The shift registers can only provide 6mA per channel if 74HC, and 8mA per channel if 74AHC. One channel can provide 35mA, but you should assume all channels active at once, which would mean that the chip would die if used beyond the value listed earlier. If the chip does not die instantly it will have a reduced life, as it is operating outside of specification. Max source current is 70mA. Some current is used for the internal logic and registers. To get the full current you will need source and sink transistors. You can use darlington arrays, but I wouldn't. While they are compact, they are expensive and have a high voltage drop.
I would use three shift registers, 24 P channel transistor, with 24 current limiting resistor for the columns. The P channels transistors only have to source 20mA so just about any should do. For the rows I would use one shift register with 8 N channel transistors. These transistors have to be strong. They could have to sink 20*8*3=480mA. So a ULN2803 may be better here. You can still drive it with a shift register.
This circuit would be easier if you had a common anode matrix. I hope this helped.