I know few solution for 48V to 12V 3A.
And I would like to know how "perfect" in your request.
Each solution is most suitable for its application like: home, automotive, medical, space, military.
Your testing with LM2596 was failed, that is truly. Becasue IC LM2596 did not fit your requirement.
Known that LM2596 is quite cheaper than other but it is very old type, it don't have soft-start, whick will spoil if input is high as 48V.
If you still want it run properly under 48V, I have a trick like this (worked in some my project):
Add a few ohm resistor 3W/5W before input capacitor of LM2596. The better is use NTC.
This option make input of LM2596 rising slowly and charge current to output capacitor in limit range.
If input voltage apply rise to fast, LM2596 will open nearly 100%, the current through IC is (48V-0V)/(ZL+RL).
This high current make surge dissipation power on high side valve in LM2596, which drop over 2V and spoild IC.
Some IC faked from unauthoried saler have max input is 40V then remarking to HS to sale for higher price, also make spoil.
But with this option can work with output 12V1A is suggested.
If you need 3A output at 12V, you should use other solution.
Why ?
You read datasheet which claim can support 3A continous output current is right, but it is only half truly story.
3A output with 3.3V is ok, that is 3Ax 3.3V = ~10W.
If 12V x 3A = 36W, the story if different.
The problem is efficency of solution.
The high side switch is transistor, which have 2V drop voltage, this is make high lost power when switching or conduct.
Normally eff of solution is 80%, so the rest lost power is most on IC arround 20% x 36W = ~7W.
The package of IC is not feed to release this power, so the temperature of IC rise up and burn IC.
So, before you select solution with monolithic IC, you must check eff of operation point, maximum dissipation power, temperature rise over each W lost power.
After that you have to take care output ripple, eff of system, step load responde, noise radiate, life time of capacitor, switching frequency, inductor & capacitor, compesation loop, layout, protection, temperature, efficency, cost.
LM2596 also don't have external compensation loop to optimize for each configuration.
You can consider TPS54360 IC.