What a good question!
There are many humans who can't tell the difference between crying and laughing by the one person (even sometimes when they know that person very well). So it raises the question of how a person could programme a machine to recognise a difference that they cannot recognise themselves?
I will be very, very interested to understand what you think that Matlab or any other IT solution can do which their (human) programmers cannot do! That requires access to reliable data of voice recordings which have already been reliably, accurately, consistently and universally applied to voices. Lets not even think about what technology has been used in such studies - the point is, can the distinction be made to an adequate standard of repeatability, reliability and confidence?
(Note. I agree that any one specific baby/person may have distinct personal characteristis which differentiate crying from laughing, and that difference could be coded alorithmically, but I don't believe that there is a generic distinction in the audio domain. I'd recommend speaking to some actors and some child psychologists before starting to define the problem in the audio domain).