It might help to think how you would describe a moving human, in terms of geometric shapes. Start with the scene as represented by a grid of pixels. A human generally is a tall rectangular mass of pixels.
The head is a smaller oval mass, at the top of the body. Face and hair are often different colors.
Arms are long rods, which pivot at the shoulder and elbow. Etc.
A moving human usually has moving legs. These are easier to spot when looking from the side.
The shapes can be picked out more easily if the background is a solid color, and different hue from the subject.
Software which can do this needs to be sophisticated. I picture it needing a long and complicated algorithm. There are companies which have done this with a measure of success. They need to charge money for their efforts. I do not think they will share their algorithm for free.