I have a huge doubt!!
When I looked for types of features to extract from the speech signal for speech recognition, many suggested to take the MFCC ( in case of non-noisy conditions) and PLP in case of noisly conditions.
In both the feature extraction processes, we have to warp the power spectrum of the signal according the auditory response of the ear ( i.e. the mel scale or the bark scale).
Why do we have to take into account the auditory response when we're doing speech recognition? Please Help!!