the articulation bandwidth of voice is (300Hz to 3Kz), which makes 8KHz minimum sampling rate(this is for making out words clearly, lesser would work but its not optimum)
less sampling rate makes voice more noisy, to hear... (effect is known as aliasing in general)
the minimum samplin rate of any human voice that gives u a clear data will be somethin around 8k samples/sec. but this actually depends on the application.... its obvious that more the data the better u get the info out of it. like when u want to go for some recognisin and reproducin system u need a much higher , 'coz u need a bit more info to get the output reproduced. on the other hand just for a recognition system u can do it wid 8k sample/sec, which would reduce the aliasing....n obviously satisfies the nyquist criteria....
as i told u earlier reproduction needs more information n hence greater sampling rate.
i think u r takin the lpc(linear predictive coeff) or cepstral coeff. to represent the signal when u r trainin the system n later on from these u would reproduce them. i such a case the more samples that u take the more info u have as coeff. which u can reproduce.
i precisely cant say which freq would be good for u ...it depends on the quality u want ... so my advice is start with some higher freq say 15k Hz and go down 10...until u think u r satisfied with lowest freq possible.