It cannot be entirely done some of the music sounds also have freq components in the range of voice signals.
Initial criterion should be freq based sepeartion of voice frm music
PCA,Neural Networks,Wavelets,FFts
use any of them ....................
if the audio file is stereo, try to invert the phase of one of the two channels and add it to the second one, it is very simple, anyway the analog karaoke filter works on this princip