Firstly, we start the discussion of how we start the project? We have looked upon the internet to find the ways to classify the gender. We have many ideas like MFCCS, short term energies, pitch etc.. but we finally opted to use MFCC because by using MFCC we can get the result with much better accuracy. Now our aim is to write the code in much simpler way with least number of lines possible and we understand that the use functions can help us with that. The Functions which we defined for this Project are
mfcc_r, frames_window_r, trifilter_r, deltamfcc, lab_s, normalize_norm_mean, normalize_norm, SPKmean.
The Function
mfcc_r generates the MFCCs of the given audio sample. Before giving input audio sample to
mfcc_r ,the parts which having amplitude less than 0.1 (arbitrarily chosen by observing audio samples) are eliminated.
mfcc_r uses to other functions for framing and windowing (
frames_window_r) and to increase the amplitudes we will use triangular filter bank(
trifilter_r). This function firstly, starts by checking whether the given inputs are correct and if all given required inputs are given, if not it terminates the whole algorithm. Secondly, it converts the given inputs frame shift and frame duration into samples and finds the length of the FFT required based on the sampled frame duration samples. Thirdly, it does pre-emphasis and later speech is framed and windowed(
frames_window_r). Fourthly, it does FFT for each frames and next we multiply with triangular filter bank(
trifilter_r), so that along with the amplitudes, the difference between higher and lower amplitudes gets amplified too. Next, we convert to mel-scale because mel-scale converts the input into a scale which computer understands as the human voices differentiating from environmental voices. Finally, we do the Discrete cosine transform and we limit the coefficients to 13 because in general 13 coefficients are required to classify gender. These coefficients are called mel-frequency cepstral coefficients(MFCC).
We extract deltamfcc coefficients and delta-deltamfc coefficients by giving MFCC input to function (
deltamfcc). we computed MFCC, deltamfcc, delta-deltamfcc .and then we append them horizontally making a feature vector comprising all the above mentioned coefficients. Now we use the Bag-of-words technique to make the whole training data feature vectors into 2 clusters by the use of Spherical k-means clustering and later we find the mean of the whole male data samples feature vectors and also for female data samples. we found out the cluster1 mean is almost close to female data mean and cluster 2 mean is closer to male data mean. So now we consider cluster1 as female and cluster2 as male.
We take these two cluster means generated for whole data samples of male and female and compare the test data samples by using function
lab_s which gives the output percentage of male and percentage of female in the gives test sample and finally outputs male or female by comparing the percentages.
The whole functions runs in the following matlab codes
gclass_r – used to generate the feature vector for single sample,
train_r.m – used to generate the feature vector for whole training data samples,
SPK_m – used to do spherical k means clustering,
test_dm runs the whole male data samples,
test_df runs for whole female data samples. We finally find out the gender.