Female to male voice change online4/16/2024 ![]() A large object, such as a guitar, tends to vibrate at lower frequencies than a smaller object, such as a ukulele. How? The answer has to do with a phenomenon called resonance, which describes how objects have certain natural frequencies (called resonant frequencies) at which they like to vibrate. It is the presence of the vocal tract that introduces the formants. When this sound goes through the vocal tract (the oral and nasal cavities), however, it is modified to produce different vowels and consonants. Its general shape slopes downwards smoothly, with no formant bumps. In the previous post we saw that the sound that is produced by our vocal cords sounds like a buzz. ![]() To understand why, we need to know why the formant mounds are present in a voice’s spectrum in the first place. More importantly for our problem, male and female voices also have, in general, different formant frequencies and spectral envelopes. We can distinguish different vowels, for instance, because each vowel has a unique set of formant frequencies (more on that in a future post). It turns out that the positions of the formants play a big role in how we perceive a sound. The overall shape of a spectrum is called the spectral envelope, and the mounds in the envelope are called formants. But if we take a step back and look at the overall shape of the spectrum, we see that there are bigger hills and valleys in it, marked out by the orange line in the following pictures: We see successive peaks that are evenly spaced, indicating the harmonics of the sound. To figure out what’s wrong, let’s take a closer look at the spectra of the sounds. ![]() It’s a quality that you sometimes hear in special effects in music, but doesn’t sound quite natural. That still doesn’t sound that great, does it? Though we’ve managed to both lower the pitch and keep the speed of the original speech, the voice now has this weird inhuman quality to it. Speech-female.wav by xserra / CC BY, pitch lowered. I’ve illustrated all this on only one segment of the sound, but to transform the whole sound, we’d have to repeat the process of obtaining the spectrum, squishing it, and converting it back to sound for consecutive segments that make up the recording. Then we can convert this back into a sound waveform by doing the opposite of the Fourier transform. ![]() Now that we can clearly see the frequencies present in the sound, we can directly manipulate them by squishing the spectrum along the horizontal axis so that all the frequencies are 1.6 times lower, while keeping the height of each peak unchanged: Here’s the spectrum of a small segment of our speech recording: Doing so allows us to obtain a sound’s spectrum, which shows us all the frequencies that are present in it, as well as how much of each frequency there is. In a previous post I talked about the Fourier theorem, which states that all sounds can be broken down into sine waves of varying frequencies, by applying a mathematical algorithm called the Fourier transform. To lower the voice without changing its speed, we have to do something rather more complicated. Well, the voice is certainly speaking at a lower pitch, but it’s also speaking more slowly! That’s all right if it’s the effect we’re going for, but perhaps we want it to speak at the same tempo as the original recording, to portray the same mood. Speech-female.wav by xserra / CC BY, playback speed altered. Now each undulation of the wave lasts 1.6 times longer, so the frequency of the wave has decreased by the same factor. Let’s stretch it out by a factor of 1.6 along the time axis: Stretched waveform of fragment of speech The simplest way to lower the frequency of this sound would be to stretch out its waveform. Let’s take a look at the waveform of a section of the recording, showing how loud the sound is at each point in time: Waveform of fragment of speech As we saw in the previous post, this is because men tend to have thicker vocal cords, which vibrate at lower frequencies. Most women speak at frequencies of around 165 to 255 Hz, while most men speak at frequencies of 85 to 180 Hz - so men have voices that are around 1.6 times lower on average. You might guess that to change the female voice in the recording to a male one, we would have to make it lower in pitch. How would you go about it? I did this recently for an online audio processing course I took, and since the process touches not just on how audio transformation works, but also on the physics of the human voice, I thought I’d talk about it in this post. Suppose that you wanted to take the recording and change the quality of the voice - change it into a male-sounding voice, for instance.
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |