Fixing a 4-year-old bug in my favorite project
I finally fixed an old bug in pitch-detection, my favorite among all of my open-source projectsOrigin
Pitch-detection (GitHub link), a C++ collection of pitch detection algorithms, is one of my oldest projects whose origin I have described before.
It started with a guitar tuner Android app:
My first project was Pitcha, a pitch detection app for Android phones ... It had a simple UI that displays the note name (and octave) from the pitch detected from the realtime input stream of audio from the microphone, using the McLeod Pitch Method, or MPM.
I began this journey with a confused question about pitch detection on the DSP stackexchange, which led to autocorrelation and the algorithm I would ultimately end up using: the McLeod Pitch Method (MPM).
My next project was pitch-detection, a collection of O(N logN) pitch detection algorithms in C++. The intention was to isolate the pitch tracking part of Pitcha in a library ...
Bug opened in 2019
The GitHub issue was opened in 2019 by a user who described that specific-sized buffers of a sine wave with a 250 Hz pitch resulted in incorrect results:
Expected pitch: 250 Hz
pitch with 2048 samples @ 16000 Hz: -1 Hz
pitch with 256 samples @ 16000 Hz: 1781.45 Hz
pitch with 4096 samples @ 16000 Hz: -1 Hz
pitch with 2048 samples @ 22050 Hz: 286.617 Hz
pitch with 4096 samples @ 22050 Hz: 1100.9 Hz
pitch with 8092 samples @ 22050 Hz: 250.226 Hz
pitch with 8092 samples @ 4800 Hz: 250.356 Hz
pitch with 4046 samples @ 22050 Hz: 250.337 Hz
pitch with 4046 samples @ 4800 Hz: 250.626 Hz
There seems to be an issue with power-of-two buffer sizes for the sinewave affecting the pitch-detection results. At the time, I mentally filed it as an issue with my sine wave generation code! I didn't think the bug was a result of my pitch detection algorithm producing incorrect results for the size of the input sine wave.
Leaping to conclusions
My worst habit in anything computer-related, whether at work or on my personal projects, is leaping to conclusions. You'll see a symptom, think "oh it's obviously component x", and waste your time because if you looked a little deeper and probed from the symptom rather than immediately jumping to a conclusion, you'll realize your first guess may have led you to a false conclusion - and you almost can't escape! Once I decided my sine wave code was wrong, that's all I could focus on
Mission: fix my sine wave generator
From the above jumping to conclusions, I decided that some day I would address the buggy sine wave generator in my code - this is relevant later
Chesteron's fence and the 8092-sized sine wave buffer
My docs and tests used the strange size 8092
for the number of samples in the sine wave that I was passing to the pitch detection function (which would ultimately be the FFT size).
8092 is not a strange number in itself, but for audio applications that involve the FFT (Fast Fourier Transform), it's much more common to see 8192, which is a power-of-two. The FFTW documentation, a legendary and authoritative FFT implementation in C, describes this:
n is the size of the transform. It can be any positive integer. ... Transforms whose sizes are powers of 2 are especially fast.
So, when a user contributed a fix of 8092 to 8192 ~2 years ago, I accepted it.
But, actually, this wasn't really working! If I compiled my own example with an 8192 buffer, or used 8192-sized sine waves in my unit tests, just like the reported GitHub Issue,
Pitchlite and KissFFT: a lucky break
A year ago, a friend asked for some help; they needed fast pitch detection for a realtime audio web application, and were poking around WebAssembly. This is around the same time I got interested in C++ audio processing for WASM and wrote pitchlite, a light copy of pitch-detection: only YIN and MPM, the two fastest algorithms.
I had to replace ffts, the highly optimized Fastest Fourier Transform in the South, which doesn't work in WASM, with the high-quality KissFFT to whom speed is not a goal.
With ChatGPT's help, I was able to represent the real autocorrelation that I use in pitch-detection (the core of both YIN and MPM) with real FFTs, which are faster and take up less space for real-valued inputs:
void pitchlite::acorr_r(const float *audio_buffer, pitchlite::PitchBase *ba)
{
// forward real FFT
kiss_fftr(ba->fft_forward, audio_buffer, ba->out_im.data());
// scale
kiss_fft_cpx scale = {1.0f / (float)(ba->N * 2), 0.0f};
// for each bin in the first half (plus DC and Nyquist bins)
for (int i = 0; i < ba->N / 2 + 1; ++i)
{
// multiply by complex conjugate
ba->out_im[i] =
complex_mult(ba->out_im[i], complex_conj(ba->out_im[i]));
ba->out_im[i] = complex_mult(ba->out_im[i], scale);
}
// inverse real FFT
kiss_fftri(ba->fft_backward, ba->out_im.data(), ba->out_real.data());
}
In pitch-detection, I used the complex FFT for the real autocorrelation. For a reason I couldn't put my finger on, I didn't succeed at using the real FFT in the real autocorrelation: the pitch outputs come out totally wrong even though on paper it should be working fine.
Putting the clues together
I'm slowly circling around the drain of the solution: being unable to use the real-to-complex FFT of ffts in my autocorrelation function without breaking pitch detection was surely a bug
Working on the sine wave generator in Christmas 2023
Around Christmas 2023, my interest in pitch-detection spiked from a conversation on HN, and I started a big overhaul PR. In it, the first thing I did was try to bypass my "buggy" sinewave generator by instead using librosa.tone to generate sine tones, saving them to txt files, and loading those to use in my unit tests. I was shocked to figure out I had the same test failures.
As an inverted test, I used librosa's pitch tracker on my own sinewave generated samples, and the output was correct. This concludes that my sine waves are fine, but my pitch detection code is wrong for specific input sizes.
Rediscovering an old bug in FFTS
Something jogged my memory. I've seen this bug before. I searched the ffts GitHub issues and found this one, where I was shocked - for the second time - to see that I myself wrote a few comments in it in 2019 pinpointing a few cases of incorrect or unexpected outputs from ffts with printed test cases.
To summarize, I printed some statements to show that with nfft of 1022, 1023, 1024, 1025, and 1026, the outputs compared between ffts and MATLAB did not match for nfft=1024, but matched in all other cases.
Fixing the bug for real
Let's put together all of this information:
- For power-of-two input sizes with c2c fft, my pitch code doesn't work
- For non-power-of-two input sizes with c2c fft, my pitch code works
- For non-power-of-two input sizes with r2c fft, my pitch code doesn't work
Now, I want to be able to use the r2c (real-to-complex) transform for a little bit of efficiency; but that's not a big deal in the grand scheme of things, especially considering how fast ffts already is.
The missing 4th item is that by some miracle, the r2c transform makes the pitch code work correctly for power-of-two input sizes.
In my Christmas PR, I was able to contribute this code and now I have the best of all worlds:
- For power-of-two sizes, use the efficient real-to-complex transform (with correct pitch outputs)
- For non-power-of-two sizes, use the less efficient complex-to-complex transform (with correct pitch outputs)