Cepstral Analysis for pitch detection
- by Ohmu
Hi!
I'm looking to extract pitches from a sound signal.
Someone on IRC just explain to me how taking a double FFT achieves this. Specifically:
take FFT
take log of square of absolute value (can be done with lookup table)
take another FFT
take absolute value
I am attempting this using vDSP
I can't understand how I didn't come across this technique earlier. I did a lot of hunting and asking questions; several weeks worth. More to the point, I can't understand why I didn't think of it.
I am attempting to achieve this with vDSP library. it looks as though it has functions to handle all of these tasks.
However, I'm wondering about the accuracy of the final result.
I have previously used a technique which scours the frequency bins of a single FFT for local maxima. when it encounters one, it uses a cunning technique (the change in phase since the last FFT) to more accurately place the actual peak within the bin.
I am worried that this precision will be lost with this technique I'm presenting here.
I guess the technique could be used after the second FFT to get the fundamental accurately. But it kind of looks like the information is lost in step 2.
as this is a potentially tricky process, could someone with some experience just look over what I'm doing and check it for sanity?
also, I've heard there is an alternative technique involving fitting a quadratic over neighbouring bins. Is this of comparable accuracy? if so, I would favour it, as it doesn't involve remembering bin phases.
so questions:
does this approach makes sense? Can it be improved?
I'm a bit worried about And the log square component; there seems to be a vDSP function to do exactly that: vDSP_vdbcon however, there is no indication it precalculates a log-table -- I assume it doesn't, as the FFT function requires an explicit pre-calculation function to be called and passed into it. and this function doesn't.
Is there some danger of harmonics being picked up?
is there any cunning way of making vDSP pull out the maxima, biggest first?
Can anyone point me towards some research or literature on this technique?
the main question: is it accurate enough? Can the accuracy be improved? I have just been told by an expert that the accuracy IS INDEED not sufficient. Is this the end of the line?
Pi
PS I get SO annoyed (npi) when I want to create tags, but cannot. :| I have suggested to the maintainers that SO keep track of attempted tags, but I'm sure I was ignored. we need tags for vDSP, accelerate framework, cepstral analysis