I'm trying to normalize an audio file of speech.
Specifically, where an audio file contains peaks in volume, I'm trying to level it out, so the quiet sections are louder, and the peaks are quieter.
I know very little about audio manipulation, beyond what I've learnt from working on this task. Also, my math is embarrassingly weak.
I've done some research, and the Xuggle site provides a sample which shows reducing the volume using the following code: (full version here)
@Override
public void onAudioSamples(IAudioSamplesEvent event)
{
// get the raw audio byes and adjust it's value
ShortBuffer buffer = event.getAudioSamples().getByteBuffer().asShortBuffer();
for (int i = 0; i < buffer.limit(); ++i)
buffer.put(i, (short)(buffer.get(i) * mVolume));
super.onAudioSamples(event);
}
Here, they modify the bytes in getAudioSamples() by a constant of mVolume.
Building on this approach, I've attempted a normalisation modifies the bytes in getAudioSamples() to a normalised value, considering the max/min in the file. (See below for details). I have a simple filter to leave "silence" alone (ie., anything below a value).
I'm finding that the output file is very noisy (ie., the quality is seriously degraded). I assume that the error is either in my normalisation algorithim, or the way I manipulate the bytes. However, I'm unsure of where to go next.
Here's an abridged version of what I'm currently doing.
Step 1: Find peaks in file:
Reads the full audio file, and finds this highest and lowest values of buffer.get() for all AudioSamples
@Override
public void onAudioSamples(IAudioSamplesEvent event) {
IAudioSamples audioSamples = event.getAudioSamples();
ShortBuffer buffer =
audioSamples.getByteBuffer().asShortBuffer();
short min = Short.MAX_VALUE;
short max = Short.MIN_VALUE;
for (int i = 0; i < buffer.limit(); ++i) {
short value = buffer.get(i);
min = (short) Math.min(min, value);
max = (short) Math.max(max, value);
}
// assign of min/max ommitted for brevity.
super.onAudioSamples(event);
}
Step 2: Normalize all values:
In a loop similar to step1, replace the buffer with normalized values, calling:
buffer.put(i, normalize(buffer.get(i));
public short normalize(short value) {
if (isBackgroundNoise(value))
return value;
short rawMin = // min from step1
short rawMax = // max from step1
short targetRangeMin = 1000;
short targetRangeMax = 8000;
int abs = Math.abs(value);
double a = (abs - rawMin) * (targetRangeMax - targetRangeMin);
double b = (rawMax - rawMin);
double result = targetRangeMin + ( a/b );
// Copy the sign of value to result.
result = Math.copySign(result,value);
return (short) result;
}
Questions:
Is this a valid approach for attempting to normalize an audio file?
Is my math in normalize() valid?
Why would this cause the file to become noisy, where a similar approach in the demo code doesn't?