detecting pauses in a spoken word audio file using pymad, pcm, vad, etc

Posted by james on Stack Overflow See other posts from Stack Overflow or by james
Published on 2010-04-13T00:51:31Z Indexed on 2010/04/13 0:52 UTC
Read the original article Hit count: 463

Filed under:

mp3

|

pcm

First I am going to broadly state what I'm trying to do and ask for advice. Then I will explain my current approach and ask for answers to my current problems.

Problem

I have an MP3 file of a person speaking. I'd like to split it up into segments roughly corresponding to a sentence or phrase. (I'd do it manually, but we are talking hours of data.)

If you have advice on how to do this programatically or for some existing utilities, I'd love to hear it. (I'm aware of voice activity detection and I've looked into it a bit, but I didn't see any freely available utilities.)

Current Approach

I thought the simplest thing would be to scan the MP3 at certain intervals and identify places where the average volume was below some threshold. Then I would use some existing utility to cut up the mp3 at those locations.

I've been playing around with pymad and I believe that I've successfully extracted the PCM (pulse code modulation) data for each frame of the mp3. Now I am stuck because I can't really seem to wrap my head around how the PCM data translates to relative volume. I'm also aware of other complicating factors like multiple channels, big endian vs little, etc.

Advice on how to map a group of pcm samples to relative volume would be key.

Thanks!

© Stack Overflow or respective owner

Related posts about mp3

Concatenating several .mp3 files into one .mp3

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
As it was suggested here I am using cat command to concatenate several .mp3 files into one .mp3 file. Imagine, I have following .mp3 files in the current folder: 001001.mp3 001002.mp3 001003.mp3 001004.mp3 001005.mp3 or, like this: 096001.mp3 096002.mp3 096003.mp3 096004.mp3 I need to concatenate… >>> More
Paperclip validates_attachment_content_type for mp3 triggered when attaching mp3

as seen on Stack Overflow - Search for 'Stack Overflow'
Hey everyone, Struggling to workout when i add the following validtion to my Voice model using paperclip, it is being triggered when i try and upload an mp3: class Voice < ActiveRecord::Base has_attached_file :clip validates_attachment_presence :clip validates_attachment_content_type :clip… >>> More
concatenating mp3 files or joining mp3 files using java

as seen on Stack Overflow - Search for 'Stack Overflow'
We would like to concatenate/merge/join mp3 files seamlessly using "java" in any environment. We are trying the following options at the moment ( please let us know any other options): Using JMF -- ruled out as it supported only in windows http://java.sun.com/javase/technologies/desktop/media/jmf/reference/faqs/index… >>> More
How To Add MP3 Support to Audacity (to Save in MP3 Format)

as seen on How to geek - Search for 'How to geek'
You may have noticed that the default installation of Audacity doesn’t have built-in support for MP3s due to licensing issues. Here’s how to add it in yourself for free really easily in few simple steps. Photo by bobcat rock Latest Features How-To Geek ETC HTG… >>> More
Converting .wav (CCITT A-Law format) to .mp3 using LAME

as seen on Super User - Search for 'Super User'
I would like to convert wav files to mp3 using the lame encoder (lame.exe). The wav files are recorded along the following specifications: Bit Rate: 64kbps Audio sample size: 8 bit Channels: 1 (mono) Audio sample rate: 8 kHz Audio format: CCITT A-Law If I try to convert such a wav file using lame… >>> More

Related posts about pcm

Pulseaudio is no longer working in Debian Squeeze: 'Failed to open module "module-combine-sink": file not found'

as seen on Super User - Search for 'Super User'
I'm having a problem with pulseaudio. My machine crashed, and when I rebooted and ran pavucontrol, I got a "Connection Failed: Connection refused" dialog. When I run pulseaudio --log-level=info --log-target=stderr from the command line, I get the following output: [...] I: alsa-util.c: Error opening… >>> More
using isight camera in macbookpro(8,2) on ubuntu 12.04 virtualbox VM

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I'm having a lot of trouble using the built-in isight camera on my macbookpro8,2 (early 2011) from an ubuntu 12.04 virtual machine, run inside VirtualBox. The following is the log I get when I try to run guvcview ubuntu@ubuntu:~$ guvcview guvcview 1.5.3 ALSA lib pcm.c:2217:(snd_pcm_open_noupdate)… >>> More
Visualizing volume of PCM samples

as seen on Stack Overflow - Search for 'Stack Overflow'
I have several chunks of PCM audio (G.711) in my C++ application. I would like to visualize the different audio volume in each of these chunks. My first attempt was to calculate the average of the sample values for each chunk and use that as an a volume indicator, but this doesn't work well. I do… >>> More
Convert GSM Audio to WAV PCM

as seen on Stack Overflow - Search for 'Stack Overflow'
Does anyone know how to convert GSM audio into PCM WAV via C#? I have tried to find a viable solution on the Internet to no avail. >>> More
Silverlight 4 - encoding PCM data from the microphone

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi I've written a basic SL4 application to capture audio data from the microphone using CaptureSource. The trouble is, it's raw PCM output - which means huge and uncompressed. Given that I need this application to run purely within a SL4 environment, how can I compress the PCM audio data into something… >>> More