Editing Sound Files in Emacs

Emacs PCM Editing

I buy quite a lot of vinyl still.  And the hipsterish hipsters have started releasing things on tape, since vinyl is obviously too mainstream.  (I’m wondering when 78s will be making a comeback.)  So to listen to this music I need to sample it and then convert it to flac.

That’s trivial enough since I have a very nice DA, but the main issue is editing the music after sampling it.  I have, perhaps needless to say at this point, written an Emacs mode to do this.

As you can see, the mode is pretty self-explanatory.  It shows the wave forms, and you can zoom and set break points, and split the file up into pieces.  (Then name the files after querying freedb, possibly.)

The mode consist of one Emacs Lisp file and two C programs.  The first, summarize, goes through the PCM file and outputs the “energy level” in each section.  The second, bsplit, is just a fast file splitter. Oh, and there’s a patch to aplay to allow –-seeking to an arbitrary place so that you can skip around in the file and start playing sound at point.  (I see that I’ve forgotten to submit the patch to the ALSA people, so I did that just now.)

The interesting bit about wave.el is that it provides auto-splitting capabilities.  Or at least, it tries to.  It originally had a command for trying to put splitting marks at all points where the sound was “silent” for more than four seconds.  This worked somewhat OK, but what’s “silent” varies from sound source to sound source.  Some tapes are quite hissy.  And when I got a new record player, the background noise level dropped to almost nothing, so calibrating “silence” is boring.

Then I thought of a new approach: I know how many tracks there are in each file.  Say it’s a record album side with five sounds.  Then I could tell wave.el “this is five tracks, try to find a likely partition”.  It should snip away the initual “stylus hits the album” bump, and trim away the silence at the end, but otherwise put four sectional marks in the sound where it separates the tracks optimally.

However, I just wasn’t able to implement that in a satisfactory way.  The current wave.el isn’t really usable in automatic mode.

When you look at the sound files visually, it’s pretty obvious to a human bean where the tracks are, usually.  But I’m just not able to figure out a nice algorithm.  (I mean, I haven’t really tried a lot.  I think I spent a day on it last summer, if I remember correctly.)

If anybody has any ideas, I’m all ears.

2 thoughts on “Editing Sound Files in Emacs”

  1. As you go through the file linearly, can you at every point give a number to the likelihood that there is a break between tracks at that point? I guess you can: something like minus the average volume for the last four seconds. Or do some advanced dynamic-programming thing where you optimize the volume times the length of the gap, or something. But start with something simple. Call it S(n) for score at time n.

    Similarly, I bet you can give a number to the likelihood that the initial bump is over, and if you go through the file backwards you could estimate the likelihood that you are within the final silence as something like minus the maximal momentary volume after that point in time. Or something.

    Now set up a 6xN table T where N is the number of time ticks at some suitable granularity, and 6 is four inter-track gaps plus the end of the initial noise and the start of the final silence. The entry T(j,n) is the highest possible sum of gap scores under the condition that the j-th gap occurs at or before time n. Populate the table one time tick at a time: T(0,n) is just the likelihood score for the initial bump being over by that time, T(1,n) is the maximum of T(1,n-1) and T(0,n-1)+S(n), T(2,n) is the maximum of T(2,n-1) and T(1,n-1)+S(n), etc. Except maybe you don't believe in gaps right after each other, so it's more like T(j-1,n-k)+S(n), where k is, like, the minimum plausible length of a track.

    Finally you find the earliest maximal number among T(6,…) which should correspond to the best way of splitting the file into tracks. The time tick corresponding to that time is the beginning of the final silence. You scan the numbers in T(5,…) back from this time tick to find the beginning of the fifth track (find the earliest time when it's equal to T(6,n)-S(n)). Then you scan T(4,…), etc.

    And, umm, tweak each part to lose the gaps. Or, instead of four gap times have four beginning-of-gap times and four end-of-gap times.

  2. Hm… Math'r'like'not'Us, so I'm not quite sure I understand. 🙂 I think doing the initial bump and the fade out can be done somewhat easily, since these have very predictable shapes.

    Finding the inter-song gaps, however, I thought could be achieved by smoothing the curve (somewhat) and then trying to pinpoint the best minimas. But I don't seem to have much luck with that approach.

    Your approach seems to be totally different, which is promising. But I don't think I understand it.

Leave a Reply