Access to Audio Samples?

So I thought I’d try to make an automated audio-sync plugin for the video sequence editor. The idea being that you try to sync two audio strips by detecting a “clap” sound. It would be rather simple and fast to do in numpy.

Only problem: There is no way to access audio samples from Python! In fact it would be hard enough even for the C API. I somehow assumed there would be something like the pixels attribute in the Image datatype, but there isn’t. And even that property has the slightly annoying habit of copying and converting the whole image data whenever it is accessed.

There’s a gazillion ways to load and process sound data in Python. But depending on anything with a C-Extension beyond what blender provides in its standard library is next to impossible. Yes, it may work with builds specialized for a certain operating system / distribution. It may even work with Windows if you happen to know which files to copy where. But describing such an installation process is daunting, at best. It gets even worth if the addon should work with graphicall or goosebeerry builds of blender.

The only way I currently see is through an external ffmpeg process, which has to be installed/configured too. It may be hard to replicate all the sequencer logic to deal with partial strips, and even harder to deal with Metastrips, Audio from Scenestrips and so on…

So if somebody has a good idea or found some piece of documentation/code I overlooked, please share your thoughts!

Sounds more complicated what I have in mind. I would want to include the addon into blender, and for that to work it needs to be portable and not require any additional installation/configuration.

The only way I currently see is to use something like pyaudio and/or ffmpeg to convert audio-containing files (including movie clips) to raw samples in memory, and then try to figure out how these strips are related in blender.

In my view, this should not be so complicated… Ideally something like Image.pixels would be nice. But I totally lack the C programming knowledge to make it happen in the API.

My far-future wishlist includes a python-based audio/signal processing compositor node tree. That would be awesome, also for generating sound according to animations.