将声音文件导入Python作为NumPy数组（替代audiolab）

我一直在使用Audiolab来导入声音文件，它工作得很好。但是：将声音文件导入Python作为NumPy数组（替代audiolab）

它不支持某些格式，如MP3，因为libsndfile refuses to support them
它doesn't work in Python 2.6 under Windows底层，笔者不在身边解决它

In [2]: from scikits import audiolab 
-------------------------------------------------------------------- 

ImportError        Traceback (most recent call last) 

C:\Python26\Scripts\<ipython console> in <module>() 

C:\Python26\lib\site-packages\scikits\audiolab\__init__.py in <module>() 
    23 __version__ = _version 
    24 
---> 25 from pysndfile import formatinfo, sndfile 
    26 from pysndfile import supported_format, supported_endianness, \ 
    27      supported_encoding, PyaudioException, \ 

C:\Python26\lib\site-packages\scikits\audiolab\pysndfile\__init__.py in <module>() 
----> 1 from _sndfile import Sndfile, Format, available_file_formats, available_encodings 
     2 from compat import formatinfo, sndfile, PyaudioException, PyaudioIOError 
     3 from compat import supported_format, supported_endianness, supported_encoding 

ImportError: DLL load failed: The specified module could not be found.``

所以我想要：

弄清楚为什么它不是在2.6（有毛病_sndfile.pyd？），也许工作找到一种方法来扩展它不支持的格式工作
查找AUDIOLAB的完全更换

来源

2010-03-01 endolith

这个问题是特定于窗口上的Python 2.6（即你不会看到它在Python 2.5）。我还没有找到一种方法来解决它 – 2010-07-22 08:55:07

而且我最终花了两次航班之间的时间，最终成为了一个名字错误。我发布了一个新的0.11.0版本，它应该解决这个问题。 – 2010-07-23 13:00:10

David，你已经在audiolab中制作了一个很棒的工具！我经常使用它。谢谢。 – 2010-07-25 02:27:58

我一直在使用PySoundFile，而不是最近的AUDIOLAB的。它可以通过conda轻松安装。

它does not support mp3，像大多数事情一样。 MP3不再获得专利，所以没有理由不支持它;有人只需要write support into libsndfile。

来源

2018-02-26 14:53:32 endolith

AUDIOLAB的是在Ubuntu 9.04和Python 2.6.2上为我工作，所以它可能是一个Windows问题。在您的论坛链接中，作者还建议这是一个Windows错误。

在过去，这个选项很适合我，太：

from scipy.io import wavfile 
fs, data = wavfile.read(filename)

只是提防data可能int数据类型，所以它不是[1,1）内进行缩放。例如，如果data为int16，则必须将data除以2**15以在[-1,1）内进行缩放。

来源

2010-03-01 22:46:13

可以scipy.io阅读24位WAV吗？ – endolith 2010-03-01 22:55:21

我对此不确定。 16位或32位应该没问题，但我不知道24位。 – 2010-03-01 23:07:50

它没有读取任何东西。即使是16位文件也会反转，并且环绕错误的值为-1。 24位获得“TypeError：数据类型不明白”肯定有更好的... – endolith 2010-03-09 05:27:18

Sox http://sox.sourceforge.net/可以成为你的朋友。它可以读取许多不同的格式，并以任何你喜欢的数据类型作为原始数据输出。实际上，我只是编写代码来将音频文件中的数据块读取到一个numpy数组中。

我决定走这条路线以实现便携性（sox非常广泛），并最大限度地提高我可以使用的输入音频类型的灵活性。实际上，从最初的测试来看，它似乎并不明显地慢于我正在使用它......这是从非常长的（小时）文件中读取短时间（几秒）的音频。

变量，你需要：

SOX_EXEC# the sox/sox.exe executable filename 
filename # the audio filename of course 
num_channels # duh... the number of channels 
out_byps # Bytes per sample you want, must be 1, 2, 4, or 8 

start_samp # sample number to start reading at 
len_samp # number of samples to read

实际的代码是非常简单的。如果你想提取整个文件，你可以删除start_samp，len_samp和'trim'内容。

import subprocess # need the subprocess module 
import numpy as NP # I'm lazy and call numpy NP 

cmd = [SOX_EXEC, 
     filename,    # input filename 
     '-t','raw',   # output file type raw 
     '-e','signed-integer', # output encode as signed ints 
     '-L',     # output little endin 
     '-b',str(out_byps*8), # output bytes per sample 
     '-',     # output to stdout 
     'trim',str(start_samp)+'s',str(len_samp)+'s'] # only extract requested part 

data = NP.fromstring(subprocess.check_output(cmd),'<i%d'%(out_byps)) 
data = data.reshape(len(data)/num_channels, num_channels) # make samples x channels

PS：这里是代码来读取使用SOX音频文件头的东西...

info = subprocess.check_output([SOX_EXEC,'--i',filename]) 
    reading_comments_flag = False 
    for l in info.splitlines(): 
     if(not l.strip()): 
      continue 
     if(reading_comments_flag and l.strip()): 
      if(comments): 
       comments += '\n' 
      comments += l 
     else: 
      if(l.startswith('Input File')): 
       input_file = l.split(':',1)[1].strip()[1:-1] 
      elif(l.startswith('Channels')): 
       num_channels = int(l.split(':',1)[1].strip()) 
      elif(l.startswith('Sample Rate')): 
       sample_rate = int(l.split(':',1)[1].strip()) 
      elif(l.startswith('Precision')): 
       bits_per_sample = int(l.split(':',1)[1].strip()[0:-4]) 
      elif(l.startswith('Duration')): 
       tmp = l.split(':',1)[1].strip() 
       tmp = tmp.split('=',1) 
       duration_time = tmp[0] 
       duration_samples = int(tmp[1].split(None,1)[0]) 
      elif(l.startswith('Sample Encoding')): 
       encoding = l.split(':',1)[1].strip() 
      elif(l.startswith('Comments')): 
       comments = '' 
       reading_comments_flag = True 
      else: 
       if(other): 
        other += '\n'+l 
       else: 
        other = l 
       if(output_unhandled): 
        print >>sys.stderr, "Unhandled:",l 
       pass

来源

2012-03-21 04:40:56 travc

有趣的是，虽然有点可笑，也许不是跨平台的？有[pysox]（http://pypi.python.org/pypi/pysox）直接与[libSoX]（http://sox.sourceforge.net/libsox.html）库进行连接。看起来像[SoX自己支持一堆格式]（http://sox.sourceforge.net/Docs/Features），可以使用其他几个库来获得更多。我有很多问题让audiolab工作，并且它不支持MP3等，所以pysox可能值得一试。 – endolith 2012-03-21 15:41:32

我会看看pysox ......谢谢。尽管使用sox的子进程方法并不是真正的pythonic或漂亮的，但它非常强大且相对便携（因为可以在大多数系统中找到sox二进制文件/安装程序）。 – travc 2012-04-21 08:56:27

FFmpeg的支持MP3和适用于Windows（http://zulko.github.io/blog/2013/10/04/read-and-write-audio-files-in-python-using-ffmpeg/）。

读取MP3文件：

import subprocess as sp 

FFMPEG_BIN = "ffmpeg.exe" 

command = [ FFMPEG_BIN, 
     '-i', 'mySong.mp3', 
     '-f', 's16le', 
     '-acodec', 'pcm_s16le', 
     '-ar', '44100', # ouput will have 44100 Hz 
     '-ac', '2', # stereo (set to '1' for mono) 
     '-'] 
pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)

格式数据转换成numpy的数组：

raw_audio = pipe.proc.stdout.read(88200*4) 

import numpy 

audio_array = numpy.fromstring(raw_audio, dtype="int16") 
audio_array = audio_array.reshape((len(audio_array)/2,2))

来源

2016-06-01 15:38:24

如果你想为MP3

这里做到这一点就是我使用的是什么：它使用pydub和scipy。

完全安装（在Mac上，可以在其他系统不同）：

import tempfile 
import os 
import pydub 
import scipy 
import scipy.io.wavfile 


def read_mp3(file_path, as_float = False): 
    """ 
    Read an MP3 File into numpy data. 
    :param file_path: String path to a file 
    :param as_float: Cast data to float and normalize to [-1, 1] 
    :return: Tuple(rate, data), where 
     rate is an integer indicating samples/s 
     data is an ndarray(n_samples, 2)[int16] if as_float = False 
      otherwise ndarray(n_samples, 2)[float] in range [-1, 1] 
    """ 

    path, ext = os.path.splitext(file_path) 
    assert ext=='.mp3' 
    mp3 = pydub.AudioSegment.from_mp3(FILEPATH) 
    _, path = tempfile.mkstemp() 
    mp3.export(path, format="wav") 
    rate, data = scipy.io.wavfile.read(path) 
    os.remove(path) 
    if as_float: 
     data = data/(2**15) 
    return rate, data

感谢James Thompson's blog

来源

2018-02-26 06:37:39 Peter

将声音文件导入Python作为NumPy数组（替代audiolab）

回答

相关问题