首先,您必须以10至30ms的小帧分割信号,应用窗口功能(建议在声音应用中哼唱),然后计算信号的傅立叶变换。随着DFT,计算梅尔Frequecy倒谱系数,你必须遵循以下步骤:
- 获取功率谱:| DFT |^2
- 计算三角形银行滤波器转换赫兹规模为美度
- 获取数谱
- 应用离散cossine变换
甲Python代码例如:
import numpy
from scipy.fftpack import dct
from scipy.io import wavfile
sampleRate, signal = wavfile.read("file.wav")
numCoefficients = 13 # choose the sive of mfcc array
minHz = 0
maxHz = 22.000
complexSpectrum = numpy.fft(signal)
powerSpectrum = abs(complexSpectrum) ** 2
filteredSpectrum = numpy.dot(powerSpectrum, melFilterBank())
logSpectrum = numpy.log(filteredSpectrum)
dctSpectrum = dct(logSpectrum, type=2) # MFCC :)
def melFilterBank(blockSize):
numBands = int(numCoefficients)
maxMel = int(freqToMel(maxHz))
minMel = int(freqToMel(minHz))
# Create a matrix for triangular filters, one row per filter
filterMatrix = numpy.zeros((numBands, blockSize))
melRange = numpy.array(xrange(numBands + 2))
melCenterFilters = melRange * (maxMel - minMel)/(numBands + 1) + minMel
# each array index represent the center of each triangular filter
aux = numpy.log(1 + 1000.0/700.0)/1000.0
aux = (numpy.exp(melCenterFilters * aux) - 1)/22050
aux = 0.5 + 700 * blockSize * aux
aux = numpy.floor(aux) # Arredonda pra baixo
centerIndex = numpy.array(aux, int) # Get int values
for i in xrange(numBands):
start, centre, end = centerIndex[i:i + 3]
k1 = numpy.float32(centre - start)
k2 = numpy.float32(end - centre)
up = (numpy.array(xrange(start, centre)) - start)/k1
down = (end - numpy.array(xrange(centre, end)))/k2
filterMatrix[i][start:centre] = up
filterMatrix[i][centre:end] = down
return filterMatrix.transpose()
def freqToMel(freq):
return 1127.01048 * math.log(1 + freq/700.0)
def melToFreq(mel):
return 700 * (math.exp(mel/1127.01048) - 1)
此编码是基于MFCC Vamp example。我希望这对你有所帮助!
通常我很讨厌引用维基百科的任何技术,但是[本页](http://en.wikipedia.org/wiki/Mel-frequency_cepstral_coefficient)基本上给你获取系数的步骤? – Dan 2011-04-30 16:18:08