2016-03-18 22 views
1

我用python2.7运行一段代码,cProfile说35s,而pypy上的cProfile说73s!假设pypy是更快的翻译,这怎么可能?该代码在输入比特流时实现BWT转换。我有两个文件:在fm.py中调用的bwt.py。我所谓的功能:python如何可能比pypy更快

pypy -m cProfle fm.py inputfile 

然后

python -m cProfle fm.py inputfile 

从bwt.py的代码如下:

def rotations(t): 
    ''' Return list of rotations of input string t ''' 
    tt = t * 2 
    return [ tt[i:i+len(t)] for i in xrange(0, len(t)) ] 

def bwm(t): 
    return sorted(rotations(t)) 

def bwtViaBwm(t): 
    ''' Given T, returns BWT(T) by way of the BWM ''' 
    return ''.join(map(lambda x: x[-1], bwm(t))) 

def rankBwt(bw): 
    ''' Given BWT string bw, return parallel list of B-ranks. Also 
     returns tots: map from character to # times it appears. ''' 
    tots = dict() 
    ranks = [] 
    for c in bw: 
     if c not in tots: tots[c] = 0 
     ranks.append(tots[c]) 
     tots[c] += 1 
    return ranks, tots 
def firstCol(tots): 
    ''' Return map from character to the range of rows prefixed by 
     the character. ''' 
    first = {} 
    totc = 0 
    for c, count in sorted(tots.iteritems()): 
     first[c] = (totc, totc + count) 
     totc += count 
    return first 

def reverseBwt(bw): 
    ''' Make T from BWT(T) ''' 
    ranks, tots = rankBwt(bw) 
    first = firstCol(tots) 
    rowi = 0 # start in first row 
    t = '$' # start with rightmost character 
    while bw[rowi] != '$': 
     c = bw[rowi] 
     t = c + t # prepend to answer 
     # jump to row that starts with c of same rank 
     rowi = first[c][0] + ranks[rowi] 
    return t 



def suffixArray(s): 
    satups = sorted([(s[i:], i) for i in xrange(0, len(s))]) 
    print satups 
    return map(lambda x: x[1], satups) 

def bwtViaSa(t): 
    # Given T, returns BWT(T) by way of the suffix array 
    bw = [] 
    for si in suffixArray(t): 
     if si == 0: 
      bw.append('$') 
     else: 
      bw.append(t[si-1]) 
    return ''.join(bw) # return string-ized version of list bw 



def readfile(sd): 
    s="" 
    with open(sd,'r') as myfile: 
     s =myfile.read() 
    return s.rstrip('\n') 
def writefile(sd,N): 
    with open(sd, "wb") as sink: 
     sink.write(''.join(random.choice(string.ascii_uppercase + string.digits) for _ in xrange(N))) 
     sink.write('$') 
    return 



def main(): 
    data=readfile('inp') 
    b=bwtViaBwm(data) 
    ranks,tots = rankBwt(b) 
    print "Input stream = "+ data 
    print "BWT = " + bwtViaSa(data) 
    print '\n'.join(bwm(data)) 
    print ("Lc ranking:") 
    print zip(b,ranks) 

    fc=[x[0] for x in bwm(data)] 
    fc= ''.join(fc) 
    print ("First column="+ fc) 
    ranks,tots = rankBwt(fc) 
    print("Fc ranking:") 
    print zip(fc,ranks) 

    print reverseBwt(bwtViaSa(data)) 

if __name__=='__main__': 
    main() 

这是代码形式fm.py这我叫它通过pypy:

import bwt 
import sys 
from collections import Counter 

def build_FM(fname): 
    stream=bwt.readfile(fname) 
    #print bwt.suffixArray(stream) 
    b=bwt.bwtViaBwm(stream) 
    ranks,tots = bwt.rankBwt(b) 
    lc=zip(b,ranks) 
    fc=[x[0] for x in bwt.bwm(stream)] 
    fc= ''.join(fc) 
    fc= zip(fc,ranks) 
    #print lc,fc 


def main(): 
    fname= sys.argv[1] 
    build_FM(fname) 
    return 


if __name__=='__main__': 
    main() 
+0

发表一个例子请 – kilojoules

+1

如果它需要更多的时间来运行pypy,那么它似乎你的假设是不正确的....对于这个特定的代码和数据。 –

+0

@WilliamPursell嗯,我认为pypy总是更快。所以我错了。我需要寻找什么样的代码pypy胜过 – curious

回答

2

Pypy不保证更快地执行程序。首先,它实现的优化需要时间(有时需要很长时间)才能运行。其次,并不是所有的代码在pypy下运行得都会更快,尽管大多数代码都可以。

此外,剖析代码的相对速度在它们之间可能会有很大差异 - pypy代码是低层次的,因此引入剖析可能会比CPython更慢(相对而言)。没有分析活动的结果是什么?

我们需要查看您的程序以提供更多的信息。

+0

我用代码编辑了我的问题 – curious

-1

您的脚本在rotations()(O(N ** 2)其中N是输入文件的大小)中分配了一个疯狂的内存量。从cProfile和vmprof可以看出,大部分时间都花在那里。

因此,您所看到的是PyPy和CPython之间的内存处理差异。我的猜测是你正在交换,PyPy有更高的内存使用量。