2016-02-29 117 views
0
import pandas as pd 
df1=pd.read_csv('inputfile.txt',names=['chr','start','stop','gene','strand'], delimiter=r'\s+') 
print(df1) 
count =0 
c = 0 
for i in df1: 
    for y in df1: 
     if abs(df1.loc[i,"start"]- df1.loc[y,"stop"]) < 201: 
      if i != y: 
       index 
       c +=1 
print(c) 

我有一个样本输入文件:Python的大熊猫错误

chr15 74436458 74466677 pi-1700016M24Rik.1 - 
chr17 79734018 79754230 pi-Cdc42ep3.1 - 
chr3 124103907 124128909 pi-1700006A11Rik.1 - 
chr5 102261978 102280532 pi-Wdfy3.1 - 
chr6 85061409 85076088 pi-Gm5878.1 - 
chr9 51573456 51661164 pi-Arhgap20.1 + 
chr10 127114107 127132221 pi-Tmem194.1 + 
chr11 103286577 103315010 11-qE1-9443.1 + 
chr11 107855325 107859037 11-qE1-3997.1 + 
chr11 108278889 108286739 11-qE1-252.1 - 
chr12 99620581 99658258 12-qE-23911.1 - 
chr12 99658453 99692927 12-qE-7089.1 + 
chr13 21595489 21598393 13-qA3.1-213.1 - 
chr13 24997468 25026901 13-qA3.1-355.1 + 
chr1 94888921 94893644 1-qD-4525.1 - 
chr13 50363393 50412729 13-qA5-208.1 + 
chr13 50607591 50690856 13-qA5-464.1 - 
chr13 51001008 51029517 13-qA5-703.1 - 
chr13 52192103 52219527 13-qA5-967.1 + 
chr13 53489036 53549907 13-qB1-1517.1 + 
chr14 20445381 20472632 14-qA3-3095.1 - 
chr14 24901215 24939690 14-qA3-19970.1 + 
chr14 25184829 25189036 14-qA3-2286.1 - 
chr14 25244385 25249047 14-qA3-284.1 - 
chr14 45377787 45409614 14-qC1-1261.1 - 
chr14 45546497 45569941 14-qC1-1010.1 + 
chr15 59081442 59106777 15-qD1-17920.1 - 
chr15 59106921 59123501 15-qD1-4001.1 + 
chr15 74466817 74478882 15-qD3-14639.1 + 
chr15 78483658 78500962 15-qE1-8387.1 - 
chr15 79758435 79764840 15-qE1-1119.1 + 
chr1 127071468 127074556 1-qE3-706.1 + 
chr17 22634368 22656090 17-qA3.3-352.1 + 
chr17 27425220 27461973 17-qA3.3-27363.1 - 
chr17 27462141 27504428 17-qA3.3-26735.1 + 
chr17 49251595 49252836 17-qC-935.1 - 
chr17 50378485 50382342 17-qC-59.1 + 
chr17 66556151 66581098 17-qE1.1-7037.1 + 
chr18 67189100 67226114 18-qE1-36451.1 - 
chr18 67226241 67241315 18-qE1-1295.1 + 
chr19 37333596 37338356 19-qC2-1361.1 - 
chr2 92381298 92439234 2-qE1-35981.1 + 
chr2 127517589 127529447 2-qF1-2536.1 + 
chr2 150953183 150984330 2-qG3-1029.1 + 
chr3 20301593 20405121 3-qA2-617.1 - 
chr3 34725552 34777871 3-qA3-2052.1 + 
chr4 57373062 57377138 4-qB3-3994.1 - 
chr4 61881631 61891970 4-qB3-639.1 - 
chr4 61892039 61900375 4-qB3-277.1 + 
chr4 93946842 93998314 4-qC5-17839.1 - 
chr4 123510867 123519209 4-qD2.2-2182.1 - 
chr4 123571373 123573843 4-qD2.2-349.1 - 
chr4 135182710 135186113 4-qD3-2082.1 + 
chr5 113752221 113769115 5-qF-14508.1 - 
chr5 113769157 113794752 5-qF-14224.1 + 
chr5 115284179 115303596 5-qF-4633.1 - 
chr5 137395015 137412982 5-qG2-950.1 + 
chr5 144519247 144527999 5-qG2-2301.1 + 
chr5 150592651 150627915 5-qG3-23659.1 - 
chr6 81843811 81860488 6-qC3-6258.1 - 
chr6 83525934 83538118 6-qC3-100.1 + 
chr6 85937105 85953600 6-qC3-2394.1 - 
chr6 87932334 87944161 6-qD1-2831.1 - 
chr10 18516611 18551736 10-qA3-2592.1 - 
chr6 127726093 127746390 6-qF3-8009.1 - 
chr6 127746448 127791908 6-qF3-28913.1 + 
chr7 60142976 60169237 7-qB5-6255.1 + 
chr7 77019095 77054469 7-qD1-9417.1 - 
chr7 77054649 77111245 7-qD1-16444.1 + 
chr7 80242711 80250159 7-qD1-654.1 - 
chr7 80250197 80271441 7-qD1-19431.1 + 
chr7 80926316 80961355 7-qD2-24830.1 - 
chr1 57405819 57434364 1-qC1.3-637.1 - 
chr7 80961480 80977906 7-qD2-11976.1 + 
chr7 132476266 132493286 7-qF3-3125.1 - 
chr7 132493384 132508334 7-qF3-246.1 + 
chr10 20030311 20032118 10-qA3-143.1 - 
chr8 28403548 28406760 8-qA2-343.1 - 
chr8 38155119 38158009 8-qA4-332.1 - 
chr8 38166951 38168562 8-qA4-155.1 - 
chr8 94713358 94718315 8-qC5-8200.1 + 
chr8 95933840 95951276 8-qC5-2209.1 - 
chr8 112641565 112656356 8-qE1-3748.1 + 
chr9 3184709 3199792 9-qA1-178.1 - 
chr9 54054980 54097630 9-qA5.3-24188.1 - 
chr9 54097752 54117106 9-qA5.3-1495.1 + 
chr9 67539058 67581593 9-qC-31469.1 - 
chr9 67581751 67608736 9-qC-10667.1 + 
chr9 122711578 122714587 9-qF4-150.1 - 
chr10 62114440 62164257 10-qB4-6488.1 + 
chr10 66154778 66160884 10-qB5.1-5404.1 - 
chr10 66161040 66171440 10-qB5.1-221.1 + 
chr10 75300268 75324443 10-qC1-12816.1 + 
chr10 83951038 83967582 10-qC1-117.1 + 
chr10 85211306 85238346 10-qC1-2617.1 + 
chr10 86011423 86054254 10-qC1-1527.1 - 
chr10 86079756 86088620 10-qC1-875.1 + 
chr10 94136457 94151187 10-qC2-545.1 - 
chr11 50755203 50757227 11-qB1.3-590.1 - 

column1=chr 
column2=start 
column3=end 
column4=gene 
column5=orientation 

我想找到具有相同的染色体,但有200这是一个差异点是我到目前为止并保持出现错误。

如果有人可以请保留。 KeyError异常:“标签[CHR]是不是在[索引]”

回答

1

线for i in df1通过您的数据框的列实际上迭代,而不是行,你想for i in df1.index:

通过它更好的方式做在列上的向量化操作方面的事情,而不是像这样迭代,如果可以的话,就像

import numpy as np 
c = np.sum(np.abs(df['start'] - df['stop']) < 201)