在从txt文件中分组数据时遇到问题

我是一名初学者编码人员，并且我拥有的项目要求我对文本文件进行分类。我打开的txt文件是这样的：（这并不完全如何txt文件看起来像只是当我复制并经过它时，它看起来太乱了，只有另一列刚刚填满字 '地图' 出于某种原因）在从txt文件中分组数据时遇到问题

MAG  UTC DATE-TIME    LAT   LON  DEPTH Region 
4.3 2014/03/12 20:16:59  25.423  -109.730 10.0  GULF OF CALIFORNIA     
5.2 2014/03/12 20:09:55  36.747  144.050 24.2  JAPAN 
5.0 2014/03/12 20:08:25  35.775  141.893 24.5  JAPAN 
4.8 2014/03/12 19:59:01  38.101  142.840 17.6  Japan 
4.6 2014/03/12 19:55:28  37.400  142.384 24.7  JAPAN 
5.0 2014/03/12 19:45:19  -6.187  154.385 62.0  GUINEA

我所要的输出是这样的：

[日本， '4.3'， '5.2'，” 5.0'，'4.8'，'4.6']，[加利福尼亚湾，4.3]，[几内亚，5.0]]

我的当前编码：（该VLIST [7：]。在第一for循环给我的区域名称和在第二个for循环第j [1]给出我magtitude数）

def myOpen(filepointer): 
    header = filepointer.readline() 
    regions = []#gathers up all the names of the regions without repeating them 
    maglist = []#matchs with naems and numbers 
    filelines = []#list of lines in txt file 


    for aline in filepointer:#reades each line 
     vlist = aline.split()#turns lines into lists 
     filelines.append(vlist) 
     if not vlist[7:] in regions:#makes list of names without repeat 
      regions.append(vlist[7:]) 
      regions.sort() 

    for j in filelines:#gets each file line 
     for names in regions:#each name 
      if names == j[7:]: 
       num = j[1] 
       names.append(float(num)) 
       mags.append(names) 
    return maglist 
def main(): 
    myFile = open('earthquakes.txt','r') 
    quakes = myOpen(myFile) 
    myFile.close() 
    print(quakes) 

main()

给出了这样的输出：

[日本， '4.3']，[加利福尼亚湾，4.3]，[几内亚，5.0]

我不知道为什么只有它得到第一个数量级出现在其他地区，而不是其他地区。

来源

2014-12-03 Cast

你检查了我的代码吗？ – Hackaholic 2014-12-03 10:29:41

在这里你去：使用itertools.groupby，lambda，map，str.split，str.lower和str.join

如果你的文件是这样的：

MAG  UTC DATE-TIME    LAT   LON  DEPTH Region 
4.3 2014/03/12 20:16:59  25.423  -109.730 10.0  GULF OF CALIFORNIA 
5.2 2014/03/12 20:09:55  36.747  144.050 24.2  JAPAN 
5.0 2014/03/12 20:08:25  35.775  141.893 24.5  JAPAN 
4.8 2014/03/12 19:59:01  38.101  142.840 17.6  Japan 
4.6 2014/03/12 19:55:28  37.400  142.384 24.7  JAPAN 
5.0 2014/03/12 19:45:19  -6.187  154.385 62.0  GUINEA

这里是工作代码：

>>> import itertools 
>>> f = open('file.txt') 
>>> [[" ".join(x),list(map(lambda z:z[0],list(y)))] for x,y in itertools.groupby(sorted(list(map(str.split,map(str.lower,list(f)[1:]))),key=lambda x:" ".join(x[6:])),key=lambda x:x[6:])] 
[['guinea', ['5.0']], ['gulf of california', ['4.3']], ['japan', ['5.2', '5.0', '4.8', '4.6']]]

让我解释一下：

>>> f = open('file.txt') 
>>> k = list(map(str.lower,list(f)[1:])) # convert all lines to lower case and leave 1st line 
>>> k 
['4.3 2014/03/12 20:16:59  25.423  -109.730 10.0  gulf of california\n', '5.2 2014/03/12 20:09:55  36.747  144.050 24.2  japan\n', '5.0 2014/03/12 20:08:25  35.775  141.893 24.5  japan\n', '4.8 2014/03/12 19:59:01  38.101  142.840 17.6  japan\n', '4.6 2014/03/12 19:55:28  37.400  142.384 24.7  japan\n', '5.0 2014/03/12 19:45:19  -6.187  154.385 62.0  guinea\n'] 
>>> k = list(map(str.split,k)) # it will split the lines on whitespaces 
>>> k 
[['4.3', '2014/03/12', '20:16:59', '25.423', '-109.730', '10.0', 'gulf', 'of', 'california'], ['5.2', '2014/03/12', '20:09:55', '36.747', '144.050', '24.2', 'japan'], ['5.0', '2014/03/12', '20:08:25', '35.775', '141.893', '24.5', 'japan'], ['4.8', '2014/03/12', '19:59:01', '38.101', '142.840', '17.6', 'japan'], ['4.6', '2014/03/12', '19:55:28', '37.400', '142.384', '24.7', 'japan'], ['5.0', '2014/03/12', '19:45:19', '-6.187', '154.385', '62.0', 'guinea']] 
>>> k = sorted(k,key = lambda x:" ".join(x[6:])) # it will sort the k on Region 
>>> k 
[['5.0', '2014/03/12', '19:45:19', '-6.187', '154.385', '62.0', 'guinea'], ['4.3', '2014/03/12', '20:16:59', '25.423', '-109.730', '10.0', 'gulf', 'of', 'california'], ['5.2', '2014/03/12', '20:09:55', '36.747', '144.050', '24.2', 'japan'], ['5.0', '2014/03/12', '20:08:25', '35.775', '141.893', '24.5', 'japan'], ['4.8', '2014/03/12', '19:59:01', '38.101', '142.840', '17.6', 'japan'], ['4.6', '2014/03/12', '19:55:28', '37.400', '142.384', '24.7', 'japan']] 
>>> [[" ".join(x),list(map(lambda z:z[0],list(y)))] for x,y in itertools.groupby(k,key = lambda x:x[6:])] 
[['guinea', ['5.0']], ['gulf of california', ['4.3']], ['japan', ['5.2', '5.0', '4.8', '4.6']]]

来源

2014-12-03 09:41:05 Hackaholic

在从txt文件中分组数据时遇到问题

回答

相关问题