2017-06-18 119 views
0

CSV文件中的最大值:csv file创建一个字典词典,发现在Python 3.X

我有国家名称和作物类型和不同值的CSV数据文件。我想创建一个字典的字典,使得输出看起来像

{'Corn': {'Illinois': ['93']}} 
{'Soybeans': {'Illinois': ['94']}} 

其中{“作物类型”:{“州”:[“MAX_VALUE”]}}。

这里是我当前的代码:

STATES = ['Alaska', 'Alabama', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming'] 

def open_file(): 
    fp = open('alltablesGEcrops.csv', 'r') 
    return fp 

def read_file(fp): 
    fp.readline() 
    dict1 = {} 
    dict2 = {} 
    for line in fp: 
     line_lst = line.strip().split(',') 
     state = line_lst[0] 
     crop = line_lst[1] 
     variety = line_lst[3] 
     year = int(line_lst[4]) 
     value = line_lst[6] 
     if variety == 'All GE varieties' and state == 'Illinois': 
      max_value = max(value, key=int) 
      dict1.setdefault(state,[]).append(max_value) 
      dict2 = {crop:dict1} 
      print(dict2) 

def main(): 
    fp = open_file() 
    data = read_file(fp) 
    print(data) 

if __name__ == "__main__": 
    main() 

它的输出看起来是这样的:

我想知道我怎么能解决我的代码,这样我可以只打印出每个最后一行作物类型?此外,当我发现的最大价值,它总是打印出来

{'Soybeans': {'Illinois': ['7', '6', '2', '8', '3', '6', '5', '7', ...]}} 

代替

{'Soybeans': {'Illinois': ['94']}} 

我怎么能解决呢?

+1

在你的代码,你要查找的最大不见,还什么呢''alltablesGEcrops .csv''看起来像,提供样本数据.... –

+0

@DmitryPolonskiy我很抱歉,我刚刚编辑了我的代码。 – Chrisiicandy

+0

多数民众赞成你的问题,你说'价值'等于一些价值,然后检查最大值,并将其附加到列表,而不是检查列表的最大值,您将追加到 –

回答

1

你可以做到这一点没有大熊猫,但你为什么要?

import pandas as pd 

# load dataframe 
df = pd.read_csv('alltablesGEcrops.csv', na_values={"Value": ("*", ".")}) 

# produce results 
print(df.groupby(['State', 'Crop'])['Value'].max()) 

这给

State   Crop 
Alabama   Upland cotton 98 
Arkansas  Soybeans   99 
       Upland cotton 99 
California  Upland cotton  9 
Georgia   Upland cotton 99 
Illinois  Corn    93 
       Soybeans   94 
Indiana   Corn    9 
       Soybeans   96 
Iowa   Corn    95 
       Soybeans   97 
Kansas   Corn    95 
       Soybeans   96 
Louisiana  Upland cotton 99 
Michigan  Corn    93 
       Soybeans   95 
Minnesota  Corn    93 
       Soybeans   96 
Mississippi  Soybeans   99 
       Upland cotton 99 
Missouri  Corn    93 
       Soybeans   94 
Missouri 2/  Upland cotton 99 
Nebraska  Corn    96 
       Soybeans   97 
North Carolina Upland cotton 98 
North Dakota Soybeans   98 
North Dakota Corn    97 
Ohio   Corn    9 
       Soybeans   91 
Other States Corn    91 
       Soybeans   94 
       Upland cotton 98 
South Dakota Corn    98 
       Soybeans   98 
Tennessee  Upland cotton 99 
Texas   Upland cotton 93 
Texas   Corn    91 
U.S.   Corn    93 
       Soybeans   94 
       Upland cotton 96 
Wisconsin  Corn    92 
       Soybeans   95 
Name: Value, dtype: object 
+0

因为要求是通过使用字典来做到这一点..... – Chrisiicandy

0

你可以试试这个只是使用字典:

from collections import defaultdict 

f = open('alltablesGEcrops.csv').readlines() 

f = [i.strip('\n').split(',') for i in f] 

d = defaultdict(dict) 


for i in f[1:]: 
    if i[0] in d[i[1]].keys(): 

     if i[-1] > max(d[i[1]][i[0]]): 

      d[i[1]][i[0]] = [i[-1]] 

    else: 
     d[i[1]][i[0]] = [i[-1]] 

print dict(d)