的Python从CSV的字典添加多个数据点文件

我有一个CSV文件看起来像：的Python从CSV的字典添加多个数据点文件

CountryCode, NumberCalled, CallPrice, CallDuration 
BS,+1234567,0.20250,29 
BS,+19876544,0.20250,1 
US,+121234,0.01250,4 
US,+1543215,0.01250,39 
US,+145678,0.01250,11 
US,+18765678,None,0

我希望能够分析文件，以从数据工作的一些统计数据：

CountryCode, NumberOfTimesCalled, TotalPrice, TotalCallDuration 
US, 4, 1.555, 54

目前，我有字典多数民众赞成设置：

CalledStatistics = {}

当我读从CSV，什么最好的办法t分别行把数据输入字典？：

CalledStatistics['CountryCode'] = {'CallDuration', 'CallPrice', 'NumberOfTimesCalled'}

请问加入美国第二线覆盖的第一行或将在数据基础上的关键“COUNTRYCODE”被添加？

来源

2016-03-04 Mathew Jenkinson

什么问题？你有一本字典，每次你读CSV时，国家代码总是被覆盖，所以你最终会得到一个带密钥（BS，US）的字典和值=最近的条目，即覆盖数据。 – Seekheart

你真的打算把一个集合分配给'CalledStatistics ['CountryCode']'吗？ – MattDMo

在字典中KEY是一个唯一的值，所以是的，这样做会覆盖VALUE。您只需将一个新的VALUE分配给已有的KEY（美国）。 – catalesia

每个呼叫：

CalledStatistics['CountryCode'] = {'CallDuration', 'CallPrice', 'NumberOfTimesCalled'}

将覆盖前的通话。

为了计算你需要的总和，你可以使用一个字典词典。就像在for循环中你将数据放在这些变量中一样：country_code，call_duration，call_price以及你要在collect_statistics中存储数据的位置：（编辑：添加第一行以便将call_price转换为0，如果它在数据;这段代码是为了处理一致的数据，比如只有整数，如果可能有其他类型的数据，它们需要在python总结之前变成整数[或任何相同类型的数字]）

call_price = call_price if call_price != None else 0 

if country_code not in collected_statistics: 
    collected_statistics[country_code] = {'CallDuration' : [call_duration], 
              'CallPrice' : [call_price]} 
else: 
    collected_statistics[country_code]['CallDuration'] += [call_duration] 
    collected_statistics[country_code]['CallPrice'] += [call_price]

，并在循环后，每个COUNTRY_CODE：

number_of_times_called[country_code] = len(collected_statistics[country_code]['CallDuration'] 

total_call_duration[country_code] = sum(collected_statistics[country_code]['CallDuration']) 
total_price[country_code] = sum(collected_statistics[country_code]['CallPrice'])

好了，终于在这里是一个完整的窝王脚本处理，你给的例子：使用CalledData具有您所提供的完全一样的内容的文件，它输出

#!/usr/bin/env python3 

import csv 
import decimal 

with open('CalledData', newline='') as csvfile: 
    csv_r = csv.reader(csvfile, delimiter=',', quotechar='|') 

    # btw this creates a dict, not a set 
    collected_statistics = {} 

    for row in csv_r: 

     [country_code, number_called, call_price, call_duration] = row 

     # Only to avoid the first line, but would be better to have a list of available 
     # (and correct) codes, and check if the country_code belongs to this list: 
     if country_code != 'CountryCode': 

      call_price = call_price if call_price != 'None' else 0 

      if country_code not in collected_statistics: 
       collected_statistics[country_code] = {'CallDuration' : [int(call_duration)], 
                 'CallPrice' : [decimal.Decimal(call_price)]} 
      else: 
       collected_statistics[country_code]['CallDuration'] += [int(call_duration)] 
       collected_statistics[country_code]['CallPrice'] += [decimal.Decimal(call_price)] 


    for country_code in collected_statistics: 
     print(str(country_code) + ":") 
     print("number of times called: " + str(len(collected_statistics[country_code]['CallDuration']))) 
     print("total price: " + str(sum(collected_statistics[country_code]['CallPrice']))) 
     print("total call duration: " + str(sum(collected_statistics[country_code]['CallDuration'])))

：

$ ./test_script 
BS: 
number of times called: 2 
total price: 0.40500 
total call duration: 30 
US: 
number of times called: 4 
total price: 0.03750 
total call duration: 54

来源

2016-03-04 16:56:30 zezollo

这是行不通的，因为在最后一行有一个** None **值会出现** TypeError **。但这是个好想法。 – catalesia

确实。我认为我们可以假设None的价格可以被视为零。所以，数据在使用之前需要进行处理。我编辑我的帖子来反映这一点。 – zezollo

没有你想象的那么简单:)我们不知道所有的细节。案件越复杂，它就越复杂！你测试过了吗？它工作吗？想象一下，在文件的某处有人把“五”而不是5;） – catalesia

字典可以包含列表和字典的名单，这样你就可以达到你想要的结构如下：

CalledStatistics['CountryCode'] =[ { 
    'CallDuration':cd_val, 
    'CallPrice':cp_val, 
    'NumberOfTimesCalled':ntc_val } ]

然后你就可以添加值是这样的：

for line in lines: 
    parts = line.split(',') 
    CalledStatistics[parts.pop(0)].append({ 
     'CallDuration':parts[0], 
     'CallPrice':parts[1], 
     'NumberOfTimesCalled':parts[2] })

通过使每个countryCode成为一个列表，您可以根据自己的countryCode添加任意数量的唯一字符。

pop(i)方法返回值并对列表进行变更，所以剩下的就是您对字典值所需的数据。这就是为什么我们弹出索引0并将索引0 - 2添加到字典。

来源

2016-03-04 16:57:04 arctelix

您的方法可能会略有不同。只需读取文件，将其作为列表（readlines.strip（“\ n”），split（“，”））。

忘掉第一行和最后一行（最可能是空的，测试）。然后，你可以使用一个示例@zezollo使用的字典，只需添加您将创建的字典的键的值。确保在添加列表后，所有添加的值都是相同的类型。

完全不像一个艰苦的工作，你会记得长的话;）

测试，测试，测试在模拟的例子。并阅读Python帮助和文档。这个棒极了。

来源

2016-03-04 18:17:24 catalesia

的Python从CSV的字典添加多个数据点文件

回答

相关问题