2011-05-30 43 views
1

我有几个文件,每个文件都有这样的数据的复选标记(文件名:里面的数据通过换行分隔):集团和使用Python

  1. 迈克:平面\ NCAR
  2. 保:平面\ nTrain \ nBoat \ NCAR
  3. 比尔:船\ nTrain
  4. 斯科特:汽车

我如何使用python该组中的所有不同的车辆,然后把一个创建一个CSV文件X上适用的人,如:

output

+1

行号是否也在您的文件中? – 2011-05-30 20:45:17

+0

不,这只是为了表明存在单独的文件。 – mike 2011-05-30 21:39:30

回答

1

假设这些行号不是在那里(足够容易解决,如果他们是),并与输入文件类似以下内容:

Mike: Plane 
Car 
Paula: Plane 
Train 
Boat 
Car 
Bill: Boat 
Train 
Scott: Car 

解决方案可以在这里找到:https://gist.github.com/999481

import sys 
from collections import defaultdict 
import csv 

# see http://stackoverflow.com/questions/6180609/group-and-check-mark-using-python 
def main(): 
    # files = ["group.txt"] 
    files = sys.argv[1:] 
    if len(files) < 1: 
     print "usage: ./python_checkmark.py file1 [file2 ... filen]" 

    name_map = defaultdict(set) 

    for f in files: 
     file_handle = open(f, "r") 
     process_file(file_handle, name_map) 
     file_handle.close() 

    print_csv(sys.stdout, name_map) 

def process_file(input_file, name_map): 
    cur_name = "" 
    for line in input_file: 
     if ":" in line: 
      cur_name, item = [x.strip() for x in line.split(":")] 
     else: 
      item = line.strip() 
     name_map[cur_name].add(item) 


def print_csv(output_file, name_map): 
    names = name_map.keys() 
    items = set([]) 
    for item_set in name_map.values(): 
     items = items.union(item_set) 

    writer = csv.writer(output_file, quoting=csv.QUOTE_MINIMAL) 
    writer.writerow([""] + names) 
    for item in sorted(items): 
     row_contents = map(lambda name:"X" if item in name_map[name] else "", names) 
     row = [item] + row_contents 
     writer.writerow(row) 


if __name__ == '__main__': 
    main() 

输出:

,Mike,Bill,Scott,Paula 
Boat,,X,,X 
Car,X,,X,X 
Plane,X,,,X 
Train,,X,,X 

这个脚本不做的唯一的事情就是保持列名的顺序。可以保持单独的列表维护顺序,因为maps/dicts本质上是无序的。

+0

这个工作得很好,唯一的事情就是那个文件输出在每一行之后都会生成一个换行符。 – mike 2011-05-30 22:47:42

+0

嗯..你不想让每一行都在自己的行吗? – I82Much 2011-05-30 23:50:03

+1

实际上,问题在于我没有按照这个[post](http:// stackoverflow)创建二进制输出csv文件。com/questions/1170214/pythons-csv-writer-produce-wrong-line-terminator) – mike 2011-05-31 13:04:12

0

下面是如何,分析这些类型的文件的一个例子。

请注意,字典在这里是无序的。您可以使用命令字典(在Python 3.2/2.7的情况下)从标准库,发现在任何情况下,可用implmentation /反向移植,如果你有旧版本的Python或只保存一个顺序附加列表:)

data = {} 
name = None 

with open(file_path) as f: 
    for line in f: 
     if ':' in line: # we have a name here 
      name, first_vehicle = line.split(':') 
      data[name] = set([first_vehicle, ]) # a set of vehicles per name 
     else: 
      if name: 
       data[name].add(line) 

# now a dictionary with names/vehicles is available 
# let's convert it to simple csv-formatted string.. 

# a set of all available vehicles 
vehicles = set(v for vlist in data.values() 
       for v in vlist) 

for name in data: 
    name_vehicles = data[name] 
    csv_vehicles = '' 
    for v in vehicles: 
     if v in name_vehicles: 
      csv_vehicles += v 
     csv_vehicles += ',' 

    csv_line = name + ',' + csv_vehicles 
0

假设,输入如下:

Mike: Plane 
Car 
Paula: Plane 
Train 
Boat 
Car 
Bill: Boat 
Train 
Scott: Car 

这python脚本,则以车辆在字典中,由人编入索引:

#!/usr/bin/python 

persons={} 
vehicles=set() 

with open('input') as fd: 
    for line in fd: 
     line = line.strip() 
     if ':' in line: 
      tmp = line.split(':') 
      p = tmp[0].strip() 
      v = tmp[1].strip() 
      persons[p]=[v] 
      vehicles.add(v) 
     else: 
      persons[p].append(line) 
      vehicles.add(line) 

for k,v in persons.iteritems(): 
    print k,v 

print 'vehicles', vehicles 

结果:

Mike ['Plane', 'Car'] 
Bill ['Boat', 'Train'] 
Scott ['Car'] 
Paula ['Plane', 'Train', 'Boat', 'Car'] 
vehicles set(['Train', 'Car', 'Plane', 'Boat']) 

现在,所有需要的数据都放在数据结构中。该CSV部分就留给读者做练习:-)

0

最优雅,最简单的办法是,像这样:

vehiclesToPeople = {} 
people = [] 

for root,dirs,files in os.walk('/path/to/folder/with/files'): 
    for file in files: 
     person = file 
     people += [person] 
     path = os.path.join(root, file) 

     with open(path) as f: 
      for vehicle in f: 
       vehiclesToPeople.setdefault(vehicle,set()).add(person) 

people.sort() 
table = [ ['']+people ] 
for vehicle,owners in peopleToVehicles.items(): 
    table.append([('X' if p in vehiclesToPeople[vehicle] else '') for p in people]) 

csv = '\n'.join(','.join(row) for row in table) 

你可以做pprint.pprint(table)也来关注一下吧。