2017-08-14 334 views
1

新的但令人兴奋的Python,我需要你的建议。我想出了下面的代码基于nmap的扫描来比较两个CSV文件:熊猫:如何在csv文件的数据框上添加列名

import pandas as pd 
from pandas import DataFrame 
import os 
file = raw_input('\nEnter the Old CSV file: ') 
file1 = raw_input('\nEnter the New CSV file: ') 
A=set(pd.read_csv(file, index_col=False, header=None)[0]) 
B=set(pd.read_csv(file1, index_col=False, header=None)[0]) 
final=list(A-B) 
df = pd.DataFrame(final, columns=["host"]) 
df.to_csv('DIFF_'+file) 

print "Completed!" 

当我运行它,我得到了以下结果: ,

host 
0,82.214.228.71;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
1,82.214.228.70;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 

我的问题是如何添加标签/ enter code here列2,3上的名称例如:hostanme,端口,端口名称,状态等 我试过了: df ['hostname'] = range(1,len(df)+ 1)当我用Excel打开文件时,第一列的主机名与主机一起添加

+0

你想比较所有列,或者仅第一? – jezrael

回答

2

我认为你需要read_csv与参数sep=','names的第一定义列名:

file = raw_input('\nEnter the Old CSV file: ') 
file1 = raw_input('\nEnter the New CSV file: ') 

cols = ['hostname','port','portname', ...] 
A= pd.read_csv(file, index_col=False, header=None, sep=';', names=cols) 
B= pd.read_csv(file1, index_col=False, header=None, sep=';', names=cols) 

然后使用mergeboolean indexing比较,如果需要比较所有列:

df = pd.merge(A, B, how='outer', indicator=True) 
df = df[df['_merge']=='left_only'].drop('_merge',axis=1) 

df.to_csv('DIFF_'+file) 

print "Completed!" 

样品

import pandas as pd 
from pandas.compat import StringIO 

temp=u"""82.214.228.71;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
82.214.228.70;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
82.214.228.74;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
82.214.228.75;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;""" 
#after testing replace 'StringIO(temp)' to 'filename.csv' 
cols = ['hostname','port','portname', 'a','b','c','d','e','f','g','h','i', 'j'] 
A = pd.read_csv(StringIO(temp), sep=";", names=cols) 
print (A) 
     hostname       port portname a b  c \ 
0 82.214.228.71 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
1 82.214.228.70 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 
2 82.214.228.74 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
3 82.214.228.75 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
0 open NaN NaN syn-ack NaN 3 NaN 
1 open NaN NaN syn-ack NaN 3 NaN 
2 open NaN NaN syn-ack NaN 3 NaN 
3 open NaN NaN syn-ack NaN 3 NaN 

temp=u"""82.214.228.75;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
82.214.228.70;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
82.214.228.77;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
""" 
#after testing replace 'StringIO(temp)' to 'filename.csv' 
cols = ['hostname','port','portname', 'a','b','c','d','e','f','g','h','i', 'j'] 
B = pd.read_csv(StringIO(temp), sep=";", names=cols) 
print (B) 
     hostname       port portname a b  c \ 
0 82.214.228.75 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
1 82.214.228.70 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 
2 82.214.228.77 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
0 open NaN NaN syn-ack NaN 3 NaN 
1 open NaN NaN syn-ack NaN 3 NaN 
2 open NaN NaN syn-ack NaN 3 NaN 

df1 = pd.merge(A, B, how='outer', indicator=True) 

print (df1) 

     hostname       port portname a b  c \ 
0 82.214.228.71 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
1 82.214.228.70 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 
2 82.214.228.74 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
3 82.214.228.75 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 
4 82.214.228.75 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
5 82.214.228.77 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j  _merge 
0 open NaN NaN syn-ack NaN 3 NaN left_only 
1 open NaN NaN syn-ack NaN 3 NaN  both 
2 open NaN NaN syn-ack NaN 3 NaN left_only 
3 open NaN NaN syn-ack NaN 3 NaN left_only 
4 open NaN NaN syn-ack NaN 3 NaN right_only 
5 open NaN NaN syn-ack NaN 3 NaN right_only 
#only values in A 
df1 = df1[df1['_merge']=='left_only'].drop('_merge',axis=1) 
print (df1) 
     hostname       port portname a b  c \ 
0 82.214.228.71 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
2 82.214.228.74 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
3 82.214.228.75 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
0 open NaN NaN syn-ack NaN 3 NaN 
2 open NaN NaN syn-ack NaN 3 NaN 
3 open NaN NaN syn-ack NaN 3 NaN 
#only values in B 
df1 = pd.merge(A, B, how='outer', indicator=True) 
df11 = df1[df1['_merge']=='right_only'].drop('_merge',axis=1) 
print (df11) 
     hostname       port portname a b  c \ 
4 82.214.228.75 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
5 82.214.228.77 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
4 open NaN NaN syn-ack NaN 3 NaN 
5 open NaN NaN syn-ack NaN 3 NaN 
#same values in both dataframes 
df12 = df1[df1['_merge']=='both'].drop('_merge',axis=1) 
print (df12) 
     hostname       port portname a b  c \ 
1 82.214.228.70 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
1 open NaN NaN syn-ack NaN 3 NaN 

但如果需要只比较第一列hostname使用isin的面具,~boolean indexing反相:

df2 = A[~A['hostname'].isin(B['hostname'])] 
print (df2) 
     hostname       port portname a b  c \ 
0 82.214.228.71 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
2 82.214.228.74 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
0 open NaN NaN syn-ack NaN 3 NaN 
2 open NaN NaN syn-ack NaN 3 NaN 
+0

嘿Jez.Thanks!试试吧,然后回去 –

+0

是的,当然。小通知 - 如果csv也有csv标题,请删除参数'header = None'和参数名称' – jezrael

+0

Perfect Jez!像魅力一样工作!只需添加sep =';'在写作声明中:df.to_csv('DIFF_'+ file,sep =';'),我得到了我想要的:)。我正在考虑这个答案,如果你不介意的话,我只是另外一件事。我收到以下内容: host hostname hostname_type protocol port \ 24 82.214.228.70 dsl-radius-01.direcpceu.com PTR tcp 111 32 82.214.228.71 dsl-radius-02.direcpceu.com PTR tcp 111 –

1

您可以在定义数据框的位置添加标签。例如,下面应该工作

df = pd.DataFrame(final, columns=["host"].append([x for x in range(1, len(df) + 1)])) 
+0

谢谢阿米特!将尽力回复 –

+0

谢谢Amit.This也不错! –

+0

@IvanMadolev感谢您的反馈 – Amit