2016-05-16 70 views
0

我使用一些示例代码(下面)来测试一个NB分类和Im从管线22得到以下错误:csv.Error:换行字符

_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode? 

这是CSV的样品排文件:

b8:27:eb:38:72:a7,df598b5eb8f4,5/9/16 14:47,154aec250ef6,-84,outside 

的代码示例:

from sklearn.preprocessing import LabelBinarizer 
import numpy as np 
from sklearn import naive_bayes 
import csv 
import random 
from sklearn import metrics 
import urllib 
url = "example.com" 
webpage = urllib.urlopen(url) 
# download the file 
#raw_data = urllib.urlopen(url) 

datareader = csv.reader(webpage) #line 22 is this one 

ct = 0; 
for row in datareader: 
    ct = ct+1 
webpage = urllib.urlopen(url) 
datareader = csv.reader(webpage) 
data = np.array(-1*np.ones((ct,6),float),object); 
k=0; 
for row in datareader: 
    data[k,:] = np.array(row) 
    k = k+1; 

featnames = np.array(['unti','dongle','timestamp','tracker','rssi','label'],str) 

keys = [[]]*np.size(data,1) 
numdata = -1*np.ones_like(data); 

for k in range(np.size(data,1)): 
    keys[k],garbage,numdata[:k] = np.unique(data[:,k],True,True) 

numrows = np.size(numdata,0); 
numcols = np.size(numdata,1); 
numdata = np.array(numdata, int) 
xdata = numdata[:,:-1] 
ydata = numdata[:,-1] 

lbin = LabelBinarizer(); 
for k in range(np.size(xdata,1)): 
if k==0: 
    xdata_ml = lbin.fit_transform(xdata[:,k]); 
else: 
    xdata_ml = np.hstack((xdata_ml,lbin.fit_transform(xdata[:,k]))) 
ydata_ml = lbin.fit_transform(ydata) 


allIDX = np.arrange(numrows); 
random.shuffle(allIDX); 
holdout_number = numrows/10; 
testIDX = allIDX[0:holdout_number]; 
trainIDX = allIDX[holdout_number:]; 

xtest = xdata_ml[testIDX,:]; 
xtrain = xdata_ml[trainIDX,:]; 
ytest = ydata[testIDX]; 
ytrain = ydata[trainIDX]; 

mnb = naive_bayes.MultinomialNB(); 
mnb.fit(xtrain,ytrain); 
print "Classification accuracy of MNB =", mnb.score(xtest,ytest) 

谁能帮我找错误,并建议修复?

回答

0

你使用的是Windows吗?如果是的话,这可以通过解决:

datareader = csv.reader(webpage, dialect=csv.excel_tab) 
+0

nope - 在mac上的蟒蛇 – DataGuy

+0

嗯..怪异..你有没有尝试加入'方言'kwarg? – silviomoreto

+0

是的,我试过了 – DataGuy

0

这里的某些答案CSV new-line character seen in unquoted field error的参考CSV在MAC

你可以尝试将文件手动下载到你的MAC,并尝试做了以下文件作为本地文件:

1)将文件另存为CSV(MS-DOS逗号分隔)

2)将文件另存为CSV(视窗逗号分隔)

3)运行下面的脚本

with open(csv_filename, 'rU') as csvfile: 
    csvreader = csv.reader(csvfile) 
    for row in csvreader: 
     print ', '.join(row) 

说明关于“RU”:https://www.python.org/dev/peps/pep-0278/

在Python与通用换行符支持开放()的模式参数也可以是“U”,意为“开放输入作为具有通用换行符解释的文本文件“。模式“汝”也是允许的,与“RB”

理由

通用换行符支持用C实现的,而不是在Python的对称性。 这样做是因为我们要与外国换行符文件 约定是进口能力,所以Python lib目录可以 通过远程文件系统的连接共享,或在Mac OS XMacPython上 和Unix的Python之间