您好,我正在尝试使用CSV文件并遍历每个客户数据。为了解释,每个客户都有12个月的数据。我想分析他们的年度数据,将这些数据的相关性保存到一个新列表中并循环,直到所有客户都进行了分析。对CSV进行迭代删除分析数据
我已经能够得到这个工作,以生成一个客户数据的CSV相关性。但是,我的数据表中有成千上万的客户。我想使用嵌套for循环来获取每个客户的所有相关值到列表/数组中。该列表将包含一行特定客户的关联关系,那么下一行将成为下一个客户。
这里是我当前的代码:
import numpy
from numpy import genfromtxt
overalldata = genfromtxt('C:\Users\User V\Desktop\CUSTDATA.csv', delimiter=',')
emptylist = []
overalldatasubtract = overalldata[13::]
#This is where I try to use the four loop to go through all the customers. I don't know if len will give me all the rows or the number of columns.
for x in range(0,len(overalldata),11):
for x in range(0,13,1):
cust_months = overalldata[0:x,1]
cust_balancenormal = overalldata[0:x,16]
cust_demo_one = overalldata[0:x,2]
cust_demo_two = overalldata[0:x,3]
num_acct_A = overalldata[0:x,4]
num_acct_B = overalldata[0:x,5]
#Correlation Calculations
demo_one_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_one)[1,0]
demo_two_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_two)[1,0]
demo_one_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_one)[1,0]
demo_one_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_one)[1,0]
demo_two_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_two)[1,0]
demo_two_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_two)[1,0]
result_correlation = [demo_one_corr_balance, demo_two_corr_balance, demo_one_corr_acct_a, demo_one_corr_acct_b, demo_two_corr_acct_a, demo_two_corr_acct_b]
result_correlation_combined = emptylist.append(result_correlation)
#This is where I try to delete the rows I have already analyzed.
overalldata = overalldata[11**x::]
print result_correlation_combined
print overalldatasubtract
看来,我的加减法的工作,但是当我用我的更大的数据集试了一下,我才意识到我的方法是完全错误的。
你会以不同的方式做到这一点吗?我认为它可以工作,但我找不到我的错误。
谢谢,这似乎是什么,我试图做的,但我仍然没有得到任何输出。 我想将这些相关性保存到: result_correlation_combined = emptylist.append(result_correlation) 但是,这似乎并没有保存任何内容,因为我不断收到一个空列表。 –