2016-09-29 43 views
0

您好,我正在尝试使用CSV文件并遍历每个客户数据。为了解释,每个客户都有12个月的数据。我想分析他们的年度数据,将这些数据的相关性保存到一个新列表中并循环,直到所有客户都进行了分析。对CSV进行迭代删除分析数据

例如这里是一个客户的数据可能是什么样子(简化的情况): enter image description here

我已经能够得到这个工作,以生成一个客户数据的CSV相关性。但是,我的数据表中有成千上万的客户。我想使用嵌套for循环来获取每个客户的所有相关值到列表/数组中。该列表将包含一行特定客户的关联关系,那么下一行将成为下一个客户。

这里是我当前的代码:

import numpy 
from numpy import genfromtxt 
overalldata = genfromtxt('C:\Users\User V\Desktop\CUSTDATA.csv', delimiter=',') 
emptylist = [] 
overalldatasubtract = overalldata[13::] 
#This is where I try to use the four loop to go through all the customers. I  don't know if len will give me all the rows or the number of columns. 
for x in range(0,len(overalldata),11): 
    for x in range(0,13,1): 
      cust_months = overalldata[0:x,1] 
      cust_balancenormal = overalldata[0:x,16] 
      cust_demo_one = overalldata[0:x,2] 
      cust_demo_two = overalldata[0:x,3] 
      num_acct_A = overalldata[0:x,4] 
      num_acct_B = overalldata[0:x,5] 
    #Correlation Calculations 
      demo_one_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_one)[1,0] 
      demo_two_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_two)[1,0] 
      demo_one_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_one)[1,0] 
      demo_one_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_one)[1,0] 
      demo_two_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_two)[1,0] 
      demo_two_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_two)[1,0] 

      result_correlation = [demo_one_corr_balance, demo_two_corr_balance, demo_one_corr_acct_a, demo_one_corr_acct_b, demo_two_corr_acct_a, demo_two_corr_acct_b] 

result_correlation_combined = emptylist.append(result_correlation) 
#This is where I try to delete the rows I have already analyzed. 
overalldata = overalldata[11**x::] 

print result_correlation_combined 
print overalldatasubtract 

看来,我的加减法的工作,但是当我用我的更大的数据集试了一下,我才意识到我的方法是完全错误的。

你会以不同的方式做到这一点吗?我认为它可以工作,但我找不到我的错误。

回答

0

对两个循环使用相同的变量x。在第二个循环中,x从0变为12,无论客户在哪里,并且由于您仅将行号设置为x,您将被困在第一位客户身上。

你的双循环而应是这样的:

# loop over the customers 
for x_customer in range(0,len(overalldata),12): 
    # loop over the months 
    for x_month in range(0,12,1): 
     # line number: x 
     x = x_customer*12 + x_month 
     ... 

我改变了边界和循环的步骤,因为:

  • 环1:有在12个月每所以12条线路customer - > step = 12
  • loop 2:有12个月,所以月份的数字范围从0到11 - >range(0,12,1)
+0

谢谢,这似乎是什么,我试图做的,但我仍然没有得到任何输出。 我想将这些相关性保存到: result_correlation_combined = emptylist.append(result_correlation) 但是,这似乎并没有保存任何内容,因为我不断收到一个空列表。 –

0

这是我如何解决问题:这是我的for循环的位置问题。一个简单的缩进问题。感谢您对上述海报的帮助。

在范围x_customer(0,LEN(overalldata),12):

for x in range(0,13,1): 
      cust_months = overalldata[0:x,1] 
      cust_balancenormal = overalldata[0:x,16] 
      cust_demo_one = overalldata[0:x,2] 
      cust_demo_two = overalldata[0:x,3] 
      num_acct_A = overalldata[0:x,4] 
      num_acct_B = overalldata[0:x,5] 
#Correlation Calculations 
      demo_one_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_one)[1,0] 
      demo_two_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_two)[1,0] 
      demo_one_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_one)[1,0] 
      demo_one_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_one)[1,0] 
      demo_two_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_two)[1,0] 
      demo_two_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_two)[1,0] 

      result_correlation = [(demo_one_corr_balance),(demo_two_corr_balance),(demo_one_corr_acct_a),(demo_one_corr_acct_b),(demo_two_corr_acct_a),(demo_two_corr_acct_b)] 
      numpy.savetxt('correlationoutput.csv', (result_correlation)) 
    result_correlation_combined = emptylist.append([result_correlation]) 
    cust_delete_list = [0,(x_customer),1] 
    overalldata = numpy.delete(overalldata, (cust_delete_list), axis=0)