我正在处理1-12个月的10,000个客户数据集。我在每个客户的12个月期间为不同的价值产生了相关性。Python对大数据集进行迭代并删除评估数据
目前我的输出关联文件比我的原始文件有更多的行。我意识到这是从我试图从原始数据集中删除已评估的行时的迭代错误。
我期望的结果是一个数据集,每个客户年度评估对应的各种相关性有10,000个条目。
我粗体显示(出演)我认为错误的地方。
这里是我当前的代码:
for x_customer in range(0,len(overalldata),12):
for x in range(0,13,1):
cust_months = overalldata[0:x,1]
cust_balancenormal = overalldata[0:x,16]
cust_demo_one = overalldata[0:x,2]
cust_demo_two = overalldata[0:x,3]
num_acct_A = overalldata[0:x,4]
num_acct_B = overalldata[0:x,5]
out_mark_channel_one = overalldata[0:x,25]
out_service_channel_two = overalldata[0:x,26]
out_mark_channel_three = overalldata[0:x,27]
out_mark_channel_four = overalldata[0:x,28]
#Correlation Calculations
#Demographic to Balance Correlations
demo_one_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_one)[1,0]
demo_two_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_two)[1,0]
#Demographic to Account Number Correlations
demo_one_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_one)[1,0]
demo_one_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_one)[1,0]
demo_two_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_two)[1,0]
demo_two_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_two)[1,0]
#Marketing Response Channel One
mark_one_corr_acct_a = numpy.corrcoef(num_acct_A, out_mark_channel_one)[1, 0]
mark_one_corr_acct_b = numpy.corrcoef(num_acct_B, out_mark_channel_one)[1, 0]
mark_one_corr_balance = numpy.corrcoef(cust_balancenormal, out_mark_channel_one)[1, 0]
#Marketing Response Channel Two
mark_two_corr_acct_a = numpy.corrcoef(num_acct_A, out_service_channel_two)[1, 0]
mark_two_corr_acct_b = numpy.corrcoef(num_acct_B, out_service_channel_two)[1, 0]
mark_two_corr_balance = numpy.corrcoef(cust_balancenormal, out_service_channel_two)[1, 0]
#Marketing Response Channel Three
mark_three_corr_acct_a = numpy.corrcoef(num_acct_A, out_mark_channel_three)[1, 0]
mark_three_corr_acct_b = numpy.corrcoef(num_acct_B, out_mark_channel_three)[1, 0]
mark_three_corr_balance = numpy.corrcoef(cust_balancenormal, out_mark_channel_three)[1, 0]
#Marketing Response Channel Four
mark_four_corr_acct_a = numpy.corrcoef(num_acct_A, out_mark_channel_four)[1, 0]
mark_four_corr_acct_b = numpy.corrcoef(num_acct_B, out_mark_channel_four)[1, 0]
mark_four_corr_balance = numpy.corrcoef(cust_balancenormal, out_mark_channel_four)[1, 0]
#Result Correlations For Exporting to CSV of all Correlations
result_correlation = [(demo_one_corr_balance),(demo_two_corr_balance),(demo_one_corr_acct_a),(demo_one_corr_acct_b),(demo_two_corr_acct_a),(demo_two_corr_acct_b),(mark_one_corr_acct_a),(mark_one_corr_acct_b),(mark_one_corr_balance),
(mark_two_corr_acct_a),(mark_two_corr_acct_b),(mark_two_corr_balance),(mark_three_corr_acct_a),(mark_three_corr_acct_b),(mark_three_corr_balance),(mark_four_corr_acct_a),(mark_four_corr_acct_b),
(mark_four_corr_balance)]
result_correlation_nan_nuetralized = numpy.nan_to_num(result_correlation)
c.writerow(result_correlation)
**result_correlation_combined = emptylist.append([result_correlation])
cust_delete_list = [0,x_customer,1]
overalldata = numpy.delete(overalldata, (cust_delete_list), axis=0)**
为了扩展,当我给一个10个客户的文件,每个文件有12个月的数据时,我会收到一个130行的输出文件,它应该只有10个。 –