2017-05-24 164 views
1

我是熊猫新手。我正在尝试制作一个带有邮政编码,该邮政编码中的人口数字和邮政编码中的县数的数据集。熊猫数据框合并

我从人口普查网站上的数据:https://www2.census.gov/geo/docs/maps-data/data/rel/zcta_county_rel_10.txt

我用下面的代码很努力,但它不工作。你能帮我弄清楚正确的代码吗?我有一个预感,错误是由于数据框或与数据类型相关的排序。但是我无法弄清楚正确的代码是否正确。请让我知道你的想法。先谢谢你!

import pandas as pd 

df = pd.read_csv("zcta_county_rel_10.txt", dtype={'ZCTA5': str, 'STATE': str, 'COUNTY': str}, usecols=['ZCTA5', 'STATE', 'COUNTY', 'ZPOP']) 

zcta_pop = df.drop_duplicates(subset={'ZCTA5', 'ZPOP'}).drop(['STATE', 'COUNTY'], 1) 

zcta_ct_county = df['ZCTA5'].value_counts() 

zcta_ct_county.columns = ['ZCTA5', 'CT_COUNTY'] 

pre_merge_1 = pd.merge(zcta_pop, zcta_ct_county, on='ZCTA5')[['ZCTA5', 'ZPOP', 'CT_COUNTY']] 

这是我的错误信息:

Traceback (most recent call last):  
File "<stdin>", line 1, in <module>  
File "/usr/local/python27/lib/python2.7/site-packages/pandas/tools/merge.py", line 58, in merge copy=copy, indicator=indicator) 
File "/usr/local/python27/lib/python2.7/site-packages/pandas/tools/merge.py", line 473, in __init__ 'type {0}'.format(type(right)))  
ValueError: can not merge DataFrame with instance of type <class 'pandas.core.series.Series'> 

SOLUTION

import pandas as pd 
df = pd.read_csv("zcta_county_rel_10.txt", dtype={'ZCTA5': str, 'STATE': str, 'COUNTY': str}, usecols=['ZCTA5', 'STATE', 'COUNTY', 'ZPOP']) 
zcta_pop = df.drop_duplicates(subset={'ZCTA5', 'ZPOP'}).drop(['STATE', 'COUNTY'], 1) 
zcta_ct_county = df['ZCTA5'].value_counts().reset_index() 
zcta_ct_county.columns = ['ZCTA5', 'CT_COUNTY'] 
pre_merge_1 = pd.merge(zcta_pop, zcta_ct_county, on='ZCTA5')[['ZCTA5', 'ZPOP', 'CT_COUNTY']] 
+0

如果您回答了您自己的问题,您应该将解决方案作为答案发布,并将其标记为“已接受”(而不是将其留在问题本身中)。 –

+0

感谢提醒,@jezrael帮我解决了这个问题。该解决方案发布在我的帖子底部。下面接受jezrael的答案。 – Counter10000

回答

1

我想你需要添加reset_index,因为value_counts输出Series,需要DataFrame 2列:

zcta_ct_county = df['ZCTA5'].value_counts().reset_index()