1
我有一个txt文件,看起来像这样:阅读文本文件作为所需数据帧格式
Alabama[edit]
Auburn (Auburn University, Edward Via College of Osteopathic Medicine)
Birmingham (University of Alabama at Birmingham, Birmingham School of
Alaska[edit]
Anchorage[21] (University of Alaska Anchorage)
Fairbanks (University of Alaska Fairbanks)[16]
我想看书txt文件作为一个数据帧,看起来像这样:
state county
Alabama Auburn
Alabama Birmingham
Alaska Anchorage
Alaska Faibanks
我至今是:
university_towns = open('university_towns.txt','r')
df_university_towns = pd.DataFrame(columns={'State','RegionName'})
# loop over each line of the file object
# determine if each line is state or county.
# if the line has [edit], it's state
for line in university_towns:
state_pattern = re.compile('\[edit\]')
state_pattern_m = state_pattern.search(line)
county_pattern = re.compile('(')
county_pattern_m = county_pattern.search(line)
if state_pattern_m:
#extract everything before \[edit]
print(state_pattern_m.start())
end_position = state_pattern_m.start()
print(line[0:end_position])
state_name = line[0:end_position]
if county_pattern_m:
#extract everything before (
这个代码将只给我这样的:
State County
Alabama Auburn
Birminham
.
.
.