pandas read_csv方法从列表中建立索引

我需要通过复制和粘贴来自维基百科的一些数据来读取创建的csv文件。这些数据是根据其来源分类的大学列表。我想要做的是将这些数据导入熊猫数据框中，其中索引是州的名称。但是，当我使用read_csv导入csv时，数据是一维的，州名与大学名称在同一列。从这个数据框我现在应该从第一列中提取状态并将它们用作索引。不知道如何做到这一点。我想我可以尝试一个for/if循环与状态名称列表;但可能会有更快更优雅的方式。有什么建议吗？pandas read_csv方法从列表中建立索引

这是CSV文件的样子：

Alabama[edit] 
Auburn (Auburn University, Edward Via College of Osteopathic Medicine)[14] 
Birmingham (University of Alabama at Birmingham, Birmingham School of Law, Cumberland School of Law, Miles Law School)[15] 
Dothan (Fortis College, Troy University Dothan Campus, Alabama College of Osteopathic Medicine) 
Florence (University of North Alabama) 
Homewood (Samford University) 
Huntsville (University of Alabama, Huntsville) 
Jacksonville (Jacksonville State University)[16] 
Livingston (University of West Alabama)[16] 
Mobile (University of South Alabama)[17] 
Montevallo (University of Montevallo, Faulkner University)[16] 
Montgomery (Alabama State University, Huntingdon College, Auburn University at 
Montgomery, H. Councill Trenholm State Technical College, Faulkner University) 
Troy (Troy University)[16] 
Tuscaloosa (University of Alabama, Stillman College, Shelton State)[18][19] 
Tuskegee (Tuskegee University)[20] 
Alaska[edit] 
Anchorage[21] (University of Alaska Anchorage) 
Fairbanks (University of Alaska Fairbanks)[16] 
Juneau (University of Alaska Southeast) 
Ketchikan (University of Alaska Southeast-extended campus) 
Sitka (University of Alaska Southeast-extended campus)

非常感谢！

来源

2017-07-29 Jemba88

你可以把数据的样本？ – MedAli

刚刚粘贴的部分数据 – Jemba88

感谢@ayhan！我从那里解决了 – Jemba88

如pandas.read_csv文档中所述，您可以使用index_col来定义csv文件中的哪一列用作索引。

针对您的特殊情况下，这里是工作的代码示例，你需要把你的数据在文件中，编辑下面的代码读取该文件

import pandas as pd 


# read your data into a list of lines 
with open("/tmp/data.txt", "rb") as myfile: 
    data= myfile.readlines() 

# strip whitespaces from each line 
data = [i.strip() for i in data] 

# split each line with space to a list of words 
data = [i.split(" ") for i in data] 

# create a list of lists where 
# each list contains the state name in the first element 
# and the other words in the second element 
data = [[i[0], " ".join(i[1:])] for i in data] 

# create a data frame from the prepared data 
data = pd.DataFrame(data, columns=["state", "university"]) 

# convert the state column to the dataframe index 
data = data.set_index("state") 

# see the results 
print(data.head())

的结果是这样的：

             university 
state               
Alabama[edit]             
Auburn   (Auburn University, Edward Via College of Oste... 
Birmingham  (University of Alabama at Birmingham, Birmingh... 
Dothan   (Fortis College, Troy University Dothan Campus... 
Florence       (University of North Alabama)

来源

2017-07-29 08:47:58 MedAli

不幸的是，csv也只有1列。 – Jemba88

这个方法可能适用于一些调整。问题的方式是，在'状态'栏中应该只有数据后面有[edit]，而在另一列中应该有剩余的数据与（）。在大学状态发生变化之前，'州'栏应该包含相同的值。例如，“状态”栏的前8行应该是阿拉巴马州的8倍，相当于“大学专栏”中从该州列出的8所大学。虽然我在另一个线程中找到了答案。非常感谢您的时间！ – Jemba88

pandas read_csv方法从列表中建立索引

回答

相关问题