2017-09-14 67 views
0

请帮我弄清楚如何做到这一点。我有一个数据框。在“指标”栏中有一堆不同的参数(字符串),但我只需要“生活满意度”。我不知道如何删除其他指标,如“没有基础设施的住房”及其相应的价值观和国家。从其他列删除字符串行及其相应的值

import numpy as np 
import pandas as pd 

oecd_bli = pd.read_csv("/Users/vladelec/Desktop/Life.csv") 
df = pd.DataFrame(oecd_bli) 
df.drop(df.columns[[0,2,4,5,6,7,8,9,10,11,12,13,15,16]], axis=1, inplace=True) 
#dropped other columns that a do not need 

这里是我的数据框的截图:

Example of Dataframe

+0

你不需要做'oecd_bli = pd.read_csv( “/用户/ vladelec /桌面/ Life.csv”) DF多重比较= pd.DataFrame(oecd_bli)'只有第一行。 – GiantsLoveDeathMetal

+0

[删除基于列值的Pandas中的DataFrame行]可能的副本(https://stackoverflow.com/questions/18172851/deleting-dataframe-row-in-pandas-based-on-column-value) – GiantsLoveDeathMetal

回答

1

你可以在你的数据加载像这样:

file_name = "/Users/vladelec/Desktop/Life.csv" 

# Columns you want to load 
keep_cols = ['Country', 'Indicator'] 

# pd.read_csv() will load the data into a pd.DataFrame 
oecd_bli = pd.read_csv(file_name, usecols=keep_cols) 

如果只想"Life Satisfaction"Indicator那么你就可以请执行以下操作:

oecd_bli = oecd_bli[oecd_bli['Indicator'] == "Life Satisfaction"] 

如果您有更多的Indicators你想保持,那么你可以这样做:

keep_indicators = [ 
    "Life Satisfaction", 
    "Crime Indicator", 
] 

oecd_bli = oecd_bli[oecd_bli['Indicator'].isin(keep_indicators)] 
+0

谢谢你man为您的答案! –

+0

不要忘记接受答案 – GiantsLoveDeathMetal

0

@GiantsLoveDeathMetal具有很好的点。原则上,您可以读取oecd_bli中的原始数据,并选择满足某些条件的DataFrame的子集。

演示

import pandas as pd 


# Given a DataFrame of raw data 
d = { 
    "Country": pd.Series(["Australia", "Austria", "Fiji", "Japan"]), 
    "Indicator": pd.Series(["Dwellings ...", "Dwellings ...", "Life ...", "Life ..."]), 
    "Value": pd.Series([1.1, 1.0, 2.2, 2.9]), 
} 

oecd_bli = pd.DataFrame(d, columns=["Country", "Indicator", "Value"]) 
oecd_bli 

enter image description here

# Select rows starting with "Life" in column "Indicator" 
condition = oecd_bli["Indicator"].str.startswith("Life") 
subset = oecd_bli[condition] 
subset 

enter image description here

可替代地,通过选择.loc使用标签的索引的子集:

subset = oecd_bli.loc[condition, :] 

这里loc预计[<rows>, <columns>]。因此,显示符合条件的那些行。


详细

通知数据帧的视图被呈现的每一行,给出了一个True条件。这是因为DataFrame响应boolean arrays。一个布尔阵列的

实施例:

>>> condition = oecd_bli["Indicator"].str.startswith("Life") 
>>> condition 

0 False 
1 False 
2  True 
3  True 
Name: Indicator, dtype: bool 

其他方式设置条件:

>>> condition = oecd_bli["Indicator"] == "Life ..." 
>>> condition = ~oecd_bli["Indicator"].str.startswith("Dwell") 
>>> condition = oecd_bli["Indicator"].isin(["Life ...", "Crime ..."]) 
>>> condition = (oecd_bli["Indicator"] == "Life ...") | (oecd_bli["Indicator"] == "Crime ...") 
  1. 直接平等(==
  2. 排除(~)不希望出现
  3. 包括通过列入白名单的列
  4. 与逻辑位运算符(|&等)
相关问题