@GiantsLoveDeathMetal具有很好的点。原则上,您可以读取oecd_bli
中的原始数据,并选择满足某些条件的DataFrame的子集。
演示
import pandas as pd
# Given a DataFrame of raw data
d = {
"Country": pd.Series(["Australia", "Austria", "Fiji", "Japan"]),
"Indicator": pd.Series(["Dwellings ...", "Dwellings ...", "Life ...", "Life ..."]),
"Value": pd.Series([1.1, 1.0, 2.2, 2.9]),
}
oecd_bli = pd.DataFrame(d, columns=["Country", "Indicator", "Value"])
oecd_bli
# Select rows starting with "Life" in column "Indicator"
condition = oecd_bli["Indicator"].str.startswith("Life")
subset = oecd_bli[condition]
subset
可替代地,通过选择.loc
使用标签的索引的子集:
subset = oecd_bli.loc[condition, :]
这里loc
预计[<rows>, <columns>]
。因此,显示符合条件的那些行。
详细
通知数据帧的视图被呈现的每一行,给出了一个True
条件。这是因为DataFrame响应boolean arrays。一个布尔阵列的
实施例:
>>> condition = oecd_bli["Indicator"].str.startswith("Life")
>>> condition
0 False
1 False
2 True
3 True
Name: Indicator, dtype: bool
其他方式设置条件:
>>> condition = oecd_bli["Indicator"] == "Life ..."
>>> condition = ~oecd_bli["Indicator"].str.startswith("Dwell")
>>> condition = oecd_bli["Indicator"].isin(["Life ...", "Crime ..."])
>>> condition = (oecd_bli["Indicator"] == "Life ...") | (oecd_bli["Indicator"] == "Crime ...")
- 直接平等(
==
)
- 排除(
~
)不希望出现
- 包括通过列入白名单的列
- 与逻辑位运算符(
|
,&
等)
你不需要做'oecd_bli = pd.read_csv( “/用户/ vladelec /桌面/ Life.csv”) DF多重比较= pd.DataFrame(oecd_bli)'只有第一行。 – GiantsLoveDeathMetal
[删除基于列值的Pandas中的DataFrame行]可能的副本(https://stackoverflow.com/questions/18172851/deleting-dataframe-row-in-pandas-based-on-column-value) – GiantsLoveDeathMetal