从python数据框列中删除非json对象行

我有一个数据框，以便该列包含json对象和字符串。我想摆脱不包含json对象的行。从python数据框列中删除非json对象行

下面是我的数据框的样子：

import pandas as pd 

df = pd.DataFrame({'A': ["hello","world",{"a":5,"b":6,"c":8},"usa","india",{"a":9,"b":10,"c":11}]}) 

print(df)

我应该如何删除只包含字符串的行，使消除这些字符串行后，我可以在下面适用于本列JSON对象转换成据帧的单独列：

from pandas.io.json import json_normalize 
df = json_normalize(df['A']) 
print(df)

来源

2017-10-20 Nikita Gupta

一旦你做了这不是你的JSON df，这是一个字典。但它让我占有尝试有选择地保持那些列肯定:) – roganjosh

是的，由json我的意思是只有dict对象。任何想法如何删除所有包含像“你好”，“世界”等简单字符串的行 –

请问这个问题https://stackoverflow.com/questions/46856988/np-isreal-behavior-different-in- pandas-dataframe-and-numpy-array – Wen

我想我会喜欢使用isinstance检查：

In [11]: df.loc[df.A.apply(lambda d: isinstance(d, dict))] 
Out[11]: 
          A 
2 {'a': 5, 'b': 6, 'c': 8} 
5 {'d': 9, 'e': 10, 'f': 11}

如果要包括数字也一样，你可以这样做：

In [12]: df.loc[df.A.apply(lambda d: isinstance(d, (dict, np.number)))] 
Out[12]: 
          A 
2 {'a': 5, 'b': 6, 'c': 8} 
5 {'d': 9, 'e': 10, 'f': 11}

调整这要包括哪个类型？

的最后一步，json_normalize需要json对象列表，无论出于何种原因系列不好（并给出KeyError），您可以将其作为一个列表并且您的好行为：

In [21]: df1 = df.loc[df.A.apply(lambda d: isinstance(d, (dict, np.number)))] 

In [22]: json_normalize(list(df1["A"])) 
Out[22]: 
    a b c d  e  f 
0 5.0 6.0 8.0 NaN NaN NaN 
1 NaN NaN NaN 9.0 10.0 11.0

来源

2017-10-20 20:13:34

我更喜欢这个答案。由于其他讨论似乎没有进行，你碰巧知道为什么“isreal”有效，所以你可以指引我在阅读的正确方向？ – roganjosh

在应用您的代码后应用“规范化代码”，它会给出关键错误。 –

@roganjosh我不知道，我认为你需要看代码 - 我不认为np.isreal是打算像那样使用的（我不想依赖它） –

df[df.applymap(np.isreal).sum(1).gt(0)] 
Out[794]: 
          A 
2 {'a': 5, 'b': 6, 'c': 8} 
5 {'d': 9, 'e': 10, 'f': 11}

来源

2017-10-20 19:59:55 Wen

请解释一下，它到底在做什么 –

我也对这样做有困惑。文档不会提供太多，当然对于字符串。这是副作用吗？ – roganjosh

'df [df.applymap（np.isreal）.values]'可能更简洁一点。 – cmaher

如果你想要一个丑陋的解决方案，也可以......这里是我创建的一个函数，它查找只包含字符串的列，并返回df减去那些行。（因为你的df只有一列，你只需要包含所有字典的1列的数据框）。然后，从那里开始，您需要使用 df = json_normalize(df['A'].values)而不仅仅是df = json_normalize(df['A'])。

对于单个列数据框...

import pandas as pd 
import numpy as np 
from pandas.io.json import json_normalize 
def delete_strings(df): 
    nrows = df.shape[0] 
    rows_to_keep = [] 
    for row in np.arange(nrows): 
     if type(df.iloc[row,0]) == dict: 
      rows_to_keep.append(row) #add the row number to list of rows 
            #to keep if the row contains a dict 
    return df.iloc[rows_to_keep,0] #return only rows with dicts 
df = pd.DataFrame({'A': ["hello","world",{"a":5,"b":6,"c":8},"usa","india", 
         {"a":9,"b":10,"c":11}]}) 
df = delete_strings(df) 
df = json_normalize(df['A'].values) 
print(df) 
#0  {'a': 5, 'b': 6, 'c': 8} 
#1 {'a': 9, 'b': 10, 'c': 11}

对于多列DF（还与一列DF）：

def delete_rows_of_strings(df): 
    rows = df.shape[0] #of rows in df 
    cols = df.shape[1] #of coluns in df 
    rows_to_keep = [] #list to track rows to keep 
    for row in np.arange(rows): #for every row in the dataframe 
     #num_string will count the number of strings in the row 
     num_string = 0 
     for col in np.arange(cols): #for each column in the row... 
      #if the value is a string, add one to num_string 
      if type(df.iloc[row,col]) == str: 
       num_string += 1 
     #if num_string, the number of strings in the column, 
     #isn't equal to the number of columns in the row... 
     if num_string != cols: #...add that row number to the list of rows to keep 
      rows_to_keep.append(row) 
    #return the df with rows containing at least one non string 
    return(df.iloc[rows_to_keep,:]) 


df = pd.DataFrame({'A': ["hello","world",{"a":5,"b":6,"c":8},"usa","india"], 
         'B' : ['hi',{"a":5,"b":6,"c":8},'sup','america','china']}) 
#       A       B 
#0      hello      hi 
#1      world {'a': 5, 'b': 6, 'c': 8} 
#2 {'a': 5, 'b': 6, 'c': 8}      sup 
print(delete_rows_of_strings(df)) 
#       A       B 
#1      world {'a': 5, 'b': 6, 'c': 8} 
#2 {'a': 5, 'b': 6, 'c': 8}      sup

来源

2017-10-20 20:19:13

从python数据框列中删除非json对象行

回答

相关问题