将pandas对象提取到列表中并提取唯一值

因此，我有一个存储为pandas数据框对象的文件字典，并且我通过for循环访问每个文件以提取'Country'列。我要做的是将每个这些提取到一个列表中，然后采取列表的整个列表的集合。下面是代码，我的困境：将pandas对象提取到列表中并提取唯一值

country_setter = [] 
     for file in files_list: 
     country_setter.append(all_comps[file]['Country'].tolist()) 

    uni_country_setter = ?

产生的输出是一个列表的列表，每个熊猫DF [“国家”]列取父列表中的列表。它看起来像这样：

[['France', 
    'United States', 
    'Poland', 
    'Poland', 
    'Poland', 
    'Poland', 
    'Hungary', 
    'Poland', 
    'France', 
    'United Kingdom', 
    .... 
    'Namibia', 
    'China', 
    'China', 
    'Ireland'], 
['Netherlands', 
    'Canada', 
    'United States', 
    'Canada', 
    'Canada', 
    'United States', 
    'Sweden', 
    'Sweden', 
    'United Kingdom', 
    .... 
    'Ireland', 
    'Netherlands', 
    'Netherlands', 
    'France', 
    'Hong Kong', 
    'France', 
    'France', 
    'United States', 
    'France', 
    'United States']]

这是一个包含40个单独列表的列表。我可以使用set（country_setter [0]），并且在获取第一个列表的唯一值时工作正常，但我需要知道所有文件的唯一值。

让我知道您是否有任何人可以提供帮助。我通过stackoverflow挖掘，只发现了一个类似的问题，但他们的目标是维护独特提取中的列表结构并使用itertools。我希望在这里列出所有列表中独特的个人价值观。

预先感谢您！

来源

2017-10-14 fattmagan

你可以添加数据样本？ – jezrael

当然，我会给结构。 – fattmagan

@jezrael有帮助吗？ – fattmagan

我想你需要拼合名单，然后由set创造出独特的名单：

uni_country_setter = list(set([item for sublist in country_setter for item in sublist]))

编辑：

第一个循环是没有必要的，可以使用：

uni_country_setter = list(set([item for file in files_list 
           for item in all_comps[file]['Country'].tolist()]))

来源

2017-10-14 19:23:29 jezrael

谢谢！我不认为我可以自己想象。你能解释这个双重“for”呼叫背后的逻辑吗？你是否定义了每个子列表，然后遍历它们？ – fattmagan

也许更好的解释是[这里]（https://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python）flatenning。 – jezrael

将pandas对象提取到列表中并提取唯一值

回答

相关问题