2016-04-15 232 views
2

我试图将一列熊猫数据框转换为因子,因为我试图在R中调用的函数预计因子。将Pandas Dataframe列转换为R因子

pandas2ri.activate()  
#second column of labels has to be converted to factors 
labels = read_csv(path_to_csv) 
as_factor = ro.r['as.factor'] 
output = package.function(another_df, as_factor(labels['column_name'])) 

以下是错误我得到:

rpy2.rinterface.RRuntimeError: Error in sort.list(y) : 'x' must be atomic for 'sort.list' 
Have you called 'sort' on a list? 

我该怎么办?

重现下面的例子:

import pandas as pd 

df = pd.DataFrame({'Col': [10, 20], 
        'x': ['Control', 'Low_Cav02']}) 

from rpy2 import robjects as ro 

from rpy2.robjects import pandas2ri 
pandas2ri.activate() 

as_factor = ro.r['as.factor'] 

labels = as_factor(df['Col']) 
print labels 

labels = as_factor(df['x']) 
print labels 

输出:

[1] 10 20 
Levels: 10 20 

/Users/swetabh/Envs/damet/lib/python2.7/site-packages/rpy2/robjects/functions.py:106: UserWarning: Error in sort.list(y) : 'x' must be atomic for 'sort.list' 
Have you called 'sort' on a list? 

    res = super(Function, self).__call__(*new_args, **new_kwargs) 
Traceback (most recent call last): 
    File "damet/analysis.py", line 26, in <module> 
    labels = as_factor(df['x']) 
    File "/Users/swetabh/Envs/damet/lib/python2.7/site-packages/rpy2/robjects/functions.py", line 178, in __call__ 
    return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs) 
    File "/Users/swetabh/Envs/damet/lib/python2.7/site-packages/rpy2/robjects/functions.py", line 106, in __call__ 
    res = super(Function, self).__call__(*new_args, **new_kwargs) 
rpy2.rinterface.RRuntimeError: Error in sort.list(y) : 'x' must be atomic for 'sort.list' 
Have you called 'sort' on a list? 
+0

可以尝试显示重复的例子,我们可以运行,以及帮你? –

+1

我不知道它是否可以解决你的问题,但R的因子相当于熊猫类:'df [“some_column”]。astype(“category”)' – ayhan

+0

@MathieuB完成。如果这有帮助的话。 – Swetabh

回答

1

这是工作在我结束就好了。您正在使用哪个版本的rpy2

编辑:原单如下回答 - 我误解了这个问题

如果试图建立的R DataFrame,默认的转换器在rpy2反过来Python列表为R列表。 如果你想要一个R向量,使用向量的构造函数。

你的榜样,这可能是这样的:

df = ro.DataFrame({'Col': ro.vectors.IntVector([10, 20]), 
        'x': ro.vectors.StrVector(['Control', 'Low_Cav02'])}) 
+0

我这样做时出现以下错误:ValueError:如果使用所有标量值,则必须通过索引 – Swetabh

+0

是的。我以某种方式设法误读了这个问题,并在答案中写入了非工作代码。我正在编辑答案。 – lgautier

相关问题