2010-12-14 84 views
0

我想通过使用Python 2.6.5和R 10.0的RPY2运行rpart。rpy2的问题,rpart从python正确传递数据到r

我在Python中创建一个数据帧并一起传递,但我得到一个错误,指出:

Error in function (x) : binary operation on non-conformable arrays 
Traceback (most recent call last): 
    File "partitioningSANDBOX.py", line 86, in <module> 
    model=r.rpart(**rpart_params) 
    File "build/bdist.macosx-10.3-fat/egg/rpy2/robjects/functions.py", line 83, in __call__ 
    File "build/bdist.macosx-10.3-fat/egg/rpy2/robjects/functions.py", line 35, in __call__ 
rpy2.rinterface.RRuntimeError: Error in function (x) : binary operation on non-conformable arrays 

谁能帮助我确定我在做什么错抛出这个错误?

我的代码的相关部分是这样的:

import numpy as np 
import rpy2 
import rpy2.robjects as rob 
import rpy2.robjects.numpy2ri 


#Fire up the interface to R 
r = rob.r 
r.library("rpart") 

datadict = dict(zip(['responsev','predictorv'],[cLogEC,csplitData])) 
Rdata = r['data.frame'](**datadict) 
Rformula = r['as.formula']('responsev ~.') 
#Generate an RPART model in R. 
Rpcontrol = r['rpart.control'](minsplit=10, xval=10) 
rpart_params = {'formula' : Rformula, \ 
     'data' : Rdata, 
     'control' : Rpcontrol} 
model=r.rpart(**rpart_params) 

两个变量cLogEC和csplitData是浮动式的numpy的阵列。

另外,我的数据帧看起来像这样:

In [2]: print Rdata 
------> print(Rdata) 
    responsev predictorv 
1 0.6020600  312 
2 0.3010300  300 
3 0.4771213  303 
4 0.4771213  249 
5 0.9242793  239 
6 1.1986571  297 
7 0.7075702  287 
8 1.8115750  270 
9 0.6020600  296 
10 1.3856063  248 
11 0.6127839  295 
12 0.3010300  283 
13 1.1931246  345 
14 0.3010300  270 
15 0.3010300  251 
16 0.3010300  246 
17 0.3010300  273 
18 0.7075702  252 
19 0.4771213  252 
20 0.9294189  223 
21 0.6127839  252 
22 0.7075702  267 
23 0.9294189  252 
24 0.3010300  378 
25 0.3010300  282 

和式看起来像这样:

In [3]: print Rformula 
------> print(Rformula) 
responsev ~ . 
+0

R中的数据帧是列表。也许你应该将数组传递给数组或矩阵? – 2010-12-14 01:58:56

+0

我尝试传递矩阵,但也抛出了错误。有趣的是,如果我将r.plsr替换为r.rpart,它可以很好地工作,rpart和plsr都会说他们需要数据作为data.frame .... – mishaF 2010-12-14 02:48:55

回答

5

的课题在rpart包作为R特质码(准确地说,在以下块,特别是最后一行:

m <- match.call(expand.dots = FALSE) 
m$model <- m$method <- m$control <- NULL 
m$x <- m$y <- m$parms <- m$... <- NULL 
m$cost <- NULL 
m$na.action <- na.action 
m[[1L]] <- as.name("model.frame") 
m <- eval(m, parent.frame()) 

)。

解决此问题的一种方法是避免输入该代码块(请参见下文),或者可能会从Python构建嵌套评估(以便parent.frame()行为)。这并不像人们希望的那么简单,但可能我会在未来找到时间让它更容易。

from rpy2.robjects import DataFrame, Formula 
import rpy2.robjects.numpy2ri as npr 
import numpy as np 
from rpy2.robjects.packages import importr 
rpart = importr('rpart') 
stats = importr('stats') 

cLogEC = np.random.uniform(size=10) 
csplitData = np.array(range(10), 'i') 

dataf = DataFrame({'responsev': cLogEC, 
        'predictorv': csplitData}) 
formula = Formula('responsev ~.') 
rpart.rpart(formula=formula, data=dataf, 
      control=rpart.rpart_control(minsplit = 10, xval = 10), 
      model = stats.model_frame(formula, data=dataf)) 
+0

您的答案非常完美,解决方案完美无缺。非常感谢!我正在拉我的头发。 – mishaF 2010-12-14 17:18:08