2016-01-23 61 views
0

我有一个数据表DT,下面给出:R和数据选择

DSanity.markWinsorize <- function(dt, colnames) 
{ 
    PERnames <- unlist(lapply(colnames, function(x) paste0("PER",x))); 
    print(dt[,colnames]) 
    if(length(colnames)>1) 
    {dt[,PERnames] <- sapply(dt[,colnames], Num.calPtile);} 
    else 
    {dt[,PERnames] <- Num.calPtile(dt[,colnames]);} 

    return(dt) 
} 

## Calculate Percentile score of a data vector 
Num.calPtile <- function(x) 
{ 
    return((ecdf(x))(x)) 
} 

此功能的工作是创造新:

structure(list(IM = c(0.830088495575221, 0.681436210847976, 0.498810939357907, 
    0.47265400115141, 0.527908540685945, 0.580763582966226, 0.408069043807859, 
    0.467368671545006, 0.44662887412295, 0.0331974034502217, 0.0368210899219588, 
    0.0333698233772947, 0.0294312465832275, 0.578743426515361, 0.566950053134963, 
    0.808756701221038, 0.585507838980771, 0.61507839619537, 0.586388329979879, 
    0.794196637085474), CM = c(0.876991150442478, 0.996180290297937, 
    0.651605231866825, 0.824409902130109, 0.94418291862811, 0.961820851688693, 
    0.943861532396347, 1.10137922144883, 1.1524325077831, 0.128868067469359, 
    0.155932251596297, 0.159414951213752, 0.196968075413411, 1.19678937171326, 
    0.901168969181722, 3.42528220866977, 2.4377239516641, 2.0040870054458, 
    1.86099597585513, 1.51928615911568), RM = c(0.601769911504425, 
    0.495034377387319, 0.405469678953627, 0.368451352907311, 0.361802286482851, 
    0.320851688693098, 0.791548118347242, 0.816050925099649, 0.786622368849031, 
    0.545805622636092, 0.594370732740163, 0.594771872860171, 0.536043514857356, 
    0.617215610296153, 0.619287991498406, 0.602602774009141, 0.634069706132375, 
    0.596543561108693, 0.582203219315895, 0.695985131558462)), .Names = c("IM", "CM", "RM"), class = c("data.table", "data.frame"), row.names 
    = c(NA, 
    -20L), .internal.selfref = <pointer: 0x00000000003f0788>) 

下面给出我写了一个函数列,计算提供给函数markWinsorize的列的每个数据点的百分位数。

在这里,我试图运行功能markWinsorize:

colnames <- c('CM','AM','BM') 
DSanity.markWinsorize(dt,colnames) 

我收到以下错误:

> sdc1 <- DSanity.markWinsorize(sdc,colnames) 
[1] "CM" "AM" "BM" 
Show Traceback 

重新运行调试

Error in approxfun(vals, cumsum(tabulate(match(x, vals)))/n, method = "constant", : 
    zero non-NA points In addition: Warning message: 
In xy.coords(x, y) : NAs introduced by coercion 

这将是巨大的,如果一些你可以在这里帮助我。谢谢。

回答

1

你的方法很笨拙。我推荐一种全新的方法。

library(dplyr) 

colnames <- c("CM", "AM", "BM") 

dt %>% 
    select_(.dots = colnames) %>% 
    mutate_each(funs(ntile(., 100))) 

我认为这给你想要的东西(也许加上%>% bind_cols(dt))。

+0

感谢您的解决方案。它很棒! – Sumit