2017-09-02 64 views
0

这个话题的this onethis one相当搭配。 我的烦恼来自我无法通过功能/代码tibbles列表中的所有元素。我知道如何一行一行地得到想要的结果,但不能在整体上做到。R:总结一下,滴列在列表中更改数据框中的名称,并保存结果针对Env

为主题,让我们采取两种tibbles在结构上与我的真实情况非常相似。

MyRes_tw <- structure(list(text = c("follow @SmartRE_Info and get your token in waves t.co/g3q4XelPaK #SmartRE", 
"RT @investFeed: Make your FEED work for you - check out this blog on the power of the FEED token: t.co/JOHSCeitGc", 
"RT @investFeed: WE HAVE NOW PASSED 8,000 $ETH IN OUR TOKEN SALE PURCHASED! t.co/bx7s1xWyXI #ICO #Tokensale t.co/ZFndFhUfVT" 
), Tweet.id = c("889602043249254400", "889589518159945729", "889573909405679616" 
), created.date = structure(c(17371, 17371, 17371), class = "Date"), 
    created.week = c(30, 30, 31), retweet = c(0, 0, 0), custom = c(0, 
    0, 0)), .Names = c("text", "Tweet.id", "created.date", "created.week", 
"retweet", "custom"), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame")) 

MyRes1_tw <- structure(list(text = c("RT @AmbrosusAMB: We are on the front page of #NASDAQ/#Editorial Choice, Proud #Ethereum #Blockchain #ICO #TGE @Nasdaq @gavofyork @jutta_s…", 
"RT @MyBit_DApp: 10 minutes left in #mybit #tokensale over 10,000 #ethereum contributed! Check it out t.co/AgyRCcyyzD", 
"RT @MyBit_DApp: only 23 ETH left now", "RT @MyBit_DApp: #MyBit #tokensale ends in ~1 hour. 9k+ $ETH raised so far. Only 125 #ethereum left at 25% discount. t.co/AgyRCcyyzD", 
"RT @MyBit_DApp: ~12 hours left in the t.co/AgyRCcyyzD #TokenSale #ICO 25% Bonus activated for #ethereum $ether #bitcoin $BTC $xbt" 
), Tweet.id = c("897499492219445252", "897487635442274305", "897487621714305024", 
"897487610494558208", "897487593117450244"), created.date = structure(c(17393, 
17393, 17393, 17393, 17393), class = "Date"), created.week = c(33, 
33, 34, 34, 34), retweet = c(0, 0, 0, 0, 0), custom = c(0, 0, 
0, 0, 0)), .Names = c("text", "Tweet.id", "created.date", "created.week", 
"retweet", "custom"), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame")) 

这两个df是来自Twitter的数据。我想这样做对他们的一些整齐,最后才能得到这些结果:

MyRes <- structure(list(created.week = c(33, 34, 35), retweet = c(12, 
0, 8), custom = c(0, 0, 2), Twitter.name = c("MyRes", "MyRes", 
"MyRes")), .Names = c("created.week", "retweet", "custom", "Twitter.name" 
), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame" 
)) 

MyRes1 <- structure(list(created.week = c(33, 34, 35), retweet = c(12, 
0, 8), custom = c(0, 0, 2), Twitter.name = c("MyRes1", "MyRes1", 
"MyRes1")), .Names = c("created.week", "retweet", "custom", "Twitter.name" 
), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame" 
)) 

请注意,名字是很重要的,每个结果tibble是从开始tibbles的名称与_tw的名字被丢弃。

而且,请注意,在最后的结果,最后一列$ Twitter.name应反映tibble名。

列表我tibbles在我的环境这样myUser.tw <- ls(,pattern = "_tw"),因为它们与_tw结束的唯一对象。

我写了这个功能的帮助:

MySummarize <- function(x){ 
    summarise(group_by(x, created.week, Retweet.count = sum(retweet), Custom.count = sum(custom))) 
} 

现在到了痛!下面是我的工作代码:

testLst <- mget(myUser.tw) %>% 
    lapply(function(x) MySummarize(x)) %>% 
    list2env(testLst, envir = .GlobalEnv) 

然后,我不能找到一个办法:

  1. 变化DF的名字让迈尔斯,MyRes1姓名
  2. 所有添加一列包含上述文本(迈尔斯,MyRes1)行
  3. 结果保存在我的环境。

不管你信不信,我已经在这个很长一段时间。我希望能帮助完成我的整个代码。谢谢

回答

1

一个可行的方法:

# list of tibbles with tw 
myUser.tw.list <- mget(myUser.tw) 

# perform lapply over the sequence of positions rather than the list of elements 
myUser <- lapply(seq(myUser.tw), 
     function(i){ 
     myUser.tw.list[i][[1]] %>% group_by(created.week) %>% 
      summarise(retweet = sum(retweet), custom = sum(custom)) %>% 
      ungroup() %>% 
      mutate(Twitter.name = gsub("_tw$", "", names(myUser.tw.list[i]))) 
     }) 
names(myUser) <- gsub("_tw$", "", myUser.tw) 

结果:tibbles与名称

> myUser 
$MyRes 
# A tibble: 2 x 4 
    created.week retweet custom Twitter.name 
     <dbl> <dbl> <dbl>  <chr> 
1   30  0  0  MyRes 
2   31  0  0  MyRes 

$MyRes1 
# A tibble: 2 x 4 
    created.week retweet custom Twitter.name 
     <dbl> <dbl> <dbl>  <chr> 
1   33  0  0  MyRes1 
2   34  0  0  MyRes1 
2

,目前还不清楚什么是“东风”指的是,但如果目标是获取摘要列表标有源列:

library(dplyr) 

myUser.tw %>% 
    mget(.GlobalEnv) %>% 
    lapply(MySummarize) %>% 
    bind_rows(.id = "source") %>% 
    mutate(source = sub("_tw$", "", source)) %>% 
    split(.$source) 

,并提供:

$MyRes 
# A tibble: 2 x 4 
# Groups: created.week, Retweet.count [2] 
    source created.week Retweet.count Custom.count 
    <chr>  <dbl>   <dbl>  <dbl> 
1 MyRes   30    0   0 
2 MyRes   31    0   0 

$MyRes1 
# A tibble: 2 x 4 
# Groups: created.week, Retweet.count [2] 
    source created.week Retweet.count Custom.count 
    <chr>  <dbl>   <dbl>  <dbl> 
1 MyRes1   33    0   0 
2 MyRes1   34    0   0 

或者如果您想要单个数据帧省略split

+0

TY您的解决方案列表。我必须承认我不知道我喜欢哪一个,所以,让我们说“女孩的力量”。 :-) – gabx

相关问题