创建大型数据框

假设我想从头开始生成大型数据框。创建大型数据框

使用data.frame函数是我通常如何创建数据框。但是，df类似以下内容极易出错且效率低下。

那么有没有更有效的方式来创建下面的数据框。

df <- data.frame(GOOGLE_CAMPAIGN=c(rep("Google - Medicare - US", 928), rep("MedicareBranded", 2983), 
            rep("Medigap", 805), rep("Medigap Branded", 1914), 
            rep("Medicare Typos", 1353), rep("Medigap Typos", 635), 
            rep("Phone - MedicareGeneral", 585), 
            rep("Phone - MedicareBranded", 2967), 
            rep("Phone-Medigap", 812), 
            rep("Auto Broad Match", 27), 
            rep("Auto Exact Match", 80), 
            rep("Auto Exact Match", 875)),     
       GOOGLE_AD_GROUP=c(rep("Medicare", 928), rep("MedicareBranded", 2983), 
            rep("Medigap", 805), rep("Medigap Branded", 1914), 
            rep("Medicare Typos", 1353), rep("Medigap Typos", 635), 
            rep("Phone ads 1-Medicare Terms",585), 
            rep("Ad Group #1", 2967), rep("Medigap-phone", 812), 
            rep("Auto Insurance", 27), 
            rep("Auto General", 80), 
            rep("Auto Brand", 875)))

哎呀，那是一些'坏'的代码。如何以更高效的方式生成这个“大型”数据框？

来源

2011-08-26 ATMathew

我很喜欢c好歹你为什么在这两列中都有这么多重复的数据。通常，当我在一列中重复数据时，它会在另一列中变化或循环（以二进制计算）。 – Owen

如果你对这些信息的唯一来源是一张纸，那么你可能不会得到太多比这更好的，但你至少可以整合所有到一个单一的rep呼吁每一列：

#I'm going to cheat and not type out all those strings by hand 
x <- unique(df[,1]) 
y <- unique(df[,2]) 

#Vectors of the number of times for each  
x1 <- c(928,2983,805,1914,1353,635,585,2967,812,27,955) 
y1 <- c(x1[-11],80,875) 

dd <- data.frame(GOOGLE_CAMPAIGN = rep(x, times = x1), 
       GOOGLE_AD_GROUP = rep(y, times = y1))

这应该是相同的：

> all.equal(dd,df) 
[1] TRUE

但是，如果这个信息已经在R中的数据结构在某种程度上，你只需要变换它可能会更容易，但我们需要知道那个结构是什么。

来源

2011-08-26 23:25:54 joran

该死的..再次击败我...... – John

@John对不起。我已经丢失了我丢弃的答案的数量，因为有人殴打我，如果这让你感觉更好。 – joran

它不......你可以给我发送一百万美元......这将有助于 – John

手动，（1）创建该数据帧：

> dfu <- unique(df) 
> rownames(dfu) <- NULL 
> dfu 
      GOOGLE_CAMPAIGN   GOOGLE_AD_GROUP 
1 Google - Medicare - US     Medicare 
2   MedicareBranded   MedicareBranded 
3     Medigap     Medigap 
4   Medigap Branded   Medigap Branded 
5   Medicare Typos    Medicare Typos 
6   Medigap Typos    Medigap Typos 
7 Phone - MedicareGeneral Phone ads 1-Medicare Terms 
8 Phone - MedicareBranded    Ad Group #1 
9   Phone-Medigap    Medigap-phone 
10  Auto Broad Match    Auto Insurance 
11  Auto Exact Match    Auto General 
12  Auto Exact Match     Auto Brand

和（2）的长度的这种载体：

> lens <- rle(as.numeric(interaction(df[[1]], df[[2]])))$lengths 
> lens 
[1] 928 2983 805 1914 1353 635 585 2967 812 27 80 875

从这两个输入（dfu和lens），我们可以重建df （这里叫做df2）：

> df2 <- dfu[rep(seq_along(lens), lens), ] 
> rownames(df2) <- NULL 
> identical(df, df2) 
[1] TRUE

来源

2011-08-26 23:56:22

创建大型数据框

回答

相关问题