2017-07-14 91 views
1

我有这个数据帧:如何基于列名称对数据框进行子集划分?

dput(df) 
structure(list(Server = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "servera", class = "factor"), 
    Date = structure(1:6, .Label = c("7/13/2017 15:01", "7/13/2017 15:02", 
    "7/13/2017 15:03", "7/13/2017 15:04", "7/13/2017 15:05", 
    "7/13/2017 15:06"), class = "factor"), Host_CPU = c(1.812950134, 
    2.288070679, 1.563278198, 1.925239563, 5.350669861, 2.612503052 
    ), UsedMemPercent = c(38.19, 38.19, 38.19, 38.19, 38.19, 
    38.22), jvm1 = c(10.91, 11.13, 11.34, 11.56, 11.77, 11.99 
    ), jvm2 = c(11.47, 11.7, 11.91, 12.13, 12.35, 12.57), jvm3 = c(75.65, 
    76.88, 56.93, 58.99, 65.29, 67.97), jvm4 = c(39.43, 40.86, 
    42.27, 43.71, 45.09, 45.33), jvm5 = c(27.42, 29.63, 31.02, 
    32.37, 33.72, 37.71)), .Names = c("Server", "Date", "Host_CPU", 
"UsedMemPercent", "jvm1", "jvm2", "jvm3", "jvm4", "jvm5"), class = "data.frame", row.names = c(NA, 
-6L)) 

我只希望能够基于该变量的向量名子集这个数据帧:

select<-c("jvm3", "jvm4", "jvm5") 

所以,我最后的DF应该像这个:

structure(list(Server = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "servera", class = "factor"), 
    Date = structure(1:6, .Label = c("7/13/2017 15:01", "7/13/2017 15:02", 
    "7/13/2017 15:03", "7/13/2017 15:04", "7/13/2017 15:05", 
    "7/13/2017 15:06"), class = "factor"), Host_CPU = c(1.812950134, 
    2.288070679, 1.563278198, 1.925239563, 5.350669861, 2.612503052 
    ), UsedMemPercent = c(38.19, 38.19, 38.19, 38.19, 38.19, 
    38.22), jvm3 = c(75.65, 76.88, 56.93, 58.99, 65.29, 67.97 
    ), jvm4 = c(39.43, 40.86, 42.27, 43.71, 45.09, 45.33), jvm5 = c(27.42, 
    29.63, 31.02, 32.37, 33.72, 37.71)), .Names = c("Server", 
"Date", "Host_CPU", "UsedMemPercent", "jvm3", "jvm4", "jvm5"), class = "data.frame", row.names = c(NA, 
-6L)) 

有什么想法吗?

+1

解决的办法是:'DF [选择]' –

+2

'DF [C( “服务器”, “日期”, “Host_CPU”, “UsedMemPercent”,选择)]'。或者,您可以使用'df [,c(“Server”, “Date”,“Host_CPU”,“UsedMemPercent”,select)]'。或者'子集(select = c(“Server”,“Date”,“Host_CPU”,“UsedMemPercent”,select))'。有关详细信息,请参阅'?subset'。或'?['。 – Gregor

+0

请注意,非常感谢您采取额外的措施将dput的输出修改为可直接粘贴到R中的内容。因此,如果你将它粘贴到'your_data < - {在这里插入dput输出}' – Dason

回答

1

保存你的数据帧给一个变量DF:

df <- 
    structure(
    list(
     Server = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "servera", class = "factor"), 
     Date = structure(
     1:6, 
     .Label = c(
      "7/13/2017 15:01", 
      "7/13/2017 15:02", 
      "7/13/2017 15:03", 
      "7/13/2017 15:04", 
      "7/13/2017 15:05", 
      "7/13/2017 15:06" 
     ), 
     class = "factor" 
    ), 
     Host_CPU = c(
     1.812950134, 
     2.288070679, 
     1.563278198, 
     1.925239563, 
     5.350669861, 
     2.612503052 
    ), 
     UsedMemPercent = c(38.19, 38.19, 38.19, 38.19, 38.19, 
         38.22), 
     jvm1 = c(10.91, 11.13, 11.34, 11.56, 11.77, 11.99), 
     jvm2 = c(11.47, 11.7, 11.91, 12.13, 12.35, 12.57), 
     jvm3 = c(75.65, 
       76.88, 56.93, 58.99, 65.29, 67.97), 
     jvm4 = c(39.43, 40.86, 
       42.27, 43.71, 45.09, 45.33), 
     jvm5 = c(27.42, 29.63, 31.02, 
       32.37, 33.72, 37.71) 
    ), 
    .Names = c(
     "Server", 
     "Date", 
     "Host_CPU", 
     "UsedMemPercent", 
     "jvm1", 
     "jvm2", 
     "jvm3", 
     "jvm4", 
     "jvm5" 
    ), 
    class = "data.frame", 
    row.names = c(NA,-6L) 
) 

df[,select]应该是什么youre寻找

+0

这个答案不起作用 – user1471980

+0

@ user1471980如果你明显地创建了'select',这个回答很好,但你没有并没有说明你还想保留其他几个。 –

+1

@ user1471980是的,我误解了你的问题,看起来像你需要:'cbind(df [,1:4],df [,select])' –

1

这里有一种方法:

df[,c(1:4,7:9)]

您还可以使用dplyr选择栏目:

select(df, Server,Date,Host_CPU,UsedMemPercent,jvm3,jvm4,jvm5)

4

请重新访问索引。如果R中使用索引机构[,可以使用主要有三种类型的索引:

  • 逻辑矢量:长度相同的列数,TRUE手段选择列
  • 数值向量 :选择基于位置
  • 字符向量列:基于名称选择栏

如果您使用的数据帧索引机制,可以通过两种方式处理这些对象:

  • 作为一个列表,因为它们是在内部列出
  • 作为基质,因为他们模拟天生在许多情况下,矩阵的行为

iris数据框为例,比较您可以从数据框中选择列的多种方式。如果你把它当作一个列表,您有以下两种选择:

使用[[如果你想在矢量形式的单个列:

iris[["Species"]] 
# [1] setosa  setosa  setosa ... : is a vector 

使用[,如果你想一列或多列,但你需要一个回数据帧:

iris["Species"] 
iris[c("Sepal.Width", "Species")] 

如果你把它当作一个矩阵,你只是做同样的,你会用一个矩阵做。如果不指定任何行索引,这些命令实际上是等同于上面所用的那些:

iris[ , "Species"] # is the same as iris[["Species"]] 
iris[ , "Species", drop = FALSE] # is the same as iris["Species"] 
iris[ , c("Sepal.Width", "Species")] # is the same as iris[c("Sepal.Width", "Species")] 

所以你的情况,你只需要:在子

select <- c("Server","Date","Host_CPU","UsedMemPercent", 
      "jvm3","jvm4","jvm5") 
df[select] 

注:子集的作品,但只能交互使用。有帮助页面上的警告,指出:

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

相关问题