删除DataFrame中的列名称

在sparkR中我有一个DataFrame data。当我键入head(data)我们得到如下的输出删除DataFrame中的列名称

C0  C1    C2   C3 
1 id user_id foreign_model_id machine_id 
2 1 3145    4   12 
3 2 4079    1   8 
4 3 1174    7   1  
5 4 2386    9   9  
6 5 5524    1   7

我想删除C0,C1,C2,C3，因为他们给我的问题后一个。例如，当我使用filter功能：

filter(data,data$machine_id==1)

不能因为这个运行。

我看过这样的

data <- read.df(sqlContext, "/home/ole/.../data", "com.databricks.spark.csv")

来源

2016-03-07 Ole Petersen

我想你已经读了错误的数据帧。您想要删除数据框的列名称并将行1作为新的列名称，对吗？ –

是的，这是正确的。 –

然后你不能这样做：'colnames（data）= data [1，]'和'data = data [-1，]'？ – Konrad

SparkR将头部放入第一行，并给DataFrame一个新头，因为头部选项的默认值为“false”。将header选项设置为header =“true”，然后你就不必处理这个问题。

data <- read.df(sqlContext, "/home/ole/.../data", "com.databricks.spark.csv", header="true")

来源

2016-03-07 19:19:34 xyzzy

这是唯一正确的答案。 –

数据尝试

colnames(data) <- unlist(data[1,]) 
data <- data[-1,] 
> data 
# id user_id foreign_model_id machine_id 
#2 1 3145    4   12 
#3 2 4079    1   8 
#4 3 1174    7   1 
#5 4 2386    9   9 
#6 5 5524    1   7

如果你愿意，你可以在第一行的删除后添加rownames(data) <- NULL以纠正行号。

该操作后，可以选择喜欢

subset(data, data$machine_id==1) 
# id user_id foreign_model_id machine_id 
#4 3 1174    7   1

在碱R对应于特定条件的行，该函数filter()在OP建议是stats命名空间的一部分，并且通常保留给分析的时间序列。

数据

data <- structure(list(C0 = structure(c(6L, 1L, 2L, 3L, 4L, 5L), 
     .Label = c("1", "2", "3", "4", "5", "id"), class = "factor"), 
     C1 = structure(c(6L, 3L, 4L, 1L, 2L, 5L), .Label = c("1174", "2386", 
     "3145", "4079", "5524", "user_id"), class = "factor"), 
     C2 = structure(c(5L, 2L, 1L, 3L, 4L, 1L), 
    .Label = c("1", "4", "7", "9", "foreign_model_id"), class = "factor"), 
     C3 = structure(c(6L, 2L, 4L, 1L, 5L, 3L), 
     .Label = c("1", "12", "7", "8", "9", "machine_id"), class = "factor")), 
    .Names = c("C0", "C1", "C2", "C3"), class = "data.frame", 
    row.names = c("1", "2", "3", "4", "5", "6"))

来源

2016-03-07 09:37:08 RHertel

你为什么'unlist'获取姓氏？不会'colnames（data）< - data [1，]'做到这一点吗？ – Sotos

@Sotos你尝试过吗？读取文件的默认设置是'stringsAsFactors = TRUE'，这就是为什么你的建议不起作用，至少在我的电脑上不行。当我使用你的代码时，名字是“6”，“6”，“5”，“6”，这对应于第一行中条目的级别号码（在列内）。 – RHertel

啊......好吧，我明白了。它需要成为一个角色才能在没有'unlist'的情况下工作。欢呼声 – Sotos

试试这个

names <- c() 
for (i in seq(along = names(data))) { 
    names <- c(names, toString(data[1,i])) 
} 

names(data) <- names 
data <- data[-1,]

来源

2016-03-07 09:39:17 Paul

我根本就没有因为sparkR它不能运行使用的答案：object of type 'S4' is not subsettable。我以这种方式解决了这个问题，但是，我认为有一个更好的解决方法。

data <- withColumnRenamed(data, "C0","id") 
data <- withColumnRenamed(data, "C1","user_id") 
data <- withColumnRenamed(data, "C2","foreign_model_id") 
data <- withColumnRenamed(data, "C3","machine_id")

而现在，我可以成功地使用filter函数，因为我想。

来源

2016-03-07 09:58:37

删除DataFrame中的列名称

回答

相关问题