如何将变量变为定量？

我有一个数据矩阵（900列和5000行），我想要做一个pca ..如何将变量变为定量？

该矩阵看起来非常好在Excel中（意味着所有的值都是定量的），但在我读我的文件在R中，并尝试运行pca代码，我得到一个错误，说“下面的变量不是定量的”，我得到一个非定量变量列表。

所以一般来说，一些变量是定量的，有些不是。请参阅以下示例。当我检查变量1时，它是正确和定量的..（随机的一些变量在文件中是定量的）当我检查变量2时，它是不正确的和非定量的..（随机一些像这样的变量是非 - 定量在文件中）

> data$variable1[1:5] 
[1] -0.7617504 -0.9740939 -0.5089303 -0.1032487 -0.1245882 

> data$variable2[1:5] 
[1] -0.183546332959017 -0.179283451229594 -0.191165669598284 -0.187060515423038 
[5] -0.184409474669824 
731 Levels: -0.001841783473108 -0.001855956210119 ... -1,97E+05

所以我的问题是，我怎么能将所有的非定量变量转化为定量？

缩短文件并没有帮助，因为这些值本身就是定量的。我不知道发生了什么事。所以这里是我的原始文件链接< - https://docs.google.com/file/d/0BzP-YLnUNCdwakc4dnhYdEpudjQ/edit

我也试过下面给出的答案，但它仍然没有帮助。

那么让我告诉正是我做了什么，

> data <- read.delim("file.txt", header=T) 
> res.pca = PCA(data, quali.sup=1, graph=T) 
Error in PCA(data, quali.sup = 1, graph = T) : 
The following variables are not quantitative: batch 
The following variables are not quantitative: target79 
The following variables are not quantitative: target148 
The following variables are not quantitative: target151 
The following variables are not quantitative: target217 
The following variables are not quantitative: target266 
The following variables are not quantitative: target515 
The following variables are not quantitative: target530 
The following variables are not quantitative: target587 
The following variables are not quantitative: target620 
The following variables are not quantitative: target730 
The following variables are not quantitative: target739 
The following variables are not quantitative: target801 
The following variables are not quantitative: target803 
The following variables are not quantitative: target809 
The following variables are not quantitative: target819 
The following variables are not quantitative: target868 
The following variables a 
In addition: There were 50 or more warnings (use warnings() to see the first 50)

来源

2013-02-28 Letin

我可能是错的，但我怀疑97E + 05做的伎俩。检查包含诸如非数字之类的东西的条目。您是否以CSV格式导出？ – 2013-02-28 09:58:26

@ sebastian-c我现在删除文件中的所有值与“E”（如-1,97E + 05）..我仍然得到相同的错误..我把它导出为“文本制表符分隔”..另一件事情是，检查变量1和变量2的差异。量化变量很短，非定量变量很长。 – Letin 2013-02-28 10:08:06

您的数据如何从Excel转换为R？这是你在变量2中的一个因素。 – themel 2013-02-28 10:09:08

考虑R的变量因素，如阿伦提及。因此它会生成一个data.frame（实际上是一个列表）。有许多方法可以解决这个问题，可以通过以下方式将其转换为数据矩阵;

matrix <- as.numeric(as.matrix(data)) 
dim(matrix) <- dim(data)

现在，您可以在矩阵上运行PCA。

编辑：

扩展的例子了一下，查理的建议的第二部分将无法工作。复制下面的会话，看看它是如何工作的;

d <- data.frame(
a = factor(runif(2000)), 
b = factor(runif(2000)), 
c = factor(runif(2000))) 

as.numeric(d) #does not work on a list (data frame is a list) 

as.numeric(d$a) # does work, because d$a is a vecor, but this is not what you are 
# after. R converts the factor levels to numeric instead of the actual value. 

(m <- as.numeric(as.matrix(d))) # this does the rigth thing 
dim(m)      # but m loses the dimensions and is now a vector 

dim(m) <- dim(d)    # assign the dimensions of d to m 

svd(m)      # you can do the PCA function of your liking on m

来源

2013-02-28 11:07:21 Edwin

谢谢埃德温。让我试试这个，然后回来。我只是花时间重新运行我对文件的分析并回到特定的错误。并且还会链接到我的文件。让我回过头来说一下它是否可行。 – Letin 2013-02-28 11:13:06

默认情况下，R将字符串强制为因子。这可能会导致意外的行为。关闭此默认选项有：

 read.csv(x, stringsAsFactors=F)

可以，或者，强制因素与数字

 newVar<-as.numeric(oldVar)

来源

2013-02-28 11:18:56 charlie

嘿查理，谢谢你的回复。但它在这里说file_new < - as.numeric（文件）错误：（列表）对象不能被强制输入'double' – Letin 2013-02-28 12:22:16

由于对象'file_new'是用类dataframe创建的，因此会出现该错误，因为某些变量是数字，有些是字符。（用'class（file_new）'检查） – 2013-02-28 12:55:46

你是对的。我应该更清楚。你不能强制整个数据帧。而且，正如埃德温正确指出的那样，你可能不想。根据我的经验，默认转换为read.table（）中的因子会导致头痛。我设置了我的编辑器，默认输入“stringsAsFactor = FALSE”。 – charlie 2013-02-28 21:34:15

如何将变量变为定量？

回答

相关问题