尽管R studio显示df_intrate确实是ASSET CLASS A的预期行数,但df_intrate中是否保留了df的结构信息?
是的。这是变量是如何分类的,被称为因素,被存入R - 无论是水平,所有可能的值的向量,并采取了实际值,存储:
x = factor(c('a', 'b', 'c', 'a', 'b', 'b'))
x
# [1] a b c a b b
# Levels: a b c
y = x[1]
# [1] a
# Levels: a b c
可以摆脱未使用的水平与droplevels()
,或通过重新应用factor
功能,创建一个新的因素出来的唯一的东西存在:
droplevels(y)
# [1] a
# Levels: a
factor(y)
# [1] a
# Levels: a
您还可以使用droplevels
一个数据帧从所有的因素列删除所有未使用的水平:
dat = data.frame(x = x)
str(dat)
# 'data.frame': 6 obs. of 1 variable:
# $ x: Factor w/ 3 levels "a","b","c": 1 2 3 1 2 2
str(dat[1, ])
# Factor w/ 3 levels "a","b","c": 1
str(droplevels(dat[1, ]))
# Factor w/ 1 level "a": 1
虽然无关,你目前的问题,我们还要提到的是factor
有一个可选的参数levels
可用于指定一个系数的水平和他们应该去的顺序。如果你想要一个特定的顺序(可能用于绘图或建模),或者如果有更多可能的层次比实际存在的层次并且你想包含它们,这可能很有用。如果您未指定levels
,则默认将按字母顺序排列。
x = c("agree", "disagree", "agree", "neutral", "strongly agree")
factor(x)
# [1] agree disagree agree neutral strongly agree
# Levels: agree disagree neutral strongly agree
## not a good order
factor(x, levels = c("disagree", "neutral", "agree", "strongly agree"))
# [1] agree disagree agree neutral strongly agree
# Levels: disagree neutral agree strongly agree
## better order
factor(x, levels = c("strongly disagree", "disagree", "neutral", "agree", "strongly agree"))
# [1] agree disagree agree neutral strongly agree
# Levels: strongly disagree disagree neutral agree strongly agree
## good order, more levels than are actually present
您可以使用?reorder
和?relevel
(或只是factor
再次)更改级别的顺序对已创建的因素。
请注意,如果您使用'read.csv(path,as.is = TRUE)',那么您将获得字符列代替因子列。还要注意'header = TRUE'和'sep =',''''是'read.csv'的默认值,所以你不必指定它们。 –