我想我并没有提出正确的问题。没有正确读取数据?
新问题: 我有一个1.5gig tsv文件。它在顶部有6行垃圾,底部有一行垃圾,所有这些我都想在不打开文件的情况下移除。第7行是标题。我有13个标题。行数未知。
怎样文件读入到一个数据帧,这样我可以做基本的描述性统计,箱线图,等....
原题:
嗨
我有这种感觉真的很容易。我只是想念一些东西。
我有一个txt文件,tab分开,顶部有6行垃圾,底部也有垃圾行。 在垃圾我有形式 的Label1 Label2的LABEL3 Label4的数据之间.... Label13 文本ID号百分之....号
这里是我的R中输入:
datadump <- read.delim2("truncate.txt", header=TRUE, skip="6")
cleandata <- datadump[c(-dim(datadump)[1]),]
avgposition <- cleandata$Avg.Position
hist(avgposition)
魅力.POSITION是label13和一些形式的##
然而,我得到一个错误: 错误hist.default(avgposition):“X”必须是数字
为什么没有看到DAT一个数字?
谢谢!
由于这里要求的一些数据:
> dput(cleandata)
structure(list(Account = structure(c(2L, 2L), .Label = c("Crap1",
"XXS"), class = "factor"), Campaign = structure(c(1L, 1L), .Label = c("3098012",
"Crap2"), class = "factor"), Customer.Id = structure(c(2L, 2L
), .Label = c("", "nontech broad (7)"), class = "factor"), Ad.Group = structure(c(2L,
2L), .Label = c("", "RR 236 (300)"), class = "factor"), Keyword = structure(2:3, .Label = c("",
"chagall pro", "matisse"), class = "factor"), Keyword.Matching = structure(c(2L,
2L), .Label = c("", "Broad"), class = "factor"), Impressions = c(4L,
16L), Clicks = c(1L, 1L), CTR = structure(2:3, .Label = c("",
"25.00%", "6.25%"), class = "factor"), Avg.CPC = structure(2:3, .Label = c("",
"$0.05 ", "$0.11 "), class = "factor"), Avg.CPM = structure(2:3, .Label = c("",
"$12.50 ", "$6.88 "), class = "factor"), Cost = structure(2:3, .Label = c("",
"$0.05 ", "$0.11 "), class = "factor"), Avg.Position = structure(2:3, .Label = c("",
"3", "3.1"), class = "factor")), .Names = c("Account", "Campaign",
"Customer.Id", "Ad.Group", "Keyword", "Keyword.Matching", "Impressions",
"Clicks", "CTR", "Avg.CPC", "Avg.CPM", "Cost", "Avg.Position"
), row.names = 1:2, class = "data.frame")
是否有机会发布文本文件几行内容的确切内容? – 2010-09-27 23:05:25
修改数据以保持匿名,但本质上我有1演出它的形式: – datayoda 2010-09-27 23:12:14
尝试使用头(x,5),然后复制并粘贴一个dput(x)它使人们更容易看你的例子。 – 2010-09-27 23:22:37