2014-10-05 81 views
1

我有以下列格式中的文件的矩阵:在套7列的布局读入一个矩阵至R

  V1  V2  V3  V4  V5  V6  V7 
[1,] 17.67787 12.375978 12.007860 16.089949 24.864464 37.64243 42.711561 
... 
[10,] 42.89655 21.535867 7.975470 6.580414 10.326551 11.06297 11.201733 
     V8  V9  V10  V11  V12  V13  V14 
[1,] 30.41993 35.46864 16.97427 10.992030 11.408483 17.417670 33.815149 
... 
[10,] 

10行和N列向量。

如何将这个读入R中作为矩阵?

扫描抛出一个错误:如果我打电话 “as.matrix(函数read.table(..))”

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : 
    scan() expected 'a real', got 'V1' 

我得到:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : 
    line 11 did not have 8 elements 
+0

我认为这将是有益的张贴文件的实际'头-11'阅读在list文件有问题。话虽如此,如果额外的列标题(如'V9'-'V16')真的穿插在文件中,那么您需要使用'readLines',卷起袖子并做一些后期处理。 – hrbrmstr 2014-10-05 04:29:11

+0

编辑我的文章,以包括我需要阅读的文件的实际提取。 – raf 2014-10-05 04:38:02

+0

@akun我得到:错误在read.table(“centroid0.txt”,填充= TRUE,标题= TRUE): 重复'行。姓名'不允许 – raf 2014-10-05 07:06:21

回答

1

你可以尝试:

lines <- readLines(textConnection("V1 .... #... all the data you showed in `pastebin` 
       ... 33.21421")) 

如果您是从file

lines <- readLines("raf.txt") 
lines1 <- gsub("\\[.*]","",lines) #remove the `[,number]` part in the beginning 
library(stringr) 
lines2 <- str_trim(lines1) #remove the trailing/leading spaces 

这里读它,我分裂lines2成团在list中使用由grepcumsum创建的索引,以便每组新的组具有标题和数据。然后使用lapplyread.table

lst1 <- lapply(split(lines2, cumsum(grepl("^V", lines2))), 
        function(x) read.table(text=x, header=TRUE)) 
    names(lst1) <- NULL 
    res <- do.call(`cbind`, lst1) 

如果你想将它转换为矩阵

m1 <- as.matrix(res) 
dim(res) 
#[1] 10 128 

res[1:3,1:3] 
#  V1  V2  V3 
#1 17.67787 12.37598 12.00786 
#2 29.44688 19.44888 15.06014 
#3 30.49377 19.64495 11.15946 
0

我建立了一个文本对象,'txt',符合您的原始描述。

dput(txt) 
"   V1 V2 V3 V4 V5 V6 V7 V8\n [1] 10074 10146 10079 10091 10040 10066 10009 10152\n [2] 10137 10136 10032 10139 10038 10122 1\n [3] 10046 10120 10062 10061 10149 10029 10030 10059\n [4] 10003 10028 10148 10050 10057 10100 10144 10084\n [5] 10076 10012 10114 10073 10026 10135 10130 10083\n [6] 10007 10119 10063 10078 10086 10160 10125 10087\n [7] 10031 10090 10021 10092 10093 10067 10106 10129\n [8] 10004 10102 10113 10134 10042 10064 10037 10140\n [9] 10101 10156 10060 10121 10097 10002 10109 10033\n[10] 10075 10096 10024 10089 10115 10147 10036 10103\n   V9 V10 V11 V12 V13 V14 V15 V16\n [1] 10153 10107 10049 10143 10047 10126 10039 10018\n [2] 10065 10127 10048 10133 10108 10124 10117 10077\n [3] 10105 10051 10131 10069 10098 10058 10088 10006\n [4] 10132 10104 10112 10138 10128 10027 10043 10145\n [5] 10010 10072 10151 10111 10110 10052 10020 10082\n [6] 10023 10016 10044 10158 10159 10041 10155 10019\n [7] 10099 10008 10094 10142 10045 10068 10070 10015\n [8] 10013 10080 10053 10071 10085 10014 10056 10034\n [9] 10022 10011 10150 10054 10154 10035 10081 10118\n[10] 10116 10055 10017 10005 10025 10157 10141 10001" 

tcon <- textConnection(txt) # the first description did not have commas 

代替txt你可以用一个文件()调用;其原理是您可以逐步读取连接:

cbind(read.table(text= readLines(tcon,n=11), header=TRUE), # first 11 lines 
     read.table(text= readLines(tcon,n=11), header=TRUE)) # second 11 

     V1 V2 V3 V4 V5 V6 V7 V8 
[1] 10074 10146 10079 10091 10040 10066 10009 10152 
[2] 10137 10136 10032 10139 10038 10122 1
[3] 10046 10120 10062 10061 10149 10029 10030 10059 
[4] 10003 10028 10148 10050 10057 10100 10144 10084 
[5] 10076 10012 10114 10073 10026 10135 10130 10083 
[6] 10007 10119 10063 10078 10086 10160 10125 10087 
[7] 10031 10090 10021 10092 10093 10067 10106 10129 
[8] 10004 10102 10113 10134 10042 10064 10037 10140 
[9] 10101 10156 10060 10121 10097 10002 10109 10033 
[10] 10075 10096 10024 10089 10115 10147 10036 10103 
     V9 V10 V11 V12 V13 V14 V15 V16 
[1] 10153 10107 10049 10143 10047 10126 10039 10018 
[2] 10065 10127 10048 10133 10108 10124 10117 10077 
[3] 10105 10051 10131 10069 10098 10058 10088 10006 
[4] 10132 10104 10112 10138 10128 10027 10043 10145 
[5] 10010 10072 10151 10111 10110 10052 10020 10082 
[6] 10023 10016 10044 10158 10159 10041 10155 10019 
[7] 10099 10008 10094 10142 10045 10068 10070 10015 
[8] 10013 10080 10053 10071 10085 10014 10056 10034 
[9] 10022 10011 10150 10054 10154 10035 10081 10118 
[10] 10116 10055 10017 10005 10025 10157 10141 10001 

这对长文件执行相同的操作。 ,并转化成矩阵仍然是微不足道的:

txt <-readLines(file("~/Downloads/YjwpsANG.txt")) 
tcon <-textConnection(txt) 
X <- cbind( read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE), 
read.table(text= readLines(tcon,n=11), header=TRUE)) 
+0

我可能解释得不好,我想要读取矩阵然后我可以对它做矩阵运算 – raf 2014-10-05 05:27:50

+1

我给了你一个10 XN的数据帧,只需要用'data.matrix'函数转换成一个矩阵 – 2014-10-05 05:29:49

+0

其实这似乎并不奏效。我只是一遍又一遍地得到相同的11行... – raf 2014-10-05 06:26:06