2017-03-06 58 views
-1

我有一个txt文件,看起来完全一样:组织.txt文件到数据帧中的R

ENVI ASCII Plot File [Sun Mar 5 00:06:04 2017] 
Column 1: Band Number 
Column 2: Mean: red_1 [Magenta] 20 points~~7 
Column 3: Mean: red_2 [Red] 12 points~~2 
Column 4: Mean: red_3 [Green] 12 points~~3 
Column 5: Mean: red_4 [Blue] 15 points~~4 
Column 6: Mean: red_5 [Yellow] 20 points~~5 
Column 7: Mean: red_6 [Cyan] 25 points~~6 
Column 8: Mean: red_7 [Maroon] 16 points~~8 
Column 9: Mean: red_8 [Sea Green] 6 points~~9 
Column 10: Mean: red_9 [Purple] 12 points~~10 
Column 11: Mean: red_10 [Coral] 6 points~~11 
Column 12: Mean: bcs_1 [Aquamarine] 16 points~~12 
Column 13: Mean: bcs_2 [Orchid] 16 points~~13 
Column 14: Mean: bcs_3 [Sienna] 30 points~~14 
Column 15: Mean: bcs_4 [Chartreuse] 16 points~~15 
Column 16: Mean: bcs_5 [Thistle] 25 points~~16 
Column 17: Mean: bcs_6 [Red1] 16 points~~17 
Column 18: Mean: bcs_7 [Red2] 15 points~~18 
Column 19: Mean: bcs_8 [Red3] 12 points~~19 
Column 20: Mean: bcs_9 [Green1] 20 points~~20 
Column 21: Mean: bcs_10 [Green2] 20 points~~21 
1.000000 0.061581 0.078073 0.057892 0.065844 0.090056 0.088098  0.089036 0.077258 0.055721 0.124091 0.037674 0.040654 0.037246 0.049291 0.041737 0.052611 0.059882 0.057625 0.054079 0.053647 
2.000000 0.042688 0.037923 0.045340 0.046383 0.046419 0.047063 0.053226 0.049161 0.028502 0.026902 0.057672 0.045742 0.028775 0.041979 0.038616 0.046102 0.053043 0.029172 0.045776 0.040539 
3.000000 0.018434 0.036316 0.032751 0.024035 0.027343 0.027738 0.036514 0.014953 0.022183 0.034359 0.010836 0.014596 0.011336 0.014386 0.011091 0.016790 0.014971 0.016921 0.016966 0.019890 
4.000000 0.018490 0.015526 0.018201 0.014678 0.016888 0.013276 0.024992 0.019930 0.014847 0.007780 0.018094 0.009815 0.006283 0.014529 0.012734 0.009747 0.011569 0.007291 0.013920 0.008032 

,我想提出一个数据帧,每个投资回报率(即red_1,red_2,red_3等等......)是一行,乐队编号值是列。这将涉及转移数据,我不知道该怎么做。最后的数据帧应该是这样的:

ROI Band_1 Band_2 Band_3 Band_4 
Red_1 0.061581 0.042688 0.018434 0.018490 
Red_2 0.078073. 0.037923 0.036316 0.018489 
... and so forth 

到目前为止,我有这样的:

# create an index for the lines that are needed 
txt[-1:-22] # removes all rows except data 

# find lines with names of ROIs 
rep_date_entries = grep("Mean:", txt) 

如何调换值的任何线索将不胜感激!

+0

它看起来像你想我们为你写一些代码。尽管许多用户愿意为遇险的编码人员编写代码,但他们通常只在海报已尝试自行解决问题时才提供帮助。展示这一努力的一个好方法是包含迄今为止编写的代码,示例输入(如果有的话),期望的输出以及实际获得的输出(输出,回溯等)。您提供的细节越多,您可能会收到的答案就越多。检查[FAQ](http://stackoverflow.com/tour)和[如何提问](http://stackoverflow.com/questions/how-to-ask)。 – TigerhawkT3

+0

我投票结束这个问题作为题外话,因为SO不是一个免费的编码服务。 – TigerhawkT3

+0

哦拍!我没有意识到这一点。我给了一个尝试,然后编辑这个问题。 – JAG2024

回答

1

使用:

# reading the text file 
txt <- readLines('name_of_file.txt') 

# extract the columnnames from the text file 
colnms <- sapply(strsplit(grep('^Column ', txt, value = TRUE),':'), function(i) trimws(tail(i,1))) 
colnms <- sub('(\\w+).*', '\\1', colnms) 

# reading the data lines into a dataframe with 'read.table' 
# and use the 'col.names' parameter to assign the column names 
dat <- read.table(text = txt, skip = 22, header = FALSE, col.names = colnms) 

# reshape the data into the desired format 
library(reshape2) 
dat2 <- recast(dat, variable ~ paste0('Band_',Band), id.var = 'Band') 
names(dat2)[1] <- 'ROI' 

会给:

> dat2 
     ROI Band_1 Band_2 Band_3 Band_4 
1 red_1 0.061581 0.042688 0.018434 0.018490 
2 red_2 0.078073 0.037923 0.036316 0.015526 
3 red_3 0.057892 0.045340 0.032751 0.018201 
4 red_4 0.065844 0.046383 0.024035 0.014678 
5 red_5 0.090056 0.046419 0.027343 0.016888 
6 red_6 0.088098 0.047063 0.027738 0.013276 
7 red_7 0.089036 0.053226 0.036514 0.024992 
8 red_8 0.077258 0.049161 0.014953 0.019930 
9 red_9 0.055721 0.028502 0.022183 0.014847 
10 red_10 0.124091 0.026902 0.034359 0.007780 
11 bcs_1 0.037674 0.057672 0.010836 0.018094 
12 bcs_2 0.040654 0.045742 0.014596 0.009815 
13 bcs_3 0.037246 0.028775 0.011336 0.006283 
14 bcs_4 0.049291 0.041979 0.014386 0.014529 
15 bcs_5 0.041737 0.038616 0.011091 0.012734 
16 bcs_6 0.052611 0.046102 0.016790 0.009747 
17 bcs_7 0.059882 0.053043 0.014971 0.011569 
18 bcs_8 0.057625 0.029172 0.016921 0.007291 
19 bcs_9 0.054079 0.045776 0.016966 0.013920 
20 bcs_10 0.053647 0.040539 0.019890 0.008032 

重塑数据的最后一步,也可以与data.table包进行:

library(data.table) 
dcast(melt(setDT(dat), id = 1, variable.name = 'ROI'), ROI ~ paste0('Band_',Band)) 
+0

嗨@Jaap。还有一个问题。当我使用比我在这里给出的例子更多ROI的文本文件时,我的代码不会输出“Band_1”。你能指出哪一行代码选择Band_1,Band_2等,并将它们放入列中? – JAG2024

+0

@ JAG2024'max(grep('^ Column',txt))'会给出以*“Column”*开头的最后一行的值。该值应该用于'read.table'中的'skip'参数。 – Jaap