2017-04-09 92 views
0

我是R新手,所以请原谅基本问题。R嵌套for循环迭代行和列名

Here's a Dropbox link to .csv of my data。

我有从1990年至2010年的国家的数据。我的数据很广泛:每个国家都是一排,每年有两列对应两个数据源。但是,一些国家的数据并不完整。例如,一个国家行在1990 - 1995年的列中可能有NA值。

我想创建两列,并且对于每个国家行,我希望这些列中的值为两个数据类型中每一个的最早非NA

我还想创建两个其他列,并且对于每个国家/地区行,我希望这些列中的值是这两种数据类型中最早的非NA

所以最后四列会是这样的:

1990, 12, 1990, 87 
1990, 7, 1990, 132 
1996, 22, 1996, 173 
1994, 14, 1994, 124 

这里是我想象中的嵌套的循环我粗略半伪代码的企图将如下所示:

for i in (number of rows){ 
    for j in names(df){ 
    if(is.na(df$j) == FALSE) df$earliest_year = j 
    } 
} 

如何我可以生成这些所需的四列吗?谢谢!

回答

2

您提到的循环;所以我试着做一个for-loop。但你可能想尝试其他的R函数,比如稍后申请。此代码是一个有点冗长,希望这可以帮助你:

# read data; i'm assuming the first column is row name and not important 
df <- read.csv("wb_wide.csv", row.names = 1) 

# get names of columns for the two datasource 
# here I used grep to find columns names using NY and SP pattern; 
# but if the format is consistentto be alternating, 
# you can use sequence of number 
dataSourceA <- names(df)[grep(x = names(df), pattern = "NY")] 
dataSourceB <- names(df)[grep(x = names(df), pattern = "SP")] 

# create new columns for the data set 
# if i understand it correctly, first non-NA data from source 1 
# and source 2; and then the year of these non-NAs 
df$sourceA <- vector(length = nrow(df)) 
df$yearA <- vector(length = nrow(df)) 
df$sourceB <- vector(length = nrow(df)) 
df$yearB <- vector(length = nrow(df)) 

# start for loop that will iterate per row 
for(i in 1:nrow(df)){ 

    # this is a bit nasty; but the point here is to first select columns for source A 
    # then determine non-NAs, after which select the first and store it in the sourceA column 
    df$sourceA[i] <- df[i, dataSourceA][which(!is.na(df[i , dataSourceA]))[1]] 

    # another nasty one; but I used gsub to clean the column name so that the year will be left 
    # you can also skip this and then just clean afterward 
    df$yearA[i] <- gsub(x = names(df[i, dataSourceA][which(!is.na(df[i , dataSourceA]))[1]]), 
       pattern = "^.*X", replacement = "") 

    # same with the first bit of code, but here selecting from source B 
    df$sourceB[i] <- df[i, dataSourceB][which(!is.na(df[i , dataSourceB]))[1]] 

    # same with the second bit for source B 
    df$yearB[i] <- gsub(x = names(df[i, dataSourceB][which(!is.na(df[i , dataSourceB]))[1]]), 
       pattern = "^.*X", replacement = "") 

} 

我试图使代码具体到你的榜样,并希望输出。

+0

这太棒了!非常感谢!!非常有帮助的解释。 – Jim