R嵌套for循环迭代行和列名

Here's a Dropbox link to .csv of my data。

我有从1990年至2010年的国家的数据。我的数据很广泛：每个国家都是一排，每年有两列对应两个数据源。但是，一些国家的数据并不完整。例如，一个国家行在1990 - 1995年的列中可能有NA值。

我想创建两列，并且对于每个国家行，我希望这些列中的值为两个数据类型中每一个的最早非NA 值。

我还想创建两个其他列，并且对于每个国家/地区行，我希望这些列中的值是这两种数据类型中最早的非NA 年。

所以最后四列会是这样的：

1990, 12, 1990, 87 
1990, 7, 1990, 132 
1996, 22, 1996, 173 
1994, 14, 1994, 124

这里是我想象中的嵌套的循环我粗略半伪代码的企图将如下所示：

for i in (number of rows){ 
    for j in names(df){ 
    if(is.na(df$j) == FALSE) df$earliest_year = j 
    } 
}

如何我可以生成这些所需的四列吗？谢谢！

来源

2017-04-09 Jim

您提到的循环;所以我试着做一个for-loop。但你可能想尝试其他的R函数，比如稍后申请。此代码是一个有点冗长，希望这可以帮助你：

# read data; i'm assuming the first column is row name and not important 
df <- read.csv("wb_wide.csv", row.names = 1) 

# get names of columns for the two datasource 
# here I used grep to find columns names using NY and SP pattern; 
# but if the format is consistentto be alternating, 
# you can use sequence of number 
dataSourceA <- names(df)[grep(x = names(df), pattern = "NY")] 
dataSourceB <- names(df)[grep(x = names(df), pattern = "SP")] 

# create new columns for the data set 
# if i understand it correctly, first non-NA data from source 1 
# and source 2; and then the year of these non-NAs 
df$sourceA <- vector(length = nrow(df)) 
df$yearA <- vector(length = nrow(df)) 
df$sourceB <- vector(length = nrow(df)) 
df$yearB <- vector(length = nrow(df)) 

# start for loop that will iterate per row 
for(i in 1:nrow(df)){ 

    # this is a bit nasty; but the point here is to first select columns for source A 
    # then determine non-NAs, after which select the first and store it in the sourceA column 
    df$sourceA[i] <- df[i, dataSourceA][which(!is.na(df[i , dataSourceA]))[1]] 

    # another nasty one; but I used gsub to clean the column name so that the year will be left 
    # you can also skip this and then just clean afterward 
    df$yearA[i] <- gsub(x = names(df[i, dataSourceA][which(!is.na(df[i , dataSourceA]))[1]]), 
       pattern = "^.*X", replacement = "") 

    # same with the first bit of code, but here selecting from source B 
    df$sourceB[i] <- df[i, dataSourceB][which(!is.na(df[i , dataSourceB]))[1]] 

    # same with the second bit for source B 
    df$yearB[i] <- gsub(x = names(df[i, dataSourceB][which(!is.na(df[i , dataSourceB]))[1]]), 
       pattern = "^.*X", replacement = "") 

}

我试图使代码具体到你的榜样，并希望输出。

来源

2017-04-09 02:17:32 din

这太棒了！非常感谢！！非常有帮助的解释。 – Jim

R嵌套for循环迭代行和列名

回答

相关问题