2016-12-26 35 views
1

我有一个问题玩了三个data.frame的R:如何从另一个数据框中的数据框中导入一个变量,以响应基于匹配变量/观测的逻辑测试?

一个是从UNHCR(我只保留了来自欧洲国家)未来:

'data.frame': 41 obs. of 6 variables: 
$ Country  : chr "Albania" "Austria" "Belarus" "Belgium" ... 
$ 2005   : num 92 62770 13202 34593 199518 ... 
$ 2011   : num 106 72046 8036 42107 177653 ... 
$ 2012   : num 7560 74712 7607 41053 177260 ... 
$ 2013   : num 7767 78956 7404 39578 163730 ... 
$ 2014   : num 8026 79285 7628 41719 144115 ... 

二是从World Bank (population, total)未来:

'data.frame': 2640 obs. of 11 variables: 
$ iso2c  : chr "1A" "1A" "1A" "1A" ... 
$ country : chr "Arab World" "Arab World" "Arab World" "Arab World" ... 
$ SP.POP.TOTL: num 3.37e+08 3.21e+08 3.13e+08 3.29e+08 3.77e+08 ... 
$ year  : num 2008 2006 2005 2007 2013 ... 
$ iso3c  : Factor w/ 248 levels "ABW","AFG","AGO",..: 6 6 6 6 6 6 6 6 6 6 ... 
$ region  : Factor w/ 8 levels "Aggregates","East Asia & Pacific (all income levels)",..: 1 1 1 1 1 1 1 1 1 1 ... 
$ capital : Factor w/ 211 levels "","Abu Dhabi",..: 1 1 1 1 1 1 1 1 1 1 ... 
$ longitude : Factor w/ 211 levels "","-0.126236",..: 1 1 1 1 1 1 1 1 1 1 ... 
$ latitude : Factor w/ 211 levels "","-0.229498",..: 1 1 1 1 1 1 1 1 1 1 ... 
$ income  : Factor w/ 7 levels "Aggregates","High income: nonOECD",..: 1 1 1 1 1 1 1 1 1 1 ... 
$ lending : Factor w/ 5 levels "Aggregates","Blend",..: 1 1 1 1 1 1 1 1 1 1 ... 

最后,最后一个是从联合国开发计划署(人类发展指数)(我不能发布链接,因为我没有足够的“声誉”发布超过2个链接,但你可以很容易地找到它通过快速谷歌搜索):

'data.frame': 189 obs. of 13 variables: 
$ HDI.Rank: int 171 85 83 34 149 58 40 85 2 23 ... 
$ Country : chr "Afghanistan" "Albania" "Algeria" "Andorra" ... 
$ X1980 : Factor w/ 96 levels "","0.190","0.199",..: 4 59 1 1 1 1 71 1 1 85 ... 
$ X1985 : Factor w/ 104 levels "","0.199","0.207",..: 5 62 1 1 1 1 77 1 1 93 ... 
$ X1990 : Factor w/ 127 levels "","0.214","0.218",..: 9 68 53 1 1 1 89 70 127 115 ... 
$ X1995 : Factor w/ 129 levels "","0.232","0.241",..: 10 64 56 1 1 1 96 59 128 115 ... 
$ X2000 : Factor w/ 144 levels "","0.257","0.284",..: 13 74 68 1 19 1 107 69 143 127 ... 
$ X2005 : Factor w/ 153 levels "","0.289","0.324",..: 17 85 79 1 29 1 112 85 152 131 ... 
$ X2010 : Factor w/ 159 levels "0.326","0.362",..: 17 90 92 128 35 115 124 89 158 142 ... 
$ X2011 : Factor w/ 159 levels "0.333","0.368",..: 18 94 95 126 37 115 125 91 158 143 ... 
$ X2012 : Factor w/ 163 levels "0.342","0.373",..: 20 95 97 133 36 114 129 94 162 143 ... 
$ X2013 : Factor w/ 161 levels "0.345","0.348",..: 20 91 92 130 37 111 126 90 160 141 ... 
$ X2014 : Factor w/ 160 levels "0.348","0.350",..: 18 88 90 130 37 110 125 88 159 140 ... 

我试图做的是从第二(SP.POP.TOTL)和第三data.frames(X2005,X2011提取一些关键变量,X2012,X2013,X2014)到我的第一个数据框架(基于UNHCR)。

我成功与这个二把手data.frame这样做(在sum在这里,我也需要重新集结两个国家之一,以使变量之间的一些协调base$Country难民署)和pop$country世界银行)):

>base$pop2005 = NA 
>for(pays in base$Country){base$pop2005[base$Country == pays] = sum(pop$SP.POP.TOTL[pop$year == 2005 & pop$country == pays])} 

这(与pop$year == 2011等)复制同一命令的组合给了我第一个data.frame的更新版本:

'data.frame': 41 obs. of 11 variables: 
$ Country  : chr "Albania" "Austria" "Belarus" "Belgium" ... 
$ 2005   : num 92 62770 13202 34593 199518 ... 
$ 2011   : num 106 72046 8036 42107 177653 ... 
$ 2012   : num 7560 74712 7607 41053 177260 ... 
$ 2013   : num 7767 78956 7404 39578 163730 ... 
$ 2014   : num 8026 79285 7628 41719 144115 ... 
$ pop2005  : num 3011487 8227829 9663000 10478617 3833377 ... 
$ pop2011  : num 2904780 8391643 9473000 11047744 3832310 ... 
$ pop2012  : num 2900247 8429991 9464000 11128246 3828419 ... 
$ pop2013  : num 2896652 8479375 9466000 11182817 3823533 ... 
$ pop2014  : num 2893654 8541575 9483000 11231213 3817554 ... 

但对于一些,我认为无论是神秘的,或者由于我缺乏知识的,原因该命令将不会与第三data.frame工作... 正如我试过这个命令:

>base$idh2005 = NA 
>for(pays in base$Country)base$idh2005[base$Country == pays] = hdi$X2005[hdi$Country == pays] 

严格地说,这个命令起作用。但是取代变量hdi $ X2005,我得到了变量HDI.Rank(尽管它在命令中找不到)。我试图用hdi$HDI.Rank <- NULL来抑制变量HDI.Rank,但它没有给出任何结果。

我在命令中做错了什么? (顺便说一句,第一,第二和第三个数据帧中的所有名称都被修正为相似)。

在此之前,非常感谢您花时间阅读我以及任何能够回复的人! :)

邮政scriptum:由于提问者allinr这是我的数据帧的dput

碱(难民专员办事处):

structure(list(Country = c("Albania", "Austria", "Belarus", "Belgium", 
"Bosnia and Herzegovina", "Bulgaria", "Croatia", "Cyprus", "Czech Republic", 
"Denmark", "Estonia", "Finland", "France", "Georgia", "Germany", 
"Greece", "Hungary", "Iceland", "Ireland", "Italy", "Latvia", 
"Liechtenstein", "Lithuania", "Luxembourg", "Malta", "Monaco", 
"Montenegro", "Netherlands", "Norway", "Portugal", "Moldova", 
"Romania", "Serbia and Kosovo (S/RES/1244 (1999))", "Slovak Republic", 
"Slovenia", "Spain", "Sweden", "Switzerland", "Macedonia, FYR", 
"Ukraine", "United Kingdom"), `2005` = c(92, 62770, 13202, 34593, 
199518, 5218, 10867, 13769, 2726, 45457, 136015, 12658, 179541, 
238602, 783980, 14257, 8922, 375, 9531, 21615, 418658, 210, 9295, 
1896, 2088, 0, 0, 139725, 44177, 363, 1775, 2720, 486938, 3075, 
1151, 5401, 96402, 63448, 4448, 76860, 316590), `2011` = c(106, 
72046, 8036, 42107, 177653, 7072, 23944, 6562, 3358, 18712, 97808, 
15209, 260641, 276088, 659820, 45810, 5555, 245, 13687, 72763, 
312688, 153, 4379, 4728, 8409, 37, 21248, 87956, 54960, 653, 
2268, 2158, 309391, 840, 262, 6935, 117399, 67426, 2921, 46625, 
211461), `2012` = c(7560, 74712, 7607, 41053, 177260, 3558, 23998, 
6267, 4882, 17706, 94305, 13903, 268966, 281870, 682000, 38527, 
4551, 297, 11871, 79654, 281056, 124, 5077, 4326, 9015, 37, 20224, 
85056, 54835, 1233, 2259, 1620, 304567, 2448, 311, 7337, 127871, 
72532, 2541, 42904, 169764), `2013` = c(7767, 78956, 7404, 39578, 
163730, 8880, 19875, 6534, 4798, 19383, 91376, 14906, 285468, 
259574, 335562, 73027, 4439, 477, 11581, 90267, 268143, 116, 
4882, 2116, 10808, 34, 20200, 77137, 54585, 1357, 2366, 2217, 
291139, 2562, 249, 9251, 163999, 74678, 2977, 41720, 151840), 
    `2014` = c(8026, 79285, 7628, 41719, 144115, 17898, 19599, 
    7602, 5120, 26844, 88262, 15874, 309525, 265589, 455439, 
    42882, 18675, 426, 10577, 140626, 263230, 173, 4796, 2473, 
    6275, 33, 20978, 91393, 56472, 1052, 2523, 2878, 271473, 
    2678, 354, 13582, 226149, 83628, 3260, 867451, 154292), pop2005 = c(3011487, 
    8227829, 9663000, 10478617, 3833377, 7739900, 4442000, 1032586, 
    10211216, 5419432, 1354775, 5246096, 63179356, 4190000, 82469422, 
    10987314, 10087065, 296734, 4159914, 57969484, 2238799, 34852, 
    3322528, 465158, 403834, 33808, 614261, 16319868, 4623291, 
    10503330, 3595187, 21319685, 9146549, 5372807, 2000474, 43653155, 
    9029572, 7437115, 2042894, 47105150, 60401206), pop2011 = c(2904780, 
    8391643, 9473000, 11047744, 3832310, 7348328, 4280622, 1116644, 
    10496088, 5570572, 1327439, 5388272, 65342776, 3875000, 81797673, 
    11104899, 9971727, 319014, 4576794, 59379449, 2059709, 36537, 
    3028115, 518347, 416268, 37189, 620079, 16693074, 4953088, 
    10557560, 3559986, 20147528, 9025056, 5398384, 2052843, 46742697, 
    9449213, 7912398, 2065888, 45706100, 63258918), pop2012 = c(2900247, 
    8429991, 9464000, 11128246, 3828419, 7305888, 4267558, 1129303, 
    10510785, 5591572, 1322696, 5413971, 65659790, 3825000, 80425823, 
    11045011, 9920362, 320716, 4586897, 59539717, 2034319, 36791, 
    2987773, 530946, 419455, 37404, 620601, 16754962, 5018573, 
    10514844, 3559519, 20058035, 9004277, 5407579, 2057159, 46773055, 
    9519374, 7996861, 2069270, 45593300, 63700300), pop2013 = c(2896652, 
    8479375, 9466000, 11182817, 3823533, 7265115, 4255689, 1141652, 
    10514272, 5614932, 1317997, 5438972, 65972097, 3776000, 82132753, 
    10965211, 9893082, 323764, 4598294, 60233948, 2012647, 37040, 
    2957689, 543360, 423374, 37528, 621207, 16804432, 5079623, 
    10457295, 3558566, 19983693, 8982249, 5413393, 2059953, 46620045, 
    9600379, 8089346, 2072543, 45489600, 64128226), pop2014 = c(2893654, 
    8541575, 9483000, 11231213, 3817554, 7223938, 4238389, 1153658, 
    10525347, 5643475, 1314545, 5461512, 66495940, 3727000, 80982500, 
    10892413, 9866468, 327386, 4617225, 60789140, 1993782, 37286, 
    2932367, 556319, 427364, 37623, 621810, 16865008, 5137232, 
    10401062, 3556397, 19908979, 8943347, 5418649, 2061980, 46480882, 
    9696110, 8188649, 2075625, 45362900, 64613160), pourcentage2005 = c(0.00305496918963954, 
    0.762898694175584, 0.136624236779468, 0.330129443608827, 
    5.20475810232075, 0.0674168916911071, 0.244642053129221, 
    1.33344825515744, 0.0266961349167425, 0.838777938352211, 
    10.039674484693, 0.24128418542093, 0.284176685814904, 5.6945584725537, 
    0.6152, 0.129758738122893, 0.0884499108511742, 
    0.126375811332709, 0.229115313441576, 0.0372868594103753, 
    18.7001155530264, 0.602547916905773, 0.279756859836847, 0.407603437971614, 
    0.51704413199483, 0, 0, 0.856165013099371, 0.955531460165497, 
    0.00345604679658737, 0.0493715625918763, 0.0127581622336353, 
    5.3237346675779, 0.0572326532481066, 0.05753636388176, 0., 
    1.06762535367125, 0.853126514784295, 0.217730337452653, 0.16316687241204, 
    0.524145163591601), pourcentage2011 = c(0.0036491575954117, 
    0.858544625885539, 0.0848305710968014, 0.381136637489066, 
    4.63566360758916, 0.0962395799425393, 0.559357962464333, 
    0.587653719538188, 0.0319928720109816, 0.335908053966451, 
    7.3681728501272, 0.282261177609445, 0.398882655368055, 7.12485161290323, 
    0.806648863959736, 0.412520636162472, 0.0557075018198954, 
    0.0767991373419348, 0.299052131251701, 0.122539028612408, 
    15.1811736512294, 0.418753592248953, 0.144611416673409, 0.912130291098434, 
    2.02009282481478, 0.0994917852053026, 3.42666015136781, 0.526901156731229, 
    1.10961081248708, 0.00618514126370108, 0.063708115706073, 
    0.0107109914427219, 3.42813385313066, 0.0155602120931005, 
    0.0127627879969389, 0.014836542273117, 1.24242092965837, 
    0.852156324795593, 0.14139198252761, 0.102010453746874, 0.33427855974394 
    ), pourcentage2012 = c(0.260667453496202, 0.886264291385364, 
    0.0803782755705833, 0.368908092074888, 4.63010971369644, 
    0.0487004454489311, 0.562335649568207, 0.554944067269811, 
    0.0464475298467241, 0.316655137410374, 7.1297561949231, 0.256798568001195, 
    0.409635790793726, 7.36915032679739, 0.847986348862106, 0.348818122498927, 
    0.0458753420490099, 0.0926052956509809, 0.258802410431278, 
    0.13378296709069, 13.8157289982545, 0.337038949743144, 0.169925894637913, 
    0.814772123718796, 2.14921743691218, 0.0989199016148005, 
    3.25877657303163, 0.50764663029376, 1.09264127472092, 0.0117262795339617, 
    0.0634636309006919, 0.00807656383090367, 3.38247035270017, 
    0.0452697963358464, 0.015117936921745, 0.0156863818281701, 
    1.34327110165017, 0.90700588643469, 0.122796928385373, 0.0941015456218348, 
    0.266504239383488), pourcentage2013 = c(0.268137145918806, 
    0.931153534311196, 0.0782167758292838, 0.353917979700464, 
    4.28216521212188, 0.122227934451141, 0.467021908790797, 0.572328520424788, 
    0.0456332116954935, 0.345204536760196, 6.93294446041986, 
    0.274059142058462, 0.432710210803213, 6.87431144067797, 0.408560516655274, 
    0.665988096353093, 0.0448697382676096, 0.147329536328931, 
    0.251854274650555, 0.149860673253561, 13.3229026252492, 0.31317494600432, 
    0.165061302929416, 0.389428739693757, 2.55282563407295, 0.0905990193988489, 
    3.25173412405205, 0.459027713641258, 1.07458762195541, 0.0129765871575776, 
    0.0664874559021808, 0.0110940455300229, 3.24127064391112, 
    0.0473270645600643, 0.012087654427067, 0.019843395689558, 
    1.70825547616401, 0.923164864007548, 0.143639963079174, 0.0917132707256164, 
    0.236775612660796), pourcentage2014 = c(0.277365573078191, 
    0.928224595581026, 0.0804386797426975, 0.37145587034989, 
    3.77506120411132, 0.247759601480522, 0.462416262405362, 0.658947452364566, 
    0.0486444769944402, 0.475664373457843, 6.71426234933, 0.290652112455305, 
    0.465479546570813, 7.12607995707003, 0.562391874787763, 0.393686871770286, 
    0.189277459775879, 0.130121630124685, 0.229076988884016, 
    0.231334083686659, 13.2025467177455, 0.463981118918629, 0.16355387985201, 
    0.444529128072203, 1.46830336668507, 0.0877123036440475, 
    3.37369936154131, 0.541909022515732, 1.09926902269549, 0.010114351784462, 
    0.0709425859936334, 0.0144557890186132, 3.03547430285328, 
    0.0494219131004795, 0.0171679647717243, 0.0292206159082782, 
    2.33236834153078, 1.02126736657048, 0.157061126166817, 1.91224767375983, 
    0.238793459412912)), .Names = c("Country", "2005", "2011", 
"2012", "2013", "2014", "pop2005", "pop2011", "pop2012", "pop2013", 
"pop2014", "pourcentage2005", "pourcentage2011", "pourcentage2012", 
"pourcentage2013", "pourcentage2014"), row.names = c(NA, 41L), class = "data.frame") 

和HDI():身体被限制在30000个字符;与这个数据框的输入我有47259,我会在这里发布在另一个帖子。

斯科特

+0

@allinr我没有空间来发布两个dput的,请让我知道,当你完成阅读第一个data.frame的输入,我会发布第三(或第二,如果你需要它),非常感谢你的帮助:) 斯科特 –

+0

我错过了什么,你不需要只是合并所有三个数据帧按国家栏? – Parfait

回答

1

我做以下两个假设(让我知道,如果它是不是真的):

  1. 每个data.frame的第一列是国名
  2. 这个国家的名字出现data.frame

    #Generate example data from R's mtcars data 
    car_names = rownames(mtcars) 
    list1 = cbind(car_names,data.frame(mtcars[,c(1,2,3,4)])) 
    list2 = cbind(car_names,data.frame(mtcars[,c(5,6,7,8)])) 
    list3 = cbind(car_names,data.frame(mtcars[,c(9,10,11)]))  
    #Remove rownames 
    rownames(list1) = NULL 
    rownames(list2) = NULL 
    rownames(list3) = NULL 
    # Now the example data is ready 
    
    # We will use lookup command from qdapTools package 
    library(qdapTools) 
    
    #Add specific values from list2 or list3 to list1 based on car_names 
    list1$gear = lookup(list1$car_names,list3) # Add gear from list3 to list1 
    
+0

这是完美的,非常感谢你的答案:) –

0

考虑重塑pop数据帧,因为您需要很长时间才能对选定年份进行全面转换,然后合并所有数据帧。下面将重命名列,运行在数据帧的名单使用Reduce链合并:

# RESHAPE DATAFRAME LONG TO WIDE 
popwide <- reshape(pop[c("country", "year", "SP.POP.TOTL"), idvar = "country", 
        timevar = "year", direction = "wide") 
# PREFIX YEAR COLS WITH pop 
names(popwide) <- gsub("ST.POP.", "pop", names(popwide)) 

# RENAMING TO LOWER CASE country FOR CHAIN MERGE 
names(base)[2] <- "country" 
names(hdi)[2] <- "country" 
# PREFIX YEAR COLS WITH idh 
names(hdi) <- gsub("X", "idh", names(hdi)) 

# CHAIN MERGE 
finaldf <- Reduce(function(...) merge(..., by="country", all=T), list(base, popwide, hdi))