2017-05-04 106 views
0
Year <- c(1000,1500,1600,1700,1750,1800,1850,1900,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015) 

Africa <- c(70,86,114,106,106,107,111,133,229,254,285,322,366,416,478,550,632,720,814,920,1044,1186) 

我怎样才能找到多年来的人口:1925年,1963年,1978年,1988年,1998年使用多项式线性回归。R中的多项式模型

+0

一个地方看是[公式](文档中https://stackoverflow.com/ documentation/r/1061/formula#t = 201705041558583178259) – G5W

回答

1

以下是解决问题的出发点。

Year <- c(1000,1500,1600,1700,1750,1800,1850,1900,1950,1955,1960,1965, 
      1970,1975,1980,1985,1990,1995,2000,2005,2010,2015) 
Africa <- c(70,86,114,106,106,107,111,133,229,254,285,322,366,416,478,550, 
      632,720,814,920,1044,1186) 
df <- data.frame(Year, Africa) 

# Polynomial linear regression of order 5 
model1 <- lm(Africa ~ poly(Year,5), data=df) 
summary(model1) 

########### 
Call: 
lm(formula = Africa ~ poly(Year, 5), data = df) 

Residuals: 
    Min  1Q Median  3Q  Max 
-59.639 -27.119 -12.397 9.149 97.398 

Coefficients: 
       Estimate Std. Error t value Pr(>|t|)  
(Intercept)  411.32  10.12 40.643 < 2e-16 *** 
poly(Year, 5)1 881.26  47.47 18.565 3.01e-12 *** 
poly(Year, 5)2 768.50  47.47 16.190 2.42e-11 *** 
poly(Year, 5)3 709.43  47.47 14.945 8.07e-11 *** 
poly(Year, 5)4 628.45  47.47 13.239 4.89e-10 *** 
poly(Year, 5)5 359.04  47.47 7.564 1.14e-06 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 47.47 on 16 degrees of freedom 
Multiple R-squared: 0.9852, Adjusted R-squared: 0.9805 
F-statistic: 212.5 on 5 and 16 DF, p-value: 4.859e-14 
############# 

pred <- predict(model1) 
plot(Year, Africa, type="o", xlab="Year", ylab="Africa") 
lines(Year, pred, lwd=2, col="red") 

enter image description here

上述估计该模型示出了一个坏适合年< 1900因此,优选后1900

# Polynomial linear regression of order 2 
df2 <- subset(df,Year>1900) 
model2 <- lm(Africa ~ poly(Year,2), data=df2) 
summary(model2) 

########### 
Call: 
lm(formula = Africa ~ poly(Year, 2), data = df2) 

Residuals: 
    Min  1Q Median  3Q Max 
-9.267 -2.489 -0.011 3.334 12.482 

Coefficients: 
       Estimate Std. Error t value Pr(>|t|)  
(Intercept)  586.857  1.677 349.93 < 2e-16 *** 
poly(Year, 2)1 1086.646  6.275 173.17 < 2e-16 *** 
poly(Year, 2)2 245.687  6.275 39.15 3.65e-13 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 6.275 on 11 degrees of freedom 
Multiple R-squared: 0.9997, Adjusted R-squared: 0.9996 
F-statistic: 1.576e+04 on 2 and 11 DF, p-value: < 2.2e-16 
########### 

df2$pred <- predict(model2) 
plot(df2$Year, df2$Africa, type="o", xlab="Year", ylab="Africa") 
lines(df2$Year, df2$pred, lwd=2, col="red") 

这样做的拟合来估算仅用于数据的模型第二种模式显然更好:

enter image description here

最后,我们得到了这些年来的模型预测1925年,1963年,1978年,1988年,1998年

df3 <- data.frame(Year=c(1925, 1963, 1978, 1988, 1998)) 
df3$pred <- predict(model2, newdata=df3) 
df3 

    Year  pred 
1 1925 286.4863 
2 1963 301.1507 
3 1978 451.7210 
4 1988 597.6301 
5 1998 779.9623 
+0

感谢Marco Sandri。 – Legend