Graduates to apply for the quantitative analysis of changes in number of graduate
students
一Topics raised
In this paper, the total number of students from graduate students (variable) multivariate analysis (see below) specific analysis, and collect relevant data, model building, this quantitative analysis. The number of relations between the school the total number of graduate students with the major factors, according to the size of the various factors in the coefficient in the model equations, analyze the importance of various factors, exactly what factors in changes in the number of graduate students aspects play a key role in and changes in the trend for future graduate students to our proposal.
The main factors affect changes in the total number of graduate students for students are as follows:
Per capita GDP - which is affecting an important factor to the total number of students in the graduate students (graduate school is not a small cost, and only have a certain economic base have more opportunities for post-graduate)
The total population - it will affect the total number of students in graduate students is an important factor (it can be said to affect it is based on source)
The number of unemployed persons - this is the impact of a direct factor of the total number of students in the graduate students (it is precisely because of the high unemployment rate, will more people choose Kaoyan will be their own employment weights)
Number of colleges and universities - which is to influence precisely because of the emergence of more institutions of higher learning in the school the total number of graduate students is not a small factor (to allow more people to participate in Kaoyan)
二 Establish Model
Y=α+β1X1+β2X2+β3X3+β4X4 +u Among them, the
Y-in the total number of graduate students (variable) X1 - per capita GDP (explanatory variables) X2 - the total population (explanatory variables)
X3 - the number of unemployed persons (explanatory variables) X4 - the number of colleges and universities (explanatory variables)
三、Data collection
1. date Explain
Here, using the same area (ie, China) time-series data were fitted
2. Data collection
Time series data from 1986 to 2005, the specific circumstances are shown in Table 1
Table 1: Y 1986 110371 1987 120191 1988 112776 1989 101339 1990 93018 1991 88128 1992 94164 1993 106771 1994 127935 1995 145443 1996 163322 1997 176353 1998 198885 1999 233513 2000 301239 2001 393256 2002 500980 2003 651260 2004 819896 2005
X1 963 1112 1366 1519 1644 1893 2311 2998 4044 5046 5846 6420 6796 7159 7858 8622 9398 10542 12336 14040
X2 107507 109300 111026 112704 114333 115823 117171 118517 119850 121121 122389 123626 124761 125786 126743 127627 128453 129227 129988 130756
X3 264.4 276.6 296.2 377.9 383.2 352.2 363.9 420.1 476.4 519.6 552.8 576.8 571 575 595 681 770 800 827 839
X4 1054 1063 1075 1075 1075 1075 1053 1065 1080 1054 1032 1020 1022 1071 1041 1225 1396 1552 1731 1792
978610
四、Model parameter estimation, inspection and correction
1. Model parameter estimation and its economic significance, statistical inference test
. twoway(scatter Y X1)
600000200000400000Y008000001.0e+065000X11000015000
twoway(scatter Y X2) 1.0e+06600000200000400000Y0105000800000110000115000X2120000125000130000
twoway(scatter Y X3)
600000200000400000Y02008000001.0e+06400X3600800
twoway(scatter Y X4) 1.0e+06600000200000400000Y0100080000012001400X416001800
graph twoway lfit y X1
Fitted values200000400000600000800000005000X11000015000
graph twoway lfit y X2 600000Fitted values200000400000-2000000105000110000115000X2120000125000130000
graph twoway lfit y X3
Fitted valuesFitted values2000004000006000002000004000008000001000000600000800000
0200400X36008000graph twoway lfit y X4
100012001400X416001800
. reg Y X1 X2 X3 X4 Source SS df MS Number of obs = 20 F( 4, 15) = 945.14 Model 1.2988e+12 4 3.2471e+11 Prob > F = 0.0000 Residual 5.1533e+09 15 343556320 R-squared = 0.9960 Adj R-squared = 0.9950 Total 1.3040e+12 19 6.8631e+10 Root MSE = 18535 Y Coef. Std. Err. t P>|t| [95% Conf. Interval] X1 59.22455 6.352288 9.32 0.000 45.68496 72.76413 X2 -7.158603 3.257541 -2.20 0.044 -14.10189 -.2153182 X3 -366.8774 157.9402 -2.32 0.035 -703.5189 -30.23585 X4 621.3348 46.72257 13.30 0.000 521.748 720.9216 _cons 270775.2 369252.9 0.73 0.475 -516268.7 1057819
Y = 59.22454816*X1- 7.158602346*X2- 366.8774279*X3+621.3347694*X4
(6.352288) (3.257541) (157.9402) (46.72256) t= (9.323341) (-2.197548) (-2.322889) (13.29839) + 270775.151 (369252.8) (0.733306)
R2=0.996048 Adjusted R-squared =0.994994 F=945.1415
DW=1.596173
Visible, X1, X2, X3, X4 t values are significant, indicating that the per capita GDP, the total population of registered urban unemployed population, the number of colleges and universities are the main factors affecting the total number of graduate students in school. Model coefficient of determination for 0.996048 amendments coefficient of determination of 0.994994, was relatively large, indicating high degree of model fit, while the F value of 945.1415, indicating that the model overall is significant。
In addition, the coefficient of X1, X4, in line with economic significance, but the coefficient of X2, X3, does not meet the economic significance, because from an economic sense, with the increase in the total population (X2), the total number of graduate students should be increased, and due to the increase in the number of unemployed, there will be more and more people choose graduate school, so that the total number of unemployed and graduate students should be positively correlated. X2, X3 coefficient sign contrary to expectations, which may indicate the existence of severe multicollinearity.
2.计量经济学检验
. corr X1 X2 X3 X4(obs=20) X1 X2 X3 X4 X1 1.0000 X2 0.9422 1.0000 X3 0.9808 0.9593 1.0000 X4 0.8021 0.6165 0.7762 1.0000
. corr X1 X2(obs=20) X1 X2 X1 1.0000 X2 0.9422 1.0000. corr X1 X3(obs=20) X1 X3 X1 1.0000 X3 0.9808 1.0000. corr X1 X4(obs=20) X1 X4 X1 1.0000 X4 0.8021 1.0000. corr X2 X3(obs=20) X2 X3 X2 1.0000 X3 0.9593 1.0000. corr X2 X4(obs=20) X2 X4 X2 1.0000 X4 0.6165 1.0000. corr X3 X4(obs=20) X3 X4 X3 1.0000 X4 0.7762 1.0000.
The above table can be seen to explain the positive correlation between the height of the variable X1 and X2, X3, X4, X2, X1, X3, between the highly positively correlated, showing that there is serious multicollinearity. Following amendment stepwise regression:
. reg Y X1 Source SS df MS Number of obs = 20 F( 1, 18) = 91.02 Model 1.0887e+12 1 1.0887e+12 Prob > F = 0.0000 Residual 2.1529e+11 18 1.1961e+10 R-squared = 0.8349 Adj R-squared = 0.8257 Total 1.3040e+12 19 6.8631e+10 Root MSE = 1.1e+05 Y Coef. Std. Err. t P>|t| [95% Conf. Interval] X1 60.21977 6.311944 9.54 0.000 46.95887 73.48067 _cons -61096.25 42959.23 -1.42 0.172 -151350.3 29157.75
Y = 60.21976901*X1 - 61096.25048
(6.311944) (42959.23)
t = (9.540606) (-1.422191)
Adjusted R-squared=0.825725 F=91.02316
. reg Y X2 Source SS df MS Number of obs = 20 F( 1, 18) = 23.16 Model 7.3371e+11 1 7.3371e+11 Prob > F = 0.0001 Residual 5.7028e+11 18 3.1682e+10 R-squared = 0.5627 Adj R-squared = 0.5384 Total 1.3040e+12 19 6.8631e+10 Root MSE = 1.8e+05 Y Coef. Std. Err. t P>|t| [95% Conf. Interval] X2 27.05878 5.622791 4.81 0.000 15.24574 38.87183 _cons -2993786 680596.9 -4.40 0.000 -4423667 -1563905
Y = 27.05878289*X2 - 2993786.354
( 5.622791) (680596.9) t = (4.812340) (-4.398766)
R-squared=0.562668 F=23.15862
. reg Y X3 Source SS df MS Number of obs = 20 F( 1, 18) = 57.87 Model 9.9463e+11 1 9.9463e+11 Prob > F = 0.0000 Residual 3.0936e+11 18 1.7187e+10 R-squared = 0.7628 Adj R-squared = 0.7496 Total 1.3040e+12 19 6.8631e+10 Root MSE = 1.3e+05 Y Coef. Std. Err. t P>|t| [95% Conf. Interval] X3 1231.66 161.9045 7.61 0.000 891.5113 1571.809 _cons -371863.7 90051.37 -4.13 0.001 -561054.6 -182672.8
Y = 1231.659997*X3 - 371863.6509
(161.9045) (90051.37) t = (7.607324) (-4.129461)
Adjusted R-squared=0.749576 F=57.87138
. reg Y X4 Source SS df MS Number of obs = 20 F( 1, 18) = 255.89 Model 1.2183e+12 1 1.2183e+12 Prob > F = 0.0000 Residual 8.5699e+10 18 4.7610e+09 R-squared = 0.9343 Adj R-squared = 0.9306 Total 1.3040e+12 19 6.8631e+10 Root MSE = 69000 Y Coef. Std. Err. t P>|t| [95% Conf. Interval] X4 1053.52 65.85948 16.00 0.000 915.1542 1191.885 _cons -964699.8 79072.71 -12.20 0.000 -1130825 -798574.2
Y = 1053.519847*X4 - 964699.7964 (65.85948) (79072.71)
t = (15.99648) (-12.20016)
Adjusted R-squared=0.930628 F=255.8874
The analysis shows that the four simple regression model, the total number of graduate students for the linear relationship between Y college x4, goodness of fit: Y = 1053.519847*X4 - 964699.7964 (65.85948) (79072.71)
t = (15.99648) (-12.20016)
Adjusted R-squared=0.930628 F=255.887
. reg Y X4 X1 Source SS df MS Number of obs = 20 F( 2, 17) = 700.80 Model 1.2884e+12 2 6.4418e+11 Prob > F = 0.0000 Residual 1.5627e+10 17 919210968 R-squared = 0.9880 Adj R-squared = 0.9866 Total 1.3040e+12 19 6.8631e+10 Root MSE = 30318 Y Coef. Std. Err. t P>|t| [95% Conf. Interval] X4 714.1694 48.45708 14.74 0.000 611.9339 816.4049 X1 25.58238 2.930053 8.73 0.000 19.40051 31.76425 _cons -708247.7 45496.23 -15.57 0.000 -804236.4 -612259.1
Y = 714.1694264*X4 + 25.58237739*X1 - 708247.7381 (48.45708) (2.930053) (45496.23) t = (14.73818) (8.731029) (-15.56718)
Adjusted R-squared=0.986606 F=700.7988
. reg Y X4 X2 Source SS df MS Number of obs = 20 F( 2, 17) = 302.26 Model 1.2683e+12 2 6.3416e+11 Prob > F = 0.0000 Residual 3.5667e+10 17 2.0981e+09 R-squared = 0.9726 Adj R-squared = 0.9694 Total 1.3040e+12 19 6.8631e+10 Root MSE = 45805 Y Coef. Std. Err. t P>|t| [95% Conf. Interval] X4 886.3584 55.5267 15.96 0.000 769.2073 1003.509 X2 8.974091 1.837722 4.88 0.000 5.096836 12.85135 _cons -1852247 189180.7 -9.79 0.000 -2251383 -1453110
Y = 886.3583756*X4 + 8.974091045*X2 - 1852246.686
(55.52670) (1.837722) (189180.7) t = (15.96274) (4.883269) (-9.790886)
Adjusted R-squared=0.969430 F=302.2581
. reg Y X4 X3 Source SS df MS Number of obs = 20 F( 2, 17) = 299.57 Model 1.2680e+12 2 6.3401e+11 Prob > F = 0.0000 Residual 3.5979e+10 17 2.1164e+09 R-squared = 0.9724 Adj R-squared = 0.9692 Total 1.3040e+12 19 6.8631e+10 Root MSE = 46004 Y Coef. Std. Err. t P>|t| [95% Conf. Interval] X4 791.5193 69.64253 11.37 0.000 644.5864 938.4521 X3 436.7502 90.10899 4.85 0.000 246.6369 626.8636 _cons -885870.1 55171.66 -16.06 0.000 -1002272 -769468.1
Y = 791.519267*X4 + 436.7502136*X3 - 885870.134
(69.64253) (90.10899) (55171.66) t = (11.36546) (4.846910) (-16.05662) Adjusted R-squared=0.969163 F=299.5666
By the data analysis, comparison, per capita GDP of the new entrants to the X1 equation of the Adjusted R-squared = .986606
, The largest improvement, and each parameter, T-test significant, so I chose to retain the X1
Then add the other new variables to the stepwise regression:
. reg Y X4 X1 X2 Source SS df MS Number of obs = 20 F( 3, 16) = 987.18 Model 1.2970e+12 3 4.3233e+11 Prob > F = 0.0000 Residual 7.0071e+09 16 437944327 R-squared = 0.9946 Adj R-squared = 0.9936 Total 1.3040e+12 19 6.8631e+10 Root MSE = 20927 Y Coef. Std. Err. t P>|t| [95% Conf. Interval] X4 570.3758 46.57535 12.25 0.000 471.6405 669.1111 X1 53.53863 6.618152 8.09 0.000 39.50878 67.56849 X2 -12.18902 2.7475 -4.44 0.000 -18.01346 -6.364578 _cons 777507.8 336370.1 2.31 0.034 64435.18 1490580
. reg Y X4 X1 X2Y = 570.3757921*X4 + 53.53863254*X1 - 12.18901747*X2 + 777507.8381
Source SS df MS Number of obs = 20(46.57535) (6.618152) (2.747500) (336370.1) F( 3, 16) = 987.18 Model 1.2970e+12 3 4.3233e+11 Prob > F = 0.0000t = (12.24630) (8.089665) (-4.436403) (2.311466)
Residual 7.0071e+09 16 437944327 R-squared = 0.9946Adjusted R-squared=0.994626 F=987.1753 Adj R-squared = 0.9936 Total 1.3040e+12 19 6.8631e+10 Root MSE = 20927Through analysis, we can find: add a new variable X2, X2 coefficient - 12.18901747, indicating a negative correlation between X2 and Y, but in the real economic significance, X2 total population, and Y number of graduate studentsa Y Coef. Std. Err. t P>|t| [95% Conf. Interval] positive correlation between the more general economic significance of the total X4 570.3758 46.57535 12.25 0.000 471.6405 669.1111population, the absolute amount of the number of graduate student will be more. So, X1 53.53863 6.618152 8.09 0.000 39.50878 67.56849X2, should be removed. X2 -12.18902 2.7475 -4.44 0.000 -18.01346 -6.364578 _cons 777507.8 336370.1 2.31 0.034 64435.18 1490580
. reg Y X4 X1 X3 Source SS df MS Number of obs = 20 F( 3, 16) = 1015.53 Model 1.2972e+12 3 4.3239e+11 Prob > F = 0.0000 Residual 6.8125e+09 16 425778299 R-squared = 0.9948 Adj R-squared = 0.9938 Total 1.3040e+12 19 6.8631e+10 Root MSE = 20634 Y Coef. Std. Err. t P>|t| [95% Conf. Interval] X4 700.5114 33.11564 21.15 0.000 630.3093 770.7134 X1 53.63805 6.480708 8.28 0.000 39.89956 67.37654 X3 -597.6141 131.3478 -4.55 0.000 -876.0589 -319.1692 _cons -534866.2 49101.16 -10.89 0.000 -638956 -430776.4
Y = 700.5113451*X4 + 53.63805156*X1 - 597.614061*X3 - 534866.1749 (33.11564) (6.480707) (131.3478) (49101.16) t = (12.24630) ( 8.089665) (-4.436403) (2.311466) Adjusted R-squared=0.994626 F=987.1753
Similarly, adding a new variable X3, its parameter estimate is still negative, X3, represented by the number of unemployment in urban areas, the economic significance, the more unemployment in urban areas, will encourage more and more people go to PubMed in order to achieveimprove their own quality, employability and opportunities. So, in reality, the two should be positively correlated, it should be removed X3
3.White test
. estat imt,whWhite's test for Ho: homoskedasticity against Ha: unrestricted heteroskedasticity chi2(9) = 12.22 Prob > chi2 = 0.2015Cameron & Trivedi's decomposition of IM-test Source chi2 df p Heteroskedasticity 12.22 9 0.2015 Skewness 6.29 3 0.0981 Kurtosis 1.69 1 0.1933 Total 20.20 13 0.0903
Final results of a series of inspection and correction: Y = -51055.44688 + 66.53070046*X1 + 382.1680346*X4
(9052.520) (9.443438) (78.77833) t = (-5.639916) (7.045178) (4.851182)
Adjusted R-squared=0.921287 F=106.3395 DW=1.627477
五、Analysis and conclusions of the model
It can be seen from the model:
(1) model: significantly correlated only with colleges and universities total and per capita GDP in the total number of graduate students.
(2) X1, X4 is in line with economic significance of the test. Economic sense, the total number of graduate students with the increase in per capita GDP increases, the increase with the increase in the total number of universities. And universities is the total impact of the total number of the most important factor in the graduate students.
(3) the amendment of the model coefficient of determination and F values are very high goodness of fit of the model is good