Indicator | Number of metrics per year | Percent within period... | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2012-14 | 2015-16 | 2017-18 | |
Capabilities | 14 | 100% | ||||||||
Crisis and risk management | 8 | 5 | 100% | |||||||
Digital services | 7 | 6 | 54% | 46% | ||||||
Fiscal and financial management | 1 | 1 | 4 | 17% | 17% | 67% | ||||
HR management | 5 | 4 | 100% | |||||||
Inclusiveness | 3 | 2 | 100% | |||||||
Integrity | 1 | 2 | 11 | 2 | 1 | 18% | 65% | 18% | ||
Openness | 1 | 3 | 4 | 2 | 40% | 60% | ||||
Policy making | 8 | 100% | ||||||||
Procurement | 6 | 100% | ||||||||
Regulation | 6 | 3 | 100% | |||||||
Tax administration | 5 | 1 | 83% | 17% | ||||||
Total | 15 | 1 | 2 | 33 | 28 | 23 | 14 | 16% | 53% | 32% |
Table 5.2.A in the original PDF publication |
17 Sensitivity analysis
Building statistical models and indices involves stages where subjective judgements have to be made. These can include the selection of individual data sets, the treatment of missing values, and the approach to weighting and aggregation. Good modelling practice means we should evaluate our model, testing the assumptions and judgements made in its building and analysing the uncertainties associated with the modelling process. Sensitivity analysis is one way to undertake such an assessment.
To test the robustness and uncertainty of the modelling approach used by InCiSE, five types of sensitivity analysis have been undertaken:
- Varying the set of countries selected for results to be produced;
- Excluding out-of-date data;
- Alternative approaches to weighting;
- Using the ranks of source data; and,
- Alternative approaches to imputation.
This chapter summarises the approach and results of these different analyses, while detailed results can be found in Appendix B.
17.1 Country selection
Section 3.3 discusses how the approach to country selection for the 2019 edition of InCiSE differs from the 2017 Pilot, as it now uses the results of the data quality assessment (DQA) to identify countries for inclusion. The DQA produces a score for each country that summarises the quality of the data within the InCiSE model about that country (before imputation of missing values). The threshold for inclusion in the 2019 edition of InCiSE is an overall DQA score of 0.50 or greater.
The three countries included in the InCiSE Index with the lowest data quality scores have markedly poorer data quality by indicator than other countries (see Table 2.8.A). For each of these three countries only two or three of the 12 InCiSE indicators are rated green, a further two or three indicators are rated as amber, while five or six are rated as red, and one indicator is fully imputed.
Section 3.8 also outlines an approach to ‘grading’ countries based on their data quality scores. DQA scores of 0.75 are given an ‘A+’ grade, while those below 0.6 are given a ‘D’ grade. In this ‘D’ group there are four more countries in addition to the three discussed above.
The 2017 Pilot used a simpler approach to country inclusion with a threshold of having at least 75% of metrics available, and producing a set of 31 countries1. For the 2019 edition’s set of metrics 31 countries also achieve the 75% threshold but the country coverage differs to the set of countries in the 2017 Pilot.
1 One further country in 2017 met this criteria but was not an OECD member so was excluded to simplify Open Sanspretation of results.
The first two sensitivity tests for country coverage altered the DQA threshold used to determine country inclusion. The first test used a DQA score of 0.55 or higher, excluding the three countries in the 2019 set with the lowest data quality, while the second test used a DQA score of 0.6 or higher. The third test used the 2017 Pilot’s threshold of countries with 75% of data being available. The fourth test used the 31 countries included in the 2017 Pilot.
17.2 Reference date
The reference dates of the source data for the 2019 edition of InCiSE ranges from 2012 to 2018. However, as shown in Table 5.2.A, the reference dates vary across indicators. A third of the metrics have a reference date of 2017 or 2018, around half of the metrics have a reference date of 2015 or 2016, while just 17 out of the 116 metrics have a reference date of 2012.
Of these 17 metrics, 14 are the metrics for the capabilities indicator. This is the only indicator with 100% of its data with a reference date from before 20152. The capabilities indicator is solely composed of data with a reference year of 2012. Only two other indicators have data from before 2014 but in both cases this is a small number of their constituent metrics.
2 The lack of recency of the data source for the capabilities indicator (the OECD’s Survey of Adult Skills) is discussed in Chapter 6.
The first two sensitivity tests for recency exclude the capabilities indicator. In the first analysis the capabilities indicator is excluded but the weightings of the other indicators are not adjusted. In the second analysis the weightings are recalculated to account for the removal of the capabilities indicator.
In the third test, only data with a reference year of 2015 or later is included in the model; the four other metrics from before 2014 are excluded in addition to the 14 capabilities metrics. In the fourth test, only data with a reference year of 2016 or later is included in the model; the 51 metrics with a reference date of 2016 or earlier are therefore excluded. For both these analyses there is no adjustment the weightings – either to calculate the indicators from their constituent metrics or to calculate the index from the indicators.
17.3 Alternative approaches to weighting
The InCiSE Index is a weighted aggregation of the InCiSE indicators, which themselves are weighted aggregations of the InCiSE metrics. Section 3.7 set out the approach to weighting the InCiSE indicators to calculate the InCiSE Index. Two-thirds of an indicator’s weight is based on an ‘equal share’ approach (i.e. 1/12), while one-third is based on the results of the data quality assessment. Section 3.6 and Chapters 3-14 outline how the individual metrics are weighted to produce each of the 12 indicator scores.
The first three sensitivity tests for alternative weighting look at the proportion of indicator weighting that is assigned to the ‘equal share’ and the data quality assessment. The first test uses a 50:50 split rather than the 67:33 split. The second test uses solely an ‘equal share’ approach (i.e. indicator weights set to 1/12 each). The third test uses solely the results of the data quality assessment to determine the weighting.
The fourth and fifth tests focus on metrics weighting: The fourth does not apply weighting to metrics within indicators (i.e. all metrics contribute equally to the calculation of their indicator), and the fifth is a simple summation of the metrics, then normalised as per the standard calculations of the indicators and index (as set out Section 3.5).
17.4 Adjusting the base data
In the InCiSE model, metrics are normalised after missing data is imputed. An alternative approach would be to normalise the data before it is imputed.
Three sensitivity tests were done where normalisation of the data occurred before the imputation. In the first test the data was ranked, in the second test the data was rescaled using the same min-max normalisation applied to the outputs of the model, and in the third test the data was converted to z-scores with a mean of 0 and a standard deviation of 1.
17.5 Alternative imputation methods
As discussed in section 2.4 missing data in the InCiSE base data is handled through multiple imputation, and in particular the predictive mean matching method.
Four sensitivity tests were carried out using different approaches to imputation. Section 3.4 outlines how the imputation of missing data is handled on a per-indicator basis, the first test changes this to adopt a “kitchen sink”/“all-in-one” approach in which the full dataset of all 116 metrics (and two external predictor variables) are supplied to the imputation function. The second test uses a modified form of predictive mean matching called ‘midas touch’ to generate imputed values. The third test uses the ‘random forest’ method to generate imputed values, a machine learning approach. The fourth test uses mean imputation, where missing data is replaced with the simple arithmetic mean of the observed data.
17.6 Results of the sensitivity analysis
Table 17.2 shows the results of the 2019 InCiSE model for each country and the range of ranks across the five different sets of sensitivity analysis, while Figure 17.1 show how the InCiSE Index score varies by country for each of the sensitivity tests carried out. The results of the five sets of sensitivity analysis demonstrate general stability in the model, with country ranks either unchanged or changed by only one or two places on average, and the same groupings of countries at the top and bottom of the rankings. Full results from the sensitivity analysis are provided in Appendix B.
In the country coverage sensitivity analysis, the main driver of change in rankings is due to the exclusion of countries: Figure 5.1 shows that the scores of individual countries do not substantially change as a result of the exclusion of different countries. When varying the reference date there are some changes as a result of the exclusion of the capabilities indicator, and further changes as a result of excluding data with a reference year of 2015 and earlier.
Altering the weighting schemes for the calculation of the index and indicators does not result in many changes, except when calculating the index as a simple sum of all metrics (i.e. applying no weighting at all). Similarly making alterations to the metrics (e.g. ranking, rescaling, standardisation) before they are imputed does not result in many changes to country scores or rankings.
Varying the imputation methodology results in slightly more variation of country scores and ranks than the previous sensitivity checks. Only three countries see no change in their ranking, however of those that do change, the difference in ranks is still small at around one or two places.
One way to consider the effectiveness of the sensitivity analysis is to calculate the Mean Absolute Error (MAE) arising from the analysis. MAE is a common technique for assessing the quality of statistical models by comparing the difference of the model’s estimates/predictions with the original data. It is calculated as the sum of the absolute errors divided by the number of cases. In the case of the InCiSE sensitivity analysis, ‘error’ is calculated as the difference between the 2019 InCiSE Index results and the results from each of the sensitivity tests.
The overall MAE figure for the sensitivity analysis, that is the mean level of ‘error’ across all 20 sensitivity tests for all 38 countries, is ±0.017. The MAE can also be calculated for each sensitivity test or each set of tests. The per-set MAE figures is presented in Table 17.3, while the per-test MAE is presented in the tables in Appendix B. Across the different sets of methodological sensitivity tests, the smallest MAE is ±0.007 for the set of tests varying country selection while the highest MAE is ±0.023 for the set of tests changing the reference date.
Finally, the MAE can also be calculated by country, which is also included in Table 17.2 and ranges from ±0.001 to ±0.032. However, given that the same two countries place highest and lowest across most tests the minimum per-country MAE is skewed by the limited variability in these two countries’ scores, when excluding these countries the minimum MAE rises from ±0.001 to ±0.009.
Country | 2019 results | Range of country's ranks in sensitivity analysis | Mean absolute error (MAE) | |||||
---|---|---|---|---|---|---|---|---|
Score | Rank | Country coverage | Reference date | Alternative weightings | Adjust base data | Imputation method | ||
GBR | 1.000 | 1 | 1 | 1 | 1-2 | 1 | 1-2 | 0.003 |
NZL | 0.980 | 2 | 2 | 2 | 1-2 | 2 | 1-2 | 0.019 |
CAN | 0.916 | 3 | 3 | 3 | 3 | 3 | 3-5 | 0.021 |
FIN | 0.883 | 4 | 4 | 4-5 | 4-5 | 4 | 3-4 | 0.013 |
AUS | 0.863 | 5 | 5 | 4-5 | 4-5 | 5-6 | 4-7 | 0.014 |
DNK | 0.832 | 6 | 5-6 | 7-9 | 6-8 | 5-7 | 5-7 | 0.021 |
NOR | 0.830 | 7 | 6-7 | 6 | 6-7 | 6-10 | 5-7 | 0.010 |
NLD | 0.794 | 8 | 7-8 | 8-9 | 8-10 | 8-9 | 8-9 | 0.014 |
KOR | 0.785 | 9 | 8-10 | 9-11 | 6-11 | 7-11 | 10 | 0.019 |
SWE | 0.785 | 10 | 9-10 | 7-10 | 8-10 | 8-9 | 8-9 | 0.009 |
USA | 0.765 | 11 | 11 | 10-11 | 10-11 | 10-11 | 11 | 0.029 |
EST | 0.674 | 12 | 10-12 | 12-17 | 12 | 12-13 | 12-15 | 0.023 |
CHE | 0.650 | 13 | 11-13 | 13-14 | 13-14 | 12-15 | 12-15 | 0.020 |
IRL | 0.625 | 14 | 14-16 | 15-16 | 14-17 | 14-15 | 16-17 | 0.021 |
FRA | 0.619 | 15 | 12-15 | 12-14 | 13-16 | 13-15 | 12-15 | 0.012 |
AUT | 0.617 | 16 | 13-15 | 15-16 | 13-16 | 16-17 | 13-15 | 0.014 |
ESP | 0.599 | 17 | 15-17 | 13-17 | 15-17 | 16-17 | 16-17 | 0.010 |
MEX | 0.507 | 18 | 17-19 | 19-20 | 18-24 | 18-23 | 18-20 | 0.020 |
DEU | 0.505 | 19 | 16-19 | 18-21 | 18-19 | 19-21 | 18-20 | 0.010 |
LTU | 0.487 | 20 | 18-20 | 18-20 | 20-22 | 20-21 | 20-22 | 0.018 |
BEL | 0.485 | 21 | 19-22 | 18-22 | 20-21 | 19-20 | 18-21 | 0.017 |
JPN | 0.472 | 22 | 17-21 | 21-22 | 19-24 | 18-23 | 21-24 | 0.020 |
LVA | 0.466 | 23 | 20-23 | 23-26 | 20-24 | 24 | 24-26 | 0.031 |
CHL | 0.454 | 24 | 21-24 | 23-25 | 22-24 | 22-23 | 21-23 | 0.014 |
ITA | 0.419 | 25 | 22-25 | 23-25 | 25-26 | 25 | 23-25 | 0.014 |
SVN | 0.369 | 26 | 23-26 | 26-28 | 25-26 | 26 | 25-26 | 0.018 |
ISR | 0.315 | 27 | 27 | 24-27 | 27 | 27 | 27-29 | 0.022 |
POL | 0.282 | 28 | 24-28 | 28-36 | 28-29 | 28-29 | 27-29 | 0.025 |
PRT | 0.259 | 29 | 25-29 | 29-30 | 28-29 | 31 | 28-31 | 0.015 |
CZE | 0.245 | 30 | 26-30 | 27-32 | 30-32 | 28-30 | 30-31 | 0.018 |
ISL | 0.228 | 31 | 31 | 30-32 | 30-32 | 29-30 | 28-31 | 0.019 |
TUR | 0.189 | 32 | 27-32 | 28-32 | 30-35 | 32 | 32-33 | 0.026 |
SVK | 0.172 | 33 | 28-33 | 31-34 | 32-35 | 33 | 32-34 | 0.015 |
BGR | 0.147 | 34 | — | 34-35 | 33-34 | 35 | 35-36 | 0.016 |
HRV | 0.140 | 35 | — | 36-37 | 34-36 | 34 | 33-34 | 0.019 |
ROU | 0.127 | 36 | — | 35-37 | 36-37 | 36 | 35-37 | 0.022 |
GRC | 0.107 | 37 | 29-34 | 33-35 | 34-38 | 37 | 36-37 | 0.027 |
HUN | 0.000 | 38 | 30-35 | 38 | 37-38 | 38 | 38 | 0.001 |
Table 5.6.A in the original 2019 publication |
Country coverage | Reference date | Alternative weightings | Adjust base data | Imputation method | |
---|---|---|---|---|---|
Mean absolute error (MAE) | 0.007 | 0.023 | 0.018 | 0.014 | 0.022 |
Countries with no change in rank | 8 | 5 | 3 | 16 | 3 |
Largest difference in rank | 5 | 8 | 6 | 5 | 3 |
Average difference in rank | 2 | 2 | 2 | 1 | 2 |
Table 5.6.A in the original 2019 publication |