Indicator | Highest country DQA score | Distribution of country DQA scores | ||
---|---|---|---|---|
DQA ≥ 0.5 | 0.5 > DQA > 0 | DQA = 0 | ||
Capabilities | 0.333 | 0 | 31 | 218 |
Crisis & risk management | 0.855 | 95 | 13 | 141 |
Digital services | 0.581 | 34 | 0 | 215 |
Fiscal & financial management | 0.889 | 109 | 88 | 52 |
HR management | 0.673 | 37 | 83 | 129 |
Inclusiveness | 0.722 | 34 | 82 | 133 |
Integrity | 0.569 | 30 | 127 | 92 |
Openness | 0.928 | 105 | 93 | 51 |
Policy making | 1.000 | 41 | 0 | 208 |
Procurement | 0.722 | 20 | 24 | 205 |
Regulation | 0.963 | 38 | 5 | 206 |
Tax administration | 0.852 | 46 | 141 | 62 |
Overall data quality assessment | 0.757 | 38 | 162 | 49 |
Table 2.2.A in the original PDF publication |
2 Methodology of the InCiSE index
As outlined in Chapter 1, the InCiSE Index is a composite index formed from a series of indicators, each of which is comprised of a set individual metrics. The overall Index is the normalised and weighted average of the scores of the constituent InCiSE indicators. The InCiSE indicators are themselves normalised weighted averages of their individual metrics. The calculation and modelling process to produce the Index is as follows:
- Data processing:
- Data preparation Section 2.1
- Data quality assessment Section 2.2
- Country coverage selection Section 2.3
- Imputation of missing data Section 2.4
- Data normalisation Section 2.5
- Calculation of the InCiSE indicators Section 2.6:
- Raw score calculated as a weighted average of the individual metrics
- Raw score normalised to produce final indicator score
- Calculation of the InCiSE Index Section 2.7:
- Raw score calculated as a weighted average of the indicator scores
- Raw score normalised to produce final Index score
This chapter outlines the methodology for each of these different stages, and finishes with a discussion of key data quality considerations in Section 2.8 and comparisons over time in Section 2.9, while Chapters 3-14 provide details on the specific methodology of each of the InCiSE indicators.
2.1 Data preparation
The data for InCiSE comes from a wide range of independent sources, such as the UN’s E-Government Survey, Transparency International’s Global Corruption Barometer, and Bertelsmann’s Sustainable Governance Indicators (SGIs).1 The InCiSE partnership does not produce any of the source data itself or engage in primary data collection.
1 A full list of data sources can be found in the References section at the end of this report.
The data for the 2019 edition of InCiSE is the latest available as of 30 November 2018. As well as the source metrics some additional data are collected to aid in the imputation of missing data – this data does not directly contribute to the scores and therefore is not included in the published results.
Some of the source data requires processing before it is suitable for use in the InCiSE calculations and modelling. For example:
Binary/multiple categorical data: some of the source data are binary measures (e.g. yes/no questions) or assess multiple categories (e.g. groups subject to whistleblower protection). In most cases this type of data is summed.
Individual level microdata: InCiSE uses a custom analysis of the Programme for the International Assessment of Adult Competencies (PIAAC) individual-level microdata to produce country scores. The Opentender data on procurement is on individual contracts, which also requires analysis to produce country scores.
Negatively framed data: Some of the source data is based on negatively framed questions, where a higher score is poorer performance than a lower score. To align with other metrics, this data is inverted so that higher scores relate to better performance than lower scores.
Calculations against reference data: For the inclusiveness indicator, women’s representation in the civil service/public sector is compared to the labour market in general. Tax administration from the OECD is published as raw data. InCiSE uses rates based on these data which must therefore be calculated.
Chapters 3-14 outline the underlying source data for each of the indicators, and covers the specific transformations that are applied to the source data. Appendix A outlines the construction and calculation of the composite metrics (metrics calculated from more than a single data point in the original source) that are included in some of the indicators.
When importing data to the InCiSE model, data is matched against a reference list of 249 countries and territories produced by Arel-Bundock et al. (2018) using the 3-digit ISO 3166-1 alphanumeric codes. Some source data natively uses the 3-digit ISO country codes, but some use the 2-digit ISO code, another code system, or a name of the territory (either the official long/short name, or colloquial name). Therefore, as part of data preparation, all country references are converted to the 3-digit ISO country code.
2.2 Data quality assessment
In order to provide a clearer understanding of the quality of the InCiSE Index, a data quality assessment has been calculated and published alongside the 2019 edition. This assessment has a dual role: it is an important piece of metadata that will help users of the InCiSE Index better understand the results, but it has also been used to determine the country coverage of the InCiSE Index. This section describes the method for conducting the data quality assessment. The use of the assessment for country selection and weighting are discussed in sections Section 2.3 and Section 2.7 respectively, while a wider discussion of data quality based on the results of the assessment is provided at section Section 2.8.
The data quality assessment is a purely quantitative exercise based on three factors: data availability, the (non-)use of public sector proxy data, and the recency of the data. The assessment does not include any subjective evaluation of the methodology or the quality of the data sources that the underlying data used by InCiSE comes from.
The data quality assessment also does not incorporate assessments of the reliability or validity of indicator and index construction. Its purpose is to provide an assessment of easily quantifiable characteristics of the data, which can help interpretation of the InCiSE results for countries and of the indicators.
The simple mean of the three measures is taken as the data quality score for each country for each indicator. The 12 overall indicator quality scores are then combined as a simple mean score to produce an overall data quality assessment for each country.
For each indicator, the data quality assessment is based on three measures: (1) the proportion of metrics with data; (2) the proportion of metrics that have civil service specific data; and (3) the recency of the data. All three measures take a simple assessment of whether data is missing or present as their basis. However, each measure has different weighting rules for the data:
- Data availability: A missing data point fora metric with a within-indicator weight of 15% will give a greater penalty than a missing data point for a metric with a within-indicator weight of 5%.
- Civil service data (1) or a public sector proxy (0): Data points that come from public sector data are treated as equivalent to being missing.
- Recency of the data: The reference year of the metric is scaled from 0 (for 2012, the earliest year) to 1 (for 2018, the latest year) and used as the weighting.2
2 For example a datapoint with a reference year of 2013 will be weighted 0.1667, while one with a reference year of 2016 will be weighted 0.6667.
The country indicator data quality scores and overall data quality assessment (\(DQA_{c,i}\)) for a given country (\(c\)) and indicator (\(i\)) is calculated by multiplying the missing data matrix of the metrics in the indicator for that country (\(d_{c,i}\)) by each of: the within indicator weighting for the metrics in the indicator (\(m_i\)), the proxy data status of each metric in the indicator (\(s_i\)), and the recency of each metric in the indicator (\(r_i\)). The resulting products are summed and divided by three to give the mean data quality for that country and indicator.
\[ DQA_{c,i} = \frac{{(d_{c,i} * m_i) + (d_{c,i} * s_i) + (d_{c,i} * r_i)}}{3} \]
The overall data quality indicator for a country (\(DQA_c\)) is then calculated as the sum of data quality assessment scores of that country for each indicator divided by the number of indicators (\(n_i\)).
\[ DQA_c = \frac{\sum{DQA_{c,i}}}{n_i} \]
The data quality assessment scores therefore have a theoretical range from 0 to 1. Where 0 represents there being no metrics available and 1 represents there being data for all metrics, with all data representing the civil service (i.e. not using a public-sector proxy) and all data relating to the latest available year. Table 2.1 illustrates the complex picture of data quality across all countries and indicators.
The table shows how maximum data quality varies from 0.333 for capabilities, where the available data is for a public sector proxy and the oldest data in the model, to 1.000 for policy making, where all the available data relates to the civil service and is at the latest available data.
The indicators for openness, fiscal & financial management and crisis & risk management have good data quality (DQA score greater than or equal to 0.5) for a very large number of countries. Other indicators (such as HR management or tax administration) have a moderate number of countries with good data quality, but have a large number of countries with poorer data quality. Finally, some indicators (such as digital services or policy making) have data for only a small number of countries, which is typically due to the source data covering only OECD or EU members (or both).
2.3 Country coverage selection
For the 2017 Pilot edition of the InCiSE Index only two countries had data for all 76 metrics, and a simple threshold of 75% data availability plus membership of the OECD were used as the selection criteria for country availability. However, analysis of the pilot showed (as Table 2.1 shows) that there is a mixed picture of data availability and quality across indicators which is not reflected in this simple threshold. The data quality assessment outlined in Section 2.2 provides a more nuanced way to consider the variation of data availability and quality, and is therefore used to determine which countries are included in the final version of the index for the InCiSE 2019.
In determining country coverage, the InCiSE Partners have decided to use an overall data quality assessment score of 0.5 or greater for the threshold for country inclusion. 38 countries reached this score. Although two further countries would be included if data quality scores were rounded to 1 decimal place, these two countries have lower data availability (57% and 51% of all metrics respectively), which is judged to be too low for reliable analysis. Therefore, the 38 countries with a data quality score of 0.5 or higher (when rounded to 2-decimal places) are included in the 2019 edition of the InCiSE Index. This includes all 31 countries covered by the InCiSE pilot.
Table 2.2 provides an overview of the country-level data quality scores for the group of 38 countries. The table shows that for most indicators the 38 countries have generally good data quality. However, for four indicators (capabilities, crisis & risk management, digital services and procurement) there are a small number of countries with no available data at all.
Table 2.3 provides a summary of the data quality assessment for all 38 countries selected for the 2019 edition of InCiSE, Table 2.4 provides the assessment for the five countries with the next highest data quality score. One country (the United Kingdom) achieved the highest overall data quality score of 0.757, followed closely by five others (Italy, Poland, Sweden, Norway and Slovenia). Countries included for the first time in the 2019 edition of the Index are flagged with the “[new]” marker next to their country name in Table 2.3.
Further discussion on data quality issues are provided at the end of this chapter in section Section 2.8, covering both the quality of the indicators and interpretation of country level results from the InCiSE Index.
Indicator | Lowest country DQA score | Highest country DQA score | Mean country DQA score | Distribution of country DQA scores | ||
---|---|---|---|---|---|---|
DQA ≥ 0.5 | 0.5 > DQA > 0 | DQA = 0 | ||||
Capabilities | 0.000 | 0.333 | 0.244 | 0 | 38 | 10 |
Crisis & risk management | 0.000 | 0.855 | 0.631 | 26 | 12 | 1 |
Digital services | 0.000 | 0.581 | 0.444 | 29 | 9 | 9 |
Fiscal & financial management | 0.439 | 0.889 | 0.783 | 37 | 1 | 0 |
HR management | 0.293 | 0.673 | 0.640 | 35 | 3 | 0 |
Inclusiveness | 0.375 | 0.722 | 0.663 | 33 | 5 | 0 |
Integrity | 0.402 | 0.569 | 0.526 | 29 | 9 | 0 |
Openness | 0.283 | 0.928 | 0.818 | 35 | 3 | 0 |
Policy making | 1.000 | 1.000 | 1.000 | 38 | 0 | 0 |
Procurement | 0.000 | 0.722 | 0.513 | 20 | 18 | 2 |
Regulation | 0.339 | 0.963 | 0.908 | 35 | 3 | 0 |
Tax administration | 0.352 | 0.852 | 0.770 | 34 | 4 | 0 |
Overall data quality | 0.501 | 0.757 | 0.662 | 38 | 0 | 0 |
Table 2.3.A in the original PDF publication |
Code | Country | Overall DQA score | Percent of metrics available | Number of indicators where: 0.5 > DQA > 0 | Indicators with completely missing data (DQA = 0) | |
---|---|---|---|---|---|---|
Number | Indicators | |||||
GBR | United Kingdom | 0.757 | 100% | 1 | 0 | |
ITA | Italy | 0.755 | 99% | 1 | 0 | |
POL | Poland | 0.755 | 99% | 1 | 0 | |
SWE | Sweden | 0.755 | 99% | 1 | 0 | |
NOR | Norway | 0.752 | 99% | 1 | 0 | |
SVN | Slovenia | 0.750 | 99% | 1 | 0 | |
AUT | Austria | 0.738 | 98% | 1 | 0 | |
FIN | Finland | 0.736 | 97% | 2 | 0 | |
ESP | Spain | 0.733 | 97% | 1 | 0 | |
NLD | The Netherlands | 0.731 | 98% | 1 | 0 | |
FRA | France | 0.718 | 97% | 2 | 0 | |
PRT | Portugal | 0.716 | 85% | 1 | 1 | CAP |
DNK | Denmark | 0.707 | 93% | 2 | 0 | |
DEU | Germany | 0.701 | 96% | 2 | 0 | |
GRC | Greece | 0.696 | 94% | 2 | 0 | |
SVK | Slovakia | 0.692 | 93% | 1 | 0 | |
HUN | Hungary | 0.671 | 81% | 1 | 1 | CAP |
EST | Estonia | 0.669 | 90% | 2 | 0 | |
CZE | Czechia | 0.659 | 91% | 3 | 0 | |
TUR | Turkey | 0.650 | 90% | 4 | 0 | |
MEX | Mexico | 0.648 | 73% | 3 | 2 | CAP, DIG |
NZL | New Zealand | 0.644 | 83% | 4 | 1 | DIG |
CHL | Chile | 0.643 | 79% | 4 | 1 | DIG |
CAN | Canada | 0.638 | 78% | 4 | 1 | DIG |
KOR | Republic of Korea | 0.636 | 78% | 4 | 1 | DIG |
BEL | Belgium | 0.635 | 85% | 3 | 1 | CRM |
LVA | Latvia [new] | 0.628 | 75% | 2 | 1 | CAP |
CHE | Switzerland | 0.627 | 79% | 2 | 1 | CAP |
AUS | Australia | 0.618 | 71% | 3 | 3 | CAP, DIG, PRO |
LTU | Lithuania [new] | 0.615 | 82% | 5 | 0 | |
IRL | Ireland | 0.614 | 84% | 4 | 0 | |
JPN | Japan | 0.597 | 75% | 5 | 1 | DIG |
USA | United States of America | 0.579 | 74% | 4 | 2 | DIG, PRO |
ISR | Israel [new] | 0.578 | 72% | 5 | 1 | DIG |
ISL | Iceland [new] | 0.563 | 68% | 5 | 1 | CAP |
ROU | Romania [new] | 0.529 | 66% | 5 | 1 | CAP |
BGR | Bulgaria [new] | 0.511 | 66% | 6 | 1 | CAP |
HRV | Croatia [new] | 0.501 | 65% | 6 | 1 | CAP |
NA | Mean of 38 countries | 0.635 | 82% | 3 | 1 | |
Table 2.3.B in the original PDF publication |
Code | Country | Overall DQA score | Percent of metrics available | Number of indicators where: 0.5 > DQA > 0 | Indicators with completely missing data (DQA = 0) | |
---|---|---|---|---|---|---|
Number | Indicators | |||||
COL | Columbia | 0.471 | 57% | 6 | 3 | CAP, DIG, POL |
LUX | Luxembourg | 0.460 | 51% | 7 | 2 | CAP, INC |
CYP | Cyprus | 0.435 | 64% | 9 | 1 | CRM |
CRI | Costa Rica | 0.417 | 48% | 7 | 3 | CAP, DIG, POL |
MLT | Malta | 0.375 | 49% | 9 | 2 | CAP, CRM |
Table 2.3.B in the original PDF publication |
2.4 Imputation of missing data
As seen in Table Table 2.3 only one country has complete data (i.e. 100% of metrics). The average level of data availability is 86% across the 38 countries, and 7 of the included countries have data availability below the 75% threshold used for the 2017 Pilot, with the lowest level of data availability being 65%. Of the 38 countries, 15 have one indicator with a data quality score of 0 (i.e. no data at all for that indicator), two countries have two indicators with a data quality score of 0 and one country has three indicators with a data quality score of 0.
This presents issues for the analysis of the data and providing an effective method for aggregating the metrics into indicators and an overall index. The 2017 Pilot edition of InCiSE adopted two methods for imputation: multiple imputation using linear regression and median imputation. For the 2019 edition of InCiSE a decision has been made to move fully to a multiple imputation approach, using the ‘predictive mean matching’ (PMM) technique of van Buuren & Groothuis-Oudshoorn (2011). The PMM technique uses correlation – of both the values and pattern of missing data – to identify for a country with missing data those countries in the dataset that closely match it, and randomly select one of those to replace the missing value. Following the approach set out by van Buuren (2018), for each missing value 15 imputations are generated (each of which has also been iterated 15 times). A simple mean of these 15 imputation values is then calculated and used as the country’s value in the ‘final’ dataset.
Imputation is handled on a per-indicator basis – in most cases imputation will be solely from within the metrics of that indicator. However, a few indicators have external predictors, either data from elsewhere in the InCiSE model or from an external data source. Full details of the imputation approach for each indicator are described in Chapters 3-14.
2.5 Data normalisation
As a result of coming from different sources, the underlying data that drives the InCiSE model has a variety of formats: some are proportions or scores from 0 to 1 or 0 to 100; some are ratings on a scale, or the average of ratings given by a set of assessors/survey participants; and some are counts. The different formats of these data are not easily comparable, and cannot be directly averaged together to produce a combined score. In order to facilitate the comparison and combination of data from different sources, the metrics are normalised so that they are all in a common format.
There are a number of normalisation techniques that could be used. A useful discussion of the different methods is provided in the OECD et al. (2008) Handbook on Constructing Composite Indicators. The InCiSE Index uses min-max normalisation at all stages, as this maintains the underlying distribution of each metric while providing a common scale of 0 to 1. The common scale is of particular benefit, as it helps achieve InCiSE’s goal of assessing relative performance. In the min-max normalisation 0 represents the lowest achieved score and 1 represents the highest achieved score. It is therefore important to note that scoring 0 on a particular metric, indicator or the index itself does not represent poor performance in absolute terms, nor does scoring 1 represent high performance in absolute terms. Rather the country is either the lowest or highest performing of the 38 countries selected.
The min-max normalisation operates via the following mathematical formula:
\[ m_c = \frac{x_c-x_{min}}{x_{max}-x_{min}} \]
For a metric for a given country its normalised score (\(m_c\)) is calculated as the difference of the country’s original score (\(x_c\)) from the metric’s minimum score (\(x_min\)) divided by the range of the metric’s scores (the difference of the metric’s maximum score (\(x_max\)) from the metric’s minimum score (\(x_min\)).
2.6 Calculation of the InCiSE indicators
Once the data has been processed, missing data imputed, and the metrics normalised, the InCiSE indicators can be calculated. There are two stages to the calculation of the indicators: the weighting of the metrics into an aggregate score, and the normalisation of that score.
As outlined in Figure 1.2, the InCiSE data model first groups metrics into themes before aggregating into the indicator scores themselves. These themes are purely structural and scores for them are not computed. The raw score for an indicator follows this formula:
\[ i_c = \sum{(m_{i,c}*w_m*w_t)} \]
A country’s raw score for an indicator (\(i_c\)) is calculated as the sum of the product of each metric within the indicator for that country (\(m_{i,c}\)) with the weight of that metric within its theme (\(w_m\)) and the weight of that theme within the indicator (\(w_t\)). The weighting structure for each indicator are listed in detail in Chapters 3-14. After the raw scores are calculated they are normalised as described in Section 2.5 above.
2.7 Calculation of the InCiSE Index
The InCiSE Index is an aggregation of the InCiSE indicators. Ideally, the indicators would be combined equally, however in producing the 2017 Pilot edition the InCiSE Partners felt it important to consider relative data quality. In the 2017 Pilot this was done by placing a lower weight on the indicators measuring ‘attributes’ than those measuring’functions’, as the four attribute indicators were considered to generally have lower data quality than those measuring functions. The 2019 edition builds on this approach to weighting by using the results of the data quality assessment (section Section 2.2).
For this approach to weighting, two-thirds of the weighting is allocated on an equal basis, while one third is allocated according to the outcome of the data quality assessment. The weight for an indicator is calculated as follows:
\[ w_i = \left(\frac{2}{3}*\frac{1}{n_i}\right) + \left(\frac{1}{3}*Q_i\right) \]
Here the indicator weight (\(w_i\)) is equal to the product of two-thirds and the equal share (1 divided by \(n_i\), the number of indicators; i.e. 1/12) plus the product of one-third and the data quality weight for the indicator (\(Q_i\)). The data quality weight is calculated first by summing the data quality scores of the 38 selected countries for the indicator. The indicator’s data quality sum is then divided by the sum of all indicator data quality scores, in essence providing a score that represents that indicator’s share of the total data quality for the 38 countries selected. The resulting weights are shown in Table Table 2.5.
A country’s overall raw index score (\(I_c\)) is thus calculated as the sum of the product of the normalised indicator scores for the country (\(i_c\)) with the indicator weights (\(w_i\)):
\[ I_c = \sum{(i_c * w_i)} \]
After calculating the raw index scores, they are then are normalised as outlined in Section 2.5, resulting in the overall index scores for the 2019 edition of InCiSE.
InCiSE indicator | Sum of data quality scores | Share of total data quality scores | Final weight | Approximate fraction |
---|---|---|---|---|
Capabilities | 9.271 | 3.1% | 6.6% | 1/15 |
Crisis & risk management | 23.967 | 7.9% | 8.2% | 1/12 |
Digital services | 16.855 | 5.6% | 7.4% | 1/13 |
Fiscal and financial management | 29.763 | 9.9% | 8.8% | 1/11 |
HR management | 24.332 | 8.1% | 8.2% | 1/12 |
Inclusiveness | 25.188 | 8.3% | 8.3% | 1/12 |
Integrity | 19.995 | 6.6% | 7.8% | 1/13 |
Openness | 31.100 | 10.3% | 9.0% | 1/11 |
Policy making | 38.000 | 12.6% | 9.8% | 1/10 |
Procurement | 19.500 | 6.5% | 7.7% | 1/13 |
Regulation | 34.510 | 11.4% | 9.4% | 1/11 |
Tax administration | 29.269 | 9.7% | 8.8% | 1/11 |
Overall | 301.749 | 100.0% | 100.0% | |
Table 2.7.A in the original PDF publication |
2.8 Data quality considerations
Sections Section 2.3 and Section 2.7 illustrate how the data quality assessment described in section Section 2.2 is used within the InCiSE model for country selection and indicator weighting.
The assessment can also be used to help interpret the results of the InCiSE Index, both in terms of the quality of the indicators and for country results.
2.8.1 Quality of indicators
The data quality assessment conducts three checks for each indicator: the availability of metrics, the (non-)use of wider public sector data as a proxy, and the recency of the data. Table 2.6 summarises the results of these three checks for each of the indicators.
As discussed in sections Section 2.3 and Section 2.4 there are four indicators where at least one country is missing all data for the indicator. Conversely, there is only one indicator (policy making) where all 38 countries have all data available. When it comes to the use of public sector proxy data, there are six indicators where all the data is not a public sector proxy, giving the indicators a maximum proxy data score of 1, and only two indicators (capabilities and digital services) where all the data relates to the civil service and is not public sector proxy which means their maximum proxy score is 0. The recency calculation is a relative assessment where the oldest data (2012) scored 0 and the most recent data (2018) scored 1 – here we see that only one indicator (policy making) is composed solely of 2018 data and again only one indicator (capabilities) is composed solely of 2012 data.
InCiSE indicator | Data availability | Public sector proxy | Recency of data | Overall DQA score | Countries with max DAQ score | Mean DQA score | RAG rating | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
Min | Max | Min | Max | Min | Max | Min | Max | ||||
Capabilities | 0.00 | 1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.33 | 25 | 0.244 | |
Crisis & risk management | 0.00 | 1 | 0.00 | 1.00 | 0.00 | 0.56 | 0.00 | 0.85 | 18 | 0.631 | |
Digital services | 0.00 | 1 | 0.00 | 0.00 | 0.00 | 0.74 | 0.00 | 0.58 | 29 | 0.444 | |
Fiscal & financial management | 0.40 | 1 | 0.50 | 1.00 | 0.42 | 0.67 | 0.44 | 0.89 | 19 | 0.783 | |
HR management | 0.60 | 1 | 0.00 | 0.44 | 0.28 | 0.57 | 0.29 | 0.67 | 34 | 0.640 | |
Inclusiveness | 0.63 | 1 | 0.20 | 0.60 | 0.30 | 0.57 | 0.38 | 0.72 | 30 | 0.663 | |
Integrity | 0.78 | 1 | 0.00 | 0.18 | 0.43 | 0.53 | 0.40 | 0.57 | 14 | 0.526 | |
Openness | 0.30 | 1 | 0.30 | 1.00 | 0.25 | 0.78 | 0.28 | 0.93 | 22 | 0.818 | |
Policy making | 1.00 | 1 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 38 | 1.000 | |
Procurement | 0.00 | 1 | 0.00 | 0.50 | 0.00 | 0.67 | 0.00 | 0.72 | 18 | 0.513 | |
Regulation | 0.35 | 1 | 0.33 | 1.00 | 0.33 | 0.89 | 0.34 | 0.96 | 34 | 0.908 | |
Tax administration | 0.50 | 1 | 0.33 | 1.00 | 0.22 | 0.56 | 0.35 | 0.85 | 24 | 0.770 | |
Table 2.8.A in the original PDF publication |
- Mean DQA ≥ 0.75
- Mean DQA 0.75-0.25
- Mean DQA < 0.25
We can also see in Table 2.6 that there is noticeable variation in the number of countries that achieve the maximum overall data quality score for each indicator. For policy making all 38 countries score achieve the maximum score, while for integrity only 14 countries achieve the maximum score.
Besides integrity, three other indicators (crisis & risk management, fiscal & financial management, and procurement) have less than 20 countries achieving the maximum score, while three indicators besides policy making have more than 30 countries achieving the maximum score (HR management, inclusiveness, and regulation).
The indicator data quality scores can also be used to create a data-driven red-amber-green (RAG) rating for data quality. Using the mean overall data quality scores for each indicator from the 38 countries selected for the 2019 edition of InCiSE, a ‘green’ rating is assigned to those with a score of 0.75 or higher, ‘amber’ to those with a score between 0.25 and 0.75, and ‘red’ to those with a score below 0.25.
However, the data quality assessment does not consider the reliability and validity of each indicator’s construction and therefore says nothing on how well the indicator represents the concept it is trying to measure. Instead, these data-driven RAG ratings can be combined with a subjective assessment of wider data quality concerns to make an overall assessment of the general ‘quality’ of each indicator. Table 2.7 shows the data quality assessment of each indicator alongside a high-level qualitative assessment of the indicator and a ‘final’ subjective RAG rating for the indicator.
InCiSE indicator | Mean DQA score | Number of metrics | DQA-based RAG rating | High-level assessment of the reliability and validity of the indicator construction | Final RAG rating |
---|---|---|---|---|---|
Policy making | 1.000 | 8 | The indicator uses a wide range of metrics that give a broad overview of the concept, however these come from a single source relying on external expert perception. | ||
Regulation | 0.908 | 3 | The indicator contains a number of metrics which appear to give a detailed overview of the concept. | ||
Openness | 0.818 | 10 | The indicator uses a large number of metrics from a wide range of sources that give a broad overview of the concept. | ||
Fiscal & financial management | 0.783 | 6 | The indicator contains a number of metrics which appear to give a detailed overview of the concept. | ||
Tax administration | 0.770 | 6 | The indicator has a small number of metrics that give an overview of some aspects of the concept. | ||
Inclusiveness | 0.663 | 5 | The indicator has only a small number of metrics which only provide a partial picture of performance across the concept. | ||
HR management | 0.640 | 9 | The indicator's metrics give an overview of some aspects of the concept, but several metrics are dependent on external perceptions and public sector proxy data. | ||
Crisis & risk management | 0.631 | 13 | The indicator contains a wide range of metrics which provide a broad overview of the concept, however one of the two data sources focuses solely on natural disaster risk management. | ||
Integrity | 0.536 | 17 | The indicator has a large number of metrics that give a broad overview of the concept, however it relies heavily on external expert perceptions. | ||
Procurement | 0.513 | 6 | The indicator has a small number of metrics that give an overview of some aspects of the concept. | ||
Digital services | 0.444 | 13 | The indicator relies on a number of metrics from a single source which gives an overview of some aspects of the concept and relies on public sector proxy data. | ||
Capabilities | 0.244 | 14 | While the indicator has a large number of metrics, these are all drawn from a public sector proxy and date between 2012-2015. | ||
IT for officials | No data available: indicator not measured. | ||||
Innovation | No data available: indicator not measured. | ||||
Internal finance | No data available: indicator not measured. | ||||
Social security administration | The social security administration indicator has been depreciated following an in-depth review. | ||||
Staff engagement | No data available: indicator not measured. | ||||
Table 2.8.B in the original PDF publication |
- Green rating icon
- Amber rating icon
- Red rating icon
- X rating icon
Five of the indicators have a mean data quality score of 0.75 or higher, earning them an initial ‘green’ rating. Of these indicators, three retain their green rating after wider considerations of the quality of the indicators are taken into account, meaning that these indicators are considered to provide broad and robust coverage of their respective concepts. Two of the five are demoted from green to amber, reflecting concerns about whether the indicators are sufficiently broad.
Six of the indicators have an initial ‘amber’ rating. Five of these indicators retain their rating, meaning they may only provide partial coverage of the underlying concept or be heavily reliant on one particular data source or type of data. One of the six is demoted from amber to red, reflecting concerns that the indicator provides limited coverage of the underlying concept.
One indicator has an initial ‘red’ rating, which is driven largely by its lack of recent data and being solely composed of public sector proxy data. Finally, the social security function, which was included in the 2017 Pilot, is given a ‘red’ rating following its removal from the 2019 edition of InCiSE due to data quality concerns. This change is discussed further in Chapter 15 and Chapter 17.
2.8.2 Quality of country-level results
Country-level data quality has already been considered to some degree, through the determination of country selection in Section 2.3. However, as with the quality of indicators, the results of the data quality assessment can be used to show the relative quality of the selected countries, which can help improve interpretation of the results of the InCiSE Index.
Table 2.8 presents a detailed overview of the data quality by country. Each country has been given an overall data quality letter “grade” based on its overall data quality score, and for each indicator each country has been given a “RAG” rating.
The overall data quality grades are allocated as follows based on a country’s data quality score rounded to 2 decimal places:
- A+ for those countries that achieve the highest overall data quality assessment score (i.e. a data quality score of 0.75 when rounded to 2 decimal places)
- A for countries with a data quality score greater than or equal to 0.7 but less than 0.75
- B for countries with a data quality score greater than or equal to 0.65 but less than 0.7
- C for countries with a data quality score greater than or equal to 0.6 but less than 0.65
- D for countries with a data quality score greater than or equal to 0.5 but less than 0.6
For the indicators, a four category “RAG+” rating system is adopted. The data quality scores have been normalised (using min-max normalisation) by indicator:
A ‘green’ rating is given to those countries with a normalised indicator data quality score of 1 – the country has the best possible data for this indicator.
An ‘amber’ rating is given to those countries with a normalised indicator data quality score of greater than or equal to 0.5 – the country’s data quality is at least half as good as the ‘best’ possible data for that indicator.
A ‘red’ rating is given to those countries with a normalised indicator data quality score of less than 0.5 – the country’s data quality is less than half as good as the’best’ possible data for that indicator.
An ‘X’ rating is given to those countries which have no data at all for that metric – that all of the country’s scores for the metrics in that indicator have been imputed.
Country | Overall data quality score | Data quality grade | Percent of metrics available | CAP | CRM | DIG | FFM | HRM | INC | INT | OPN | POL | PRO | REG | TAX |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GBR | 0.757 | A+ | 100% | ||||||||||||
ITA | 0.755 | A+ | 99% | ||||||||||||
POL | 0.755 | A+ | 99% | ||||||||||||
SWE | 0.755 | A+ | 99% | ||||||||||||
NOR | 0.752 | A+ | 99% | ||||||||||||
SVN | 0.750 | A | 99% | ||||||||||||
AUT | 0.738 | A | 98% | ||||||||||||
FIN | 0.736 | A | 97% | ||||||||||||
ESP | 0.733 | A | 97% | ||||||||||||
NLD | 0.731 | A | 98% | ||||||||||||
FRA | 0.718 | A | 97% | ||||||||||||
PRT | 0.716 | A | 85% | ||||||||||||
DNK | 0.707 | A | 93% | ||||||||||||
DEU | 0.701 | A | 96% | ||||||||||||
GRC | 0.696 | B | 94% | ||||||||||||
SVK | 0.692 | B | 93% | ||||||||||||
HUN | 0.671 | B | 81% | ||||||||||||
EST | 0.669 | B | 90% | ||||||||||||
CZE | 0.659 | B | 90% | ||||||||||||
TUR | 0.650 | C | 90% | ||||||||||||
MEX | 0.648 | C | 73% | ||||||||||||
NZL | 0.644 | C | 83% | ||||||||||||
CHL | 0.643 | C | 79% | ||||||||||||
CAN | 0.638 | C | 78% | ||||||||||||
KOR | 0.636 | C | 78% | ||||||||||||
BEL | 0.635 | C | 85% | ||||||||||||
LVA | 0.628 | C | 75% | ||||||||||||
CHE | 0.627 | C | 79% | ||||||||||||
AUS | 0.618 | C | 71% | ||||||||||||
LTU | 0.615 | C | 82% | ||||||||||||
IRL | 0.614 | C | 84% | ||||||||||||
JPN | 0.597 | D | 75% | ||||||||||||
USA | 0.579 | D | 74% | ||||||||||||
ISR | 0.578 | D | 72% | ||||||||||||
ISL | 0.563 | D | 68% | ||||||||||||
ROU | 0.529 | D | 66% | ||||||||||||
BGR | 0.511 | D | 66% | ||||||||||||
HRV | 0.501 | D | 65% | ||||||||||||
Table 2.3.C in the original PDF publication |
- Green rating icon
- Amber rating icon
- Red rating icon
- X rating icon
Table 2.8 reveals interesting patterns in data quality:
- Six countries are given an “A+” rating – one has full data for all indicators (i.e. all indicator rated ‘green’), while the other five have just one indicator where they have an ‘amber’ rating.
- Eight countries achieve an “A” rating – they have generally good coverage of data but typically have two or three indicators rated ‘amber’ or ‘red”, only one country has an indicator where all data for that indicator has been imputed (rated’grey’).
- Seven countries achieve a “B” rating for data quality – these countries have a greater degree of ‘amber’ and ‘red’ rated indicators, typically four. All but one country has at least one ‘red’ rated indicator, one country has one indicator fully imputed while another has two indicators fully imputed.
- Ten countries achieve a “C” rating for data quality – all countries have at least one ‘red’ rated indicator and eight of the countries have at least one indicator fully imputed.
- Seven countries achieve a “D” rating for data quality – all countries both have at least one indicator fully imputed and one indicator rated ‘red’, four countries have at least four indicators rated ‘red’.
2.9 Comparisons over time
The InCiSE project is still in its infancy, and the methodology for the 2019 Index has built substantially on the foundations of the 2017 Pilot – most of the metrics used in the 2017 Pilot have continued to be used in the 2019 edition. Of the 70 metrics in the 2017 Pilot that are directly comparable to the 2019 edition, 33 have since had updates which are incorporated into the model.
In addition to the 70 metrics carried over from the 2017 Pilot, a further 46 metrics have been incorporated into the InCiSE methodology, bringing the total number of metrics for the 2019 model to 116. Most of these additional metrics (30) are from existing sources. Some have been collected multiple times, but some are new and have no previous data collection. Changes are summarised in Chapter 15.
A further consideration for comparisons over time is the need to deal with different reference dates and frequencies of updating.
Some data is updated on an annual basis while others are on two-year, three-year, or longer update cycles. For example, the data for capabilities has not been updated since it was first collected in 2012. These differing cycles are the function of a variety of different factors, such as an appreciation of the pace of change within a given topic area or the funding and resourcing of the data producers.
As outlined in Section 2.4, the InCiSE model uses imputation methods which use statistical techniques to provide an estimate of a country’s missing data. While the imputation is based on predictive methods, it is not a firm prediction of what a given country would have scored, but better understood as indicative. The imputation methods may change between years, and the relationships in the observed data (from which the imputation is drawn) may also change, limiting the reliability of comparing data imputed in one year with data imputed in another year.
It may also be the case that at one time point a country did not have data for a given metric but then has data at a later time point (or vice versa). This would mean that for one point the metrics would have been imputed.
Comparing a score based on ‘real’ data with one based on imputed estimates is unlikely to be reliable. In addition, as the methodology for InCiSE develops, future versions of the InCiSE Index could adopt back-/forward-casting (i.e. using results from different time points) to improve the quality of the imputation methods. This would also make time-series comparison more complicated or less feasible.
Finally, consideration should be given to the changing country composition. The 2017 Pilot covered 31 countries, while the 2019 edition covers 38 countries. As outlined in section Section 2.5, the data is normalised so that country scores are relative to the group of countries selected. This again means it is not possible to directly compare scores from one edition of InCiSE to another as the scores are related to the specific data range and country set used for that edition.
As a result of these varied challenges, the InCiSE Partners have decided not to include any comparisons between the 2017 Pilot and the 2019 edition of the InCiSE Index.
Furthermore, the Partners strongly advise against any direct or indirect comparisons being made beyond references to changes in the underlying source data itself (i.e. before the data is imported into the InCiSE data model, processed, imputed and normalised).