3 Methodology of the InCiSE index

As outlined in Chapter 2, the InCiSE Index is a composite index formed from a series of indicators, each of which is comprised of a set individual metrics. The overall Index is the normalised and weighted average of the scores of the constituent InCiSE indicators. The InCiSE indicators are themselves normalised weighted averages of their individual metrics. The calculation and modelling process to produce the Index is as follows:

Data processing:
1. Data preparation Section 3.1
2. Data quality assessment Section 3.2
3. Country coverage selection Section 3.3
4. Imputation of missing data Section 3.4
5. Data normalisation Section 3.5
Calculation of the InCiSE indicators Section 3.6:
1. Raw score calculated as a weighted average of the individual metrics
2. Raw score normalised to produce final indicator score
Calculation of the InCiSE Index Section 3.7:
1. Raw score calculated as a weighted average of the indicator scores
2. Raw score normalised to produce final Index score

This chapter outlines the methodology for each of these different stages, and finishes with a discussion of key data quality considerations in Section 3.8 and comparisons over time in Section 3.9, while Chapters 3-14 provide details on the specific methodology of each of the InCiSE indicators.

3.1 Data preparation

The data for InCiSE comes from a wide range of independent sources, such as the UN’s E-Government Survey, Transparency International’s Global Corruption Barometer, and Bertelsmann’s Sustainable Governance Indicators (SGIs).¹ The InCiSE partnership does not produce any of the source data itself or engage in primary data collection.

¹ A full list of data sources can be found in the References section at the end of this report.

The data for the 2019 edition of InCiSE is the latest available as of 30 November 2018. As well as the source metrics some additional data are collected to aid in the imputation of missing data – this data does not directly contribute to the scores and therefore is not included in the published results.

Some of the source data requires processing before it is suitable for use in the InCiSE calculations and modelling. For example:

Binary/multiple categorical data: some of the source data are binary measures (e.g. yes/no questions) or assess multiple categories (e.g. groups subject to whistleblower protection). In most cases this type of data is summed.
Individual level microdata: InCiSE uses a custom analysis of the Programme for the International Assessment of Adult Competencies (PIAAC) individual-level microdata to produce country scores. The Opentender data on procurement is on individual contracts, which also requires analysis to produce country scores.
Negatively framed data: Some of the source data is based on negatively framed questions, where a higher score is poorer performance than a lower score. To align with other metrics, this data is inverted so that higher scores relate to better performance than lower scores.
Calculations against reference data: For the inclusiveness indicator, women’s representation in the civil service/public sector is compared to the labour market in general. Tax administration from the OECD is published as raw data. InCiSE uses rates based on these data which must therefore be calculated.

Chapters 3-14 outline the underlying source data for each of the indicators, and covers the specific transformations that are applied to the source data. Appendix A outlines the construction and calculation of the composite metrics (metrics calculated from more than a single data point in the original source) that are included in some of the indicators.

When importing data to the InCiSE model, data is matched against a reference list of 249 countries and territories produced by Arel-Bundock et al. (2018) using the 3-digit ISO 3166-1 alphanumeric codes. Some source data natively uses the 3-digit ISO country codes, but some use the 2-digit ISO code, another code system, or a name of the territory (either the official long/short name, or colloquial name). Therefore, as part of data preparation, all country references are converted to the 3-digit ISO country code.

3.2 Data quality assessment

In order to provide a clearer understanding of the quality of the InCiSE Index, a data quality assessment has been calculated and published alongside the 2019 edition. This assessment has a dual role: it is an important piece of metadata that will help users of the InCiSE Index better understand the results, but it has also been used to determine the country coverage of the InCiSE Index. This section describes the method for conducting the data quality assessment. The use of the assessment for country selection and weighting are discussed in sections Section 3.3 and Section 3.7 respectively, while a wider discussion of data quality based on the results of the assessment is provided at section Section 3.8.

The data quality assessment is a purely quantitative exercise based on three factors: data availability, the (non-)use of public sector proxy data, and the recency of the data. The assessment does not include any subjective evaluation of the methodology or the quality of the data sources that the underlying data used by InCiSE comes from.

The data quality assessment also does not incorporate assessments of the reliability or validity of indicator and index construction. Its purpose is to provide an assessment of easily quantifiable characteristics of the data, which can help interpretation of the InCiSE results for countries and of the indicators.

The simple mean of the three measures is taken as the data quality score for each country for each indicator. The 12 overall indicator quality scores are then combined as a simple mean score to produce an overall data quality assessment for each country.

For each indicator, the data quality assessment is based on three measures: (1) the proportion of metrics with data; (2) the proportion of metrics that have civil service specific data; and (3) the recency of the data. All three measures take a simple assessment of whether data is missing or present as their basis. However, each measure has different weighting rules for the data:

Data availability: A missing data point fora metric with a within-indicator weight of 15% will give a greater penalty than a missing data point for a metric with a within-indicator weight of 5%.
Civil service data (1) or a public sector proxy (0): Data points that come from public sector data are treated as equivalent to being missing.
Recency of the data: The reference year of the metric is scaled from 0 (for 2012, the earliest year) to 1 (for 2018, the latest year) and used as the weighting.²

² For example a datapoint with a reference year of 2013 will be weighted 0.1667, while one with a reference year of 2016 will be weighted 0.6667.

The country indicator data quality scores and overall data quality assessment (\(DQA_{c,i}\)) for a given country (\(c\)) and indicator (\(i\)) is calculated by multiplying the missing data matrix of the metrics in the indicator for that country (\(d_{c,i}\)) by each of: the within indicator weighting for the metrics in the indicator (\(m_i\)), the proxy data status of each metric in the indicator (\(s_i\)), and the recency of each metric in the indicator (\(r_i\)). The resulting products are summed and divided by three to give the mean data quality for that country and indicator.

\[ DQA_{c,i} = \frac{{(d_{c,i} * m_i) + (d_{c,i} * s_i) + (d_{c,i} * r_i)}}{3} \]

The overall data quality indicator for a country (\(DQA_c\)) is then calculated as the sum of data quality assessment scores of that country for each indicator divided by the number of indicators (\(n_i\)).

\[ DQA_c = \frac{\sum{DQA_{c,i}}}{n_i} \]

The data quality assessment scores therefore have a theoretical range from 0 to 1. Where 0 represents there being no metrics available and 1 represents there being data for all metrics, with all data representing the civil service (i.e. not using a public-sector proxy) and all data relating to the latest available year. Table 3.1 illustrates the complex picture of data quality across all countries and indicators.

The table shows how maximum data quality varies from 0.333 for capabilities, where the available data is for a public sector proxy and the oldest data in the model, to 1.000 for policy making, where all the available data relates to the civil service and is at the latest available data.

The indicators for openness, fiscal & financial management and crisis & risk management have good data quality (DQA score greater than or equal to 0.5) for a very large number of countries. Other indicators (such as HR management or tax administration) have a moderate number of countries with good data quality, but have a large number of countries with poorer data quality. Finally, some indicators (such as digital services or policy making) have data for only a small number of countries, which is typically due to the source data covering only OECD or EU members (or both).

Table 3.1: Data quality assessment (DQA) results across the 12 InCiSE indicators and overall, for all 249 countries and territories considered by the InCiSE data model
Indicator	Highest country DQA score	Distribution of country DQA scores
Indicator	Highest country DQA score	DQA ≥ 0.5	0.5 > DQA > 0	DQA = 0
Capabilities	0.333	0	31	218
Crisis & risk management	0.855	95	13	141
Digital services	0.581	34	0	215
Fiscal & financial management	0.889	109	88	52
HR management	0.673	37	83	129
Inclusiveness	0.722	34	82	133
Integrity	0.569	30	127	92
Openness	0.928	105	93	51
Policy making	1.000	41	0	208
Procurement	0.722	20	24	205
Regulation	0.963	38	5	206
Tax administration	0.852	46	141	62
Overall data quality assessment	0.757	38	162	49
Table 2.2.A in the original PDF publication

3.3 Country coverage selection

For the 2017 Pilot edition of the InCiSE Index only two countries had data for all 76 metrics, and a simple threshold of 75% data availability plus membership of the OECD were used as the selection criteria for country availability. However, analysis of the pilot showed (as Table 3.1 shows) that there is a mixed picture of data availability and quality across indicators which is not reflected in this simple threshold. The data quality assessment outlined in Section 3.2 provides a more nuanced way to consider the variation of data availability and quality, and is therefore used to determine which countries are included in the final version of the index for the InCiSE 2019.

In determining country coverage, the InCiSE Partners have decided to use an overall data quality assessment score of 0.5 or greater for the threshold for country inclusion. 38 countries reached this score. Although two further countries would be included if data quality scores were rounded to 1 decimal place, these two countries have lower data availability (57% and 51% of all metrics respectively), which is judged to be too low for reliable analysis. Therefore, the 38 countries with a data quality score of 0.5 or higher (when rounded to 2-decimal places) are included in the 2019 edition of the InCiSE Index. This includes all 31 countries covered by the InCiSE pilot.

Table 3.2 provides an overview of the country-level data quality scores for the group of 38 countries. The table shows that for most indicators the 38 countries have generally good data quality. However, for four indicators (capabilities, crisis & risk management, digital services and procurement) there are a small number of countries with no available data at all.

Table 3.3 provides a summary of the data quality assessment for all 38 countries selected for the 2019 edition of InCiSE, Table 3.4 provides the assessment for the five countries with the next highest data quality score. One country (the United Kingdom) achieved the highest overall data quality score of 0.757, followed closely by five others (Italy, Poland, Sweden, Norway and Slovenia). Countries included for the first time in the 2019 edition of the Index are flagged with the “[new]” marker next to their country name in Table 3.3.

Further discussion on data quality issues are provided at the end of this chapter in section Section 3.8, covering both the quality of the indicators and interpretation of country level results from the InCiSE Index.

Table 3.2: Data quality assessment (DQA) results for the 38 countries included in the 2019 index
Indicator	Lowest country DQA score	Highest country DQA score	Mean country DQA score	Distribution of country DQA scores
Indicator	Lowest country DQA score	Highest country DQA score	Mean country DQA score	DQA ≥ 0.5	0.5 > DQA > 0	DQA = 0
Capabilities	0.000	0.333	0.244	0	38	10
Crisis & risk management	0.000	0.855	0.631	26	12	1
Digital services	0.000	0.581	0.444	29	9	9
Fiscal & financial management	0.439	0.889	0.783	37	1	0
HR management	0.293	0.673	0.640	35	3	0
Inclusiveness	0.375	0.722	0.663	33	5	0
Integrity	0.402	0.569	0.526	29	9	0
Openness	0.283	0.928	0.818	35	3	0
Policy making	1.000	1.000	1.000	38	0	0
Procurement	0.000	0.722	0.513	20	18	2
Regulation	0.339	0.963	0.908	35	3	0
Tax administration	0.352	0.852	0.770	34	4	0
Overall data quality	0.501	0.757	0.662	38	0	0
Table 2.3.A in the original PDF publication

Table 3.3: Data quality assessment (DQA) results by country
Code	Country	Overall DQA score	Percent of metrics available	Number of indicators where: 0.5 > DQA > 0	Indicators with completely missing data (DQA = 0)
Code	Country	Overall DQA score	Percent of metrics available	Number of indicators where: 0.5 > DQA > 0	Number	Indicators
GBR	United Kingdom	0.757	100%	1	0
ITA	Italy	0.755	99%	1	0
POL	Poland	0.755	99%	1	0
SWE	Sweden	0.755	99%	1	0
NOR	Norway	0.752	99%	1	0
SVN	Slovenia	0.750	99%	1	0
AUT	Austria	0.738	98%	1	0
FIN	Finland	0.736	97%	2	0
ESP	Spain	0.733	97%	1	0
NLD	The Netherlands	0.731	98%	1	0
FRA	France	0.718	97%	2	0
PRT	Portugal	0.716	85%	1	1	CAP
DNK	Denmark	0.707	93%	2	0
DEU	Germany	0.701	96%	2	0
GRC	Greece	0.696	94%	2	0
SVK	Slovakia	0.692	93%	1	0
HUN	Hungary	0.671	81%	1	1	CAP
EST	Estonia	0.669	90%	2	0
CZE	Czechia	0.659	91%	3	0
TUR	Turkey	0.650	90%	4	0
MEX	Mexico	0.648	73%	3	2	CAP, DIG
NZL	New Zealand	0.644	83%	4	1	DIG
CHL	Chile	0.643	79%	4	1	DIG
CAN	Canada	0.638	78%	4	1	DIG
KOR	Republic of Korea	0.636	78%	4	1	DIG
BEL	Belgium	0.635	85%	3	1	CRM
LVA	Latvia [new]	0.628	75%	2	1	CAP
CHE	Switzerland	0.627	79%	2	1	CAP
AUS	Australia	0.618	71%	3	3	CAP, DIG, PRO
LTU	Lithuania [new]	0.615	82%	5	0
IRL	Ireland	0.614	84%	4	0
JPN	Japan	0.597	75%	5	1	DIG
USA	United States of America	0.579	74%	4	2	DIG, PRO
ISR	Israel [new]	0.578	72%	5	1	DIG
ISL	Iceland [new]	0.563	68%	5	1	CAP
ROU	Romania [new]	0.529	66%	5	1	CAP
BGR	Bulgaria [new]	0.511	66%	6	1	CAP
HRV	Croatia [new]	0.501	65%	6	1	CAP
NA	Mean of 38 countries	0.635	82%	3	1
Table 2.3.B in the original PDF publication

Table 3.4: Data quality assessment (DQA) results by country for the next countries after the 38 selected for inclusion in the InCiSE 2019 model
Code	Country	Overall DQA score	Percent of metrics available	Number of indicators where: 0.5 > DQA > 0	Indicators with completely missing data (DQA = 0)
Code	Country	Overall DQA score	Percent of metrics available	Number of indicators where: 0.5 > DQA > 0	Number	Indicators
COL	Columbia	0.471	57%	6	3	CAP, DIG, POL
LUX	Luxembourg	0.460	51%	7	2	CAP, INC
CYP	Cyprus	0.435	64%	9	1	CRM
CRI	Costa Rica	0.417	48%	7	3	CAP, DIG, POL
MLT	Malta	0.375	49%	9	2	CAP, CRM
Table 2.3.B in the original PDF publication

3.4 Imputation of missing data

As seen in Table Table 3.3 only one country has complete data (i.e. 100% of metrics). The average level of data availability is 86% across the 38 countries, and 7 of the included countries have data availability below the 75% threshold used for the 2017 Pilot, with the lowest level of data availability being 65%. Of the 38 countries, 15 have one indicator with a data quality score of 0 (i.e. no data at all for that indicator), two countries have two indicators with a data quality score of 0 and one country has three indicators with a data quality score of 0.

This presents issues for the analysis of the data and providing an effective method for aggregating the metrics into indicators and an overall index. The 2017 Pilot edition of InCiSE adopted two methods for imputation: multiple imputation using linear regression and median imputation. For the 2019 edition of InCiSE a decision has been made to move fully to a multiple imputation approach, using the ‘predictive mean matching’ (PMM) technique of van Buuren & Groothuis-Oudshoorn (2011). The PMM technique uses correlation – of both the values and pattern of missing data – to identify for a country with missing data those countries in the dataset that closely match it, and randomly select one of those to replace the missing value. Following the approach set out by van Buuren (2018), for each missing value 15 imputations are generated (each of which has also been iterated 15 times). A simple mean of these 15 imputation values is then calculated and used as the country’s value in the ‘final’ dataset.

Imputation is handled on a per-indicator basis – in most cases imputation will be solely from within the metrics of that indicator. However, a few indicators have external predictors, either data from elsewhere in the InCiSE model or from an external data source. Full details of the imputation approach for each indicator are described in Chapters 3-14.

3.5 Data normalisation

As a result of coming from different sources, the underlying data that drives the InCiSE model has a variety of formats: some are proportions or scores from 0 to 1 or 0 to 100; some are ratings on a scale, or the average of ratings given by a set of assessors/survey participants; and some are counts. The different formats of these data are not easily comparable, and cannot be directly averaged together to produce a combined score. In order to facilitate the comparison and combination of data from different sources, the metrics are normalised so that they are all in a common format.

There are a number of normalisation techniques that could be used. A useful discussion of the different methods is provided in the OECD et al. (2008) Handbook on Constructing Composite Indicators. The InCiSE Index uses min-max normalisation at all stages, as this maintains the underlying distribution of each metric while providing a common scale of 0 to 1. The common scale is of particular benefit, as it helps achieve InCiSE’s goal of assessing relative performance. In the min-max normalisation 0 represents the lowest achieved score and 1 represents the highest achieved score. It is therefore important to note that scoring 0 on a particular metric, indicator or the index itself does not represent poor performance in absolute terms, nor does scoring 1 represent high performance in absolute terms. Rather the country is either the lowest or highest performing of the 38 countries selected.

The min-max normalisation operates via the following mathematical formula:

\[ m_c = \frac{x_c-x_{min}}{x_{max}-x_{min}} \]

For a metric for a given country its normalised score (\(m_c\)) is calculated as the difference of the country’s original score (\(x_c\)) from the metric’s minimum score (\(x_min\)) divided by the range of the metric’s scores (the difference of the metric’s maximum score (\(x_max\)) from the metric’s minimum score (\(x_min\)).

3.6 Calculation of the InCiSE indicators

Once the data has been processed, missing data imputed, and the metrics normalised, the InCiSE indicators can be calculated. There are two stages to the calculation of the indicators: the weighting of the metrics into an aggregate score, and the normalisation of that score.

As outlined in Figure 2.2, the InCiSE data model first groups metrics into themes before aggregating into the indicator scores themselves. These themes are purely structural and scores for them are not computed. The raw score for an indicator follows this formula:

\[ i_c = \sum{(m_{i,c}*w_m*w_t)} \]

A country’s raw score for an indicator (\(i_c\)) is calculated as the sum of the product of each metric within the indicator for that country (\(m_{i,c}\)) with the weight of that metric within its theme (\(w_m\)) and the weight of that theme within the indicator (\(w_t\)). The weighting structure for each indicator are listed in detail in Chapters 3-14. After the raw scores are calculated they are normalised as described in Section 3.5 above.

3.7 Calculation of the InCiSE Index

The InCiSE Index is an aggregation of the InCiSE indicators. Ideally, the indicators would be combined equally, however in producing the 2017 Pilot edition the InCiSE Partners felt it important to consider relative data quality. In the 2017 Pilot this was done by placing a lower weight on the indicators measuring ‘attributes’ than those measuring’functions’, as the four attribute indicators were considered to generally have lower data quality than those measuring functions. The 2019 edition builds on this approach to weighting by using the results of the data quality assessment (section Section 3.2).

For this approach to weighting, two-thirds of the weighting is allocated on an equal basis, while one third is allocated according to the outcome of the data quality assessment. The weight for an indicator is calculated as follows:

\[ w_i = \left(\frac{2}{3}*\frac{1}{n_i}\right) + \left(\frac{1}{3}*Q_i\right) \]

Here the indicator weight (\(w_i\)) is equal to the product of two-thirds and the equal share (1 divided by \(n_i\), the number of indicators; i.e. 1/12) plus the product of one-third and the data quality weight for the indicator (\(Q_i\)). The data quality weight is calculated first by summing the data quality scores of the 38 selected countries for the indicator. The indicator’s data quality sum is then divided by the sum of all indicator data quality scores, in essence providing a score that represents that indicator’s share of the total data quality for the 38 countries selected. The resulting weights are shown in Table Table 3.5.

A country’s overall raw index score (\(I_c\)) is thus calculated as the sum of the product of the normalised indicator scores for the country (\(i_c\)) with the indicator weights (\(w_i\)):

\[ I_c = \sum{(i_c * w_i)} \]

After calculating the raw index scores, they are then are normalised as outlined in Section 3.5, resulting in the overall index scores for the 2019 edition of InCiSE.

Table 3.5: InCiSE 2019 indicator weightings
InCiSE indicator	Sum of data quality scores	Share of total data quality scores	Final weight	Approximate fraction
Capabilities	9.271	3.1%	6.6%	1/15
Crisis & risk management	23.967	7.9%	8.2%	1/12
Digital services	16.855	5.6%	7.4%	1/13
Fiscal and financial management	29.763	9.9%	8.8%	1/11
HR management	24.332	8.1%	8.2%	1/12
Inclusiveness	25.188	8.3%	8.3%	1/12
Integrity	19.995	6.6%	7.8%	1/13
Openness	31.100	10.3%	9.0%	1/11
Policy making	38.000	12.6%	9.8%	1/10
Procurement	19.500	6.5%	7.7%	1/13
Regulation	34.510	11.4%	9.4%	1/11
Tax administration	29.269	9.7%	8.8%	1/11
Overall	301.749	100.0%	100.0%
Table 2.7.A in the original PDF publication

3.8 Data quality considerations

Sections Section 3.3 and Section 3.7 illustrate how the data quality assessment described in section Section 3.2 is used within the InCiSE model for country selection and indicator weighting.

The assessment can also be used to help interpret the results of the InCiSE Index, both in terms of the quality of the indicators and for country results.

3.8.1 Quality of indicators

The data quality assessment conducts three checks for each indicator: the availability of metrics, the (non-)use of wider public sector data as a proxy, and the recency of the data. Table 3.6 summarises the results of these three checks for each of the indicators.

As discussed in sections Section 3.3 and Section 3.4 there are four indicators where at least one country is missing all data for the indicator. Conversely, there is only one indicator (policy making) where all 38 countries have all data available. When it comes to the use of public sector proxy data, there are six indicators where all the data is not a public sector proxy, giving the indicators a maximum proxy data score of 1, and only two indicators (capabilities and digital services) where all the data relates to the civil service and is not public sector proxy which means their maximum proxy score is 0. The recency calculation is a relative assessment where the oldest data (2012) scored 0 and the most recent data (2018) scored 1 – here we see that only one indicator (policy making) is composed solely of 2018 data and again only one indicator (capabilities) is composed solely of 2012 data.

Table 3.6: Summary of data quality metadata for the 38 countries of the InCiSE 2019 Index
InCiSE indicator	Data availability		Public sector proxy		Recency of data		Overall DQA score		Countries with max DAQ score	Mean DQA score	RAG rating
InCiSE indicator	Min	Max	Min	Max	Min	Max	Min	Max	Countries with max DAQ score	Mean DQA score	RAG rating
Capabilities	0.00	1	0.00	0.00	0.00	0.00	0.00	0.33	25	0.244
Crisis & risk management	0.00	1	0.00	1.00	0.00	0.56	0.00	0.85	18	0.631
Digital services	0.00	1	0.00	0.00	0.00	0.74	0.00	0.58	29	0.444
Fiscal & financial management	0.40	1	0.50	1.00	0.42	0.67	0.44	0.89	19	0.783
HR management	0.60	1	0.00	0.44	0.28	0.57	0.29	0.67	34	0.640
Inclusiveness	0.63	1	0.20	0.60	0.30	0.57	0.38	0.72	30	0.663
Integrity	0.78	1	0.00	0.18	0.43	0.53	0.40	0.57	14	0.526
Openness	0.30	1	0.30	1.00	0.25	0.78	0.28	0.93	22	0.818
Policy making	1.00	1	1.00	1.00	1.00	1.00	1.00	1.00	38	1.000
Procurement	0.00	1	0.00	0.50	0.00	0.67	0.00	0.72	18	0.513
Regulation	0.35	1	0.33	1.00	0.33	0.89	0.34	0.96	34	0.908
Tax administration	0.50	1	0.33	1.00	0.22	0.56	0.35	0.85	24	0.770
Table 2.8.A in the original PDF publication

Mean DQA ≥ 0.75
Mean DQA 0.75-0.25
Mean DQA < 0.25

We can also see in Table 3.6 that there is noticeable variation in the number of countries that achieve the maximum overall data quality score for each indicator. For policy making all 38 countries score achieve the maximum score, while for integrity only 14 countries achieve the maximum score.

Besides integrity, three other indicators (crisis & risk management, fiscal & financial management, and procurement) have less than 20 countries achieving the maximum score, while three indicators besides policy making have more than 30 countries achieving the maximum score (HR management, inclusiveness, and regulation).

The indicator data quality scores can also be used to create a data-driven red-amber-green (RAG) rating for data quality. Using the mean overall data quality scores for each indicator from the 38 countries selected for the 2019 edition of InCiSE, a ‘green’ rating is assigned to those with a score of 0.75 or higher, ‘amber’ to those with a score between 0.25 and 0.75, and ‘red’ to those with a score below 0.25.

However, the data quality assessment does not consider the reliability and validity of each indicator’s construction and therefore says nothing on how well the indicator represents the concept it is trying to measure. Instead, these data-driven RAG ratings can be combined with a subjective assessment of wider data quality concerns to make an overall assessment of the general ‘quality’ of each indicator. Table 3.7 shows the data quality assessment of each indicator alongside a high-level qualitative assessment of the indicator and a ‘final’ subjective RAG rating for the indicator.

Table 3.7: Overall quality assessment ‘RAG’ rating of the 2019 InCiSE indicators
InCiSE indicator	Mean DQA score	Number of metrics	High-level assessment of the reliability and validity of the indicator construction
Policy making	1.000	8	The indicator uses a wide range of metrics that give a broad overview of the concept, however these come from a single source relying on external expert perception.
Regulation	0.908	3	The indicator contains a number of metrics which appear to give a detailed overview of the concept.
Openness	0.818	10	The indicator uses a large number of metrics from a wide range of sources that give a broad overview of the concept.
Fiscal & financial management	0.783	6	The indicator contains a number of metrics which appear to give a detailed overview of the concept.
Tax administration	0.770	6	The indicator has a small number of metrics that give an overview of some aspects of the concept.
Inclusiveness	0.663	5	The indicator has only a small number of metrics which only provide a partial picture of performance across the concept.
HR management	0.640	9	The indicator's metrics give an overview of some aspects of the concept, but several metrics are dependent on external perceptions and public sector proxy data.
Crisis & risk management	0.631	13	The indicator contains a wide range of metrics which provide a broad overview of the concept, however one of the two data sources focuses solely on natural disaster risk management.
Integrity	0.536	17	The indicator has a large number of metrics that give a broad overview of the concept, however it relies heavily on external expert perceptions.
Procurement	0.513	6	The indicator has a small number of metrics that give an overview of some aspects of the concept.
Digital services	0.444	13	The indicator relies on a number of metrics from a single source which gives an overview of some aspects of the concept and relies on public sector proxy data.
Capabilities	0.244	14	While the indicator has a large number of metrics, these are all drawn from a public sector proxy and date between 2012-2015.
IT for officials			No data available: indicator not measured.
Innovation			No data available: indicator not measured.
Internal finance			No data available: indicator not measured.
Social security administration			The social security administration indicator has been depreciated following an in-depth review.
Staff engagement			No data available: indicator not measured.
Table 2.8.B in the original PDF publication

Green rating icon
Amber rating icon
Red rating icon
X rating icon

Five of the indicators have a mean data quality score of 0.75 or higher, earning them an initial ‘green’ rating. Of these indicators, three retain their green rating after wider considerations of the quality of the indicators are taken into account, meaning that these indicators are considered to provide broad and robust coverage of their respective concepts. Two of the five are demoted from green to amber, reflecting concerns about whether the indicators are sufficiently broad.

Six of the indicators have an initial ‘amber’ rating. Five of these indicators retain their rating, meaning they may only provide partial coverage of the underlying concept or be heavily reliant on one particular data source or type of data. One of the six is demoted from amber to red, reflecting concerns that the indicator provides limited coverage of the underlying concept.

One indicator has an initial ‘red’ rating, which is driven largely by its lack of recent data and being solely composed of public sector proxy data. Finally, the social security function, which was included in the 2017 Pilot, is given a ‘red’ rating following its removal from the 2019 edition of InCiSE due to data quality concerns. This change is discussed further in Chapter 16 and Chapter 18.

3.8.2 Quality of country-level results

Country-level data quality has already been considered to some degree, through the determination of country selection in Section 3.3. However, as with the quality of indicators, the results of the data quality assessment can be used to show the relative quality of the selected countries, which can help improve interpretation of the results of the InCiSE Index.

Table 3.8 presents a detailed overview of the data quality by country. Each country has been given an overall data quality letter “grade” based on its overall data quality score, and for each indicator each country has been given a “RAG” rating.

The overall data quality grades are allocated as follows based on a country’s data quality score rounded to 2 decimal places:

A+ for those countries that achieve the highest overall data quality assessment score (i.e. a data quality score of 0.75 when rounded to 2 decimal places)
A for countries with a data quality score greater than or equal to 0.7 but less than 0.75
B for countries with a data quality score greater than or equal to 0.65 but less than 0.7
C for countries with a data quality score greater than or equal to 0.6 but less than 0.65
D for countries with a data quality score greater than or equal to 0.5 but less than 0.6

For the indicators, a four category “RAG+” rating system is adopted. The data quality scores have been normalised (using min-max normalisation) by indicator:

A ‘green’ rating is given to those countries with a normalised indicator data quality score of 1 – the country has the best possible data for this indicator.
An ‘amber’ rating is given to those countries with a normalised indicator data quality score of greater than or equal to 0.5 – the country’s data quality is at least half as good as the ‘best’ possible data for that indicator.
A ‘red’ rating is given to those countries with a normalised indicator data quality score of less than 0.5 – the country’s data quality is less than half as good as the’best’ possible data for that indicator.
An ‘X’ rating is given to those countries which have no data at all for that metric – that all of the country’s scores for the metrics in that indicator have been imputed.

Table 3.8: Data quality scores by indicator and country
Country	Overall data quality score	Data quality grade	Percent of metrics available
GBR	0.757	A+	100%
ITA	0.755	A+	99%
POL	0.755	A+	99%
SWE	0.755	A+	99%
NOR	0.752	A+	99%
SVN	0.750	A	99%
AUT	0.738	A	98%
FIN	0.736	A	97%
ESP	0.733	A	97%
NLD	0.731	A	98%
FRA	0.718	A	97%
PRT	0.716	A	85%
DNK	0.707	A	93%
DEU	0.701	A	96%
GRC	0.696	B	94%
SVK	0.692	B	93%
HUN	0.671	B	81%
EST	0.669	B	90%
CZE	0.659	B	90%
TUR	0.650	C	90%
MEX	0.648	C	73%
NZL	0.644	C	83%
CHL	0.643	C	79%
CAN	0.638	C	78%
KOR	0.636	C	78%
BEL	0.635	C	85%
LVA	0.628	C	75%
CHE	0.627	C	79%
AUS	0.618	C	71%
LTU	0.615	C	82%
IRL	0.614	C	84%
JPN	0.597	D	75%
USA	0.579	D	74%
ISR	0.578	D	72%
ISL	0.563	D	68%
ROU	0.529	D	66%
BGR	0.511	D	66%
HRV	0.501	D	65%
Table 2.3.C in the original PDF publication

Green rating icon
Amber rating icon
Red rating icon
X rating icon

Table 3.8 reveals interesting patterns in data quality:

Six countries are given an “A+” rating – one has full data for all indicators (i.e. all indicator rated ‘green’), while the other five have just one indicator where they have an ‘amber’ rating.
Eight countries achieve an “A” rating – they have generally good coverage of data but typically have two or three indicators rated ‘amber’ or ‘red”, only one country has an indicator where all data for that indicator has been imputed (rated’grey’).
Seven countries achieve a “B” rating for data quality – these countries have a greater degree of ‘amber’ and ‘red’ rated indicators, typically four. All but one country has at least one ‘red’ rated indicator, one country has one indicator fully imputed while another has two indicators fully imputed.
Ten countries achieve a “C” rating for data quality – all countries have at least one ‘red’ rated indicator and eight of the countries have at least one indicator fully imputed.
Seven countries achieve a “D” rating for data quality – all countries both have at least one indicator fully imputed and one indicator rated ‘red’, four countries have at least four indicators rated ‘red’.

3.9 Comparisons over time

The InCiSE project is still in its infancy, and the methodology for the 2019 Index has built substantially on the foundations of the 2017 Pilot – most of the metrics used in the 2017 Pilot have continued to be used in the 2019 edition. Of the 70 metrics in the 2017 Pilot that are directly comparable to the 2019 edition, 33 have since had updates which are incorporated into the model.

In addition to the 70 metrics carried over from the 2017 Pilot, a further 46 metrics have been incorporated into the InCiSE methodology, bringing the total number of metrics for the 2019 model to 116. Most of these additional metrics (30) are from existing sources. Some have been collected multiple times, but some are new and have no previous data collection. Changes are summarised in Chapter 16.

A further consideration for comparisons over time is the need to deal with different reference dates and frequencies of updating.

Some data is updated on an annual basis while others are on two-year, three-year, or longer update cycles. For example, the data for capabilities has not been updated since it was first collected in 2012. These differing cycles are the function of a variety of different factors, such as an appreciation of the pace of change within a given topic area or the funding and resourcing of the data producers.

As outlined in Section 3.4, the InCiSE model uses imputation methods which use statistical techniques to provide an estimate of a country’s missing data. While the imputation is based on predictive methods, it is not a firm prediction of what a given country would have scored, but better understood as indicative. The imputation methods may change between years, and the relationships in the observed data (from which the imputation is drawn) may also change, limiting the reliability of comparing data imputed in one year with data imputed in another year.

It may also be the case that at one time point a country did not have data for a given metric but then has data at a later time point (or vice versa). This would mean that for one point the metrics would have been imputed.

Comparing a score based on ‘real’ data with one based on imputed estimates is unlikely to be reliable. In addition, as the methodology for InCiSE develops, future versions of the InCiSE Index could adopt back-/forward-casting (i.e. using results from different time points) to improve the quality of the imputation methods. This would also make time-series comparison more complicated or less feasible.

Finally, consideration should be given to the changing country composition. The 2017 Pilot covered 31 countries, while the 2019 edition covers 38 countries. As outlined in section Section 3.5, the data is normalised so that country scores are relative to the group of countries selected. This again means it is not possible to directly compare scores from one edition of InCiSE to another as the scores are related to the specific data range and country set used for that edition.

As a result of these varied challenges, the InCiSE Partners have decided not to include any comparisons between the 2017 Pilot and the 2019 edition of the InCiSE Index.

Furthermore, the Partners strongly advise against any direct or indirect comparisons being made beyond references to changes in the underlying source data itself (i.e. before the data is imported into the InCiSE data model, processed, imputed and normalised).