2  Methodology of the InCiSE index

As outlined in Chapter 1, the InCiSE Index is a composite index formed from a series of indicators, each of which is comprised of a set individual metrics. The overall Index is the normalised and weighted average of the scores of the constituent InCiSE indicators. The InCiSE indicators are themselves normalised weighted averages of their individual metrics. The calculation and modelling process to produce the Index is as follows:

  1. Data processing:
    1. Data preparation Section 2.1
    2. Data quality assessment Section 2.2
    3. Country coverage selection Section 2.3
    4. Imputation of missing data Section 2.4
    5. Data normalisation Section 2.5
  2. Calculation of the InCiSE indicators Section 2.6:
    1. Raw score calculated as a weighted average of the individual metrics
    2. Raw score normalised to produce final indicator score
  3. Calculation of the InCiSE Index Section 2.7:
    1. Raw score calculated as a weighted average of the indicator scores
    2. Raw score normalised to produce final Index score

This chapter outlines the methodology for each of these different stages, and finishes with a discussion of key data quality considerations in Section 2.8 and comparisons over time in Section 2.9, while Chapters 3-14 provide details on the specific methodology of each of the InCiSE indicators.

2.1 Data preparation

The data for InCiSE comes from a wide range of independent sources, such as the UN’s E-Government Survey, Transparency International’s Global Corruption Barometer, and Bertelsmann’s Sustainable Governance Indicators (SGIs).1 The InCiSE partnership does not produce any of the source data itself or engage in primary data collection.

  • 1 A full list of data sources can be found in the References section at the end of this report.

  • The data for the 2019 edition of InCiSE is the latest available as of 30 November 2018. As well as the source metrics some additional data are collected to aid in the imputation of missing data – this data does not directly contribute to the scores and therefore is not included in the published results.

    Some of the source data requires processing before it is suitable for use in the InCiSE calculations and modelling. For example:

    • Binary/multiple categorical data: some of the source data are binary measures (e.g. yes/no questions) or assess multiple categories (e.g. groups subject to whistleblower protection). In most cases this type of data is summed.

    • Individual level microdata: InCiSE uses a custom analysis of the Programme for the International Assessment of Adult Competencies (PIAAC) individual-level microdata to produce country scores. The Opentender data on procurement is on individual contracts, which also requires analysis to produce country scores.

    • Negatively framed data: Some of the source data is based on negatively framed questions, where a higher score is poorer performance than a lower score. To align with other metrics, this data is inverted so that higher scores relate to better performance than lower scores.

    • Calculations against reference data: For the inclusiveness indicator, women’s representation in the civil service/public sector is compared to the labour market in general. Tax administration from the OECD is published as raw data. InCiSE uses rates based on these data which must therefore be calculated.

    Chapters 3-14 outline the underlying source data for each of the indicators, and covers the specific transformations that are applied to the source data. Appendix A outlines the construction and calculation of the composite metrics (metrics calculated from more than a single data point in the original source) that are included in some of the indicators.

    When importing data to the InCiSE model, data is matched against a reference list of 249 countries and territories produced by Arel-Bundock et al. (2018) using the 3-digit ISO 3166-1 alphanumeric codes. Some source data natively uses the 3-digit ISO country codes, but some use the 2-digit ISO code, another code system, or a name of the territory (either the official long/short name, or colloquial name). Therefore, as part of data preparation, all country references are converted to the 3-digit ISO country code.

    2.2 Data quality assessment

    In order to provide a clearer understanding of the quality of the InCiSE Index, a data quality assessment has been calculated and published alongside the 2019 edition. This assessment has a dual role: it is an important piece of metadata that will help users of the InCiSE Index better understand the results, but it has also been used to determine the country coverage of the InCiSE Index. This section describes the method for conducting the data quality assessment. The use of the assessment for country selection and weighting are discussed in sections Section 2.3 and Section 2.7 respectively, while a wider discussion of data quality based on the results of the assessment is provided at section Section 2.8.

    The data quality assessment is a purely quantitative exercise based on three factors: data availability, the (non-)use of public sector proxy data, and the recency of the data. The assessment does not include any subjective evaluation of the methodology or the quality of the data sources that the underlying data used by InCiSE comes from.

    The data quality assessment also does not incorporate assessments of the reliability or validity of indicator and index construction. Its purpose is to provide an assessment of easily quantifiable characteristics of the data, which can help interpretation of the InCiSE results for countries and of the indicators.

    The simple mean of the three measures is taken as the data quality score for each country for each indicator. The 12 overall indicator quality scores are then combined as a simple mean score to produce an overall data quality assessment for each country.

    For each indicator, the data quality assessment is based on three measures: (1) the proportion of metrics with data; (2) the proportion of metrics that have civil service specific data; and (3) the recency of the data. All three measures take a simple assessment of whether data is missing or present as their basis. However, each measure has different weighting rules for the data:

    • Data availability: A missing data point fora metric with a within-indicator weight of 15% will give a greater penalty than a missing data point for a metric with a within-indicator weight of 5%.
    • Civil service data (1) or a public sector proxy (0): Data points that come from public sector data are treated as equivalent to being missing.
    • Recency of the data: The reference year of the metric is scaled from 0 (for 2012, the earliest year) to 1 (for 2018, the latest year) and used as the weighting.2
  • 2 For example a datapoint with a reference year of 2013 will be weighted 0.1667, while one with a reference year of 2016 will be weighted 0.6667.

  • The country indicator data quality scores and overall data quality assessment (\(DQA_{c,i}\)) for a given country (\(c\)) and indicator (\(i\)) is calculated by multiplying the missing data matrix of the metrics in the indicator for that country (\(d_{c,i}\)) by each of: the within indicator weighting for the metrics in the indicator (\(m_i\)), the proxy data status of each metric in the indicator (\(s_i\)), and the recency of each metric in the indicator (\(r_i\)). The resulting products are summed and divided by three to give the mean data quality for that country and indicator.

    \[ DQA_{c,i} = \frac{{(d_{c,i} * m_i) + (d_{c,i} * s_i) + (d_{c,i} * r_i)}}{3} \]

    The overall data quality indicator for a country (\(DQA_c\)) is then calculated as the sum of data quality assessment scores of that country for each indicator divided by the number of indicators (\(n_i\)).

    \[ DQA_c = \frac{\sum{DQA_{c,i}}}{n_i} \]

    The data quality assessment scores therefore have a theoretical range from 0 to 1. Where 0 represents there being no metrics available and 1 represents there being data for all metrics, with all data representing the civil service (i.e. not using a public-sector proxy) and all data relating to the latest available year. Table 2.1 illustrates the complex picture of data quality across all countries and indicators.

    The table shows how maximum data quality varies from 0.333 for capabilities, where the available data is for a public sector proxy and the oldest data in the model, to 1.000 for policy making, where all the available data relates to the civil service and is at the latest available data.

    The indicators for openness, fiscal & financial management and crisis & risk management have good data quality (DQA score greater than or equal to 0.5) for a very large number of countries. Other indicators (such as HR management or tax administration) have a moderate number of countries with good data quality, but have a large number of countries with poorer data quality. Finally, some indicators (such as digital services or policy making) have data for only a small number of countries, which is typically due to the source data covering only OECD or EU members (or both).

    Table 2.1: Data quality assessment (DQA) results across the 12 InCiSE indicators and overall, for all 249 countries and territories considered by the InCiSE data model
    Indicator Highest country DQA score Distribution of country DQA scores
    DQA ≥ 0.5 0.5 > DQA > 0 DQA = 0
    Capabilities 0.333 0 31 218
    Crisis & risk management 0.855 95 13 141
    Digital services 0.581 34 0 215
    Fiscal & financial management 0.889 109 88 52
    HR management 0.673 37 83 129
    Inclusiveness 0.722 34 82 133
    Integrity 0.569 30 127 92
    Openness 0.928 105 93 51
    Policy making 1.000 41 0 208
    Procurement 0.722 20 24 205
    Regulation 0.963 38 5 206
    Tax administration 0.852 46 141 62
    Overall data quality assessment 0.757 38 162 49
    Table 2.2.A in the original PDF publication

    2.3 Country coverage selection

    For the 2017 Pilot edition of the InCiSE Index only two countries had data for all 76 metrics, and a simple threshold of 75% data availability plus membership of the OECD were used as the selection criteria for country availability. However, analysis of the pilot showed (as Table 2.1 shows) that there is a mixed picture of data availability and quality across indicators which is not reflected in this simple threshold. The data quality assessment outlined in Section 2.2 provides a more nuanced way to consider the variation of data availability and quality, and is therefore used to determine which countries are included in the final version of the index for the InCiSE 2019.

    In determining country coverage, the InCiSE Partners have decided to use an overall data quality assessment score of 0.5 or greater for the threshold for country inclusion. 38 countries reached this score. Although two further countries would be included if data quality scores were rounded to 1 decimal place, these two countries have lower data availability (57% and 51% of all metrics respectively), which is judged to be too low for reliable analysis. Therefore, the 38 countries with a data quality score of 0.5 or higher (when rounded to 2-decimal places) are included in the 2019 edition of the InCiSE Index. This includes all 31 countries covered by the InCiSE pilot.

    Table 2.2 provides an overview of the country-level data quality scores for the group of 38 countries. The table shows that for most indicators the 38 countries have generally good data quality. However, for four indicators (capabilities, crisis & risk management, digital services and procurement) there are a small number of countries with no available data at all.

    Table 2.3 provides a summary of the data quality assessment for all 38 countries selected for the 2019 edition of InCiSE, Table 2.4 provides the assessment for the five countries with the next highest data quality score. One country (the United Kingdom) achieved the highest overall data quality score of 0.757, followed closely by five others (Italy, Poland, Sweden, Norway and Slovenia). Countries included for the first time in the 2019 edition of the Index are flagged with the “[new]” marker next to their country name in Table 2.3.

    Further discussion on data quality issues are provided at the end of this chapter in section Section 2.8, covering both the quality of the indicators and interpretation of country level results from the InCiSE Index.

    Table 2.2: Data quality assessment (DQA) results for the 38 countries included in the 2019 index
    Indicator Lowest country DQA score Highest country DQA score Mean country DQA score Distribution of country DQA scores
    DQA ≥ 0.5 0.5 > DQA > 0 DQA = 0
    Capabilities 0.000 0.333 0.244 0 38 10
    Crisis & risk management 0.000 0.855 0.631 26 12 1
    Digital services 0.000 0.581 0.444 29 9 9
    Fiscal & financial management 0.439 0.889 0.783 37 1 0
    HR management 0.293 0.673 0.640 35 3 0
    Inclusiveness 0.375 0.722 0.663 33 5 0
    Integrity 0.402 0.569 0.526 29 9 0
    Openness 0.283 0.928 0.818 35 3 0
    Policy making 1.000 1.000 1.000 38 0 0
    Procurement 0.000 0.722 0.513 20 18 2
    Regulation 0.339 0.963 0.908 35 3 0
    Tax administration 0.352 0.852 0.770 34 4 0
    Overall data quality 0.501 0.757 0.662 38 0 0
    Table 2.3.A in the original PDF publication
    Table 2.3: Data quality assessment (DQA) results by country
    Code Country Overall DQA score Percent of metrics available Number of indicators where: 0.5 > DQA > 0 Indicators with completely missing data (DQA = 0)
    Number Indicators
    GBR United Kingdom 0.757 100% 1 0
    ITA Italy 0.755 99% 1 0
    POL Poland 0.755 99% 1 0
    SWE Sweden 0.755 99% 1 0
    NOR Norway 0.752 99% 1 0
    SVN Slovenia 0.750 99% 1 0
    AUT Austria 0.738 98% 1 0
    FIN Finland 0.736 97% 2 0
    ESP Spain 0.733 97% 1 0
    NLD The Netherlands 0.731 98% 1 0
    FRA France 0.718 97% 2 0
    PRT Portugal 0.716 85% 1 1 CAP
    DNK Denmark 0.707 93% 2 0
    DEU Germany 0.701 96% 2 0
    GRC Greece 0.696 94% 2 0
    SVK Slovakia 0.692 93% 1 0
    HUN Hungary 0.671 81% 1 1 CAP
    EST Estonia 0.669 90% 2 0
    CZE Czechia 0.659 91% 3 0
    TUR Turkey 0.650 90% 4 0
    MEX Mexico 0.648 73% 3 2 CAP, DIG
    NZL New Zealand 0.644 83% 4 1 DIG
    CHL Chile 0.643 79% 4 1 DIG
    CAN Canada 0.638 78% 4 1 DIG
    KOR Republic of Korea 0.636 78% 4 1 DIG
    BEL Belgium 0.635 85% 3 1 CRM
    LVA Latvia [new] 0.628 75% 2 1 CAP
    CHE Switzerland 0.627 79% 2 1 CAP
    AUS Australia 0.618 71% 3 3 CAP, DIG, PRO
    LTU Lithuania [new] 0.615 82% 5 0
    IRL Ireland 0.614 84% 4 0
    JPN Japan 0.597 75% 5 1 DIG
    USA United States of America 0.579 74% 4 2 DIG, PRO
    ISR Israel [new] 0.578 72% 5 1 DIG
    ISL Iceland [new] 0.563 68% 5 1 CAP
    ROU Romania [new] 0.529 66% 5 1 CAP
    BGR Bulgaria [new] 0.511 66% 6 1 CAP
    HRV Croatia [new] 0.501 65% 6 1 CAP
    NA Mean of 38 countries 0.635 82% 3 1
    Table 2.3.B in the original PDF publication
    Table 2.4: Data quality assessment (DQA) results by country for the next countries after the 38 selected for inclusion in the InCiSE 2019 model
    Code Country Overall DQA score Percent of metrics available Number of indicators where: 0.5 > DQA > 0 Indicators with completely missing data (DQA = 0)
    Number Indicators
    COL Columbia 0.471 57% 6 3 CAP, DIG, POL
    LUX Luxembourg 0.460 51% 7 2 CAP, INC
    CYP Cyprus 0.435 64% 9 1 CRM
    CRI Costa Rica 0.417 48% 7 3 CAP, DIG, POL
    MLT Malta 0.375 49% 9 2 CAP, CRM
    Table 2.3.B in the original PDF publication

    2.4 Imputation of missing data

    As seen in Table Table 2.3 only one country has complete data (i.e. 100% of metrics). The average level of data availability is 86% across the 38 countries, and 7 of the included countries have data availability below the 75% threshold used for the 2017 Pilot, with the lowest level of data availability being 65%. Of the 38 countries, 15 have one indicator with a data quality score of 0 (i.e. no data at all for that indicator), two countries have two indicators with a data quality score of 0 and one country has three indicators with a data quality score of 0.

    This presents issues for the analysis of the data and providing an effective method for aggregating the metrics into indicators and an overall index. The 2017 Pilot edition of InCiSE adopted two methods for imputation: multiple imputation using linear regression and median imputation. For the 2019 edition of InCiSE a decision has been made to move fully to a multiple imputation approach, using the ‘predictive mean matching’ (PMM) technique of van Buuren & Groothuis-Oudshoorn (2011). The PMM technique uses correlation – of both the values and pattern of missing data – to identify for a country with missing data those countries in the dataset that closely match it, and randomly select one of those to replace the missing value. Following the approach set out by van Buuren (2018), for each missing value 15 imputations are generated (each of which has also been iterated 15 times). A simple mean of these 15 imputation values is then calculated and used as the country’s value in the ‘final’ dataset.

    Imputation is handled on a per-indicator basis – in most cases imputation will be solely from within the metrics of that indicator. However, a few indicators have external predictors, either data from elsewhere in the InCiSE model or from an external data source. Full details of the imputation approach for each indicator are described in Chapters 3-14.

    2.5 Data normalisation

    As a result of coming from different sources, the underlying data that drives the InCiSE model has a variety of formats: some are proportions or scores from 0 to 1 or 0 to 100; some are ratings on a scale, or the average of ratings given by a set of assessors/survey participants; and some are counts. The different formats of these data are not easily comparable, and cannot be directly averaged together to produce a combined score. In order to facilitate the comparison and combination of data from different sources, the metrics are normalised so that they are all in a common format.

    There are a number of normalisation techniques that could be used. A useful discussion of the different methods is provided in the OECD et al. (2008) Handbook on Constructing Composite Indicators. The InCiSE Index uses min-max normalisation at all stages, as this maintains the underlying distribution of each metric while providing a common scale of 0 to 1. The common scale is of particular benefit, as it helps achieve InCiSE’s goal of assessing relative performance. In the min-max normalisation 0 represents the lowest achieved score and 1 represents the highest achieved score. It is therefore important to note that scoring 0 on a particular metric, indicator or the index itself does not represent poor performance in absolute terms, nor does scoring 1 represent high performance in absolute terms. Rather the country is either the lowest or highest performing of the 38 countries selected.

    The min-max normalisation operates via the following mathematical formula:

    \[ m_c = \frac{x_c-x_{min}}{x_{max}-x_{min}} \]

    For a metric for a given country its normalised score (\(m_c\)) is calculated as the difference of the country’s original score (\(x_c\)) from the metric’s minimum score (\(x_min\)) divided by the range of the metric’s scores (the difference of the metric’s maximum score (\(x_max\)) from the metric’s minimum score (\(x_min\)).

    2.6 Calculation of the InCiSE indicators

    Once the data has been processed, missing data imputed, and the metrics normalised, the InCiSE indicators can be calculated. There are two stages to the calculation of the indicators: the weighting of the metrics into an aggregate score, and the normalisation of that score.

    As outlined in Figure 1.2, the InCiSE data model first groups metrics into themes before aggregating into the indicator scores themselves. These themes are purely structural and scores for them are not computed. The raw score for an indicator follows this formula:

    \[ i_c = \sum{(m_{i,c}*w_m*w_t)} \]

    A country’s raw score for an indicator (\(i_c\)) is calculated as the sum of the product of each metric within the indicator for that country (\(m_{i,c}\)) with the weight of that metric within its theme (\(w_m\)) and the weight of that theme within the indicator (\(w_t\)). The weighting structure for each indicator are listed in detail in Chapters 3-14. After the raw scores are calculated they are normalised as described in Section 2.5 above.

    2.7 Calculation of the InCiSE Index

    The InCiSE Index is an aggregation of the InCiSE indicators. Ideally, the indicators would be combined equally, however in producing the 2017 Pilot edition the InCiSE Partners felt it important to consider relative data quality. In the 2017 Pilot this was done by placing a lower weight on the indicators measuring ‘attributes’ than those measuring’functions’, as the four attribute indicators were considered to generally have lower data quality than those measuring functions. The 2019 edition builds on this approach to weighting by using the results of the data quality assessment (section Section 2.2).

    For this approach to weighting, two-thirds of the weighting is allocated on an equal basis, while one third is allocated according to the outcome of the data quality assessment. The weight for an indicator is calculated as follows:

    \[ w_i = \left(\frac{2}{3}*\frac{1}{n_i}\right) + \left(\frac{1}{3}*Q_i\right) \]

    Here the indicator weight (\(w_i\)) is equal to the product of two-thirds and the equal share (1 divided by \(n_i\), the number of indicators; i.e. 1/12) plus the product of one-third and the data quality weight for the indicator (\(Q_i\)). The data quality weight is calculated first by summing the data quality scores of the 38 selected countries for the indicator. The indicator’s data quality sum is then divided by the sum of all indicator data quality scores, in essence providing a score that represents that indicator’s share of the total data quality for the 38 countries selected. The resulting weights are shown in Table Table 2.5.

    A country’s overall raw index score (\(I_c\)) is thus calculated as the sum of the product of the normalised indicator scores for the country (\(i_c\)) with the indicator weights (\(w_i\)):

    \[ I_c = \sum{(i_c * w_i)} \]

    After calculating the raw index scores, they are then are normalised as outlined in Section 2.5, resulting in the overall index scores for the 2019 edition of InCiSE.

    Table 2.5: InCiSE 2019 indicator weightings
    InCiSE indicator Sum of data quality scores Share of total data quality scores Final weight Approximate fraction
    Capabilities 9.271 3.1% 6.6% 1/15
    Crisis & risk management 23.967 7.9% 8.2% 1/12
    Digital services 16.855 5.6% 7.4% 1/13
    Fiscal and financial management 29.763 9.9% 8.8% 1/11
    HR management 24.332 8.1% 8.2% 1/12
    Inclusiveness 25.188 8.3% 8.3% 1/12
    Integrity 19.995 6.6% 7.8% 1/13
    Openness 31.100 10.3% 9.0% 1/11
    Policy making 38.000 12.6% 9.8% 1/10
    Procurement 19.500 6.5% 7.7% 1/13
    Regulation 34.510 11.4% 9.4% 1/11
    Tax administration 29.269 9.7% 8.8% 1/11
    Overall 301.749 100.0% 100.0%
    Table 2.7.A in the original PDF publication

    2.8 Data quality considerations

    Sections Section 2.3 and Section 2.7 illustrate how the data quality assessment described in section Section 2.2 is used within the InCiSE model for country selection and indicator weighting.

    The assessment can also be used to help interpret the results of the InCiSE Index, both in terms of the quality of the indicators and for country results.

    2.8.1 Quality of indicators

    The data quality assessment conducts three checks for each indicator: the availability of metrics, the (non-)use of wider public sector data as a proxy, and the recency of the data. Table 2.6 summarises the results of these three checks for each of the indicators.

    As discussed in sections Section 2.3 and Section 2.4 there are four indicators where at least one country is missing all data for the indicator. Conversely, there is only one indicator (policy making) where all 38 countries have all data available. When it comes to the use of public sector proxy data, there are six indicators where all the data is not a public sector proxy, giving the indicators a maximum proxy data score of 1, and only two indicators (capabilities and digital services) where all the data relates to the civil service and is not public sector proxy which means their maximum proxy score is 0. The recency calculation is a relative assessment where the oldest data (2012) scored 0 and the most recent data (2018) scored 1 – here we see that only one indicator (policy making) is composed solely of 2018 data and again only one indicator (capabilities) is composed solely of 2012 data.

    Table 2.6: Summary of data quality metadata for the 38 countries of the InCiSE 2019 Index
    InCiSE indicator Data availability Public sector proxy Recency of data Overall DQA score Countries with max DAQ score Mean DQA score RAG rating
    Min Max Min Max Min Max Min Max
    Capabilities 0.00 1 0.00 0.00 0.00 0.00 0.00 0.33 25 0.244 Red
    Crisis & risk management 0.00 1 0.00 1.00 0.00 0.56 0.00 0.85 18 0.631 Amber
    Digital services 0.00 1 0.00 0.00 0.00 0.74 0.00 0.58 29 0.444 Amber
    Fiscal & financial management 0.40 1 0.50 1.00 0.42 0.67 0.44 0.89 19 0.783 Green
    HR management 0.60 1 0.00 0.44 0.28 0.57 0.29 0.67 34 0.640 Amber
    Inclusiveness 0.63 1 0.20 0.60 0.30 0.57 0.38 0.72 30 0.663 Amber
    Integrity 0.78 1 0.00 0.18 0.43 0.53 0.40 0.57 14 0.526 Amber
    Openness 0.30 1 0.30 1.00 0.25 0.78 0.28 0.93 22 0.818 Green
    Policy making 1.00 1 1.00 1.00 1.00 1.00 1.00 1.00 38 1.000 Green
    Procurement 0.00 1 0.00 0.50 0.00 0.67 0.00 0.72 18 0.513 Amber
    Regulation 0.35 1 0.33 1.00 0.33 0.89 0.34 0.96 34 0.908 Green
    Tax administration 0.50 1 0.33 1.00 0.22 0.56 0.35 0.85 24 0.770 Green
    Table 2.8.A in the original PDF publication
    • Green Mean DQA ≥ 0.75
    • Amber Mean DQA 0.75-0.25
    • Red Mean DQA < 0.25

    We can also see in Table 2.6 that there is noticeable variation in the number of countries that achieve the maximum overall data quality score for each indicator. For policy making all 38 countries score achieve the maximum score, while for integrity only 14 countries achieve the maximum score.

    Besides integrity, three other indicators (crisis & risk management, fiscal & financial management, and procurement) have less than 20 countries achieving the maximum score, while three indicators besides policy making have more than 30 countries achieving the maximum score (HR management, inclusiveness, and regulation).

    The indicator data quality scores can also be used to create a data-driven red-amber-green (RAG) rating for data quality. Using the mean overall data quality scores for each indicator from the 38 countries selected for the 2019 edition of InCiSE, a ‘green’ rating is assigned to those with a score of 0.75 or higher, ‘amber’ to those with a score between 0.25 and 0.75, and ‘red’ to those with a score below 0.25.

    However, the data quality assessment does not consider the reliability and validity of each indicator’s construction and therefore says nothing on how well the indicator represents the concept it is trying to measure. Instead, these data-driven RAG ratings can be combined with a subjective assessment of wider data quality concerns to make an overall assessment of the general ‘quality’ of each indicator. Table 2.7 shows the data quality assessment of each indicator alongside a high-level qualitative assessment of the indicator and a ‘final’ subjective RAG rating for the indicator.

    Table 2.7: Overall quality assessment ‘RAG’ rating of the 2019 InCiSE indicators
    InCiSE indicator Mean DQA score Number of metrics DQA-based RAG rating High-level assessment of the reliability and validity of the indicator construction Final RAG rating
    Policy making 1.000 8 Green The indicator uses a wide range of metrics that give a broad overview of the concept, however these come from a single source relying on external expert perception. Amber
    Regulation 0.908 3 Green The indicator contains a number of metrics which appear to give a detailed overview of the concept. Green
    Openness 0.818 10 Green The indicator uses a large number of metrics from a wide range of sources that give a broad overview of the concept. Green
    Fiscal & financial management 0.783 6 Green The indicator contains a number of metrics which appear to give a detailed overview of the concept. Green
    Tax administration 0.770 6 Green The indicator has a small number of metrics that give an overview of some aspects of the concept. Amber
    Inclusiveness 0.663 5 Amber The indicator has only a small number of metrics which only provide a partial picture of performance across the concept. Red
    HR management 0.640 9 Amber The indicator's metrics give an overview of some aspects of the concept, but several metrics are dependent on external perceptions and public sector proxy data. Amber
    Crisis & risk management 0.631 13 Amber The indicator contains a wide range of metrics which provide a broad overview of the concept, however one of the two data sources focuses solely on natural disaster risk management. Amber
    Integrity 0.536 17 Amber The indicator has a large number of metrics that give a broad overview of the concept, however it relies heavily on external expert perceptions. Amber
    Procurement 0.513 6 Amber The indicator has a small number of metrics that give an overview of some aspects of the concept. Amber
    Digital services 0.444 13 Amber The indicator relies on a number of metrics from a single source which gives an overview of some aspects of the concept and relies on public sector proxy data. Amber
    Capabilities 0.244 14 Red While the indicator has a large number of metrics, these are all drawn from a public sector proxy and date between 2012-2015. Red
    IT for officials x No data available: indicator not measured. x
    Innovation x No data available: indicator not measured. x
    Internal finance x No data available: indicator not measured. x
    Social security administration x The social security administration indicator has been depreciated following an in-depth review. x
    Staff engagement x No data available: indicator not measured. x
    Table 2.8.B in the original PDF publication
    • Green Green rating icon
    • Amber Amber rating icon
    • Red Red rating icon
    • X X rating icon

    Five of the indicators have a mean data quality score of 0.75 or higher, earning them an initial ‘green’ rating. Of these indicators, three retain their green rating after wider considerations of the quality of the indicators are taken into account, meaning that these indicators are considered to provide broad and robust coverage of their respective concepts. Two of the five are demoted from green to amber, reflecting concerns about whether the indicators are sufficiently broad.

    Six of the indicators have an initial ‘amber’ rating. Five of these indicators retain their rating, meaning they may only provide partial coverage of the underlying concept or be heavily reliant on one particular data source or type of data. One of the six is demoted from amber to red, reflecting concerns that the indicator provides limited coverage of the underlying concept.

    One indicator has an initial ‘red’ rating, which is driven largely by its lack of recent data and being solely composed of public sector proxy data. Finally, the social security function, which was included in the 2017 Pilot, is given a ‘red’ rating following its removal from the 2019 edition of InCiSE due to data quality concerns. This change is discussed further in Chapter 15 and Chapter 17.

    2.8.2 Quality of country-level results

    Country-level data quality has already been considered to some degree, through the determination of country selection in Section 2.3. However, as with the quality of indicators, the results of the data quality assessment can be used to show the relative quality of the selected countries, which can help improve interpretation of the results of the InCiSE Index.

    Table 2.8 presents a detailed overview of the data quality by country. Each country has been given an overall data quality letter “grade” based on its overall data quality score, and for each indicator each country has been given a “RAG” rating.

    The overall data quality grades are allocated as follows based on a country’s data quality score rounded to 2 decimal places:

    • A+ for those countries that achieve the highest overall data quality assessment score (i.e. a data quality score of 0.75 when rounded to 2 decimal places)
    • A for countries with a data quality score greater than or equal to 0.7 but less than 0.75
    • B for countries with a data quality score greater than or equal to 0.65 but less than 0.7
    • C for countries with a data quality score greater than or equal to 0.6 but less than 0.65
    • D for countries with a data quality score greater than or equal to 0.5 but less than 0.6

    For the indicators, a four category “RAG+” rating system is adopted. The data quality scores have been normalised (using min-max normalisation) by indicator:

    • A ‘green’ rating is given to those countries with a normalised indicator data quality score of 1 – the country has the best possible data for this indicator.

    • An ‘amber’ rating is given to those countries with a normalised indicator data quality score of greater than or equal to 0.5 – the country’s data quality is at least half as good as the ‘best’ possible data for that indicator.

    • A ‘red’ rating is given to those countries with a normalised indicator data quality score of less than 0.5 – the country’s data quality is less than half as good as the’best’ possible data for that indicator.

    • An ‘X’ rating is given to those countries which have no data at all for that metric – that all of the country’s scores for the metrics in that indicator have been imputed.

    Table 2.8: Data quality scores by indicator and country
    Country Overall data quality score Data quality grade Percent of metrics available CAP CRM DIG FFM HRM INC INT OPN POL PRO REG TAX
    GBR 0.757 A+ 100% Green Green Green Green Green Green Green Green Amber Green Green Green
    ITA 0.755 A+ 99% Amber Green Green Green Green Green Green Green Amber Green Green Green
    POL 0.755 A+ 99% Green Green Green Green Green Green Amber Green Amber Green Green Green
    SWE 0.755 A+ 99% Green Green Green Green Green Green Amber Green Amber Green Green Green
    NOR 0.752 A+ 99% Green Green Green Green Green Green Amber Green Amber Green Green Green
    SVN 0.750 A 99% Green Green Green Green Green Green Green Amber Amber Green Green Green
    AUT 0.738 A 98% Green Green Green Amber Green Green Amber Green Amber Green Green Green
    FIN 0.736 A 97% Green Green Green Amber Green Green Red Green Amber Green Green Green
    ESP 0.733 A 97% Amber Green Green Amber Green Green Amber Amber Amber Green Green Green
    NLD 0.731 A 98% Green Green Green Amber Green Green Green Green Amber Amber Green Green
    FRA 0.718 A 97% Amber Green Green Green Green Green Green Green Amber Red Green Green
    PRT 0.716 A 85% x Amber Green Green Green Green Amber Green Amber Green Green Green
    DNK 0.707 A 93% Green Amber Green Amber Green Green x Amber Amber Green Green Amber
    DEU 0.701 A 96% Green Green Green Green Green Red Green Green Amber Green Green x
    GRC 0.696 B 94% Green Green Green Amber Green Green Amber Green Amber Amber Green Red
    SVK 0.692 B 93% Green Amber Green Amber Green Green Amber Red Amber Green Green Amber
    HUN 0.671 B 81% x Amber Green Green Green Red Green Amber Amber Green Green Green
    EST 0.669 B 90% Green Red Green Amber Green Red Green Amber Amber Green Green Amber
    CZE 0.659 B 90% Green Amber Green Green Green x Amber Green Amber Red Green Green
    TUR 0.650 C 90% Green Green Green Amber Green Green Amber Green Amber Red Green x
    MEX 0.648 C 73% x Green x Green Green Green Green Green Amber Amber Green Amber
    NZL 0.644 C 83% Green Green x Green Amber x Amber Green Amber Amber Green Green
    CHL 0.643 C 79% Green Red x Green Green Green Green Green Amber Amber Green Green
    CAN 0.638 C 78% Green Red x Green Green Green Amber Green Amber Amber Green Green
    KOR 0.636 C 78% Green Red x Green Green Green Green Amber Amber Amber Green Green
    BEL 0.635 C 85% Green x Green Amber Green Green Green Green Amber Red Green Green
    LVA 0.628 C 75% x Red Green Red Green Green Amber Red Amber Green Green Green
    CHE 0.627 C 79% x Green Green Amber Green Green Green Red Amber Red Green Amber
    AUS 0.618 C 71% x Green x Green Green Green Amber Green Amber x Green Green
    LTU 0.615 C 82% Green Red Green x Green Green Red x Amber Green Green Green
    IRL 0.614 C 84% Green Red Green Amber Green Green Amber Red Amber Red Green Green
    JPN 0.597 D 75% Green Red x Green Green Green Amber Green Amber Red Green Red
    USA 0.579 D 74% Green Red x Green Green Green Green Green Amber x Amber Amber
    ISR 0.578 D 72% Green Red x Amber Green Green Red Red Amber Amber Green Amber
    ISL 0.563 D 68% x Red Green Red Green Green Red Red Amber Red Green Amber
    ROU 0.529 D 66% x Amber Green Red x x Red Amber Amber Green x Amber
    BGR 0.511 D 66% x Amber Green Red x x Red Amber Amber Red x Green
    HRV 0.501 D 65% x Amber Green Red x x Red Amber Amber Red x Amber
    Table 2.3.C in the original PDF publication
    • Green Green rating icon
    • Amber Amber rating icon
    • Red Red rating icon
    • X X rating icon

    Table 2.8 reveals interesting patterns in data quality:

    • Six countries are given an “A+” rating – one has full data for all indicators (i.e. all indicator rated ‘green’), while the other five have just one indicator where they have an ‘amber’ rating.
    • Eight countries achieve an “A” rating – they have generally good coverage of data but typically have two or three indicators rated ‘amber’ or ‘red”, only one country has an indicator where all data for that indicator has been imputed (rated’grey’).
    • Seven countries achieve a “B” rating for data quality – these countries have a greater degree of ‘amber’ and ‘red’ rated indicators, typically four. All but one country has at least one ‘red’ rated indicator, one country has one indicator fully imputed while another has two indicators fully imputed.
    • Ten countries achieve a “C” rating for data quality – all countries have at least one ‘red’ rated indicator and eight of the countries have at least one indicator fully imputed.
    • Seven countries achieve a “D” rating for data quality – all countries both have at least one indicator fully imputed and one indicator rated ‘red’, four countries have at least four indicators rated ‘red’.

    2.9 Comparisons over time

    The InCiSE project is still in its infancy, and the methodology for the 2019 Index has built substantially on the foundations of the 2017 Pilot – most of the metrics used in the 2017 Pilot have continued to be used in the 2019 edition. Of the 70 metrics in the 2017 Pilot that are directly comparable to the 2019 edition, 33 have since had updates which are incorporated into the model.

    In addition to the 70 metrics carried over from the 2017 Pilot, a further 46 metrics have been incorporated into the InCiSE methodology, bringing the total number of metrics for the 2019 model to 116. Most of these additional metrics (30) are from existing sources. Some have been collected multiple times, but some are new and have no previous data collection. Changes are summarised in Chapter 15.

    A further consideration for comparisons over time is the need to deal with different reference dates and frequencies of updating.

    Some data is updated on an annual basis while others are on two-year, three-year, or longer update cycles. For example, the data for capabilities has not been updated since it was first collected in 2012. These differing cycles are the function of a variety of different factors, such as an appreciation of the pace of change within a given topic area or the funding and resourcing of the data producers.

    As outlined in Section 2.4, the InCiSE model uses imputation methods which use statistical techniques to provide an estimate of a country’s missing data. While the imputation is based on predictive methods, it is not a firm prediction of what a given country would have scored, but better understood as indicative. The imputation methods may change between years, and the relationships in the observed data (from which the imputation is drawn) may also change, limiting the reliability of comparing data imputed in one year with data imputed in another year.

    It may also be the case that at one time point a country did not have data for a given metric but then has data at a later time point (or vice versa). This would mean that for one point the metrics would have been imputed.

    Comparing a score based on ‘real’ data with one based on imputed estimates is unlikely to be reliable. In addition, as the methodology for InCiSE develops, future versions of the InCiSE Index could adopt back-/forward-casting (i.e. using results from different time points) to improve the quality of the imputation methods. This would also make time-series comparison more complicated or less feasible.

    Finally, consideration should be given to the changing country composition. The 2017 Pilot covered 31 countries, while the 2019 edition covers 38 countries. As outlined in section Section 2.5, the data is normalised so that country scores are relative to the group of countries selected. This again means it is not possible to directly compare scores from one edition of InCiSE to another as the scores are related to the specific data range and country set used for that edition.

    As a result of these varied challenges, the InCiSE Partners have decided not to include any comparisons between the 2017 Pilot and the 2019 edition of the InCiSE Index.

    Furthermore, the Partners strongly advise against any direct or indirect comparisons being made beyond references to changes in the underlying source data itself (i.e. before the data is imported into the InCiSE data model, processed, imputed and normalised).