Beware data bias in AI models

Insurers should be aware of the risks of data bias associated with artificial intelligence (AI) models. Chris Halliday looks at some of these risks, particularly the ethical considerations and how an actuary can address these.

The use of advanced analytics techniques and machine learning models in insurance has increased significantly over the last few years. It’s an exciting time for actuaries and an opportunity to innovate. We have seen leading insurers in this area driving better insights and increasing predictive powers, ultimately leading to better performance.

Go deeper with GlobalData

Data Insights

The gold standard of business intelligence.

Find out more

Access deeper industry intelligence

Experience unmatched clarity with a single platform that combines unique data, AI, and human expertise.

Find out more

However, with every new technology comes new risks. With AI, such risks could be material in terms of regulatory implications, litigation, public perception, and reputation.

Why data bias in AI models matters

The ethical risks associated with data bias are not particular to just AI models, but data bias is more prevalent in AI models for a number of reasons. Firstly, AI models make predictions based on patterns in data without assuming any particular form of statistical distribution. Since these models learn from historical data, any biases present in the training data can be perpetuated by the AI systems. This can lead to biased outcomes and unfair treatment for certain groups or individuals.

For instance, a tech giant had to abandon the trial of a recruitment AI system when it was found to discriminate against women for technical roles. This turned out to be the result of training the model with a dataset spanning a number of years and since, historically, the majority of these roles were held by males, the algorithm undervalued applications from women.

Furthermore, AI models can inadvertently reinforce existing biases present in society or in existing practices. For example, if historical data reflects biased decisions made by humans, the AI model may learn and perpetuate those biases. This creates a feedback loop where biased AI outcomes further reinforce the existing biases. Non-AI models may be less susceptible to this feedback loop as they typically don’t have the ability to learn and adapt over time.

Secondly, AI models can process vast amounts of data at a fast rate, enabling them to make decisions and predictions on a large scale and in real-time. This amplifies the potential impact of biases present in the data if human oversight is missing or reduced.

Finally, AI models can be highly complex and opaque, making it challenging to understand how they arrive at decisions. This lack of transparency can make it difficult to detect and address biases within the models. In contrast, non-AI models, such as traditional rule-based systems or models based on statistical distributions, are often more transparent, allowing humans to directly inspect and understand the decision-making process.

Given these factors, data bias is a more critical concern in AI and addressing and mitigating data bias is crucial to ensure fair and ethical outcomes in AI models.

Different forms of data bias

Selection bias arises when certain samples are systematically overrepresented or underrepresented in the training data. This can occur if data collection processes inadvertently favour certain groups or exclude others. As a result, the AI model may be more accurate or effective for the overrepresented groups. Also, if the training data does not adequately capture the diversity of the target population, the AI model may not generalise well and could make inaccurate or unfair predictions. This might happen if, for example, an Asian health insurer bases its pricing on an AI model which has been trained predominantly on health metrics data from Western populations; the result will most likely not be accurate and fair.

Temporal bias refers to biases that emerge due to changes in societal norms, regulations, or circumstances over time. If the training data does not adequately represent the present reality or includes outdated information, the AI model may produce biased predictions or decisions that are not aligned with current regulatory and social dynamics.

If historical data contains discriminatory practices or reflects societal biases, the AI model may learn and perpetuate those biases, resulting in unfair treatment or discrimination against specific groups of individuals.

For instance, a lawsuit was filed against a US-based insurer which used an AI fraud detection model to help with claims management. The model outputs meant that black customers were subject to a significantly higher level of scrutiny compared to their white counterparts, resulting in more interactions and paperwork, thus longer delays in settling claims. It has been argued that the AI model perpetuated the racial bias already existent in the historical data.

Proxy bias arises when the training data includes variables that act as proxies for sensitive attributes, such as race or gender. Even if these sensitive attributes are not explicitly included in the data, the AI model may indirectly infer them from the proxy variables, leading to biased outcomes. For instance, occupation could act as a proxy for gender and location could act as a proxy for ethnicity. Fitting these in the model could result in biased predictions even if the protected characteristics are not captured in the data.

Moreover, these types of bias can often overlap and interact with each other, making it necessary to adopt comprehensive strategies to identify, mitigate, and monitor biases in AI models.

Ways to mitigate data bias

To mitigate the risks associated with data bias, an actuary will benefit from gaining a thorough understanding of the data collection methods used and identifying any potential sources of bias in the data collection process. Actuaries often have control over data quality improvement processes where they are involved in data cleaning, removing outliers and addressing missing values.

By applying rigorous data cleaning techniques, biases which are introduced by data quality issues can be reduced. For example, if a particular demographic group has disproportionately missing data, imputing missing values in a manner that preserves fairness and avoids bias can help mitigate bias in the analysis.

If the training data contains imbalanced representations of different demographic groups, resampling techniques can be employed to address the imbalance and give equal, or representative, weight to all groups, reducing potential bias.

Internal data can be supplemented with external data sources that provide a broader perspective and mitigate potential biases. By incorporating external data, the representation of various demographic groups can be expanded. However, insurers also need to be cautious about the potential biases in external data sources. The applicability and relevance of the external data to the analysis needs to be carefully considered.

Actuaries often also need to make assumptions when building models or performing analyses. As well as considering data biases, it is crucial to critically assess these assumptions for potential biases. For example, if an assumption implicitly assumes uniformity across different demographic groups, it could introduce bias. A practitioner should validate these assumptions using available data, conduct sensitivity analyses, and challenge the assumptions to ensure they do not lead to biased results.

Model validations to reduce ethical risk in AI

As well as mitigating data biases, actuaries should also design a robust model governance framework. This should include regular monitoring and evaluation of the model outputs against actual emerging data. Actuaries should carefully analyse the tail ends of the model output distribution to gain an understanding of the risk profile of individuals getting a significantly high or low prediction. If the predictions at the tails are materially different from the acceptable range, they could take a decision to apply caps and collars to the model prediction.

Continuously monitoring and evaluating the model performance, particularly in terms of fairness metrics, across different demographic groups should help identify any emerging biases. These could then be rectified by taking corrective actions and updating the model.

It can be challenging to collect the data needed for a fully robust analysis of fairness when it is not typically collected by an insurer. There may therefore be a need for the use of proxies (as described earlier) or allocation methods that use data that may be unavailable to the model, to assess the fairness.

Practitioners should also focus on conducting ethical reviews of the model’s design, implementation, and impact to ensure compliance with legal and regulatory requirements on fairness and non-discrimination. Ethical review processes can help identify and address potential biases before deploying the models in practice.

It is also vital to gain a deep understanding of the algorithm and features of the model. Incorporating explainability into a model is essential in building the trust of the management, regulator and the customer. Models that enable explainability can more easily reveal bias and identify areas for improvement. Gaining a deeper understanding of the drivers of the output should also facilitate interventions that could potentially give rise to more favourable outcome for the business.

Explainability metrics such as Shapley Additive exPlanations (SHAP) values, individual conditional expectation (ICE) plots and partial dependency plots should be part of the model governance framework. Apart from performing reasonability checks on values of these metrics across variables, it might also be worth comparing these against similar and comparable metrics, for example partial dependency plots vs generalised linear model (GLM) relativities. Although care should be taken when interpreting these differences, this approach may help to highlight areas of significant deviation that might need control or correction.

Another way of addressing model bias is to incorporate fairness considerations directly into the model training process by using techniques that explicitly account for fairness. For example, fairness-aware learning algorithms can be used to enhance fairness during the training process.

Potential bias awareness is key

The application of advanced analytics techniques, when used appropriately, can create opportunities for insurers to offer customers greater access to more targeted products at equitable prices, promoting safer behaviours and enhancing overall business outcomes.

However, it is crucial to recognise the substantial consequences associated with neglecting the risks associated with AI models that could affect business viability, regulatory compliance, and reputation. Establishing trust is key to the advancement of model techniques. Thoughtful consideration and mitigation of ethical risks should not only ensure a fairer outcome for society, but also advance the use of AI models within the insurance industry.

Chris Halliday is a Director and Consultant Actuary in WTW’s Insurance Consulting and Technology business.

Sections

Sections

Sections

Sections

Sections

Beware data bias in AI models

Go deeper with GlobalData

Artificial intelligence in Insurance: Health risk analysis models

Medical Internet of Things (IoT) in Insurance - Thematic Research

Data Insights

Access deeper industry intelligence

Why data bias in AI models matters

Different forms of data bias

Ways to mitigate data bias

Model validations to reduce ethical risk in AI

Potential bias awareness is key

Go deeper with GlobalData

Artificial intelligence in Insurance: Health risk analysis models

Medical Internet of Things (IoT) in Insurance - Thematic Research

Data Insights

HSBC to sell Singapore life insurance arm to Allianz for $2.09bn

UK motor insurance premiums continue to rise at renewal despite strong competition

Willis Re launches global life reinsurance practice

Sompo agrees to acquire Fator Seguradora in Brazil

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

Go deeper with GlobalData

Data Insights

Access deeper industry intelligence

Why data bias in AI models matters

Different forms of data bias

Ways to mitigate data bias

Model validations to reduce ethical risk in AI

Potential bias awareness is key

Data Insights

Access deeper industry intelligence

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing