Measuring Bias in German Prompts to GPT Models Using Contact Hypothesis
Version
Published
Date Issued
2025-03-20
Author(s)
Type
Conference Paper
Language
English
Abstract
Large Language Models (LLMs) have been shown to perpetuate social biases present in their training data, leading to unfair outcomes in various applications. Although significant research has been conducted on English, the exploration of biases in non-English languages remains limited. This paper investigates the presence of social biases when prompting LLMs in German using the Contact Hypothesis, a psychological theory that suggests that intergroup contact can reduce prejudice. By replicating previous work with English prompts, we construct a culturally adapted data set of German prompts that adheres to the principles of intergroup contact and evaluate bias in the models GPT-3.5, GPT-4 and GPT-4o.
Our findings reveal that bias patterns when prompting LLMs in German differ from their English counterparts, with higher bias levels in German outputs, particularly under negative contact conditions. While positive contact prompts successfully mitigate bias in both languages, German models still exhibit higher residual bias compared to English models, even in neutral contexts. Additionally, our study highlights the importance of culturally relevant prompt design, as direct translations from English might fail to account for linguistic and societal differences in bias expression. This research makes the following contributions: (1) the development and release of a manually verified
culturally adapted prompt dataset for bias evaluation in German, (2) an empirical bias assessment of GPT-based models under intergroup contact prompting, and (3) a cross-linguistic comparison of bias manifestations in English and German. Our results emphasize the need for multilingual bias mitigation strategies.
Our findings reveal that bias patterns when prompting LLMs in German differ from their English counterparts, with higher bias levels in German outputs, particularly under negative contact conditions. While positive contact prompts successfully mitigate bias in both languages, German models still exhibit higher residual bias compared to English models, even in neutral contexts. Additionally, our study highlights the importance of culturally relevant prompt design, as direct translations from English might fail to account for linguistic and societal differences in bias expression. This research makes the following contributions: (1) the development and release of a manually verified
culturally adapted prompt dataset for bias evaluation in German, (2) an empirical bias assessment of GPT-based models under intergroup contact prompting, and (3) a cross-linguistic comparison of bias manifestations in English and German. Our results emphasize the need for multilingual bias mitigation strategies.
Publisher DOI
Publisher URL
Related URL
Conference
2nd Workshop on AI bias: Measurements, Mitigation, Explanation Strategies (AIMMES 2025): Proceedings
Submitter
Kurpicz-Briki, Mascha
Citation apa
Ikae, C., & Kurpicz-Briki, M. (2025). Measuring Bias in German Prompts to GPT Models Using Contact Hypothesis. 2nd Workshop on AI bias: Measurements, Mitigation, Explanation Strategies (AIMMES 2025): Proceedings. https://doi.org/10.24451/arbor.12797
File(s)![Thumbnail Image]()
Loading...
open access
Name
ikae2025.pdf
License
Attribution 4.0 International
Version
published
Size
1.8 MB
Format
Adobe PDF
Checksum (MD5)
7a8ee2e0a65272f55bb3748b50f31162
