Repository logo
  • English
  • Deutsch
  • Français
Log In
New user? Click here to register.Have you forgotten your password?
  1. Home
  2. CRIS
  3. Publication
  4. Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models in Court Decisions
 

Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models in Court Decisions

URI
https://arbor.bfh.ch/handle/arbor/37252
Version
Published
Date Issued
2024-06-21
Author(s)
Nyffenegger, Alex
Stürmer, Matthias  
Niklaus, Joël  
Type
Conference Paper
Language
English
Abstract
Anonymity in court rulings is a critical aspect of privacy protection in the European Union and Switzerland but with the advent of LLMs, concerns about large-scale re-identification of anonymized persons are growing. In accordance with the Federal Supreme Court of Switzerland (FSCS), we study re-identification risks using actual legal data. Following the initial experiment, we constructed an anonymized Wikipedia dataset as a more rigorous testing ground to further investigate the findings. In addition to the datasets, we also introduce new metrics to measure performance. We systematically analyze the factors that influence successful re-identifications, identifying model size, input length, and instruction tuning among the most critical determinants. Despite high re-identification rates on Wikipedia, even the best LLMs struggled with court decisions. We demonstrate that for now, the risk of re-identifications using LLMs is minimal in the vast majority of cases. We hope that our system can help enhance the confidence in the security of anonymized decisions, thus leading the courts to publish more decisions.
Subjects
K Law (General)
QA75 Electronic computers. Computer science
ISBN
979-8-89176-119-3
DOI
10.24451/arbor.22330
https://doi.org/10.24451/arbor.22330
Publisher DOI
10.18653/v1/2024.findings-naacl.157
Publisher URL
https://aclanthology.org/2024.findings-naacl.157/
Related URL
https://arxiv.org/abs/2308.11103 publication
Organization
Institut Public Sector Transformation (IPST)  
Digital Sustainability Lab  
Wirtschaft  
Conference
Findings of the Association for Computational Linguistics: NAACL 2024
Publisher
Association for Computational Linguistics
Submitter
Stürmer, Matthias
Citation apa
Nyffenegger, A., Stürmer, M., & Niklaus, J. (2024). Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models in Court Decisions. Findings of the Association for Computational Linguistics: NAACL 2024. Association for Computational Linguistics. https://doi.org/10.24451/arbor.22330
File(s)
Loading...
Thumbnail Image
Download

open access

Name

2024.findings-naacl.157.pdf

License
Attribution 4.0 International
Version
published
Size

5.63 MB

Format

Adobe PDF

Checksum (MD5)

1d7f1098da57ad19768d6f8e4624f8d4

About ARBOR

Built with DSpace-CRIS software - System hosted and mantained by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback
  • Our institution