Repository logo
  • English
  • Deutsch
  • Français
Log In
New user? Click here to register.Have you forgotten your password?
  1. Home
  2. CRIS
  3. Publication
  4. GraLMatch : matching groups of entities with graphs and language models
 

GraLMatch : matching groups of entities with graphs and language models

URI
https://arbor.bfh.ch/handle/arbor/37484
Version
Published
Date Issued
2025-03
Author(s)
de Meer Pardo, Fernando
Lehmann, Claude
Gehrig, Dennis
Nagy, Andrea
Braschler, Martin
Stockinger, Kurt
Hadji Misheva, Branka  
Nicoli, Stefano
Type
Article
Language
English
Abstract
In this paper, we present an end-to-end multi-source Entity Matching problem, which we call entity group matching, where the goal is to assign to the same group records originating from multiple data sources but representing the same real-world entity. We focus on the effects of transitively matched records, i.e. the records connected by paths in the graph G = (V,E) whose nodes and edges represent the records and whether they are a match or not. We present a real-world instance of this problem, where the challenge is to match records of companies and financial securities originating from different data providers. We also introduce two new multi-source benchmark datasets that present similar matching challenges as real-world records. A distinctive characteristic of these records is that they are regularly updated following real-world events, but updates are not applied uniformly across data sources. This phenomenon makes the matching of certain groups of records only possible through the use of transitive information.
In our experiments, we illustrate how considering transitively matched records is challenging since a limited amount of false positive pairwise match predictions can throw off the group assignment of large quantities of records. Thus, we propose GraLMatch, a method that can partially detect and remove false positive pairwise predictions through graph-based properties. Finally, we showcase how fine-tuning a Transformer-based model (DistilBERT) on a reduced number of labeled samples yields a better final entity group matching than training on more samples and/or incorporating fine-tuning optimizations, illustrating how precision becomes the deciding factor in the entity group matching of large volumes of records.
DOI
10.24451/arbor.22168
https://doi.org/10.24451/arbor.22168
Journal
Open Proceedings: 28th International Conference on Extending Database Technology (EDBT), Barcelona, Spain, 25-28 March 2025
ISSN
2367-2005
Publisher URL
https://openproceedings.org/2025/conf/edbt/paper-10.pdf
Related URL
https://openproceedings.org/html/pages/index.html
Organization
Hochschule der Künste Bern  
Institut Applied Data Science & Finance  
Applied Data Science  
Wirtschaft  
Submitter
Hadji Misheva, Branka
Citation apa
de Meer Pardo, F., Lehmann, C., Gehrig, D., Nagy, A., Braschler, M., Stockinger, K., Hadji Misheva, B., & Nicoli, S. (2025). GraLMatch : matching groups of entities with graphs and language models. In Open Proceedings: 28th International Conference on Extending Database Technology (EDBT), Barcelona, Spain, 25-28 March 2025. https://doi.org/10.24451/arbor.22168
File(s)
Loading...
Thumbnail Image

open access

Name

paper-10.pdf

License
Attribution-NonCommercial-NoDerivatives 4.0 International
Version
published
Size

1.01 MB

Format

Adobe PDF

Checksum (MD5)

cea1dc0baae7f37631b562c451883db8

About ARBOR

Built with DSpace-CRIS software - System hosted and mantained by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback
  • Our institution