Implementing Informative-Based Active Learning in Biomedical Record Linkage for the Splink Package in Python
Version
Published
Date Issued
2023
Author(s)
Miletic, Marko
Editor(s)
Mantas, John
Gallos, Parisis
Zoulias, Emmanouil
Hasman, Arie
Househ, Mowafa S.
Charalampidou, Martha
Magdalinou, Andriana
Type
Article
Language
English
Abstract
In biomedical record linkage, efficient determination of a threshold to decide at which level of similarity two records should be classified as belonging to the same patient is frequently still an open issue. Here, we describe how to implement an efficient active learning strategy that puts into practice a measure of usefulness of training sets for such a task. Our results show that active learning should always be considered when training data is to be produced via manual labeling. In addition to that, active learning gives a quick indication how complex a problem is by looking into the label frequencies: If the most difficult entities are always stemming from the same class, then the classifier will probably have less problems in distinguishing the classes. In big data applications, these two properties are essential, as the problems of under- and overfitting are exacerbated in such contexts.
Subjects
QA75 Electronic computers. Computer science
ISBN
9781643684000
Publisher DOI
Journal or Serie
Studies in Health Technology and Informatics
Series/Report No.
Studies in Health Technology and Informatics
ISSN
1879-8365
Publisher URL
Volume
305
Publisher
IOS Press
Submitter
Sariyar, Murat
Citation apa
Miletic, M., & Sariyar, M. (2023). Implementing Informative-Based Active Learning in Biomedical Record Linkage for the Splink Package in Python. In J. Mantas, P. Gallos, E. Zoulias, A. Hasman, M. S. Househ, M. Charalampidou, & A. Magdalinou (Eds.), Studies in Health Technology and Informatics (Vol. 305, pp. 509–512). IOS Press. https://doi.org/10.24451/arbor.20913
File(s)![Thumbnail Image]()
Loading...
open access
Name
SHTI-305-SHTI230545.pdf
License
Attribution-NonCommercial 4.0 International
Version
published
Size
177.38 KB
Format
Adobe PDF
Checksum (MD5)
9efd13337c2ed3aaf4d96d58f70cf3d2
