Challenges of Using Synthetic Data Generation Methods for Tabular Microdata
Version
Published
Date Issued
2024-07-09
Author(s)
Type
Article
Language
English
Abstract
The generation of synthetic data holds significant promise for augmenting limited datasets while avoiding privacy issues, facilitating research, and enhancing machine learning models’ robustness. Generative Adversarial Networks (GANs) stand out as promising tools, employing two neural networks—generator and discriminator—to produce synthetic data that mirrors real data distributions. This study evaluates GAN variants (CTGAN, CopulaGAN), a variational autoencoder, and copulas on diverse real datasets of different complexity encompassing numerical and categorical attributes. The results highlight CTGAN’s sensitivity to training parameters and TVAE’s robustness across datasets. Scalability challenges persist, with GANs demanding substantial computational resources. TVAE stands out for its high utility across all datasets, even for high-dimensional data, though it incurs higher privacy risks, which is indicative of the curse of dimensionality. While no single model universally excels, understanding the trade-offs and leveraging model strengths can significantly enhance synthetic data generation (SDG). Future research should focus on adaptive learning mechanisms, scalability enhancements, and standardized evaluation metrics to advance SDG methods effectively. Addressing these challenges will foster broader adoption and application of synthetic data.
Publisher DOI
Journal
Applied Sciences
ISSN
2076-3417
Publisher URL
Volume
14
Issue
14
Publisher
MDPI AG
Submitter
Sariyar, Murat
Citation apa
Miletic, M., & Sariyar, M. (2024). Challenges of Using Synthetic Data Generation Methods for Tabular Microdata. In Applied Sciences (Vol. 14, Issue 14). MDPI AG. https://doi.org/10.24451/dspace/11576
File(s)![Thumbnail Image]()
Loading...
Name
Challenges_of_Using_Synthetic_Data_Generation_Meth.pdf
Description
Version published
License
Attribution 4.0 International
Size
528.14 KB
Format
Adobe PDF
Checksum (MD5)
93b382289f3db6480ee60b2cb855e8e0
