Repository logo
  • English
  • Deutsch
  • Français
Log In
New user? Click here to register.Have you forgotten your password?
  1. Home
  2. CRIS
  3. Publication
  4. Challenges of Using Synthetic Data Generation Methods for Tabular Microdata
 

Challenges of Using Synthetic Data Generation Methods for Tabular Microdata

URI
https://arbor.bfh.ch/handle/arbor/44803
Version
Published
Date Issued
2024-07-09
Author(s)
Miletic, Marko  
Sariyar, Murat  
Type
Article
Language
English
Abstract
The generation of synthetic data holds significant promise for augmenting limited datasets while avoiding privacy issues, facilitating research, and enhancing machine learning models’ robustness. Generative Adversarial Networks (GANs) stand out as promising tools, employing two neural networks—generator and discriminator—to produce synthetic data that mirrors real data distributions. This study evaluates GAN variants (CTGAN, CopulaGAN), a variational autoencoder, and copulas on diverse real datasets of different complexity encompassing numerical and categorical attributes. The results highlight CTGAN’s sensitivity to training parameters and TVAE’s robustness across datasets. Scalability challenges persist, with GANs demanding substantial computational resources. TVAE stands out for its high utility across all datasets, even for high-dimensional data, though it incurs higher privacy risks, which is indicative of the curse of dimensionality. While no single model universally excels, understanding the trade-offs and leveraging model strengths can significantly enhance synthetic data generation (SDG). Future research should focus on adaptive learning mechanisms, scalability enhancements, and standardized evaluation metrics to advance SDG methods effectively. Addressing these challenges will foster broader adoption and application of synthetic data.
DOI
https://doi.org/10.24451/dspace/11576
Publisher DOI
10.3390/app14145975
Journal
Applied Sciences
ISSN
2076-3417
Publisher URL
https://www.mdpi.com/2076-3417/14/14/5975
Organization
Technik und Informatk  
Institut für Optimierung und Datenanalyse IODA  
Volume
14
Issue
14
Publisher
MDPI AG
Submitter
Sariyar, Murat
Citation apa
Miletic, M., & Sariyar, M. (2024). Challenges of Using Synthetic Data Generation Methods for Tabular Microdata. In Applied Sciences (Vol. 14, Issue 14). MDPI AG. https://doi.org/10.24451/dspace/11576
File(s)
Loading...
Thumbnail Image
Name

Challenges_of_Using_Synthetic_Data_Generation_Meth.pdf

Description
Version published
License
Attribution 4.0 International
Size

528.14 KB

Format

Adobe PDF

Checksum (MD5)

93b382289f3db6480ee60b2cb855e8e0

About ARBOR

Built with DSpace-CRIS software - System hosted and mantained by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback
  • Our institution