Semantic Similarity and Correlation of Linked Statistical Data Analysis.
Version
Published
Date Issued
2014
Author(s)
Type
Conference Paper
Language
English
Abstract
Statistical data is increasingly made available in the form of
Linked Data on the Web. As more and more statistical datasets become
available, a fundamental question on statistical data comparability
arises: To what extent can arbitrary statistical datasets be faithfully
compared? Besides a purely statistical comparability, we are interested
in the role that semantics plays in the data to be compared. Our
hypothesis is that semantic relationships between different components
of statistical datasets might have a relationship with their statistical
correlation. Our research focuses in studying whether these statistical
and semantic relationships influence each other, by comparing the
correlation of statistical data with their semantic similarity. The ongoing
research problem is, hence, to investigate why machines have a difficulty
in revealing meaningful correlations or establishing non-coincidental
connection between variables in statistical datasets. We describe a fully
reproducible pipeline to compare statistical correlation with semantic
similarity in arbitrary Linked Statistical Data. We present a use case
using World Bank data expressed as RDF Data Cube, and we highlight
whether dataset titles can help predict strong correlations.
Linked Data on the Web. As more and more statistical datasets become
available, a fundamental question on statistical data comparability
arises: To what extent can arbitrary statistical datasets be faithfully
compared? Besides a purely statistical comparability, we are interested
in the role that semantics plays in the data to be compared. Our
hypothesis is that semantic relationships between different components
of statistical datasets might have a relationship with their statistical
correlation. Our research focuses in studying whether these statistical
and semantic relationships influence each other, by comparing the
correlation of statistical data with their semantic similarity. The ongoing
research problem is, hence, to investigate why machines have a difficulty
in revealing meaningful correlations or establishing non-coincidental
connection between variables in statistical datasets. We describe a fully
reproducible pipeline to compare statistical correlation with semantic
similarity in arbitrary Linked Statistical Data. We present a use case
using World Bank data expressed as RDF Data Cube, and we highlight
whether dataset titles can help predict strong correlations.
Publisher URL
Organization
Conference
13th International Semantic Web Conference
Submitter
ServiceAccount
Citation apa
Capadisli, S., Meroño-Peñuela, A., Auer, S., & Riedl, R. (2014). Semantic Similarity and Correlation of Linked Statistical Data Analysis. 13th International Semantic Web Conference. https://arbor.bfh.ch/handle/arbor/32269
