Semantic Similarity and Correlation of Linked Statistical Data Analysis.

Capadisli, Sarven; Meroño-Peñuela, Albert; Auer, Sören; Riedl, Reinhard (2014). Semantic Similarity and Correlation of Linked Statistical Data Analysis. In: 13th International Semantic Web Conference. Riva del Garda / Italy. 19.-23.10.2014.

Full text not available from this repository. (Request a copy)

Statistical data is increasingly made available in the form of Linked Data on the Web. As more and more statistical datasets become available, a fundamental question on statistical data comparability arises: To what extent can arbitrary statistical datasets be faithfully compared? Besides a purely statistical comparability, we are interested in the role that semantics plays in the data to be compared. Our hypothesis is that semantic relationships between different components of statistical datasets might have a relationship with their statistical correlation. Our research focuses in studying whether these statistical and semantic relationships influence each other, by comparing the correlation of statistical data with their semantic similarity. The ongoing research problem is, hence, to investigate why machines have a difficulty in revealing meaningful correlations or establishing non-coincidental connection between variables in statistical datasets. We describe a fully reproducible pipeline to compare statistical correlation with semantic similarity in arbitrary Linked Statistical Data. We present a use case using World Bank data expressed as RDF Data Cube, and we highlight whether dataset titles can help predict strong correlations.

Item Type:

Conference or Workshop Item (Paper)


Business School


Capadisli, Sarven;
Meroño-Peñuela, Albert;
Auer, Sören and
Riedl, Reinhard




Service Account

Date Deposited:

03 Oct 2019 10:22

Last Modified:

03 Oct 2019 10:22

Uncontrolled Keywords:

Linked Data, Statistics, Statistical database, Semantic Similarity, Correlation


Actions (login required)

View Item View Item
Provide Feedback