Dr Anusuriya Devaraju1
1CSIRO, Kensington, Australia, firstname.lastname@example.org
The adoption of open data in universities, research institutions and government agencies has led to a dramatic increase in the number of open data on the Web. Users face the challenge of discovering relevant datasets as a result of the data proliferation. Existing data repositories address this challenge through keyword and faceted search. However, these search mechanisms are primarily intended for users who know what they are looking for or are familiar with the structure of the repositories. In addition, they may return too broad and too narrow search results. This makes it difficult for users to filter datasets that are not of their interest. Recommender systems are complementary to the search mechanisms. They have been widely employed in E-commerce sites to improve product discovery and to enhance user experience of the sites. They are information filtering systems that present users with product recommendations that match the users’ preferences or contexts.
We developed a recommendation approach for a new application area, open data discovery. The approach leverage s content-based filtering (CBF) and item-to-item co-occurrence (I2I), tuned to a feature weighting model obtained through a user survey. CBF quantifies the similarity of datasets by comparing their metadata, e.g., title, keyword, and location, while I2I considers their statistical co-occurrence, e.g., downloads by the same users. We applied the approach in the context of the CSIRO Data Access Portal, and evaluated it through a user study. 113 data users participated in the study and evaluated 216 target datasets. We identified 5 data recommendations for each of the target datasets, such that we obtained 1080 relevance judgments in total. The results of the user study reveal the ability of the recommendation approach to accurately quantify the relevance of the datasets, which we consider as an important contribution to the challenge of discovering relevant open datasets.
Description of why it is relevant to this year’s conference .
This talk is relevant to the conference as it addresses the challenge of discovering open datasets. It presents a concrete experience in a new application area, e.g., the development of a recommendation approach to improve the discovery of open datasets.
Anusuriya Devaraju is currently a postdoctoral fellow at CSIRO Mineral Resources. Prior joining the research center, she worked as a researcher at the Institute for Bio- and Geosciences, Forschungszentrum Juelich, and involved in the data management of TERENO and TERENO-MED long-term terrestrial observatories. Her research focuses on the discovery of research assets such as datasets, software packages and physical collections in Earth and Environmental Science using recommender system, persistent identifier and semantic technologies.