Mr Fumihiro Kato1, Dr Ikki Ohmukai1, Dr Teruhito Kanazawa1, Dr Kei Kurakawa1
1National Institute Of Informatics, Chiyoda, Japan
National Institute of Informatics (NII) hosts scholarly information services for Japanese researchers and students so far. CiNii  is a discovery service provided by NII for Japanese research literatures such as articles, books and dissertations. It harvests and integrates metadata of publications from institutional repositories, the National Diet Library, academic societies and other scholarly databases in Japan. As sharing and reusing research data are one of key concepts of open science, we have launched a project called CiNii Research to enhance CiNii to support research data as a first-class citizen since 2017.
CiNii Research aims to enable search and discovery of publications and datasets produced by research projects in Japan. To achieve this goal, we update NII scholarly services to support research data. And we also work on the development of the entire CiNii Research system. CiNii Research consists of three components illustrated in Figure 1. The first component is to aggregate metadata of research objects related to research projects in Japan. The Second component is to extract research objects and relationships among them from collected metadata in order to make a knowledge graph. The last component is to provide a discovery service for research objects by indexing nodes of the knowledge graph. In this presentation, we report on the progress of the development.
Figure 1: Components of CiNii Research
The first component is to aggregate metadata of research objects related to research projects in Japan. NII has already collaborated with Japanese universities and institutions to collect research objects behind NII scholarly information services . For instance, IRDB  is a national aggregator of institutional repositories in Japan. It includes 2.8 million records from 681 repositories as of June 2018. And the number of datasets is 55 thousand records (2.5% of total). As CiNii uses metadata collected by IRDB, we will update IRDB with JPCOAR Schema 1.0  which is the latest metadata schema for Japanese institutional repositories to support new features like research data, identifiers and open access policies.
Another aggregator is KAKEN  that collects and hosts result reports of Grants-in-Aid for Scientific Research (KAKENHI) which is one of the major research funds by the Government of Japan. In addition to these existing aggregators, we also collect persistent identifiers from JaLC , DataCite, Crossref and ORCID. JaLC is the DOI registry agency in Japan and NII is one of board members of JaLC. Hence JaLC is our primary DOI source as Japanese repositories use JaLC to assign DOIs to research objects including datasets.
Constructing a knowledge graph of research objects is an essential part of a modern discovery service as links between scholarly literature and dataset help to find further related research objects. We defined targeted types of resource objects as products, researchers, projects, organizations and funds. Products is defined as a superset of articles, books, dissertation and datasets. We currently focus on products, researchers and projects because these types and their relationships are most important for CiNii Research. Our system extracts research objects of these types from aggregated metadata to identify them with persistent identifiers and name disambiguation techniques.
Acquiring links between identified objects is the most important but hardest process of creating a knowledge graph as explicit links in metadata are rather a few as of this moment. Our current challenge is to extract relationships from our existing scholarly services to integrate into the knowledge graph. KAKEN has reports, product lists and researchers of research projects so that the system can obtain links among products, researchers and projects. As CiNii has products and researchers, the system can get links between them. However, the main issue of this challenge is that each scholarly service is mostly independent and only a part of national researcher identifiers is shared now. Therefore, we concentrate to integrate research objects and their links among services.
Also, we expect that metadata including identifiers of research objects and links between research objects will increase in future as NII has been developing a new version of institutional repository system called WEKO3 to implement the JPCOAR schema. The current WEKO is used by about 500 Japanese universities and institutions via our hosting service. After the hosting service is replaced to the WEKO3, we encourage researchers and librarians to input identifiers for research objects and relationships between research objects in their public repositories. They will help us to grow and refine our knowledge graph.
Creating a knowledge graph of research objects is also important for a global collaboration with other discovery and related services. Scholix  provides an interoperability framework for exchanging links between scholarly literature and data and global aggregators of data-literature links such as DataCite, Crossref, OpenAIRE or EMBL-EBI. OpenAIRE also provides OpenAIRE LOD services  to share their integration of data about research as Linked Data. Research Graph  creates a local graph for research management systems to make links to the larger Research Graph including funding information, collections of research datasets and open access repositories. We would like to share and exchange such links of research objects to collaborate with international activities.
We have been implementing an integrated search with Elasticsearch to show information of research objects and relevant objects based on our knowledge graph so that a user can follow their relation links to find more related research objects. CiNii Research provides a simple input form to search keywords. A user can select a target type or all types of research objects described in the Knowledge Graph section from tabs before searching words in the form. If a user selects the “dataset” tab, search results are filtered only for datasets. CiNii Research does not support a typical facet search that many discovery services implement for their search results because we would like to keep the results as much as simple at this time.
We plan to support a way to connect CiNii Research to our research data management platform called GakuNin RDM . It will enable us to import specific research data directly after finding it on CiNii Research.
- Available from: https://ci.nii.ac.jp/en, accessed 7 Jun 2018.
- Available from: http://irdb.nii.ac.jp/analysis/index_e.php, accessed 7 Jun 2018.
- JPCOAR Schema Guidelines. Available from: https://schema.irdb.nii.ac.jp/en, accessed 7 Jun 2018
- Available from: https://kaken.nii.ac.jp/en/, accessed 7 Jun 2018
- Japan Link Center (JaLC). Available from: https://japanlinkcenter.org/top/english.html, accessed 7 Jun 2018.
- Burton, A. et al. The Scholix Framework for Interoperability in Data-Literature Information Exchange. D-Lib, 2017.
- Alexiou, G., et al., OpenAIRE LOD Services: Scholarly Communication Data. Save-SD 2016., Lecture Notes on Computer Science, vol 9792. 2016, p. 45-50.
- Aryani, A. and Wang, .J. Research Graph: Building a Distributed Graph of Scholarly Works using Research Data Switchboard, in Proceedings of Open Repository 2017, 2017.
- Komiyama, Y. and Yamaji, K. Nationwide Research Data Management service of Japan in the Open Science Era, in Proceedings of the 6th IIAI International Congress on Advanced Applied Informatics, 2017, pp.129-133.
Fumihiro Kato is a researcher at Research Center of Open Science and Data Platform, National Institute of Informatics since 2017. He is currently responsible for the development of the Japanese research data discovery service. He also works for Linked Open Data projects like DBpedia Japanese and the IMI project to create a common vocabulary for Japanese national and local governments.
He received his Master of Media and Governance from Keio University in 2004. His research interests are web technologies, semantic web and linked open data.