That Google launches a separate search engine for Open Access data sets attests to the increasing importance of Open Access for scientists and the customized search portals that connect between researchers and the content that Open Access repositories store.
A Blog Article by Pablo Markin.
In early September 2018, Google has launched Dataset Search, which is a customized search engine that restricts its queries to online databases that offer access to Open Access data for scholars, journalists and researchers around the world. As a customized search engine, this portal accesses information only on the basis of how it was classified by its proprietors. Thus, the unveiling of Google Dataset Search indicates that, similar to online databases for scholarly articles indexed by Google Scholar, Open Access empirical data made available by publishing houses, government agencies and research organizations via online repositories have reached a critical mass demanding the existence of dedicated portals for querying these resources.
According to estimates, across the globe thousands of Open Access data repositories with millions of data files already exist. However, without a customized search engine for these repositories, accessing Open Access data can be challenging, due to the unstructured nature of the information they hold. As open-source tagging vocabularies for these data sets, e.g., schema.org, can help bridging the information gap between data repositories and research communities, such as through the standardization of metadata, which significantly strengthens the momentum behind the adoption of Open Access in relation to data sets, customized search engines continue to evolve.
While data recoverability is likely to facilitate the funding of Open Access repositories for data sets, organizations owning the data need to collaborate with customized search engines, as the latter do not generate metadata on their own. This is illustrated by JURN, which was launched in 2009 as a project for searching the databases and repositories of scholarly articles in Open Access in the domain of humanities and arts, while expanding its scope to all sciences by 2014.
At present, JURN runs searches of Open Access databases and repositories via a customized search engine. The Google-based JURN directory indexes over 4,000 Open Access journals and repositories with a primary emphasis on social sciences and humanities, such as art, history, literature and philosophy. In this respect, JURN is similar to the Directory of Open Access Journals (DOAJ), as it relies on article metadata for its search functions.
JURN also fills an important gap in the accessibility of Open Access articles that may be insufficiently tagged for their automated discovery by more established search engines, such as Google Scholar. Whereas in some domains the majority of Open Access articles may continue to be discoverable by Google Scholar to a limited extent only, these customized search engines, such as JURN and the DOAJ, apparently make a critical contribution to the visibility of Open Access scholarly papers.
In other words, customized search engines, such as Dataset Search, JURN, the DOAJ and Bielefeld Academic Search Engine (BASE), utilize digital content metadata to connect Open Access data and article repositories to researchers worldwide.
By Pablo Markin
Featured Image Credits: Google Dance 2007, San Jose, CA, USA, August 21, 2007 | © Courtesy of Latham Jenkins/Flickr.