Your guide to Open Access publishing and Open Science

The Growing Use of Big Data Sets and Data Analytics Algorithms for Knowledge Discovery and Presentation

Author: No Comments Share:
Day 1, Big Data & Analytics Conference, CeBIT Australia, Sydney, Australia, May 2, 2016 | © Courtesy of CEBIT AUSTRALIA/Flickr.

The constantly increasing quantities of digitized empirical data transform multiple disciplines through new knowledge they help generate.

A Blog Article by Pablo Markin.

As Joseph Ramsey, Kun Zhang and Clark Glymour (2019: 42) have argued in their recent article, big data can only contribute to the generation of new knowledge if they pair primary empirical data with algorithmically-powered computation as a basis for making inferences, testing hypotheses and estimating their validity.1 However, the underlying algorithms remain highly demanding in terms of computing capacities and statistical modelling (Ramsey et al., 2019: 42). Nevertheless, the growing amount of big data sets already becomes sufficient for testing scientific theories and causal hypotheses across multiple fields of inquiry, e.g., astronomy (Ramsey et al., 2019: 47).

Yet the growing performance of computing systems capable of handling big data does not reduce  the need for the independent development of theoretical approaches to both empirical phenomena and data analysis algorithms as Piet Daas, Marco Puts, Bart Buelens, and Paul van den Hurk (2015: 256) highlight.2 The benefits of using big data outstrip, nevertheless, the challenges of its usage, as it can significantly reduce the cost, increase the speed and augment the quality of knowledge discovery, such as for statistical purposes (Daas et al., 2015: 259).

As Blagoj Ristevski and Ming Chen (2018: 1, 3-4) indicate,3 the key means for knowledge generation with the help of big data is analytics techniques aimed at discovering hidden patterns in large-scale data sets as a basis for new knowledge that can be bound to a particular discipline or interdisciplinary nature. In this respect, the quality of data analytics algorithms and empirical samples to which they are applied is also likely to affect the scientific validity, and practical usability, of the novel insights they produce (Ristevski and Chen, 2018: 4).

For these reasons, as Esko Juuso (2018: 403-404) notes,4 growing research attention is being paid to the optimization of data analysis methodologies, to increase the efficiency and adaptability of the underlying algorithms, such as through the application of artificial intelligence solutions that digitally reproduce human cognitive functions for the purposes of automated information processing. Among these methodologies are statistical modelling, fuzzy set frameworks and nonlinear systems that embed artificial intelligence into big data analytics processes (Juuso, 2018: 404-405).

Consequently, in the last decade, data science has emerged as a discipline that studies the extraction of knowledge from usually large-scale data sets, as Il-Yeol Song and Yongjun Zhu (2017: 5) have indicated,5 as they reviewed the challenges of using big data for scientific and educational purposes. This is because the successful utilization of big data analytics is likely to demand pooling expertise from as disparate domains as computer programming, statistical analysis and project management (Song and Zhu, 2017: 7).

Additionally, the deployment of big data solutions can demand the visualization of the new knowledge they create, such as through graphs, as Marcia Lei Zeng (2017: 3, 6) explores in relation to digital humanities that can assist data scientists with making sense of their findings, while concentrating on the most important information.6 The continued growth of social media usage also opens new possibilities for digital ethnography through topographic visualizations of patterns that arise from online interactions, such as on Instagram, according to the paper of Catherine Emma Jones, Marta Severo and Daniele Guido (2018).7

By Pablo Markin

Featured Image Credits: Day 1, Big Data & Analytics Conference, CeBIT Australia, Sydney, Australia, May 2, 2016 | © Courtesy of CEBIT AUSTRALIA/Flickr.


  1. Glymour, Clark, Joseph D. Ramsey, and Kun Zhang. “The Evaluation of Discovery: Models, Simulation and Search through “Big Data”.” Open Philosophy 2.1 (2019): 39-48.
  2. Daas, Piet J. H., Marco J. Puts, Bart Buelens, and Paul A. M. van den Hurk. “Big data as a source for official statistics.” Journal of Official Statistics 31.2 (2015): 249-262.
  3. Ristevski, Blagoj, and Ming Chen. “Big Data Analytics in Medicine and Healthcare.” Journal of integrative bioinformatics 15.3 (2018): 1-5.
  4. Juuso, Esko K. “Smart Adaptive Big Data Analysis with Advanced Deep Learning.” Open Engineering 8.1 (2018): 403-416.
  5. Song, Il-Yeol, and Yongjun Zhu. “Big data and data science: opportunities and challenges of iSchools.” Journal of Data and Information Science 2.3 (2017): 1-18.
  6. Zeng, Marcia Lei. “Smart data for digital humanities.” Journal of data and information science 2.1 (2017): 1-12.
  7. Catherine Emma (Kate) Jones, Marta Severo and Daniele Guido, « Socio-spatial visualisations of cultural routes », Netcom [En ligne], 32-3/4 | 2018, mis en ligne le 06 mars 2018, consulté le 25 avril 2019. URL : ; DOI : 10.4000/netcom.3674.
Previous Article

The Effects of the Shadow Economy on Economic Growth, Development Levels and Competitiveness

Next Article

Michel Houellebecq: The Exploration of Post-Modernity in Contemporary French Literature

You may also like

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.