COVID-19 is a rapidly transmitted disease caused by SARS-COV2. It is now a global pandemic with a high fatality rate [1] and almost no effective control measures other than a few conventional protective measures [2]. This crisis has already spread to 213 countries and territories across the globe. As of February 10, 2021, the total number of confirmed cases is 107,389,998 and the number of deaths is 2,349,171 (Figure 1) [3]. An immediate effective strategy can control the rapid transmission and death rate [4]. Traditional technology fails to manage and control the pandemic. Various information is generated every day that conveys a useful message for the COVID-19 pandemic. So, it is necessary to store this huge information that can be used for developing appropriate measures to fight against the coronavirus [5]. Big data analytics is a key tool to analyze these huge amounts of data for analyzing the trend, transmission pattern, virus association, and differences in the genomic characteristics [6]. Since the big data approach can handle a huge amount of data of the infected people, it can explain the nature of the dissemination of the virus. It also helps to develop a proper preventive method to stop its dissemination. Moreover, it helps to show the trend of infection, recovery, and expiration [7]. These data can be efficiently applied for identification of new cases, rates of serious cases, death, and recovery, and help to utilize the resources for crisis management [8]. Big data technologies also help to locate patients, proximity, travel history of people, co-morbidity, infected people’s physiology, and symptoms of infection [9]. This information can easily be obtained from the grass root level through using GPS, remote sensing, and other related technologies [10].
Information technology is an indispensable part of daily life [11]. The rapid development of information technology creates its demand in every sector [12]. The health sector is no exception due to the extensive use of information technology in almost all stages of the healthcare industry [13]. Technological integration is not new in the healthcare industry. It started in 1970 in the form of e-health and telemedicine. Medical computing was limited to a few disciplines of computer science to provide better healthcare [14]. Nowadays, health information technology (HIT) covers a range of information, health, and computer-related fields due to technological transformation [15]. This transformation is actually a shift of traditional medical practices to the adoption of modern information technology to enhance development in all stages of healthcare services [16]. HIT comprises various contemporary technologies such as e-health, m-health, telemedicine, social networking sessions, email, and messaging [17]. These technologies provide a huge opportunity to treatment seekers and healthcare professionals for ensuring better services. Some other modern technologies are also effective to store the COVID-19 related information and treatment such as electronic health records (EHRs), smart patient rooms, and computer-aided surgical instruments [18].
The healthcare sector is data-oriented and it can use the potential of big data technologies for efficient health care delivery [19]. Various stages of clinical, operational, and managerial sections of a health care provider generate a huge amount of data that can easily be maintained by several advanced technologies such as HER (electronic health record), LIMS (Laboratory Information Library System), etc. [20]. Scholars and practitioners are always trying to develop big data-based technology to provide rapid, efficient, and effective technology to healthcare seeking people. In this COVID-19 pandemic crisis, big data can be an effective addition to serve people and protect millions of people from the deadliest global crisis [21].
The analysis of big data provides useful insights to practitioners, scholars, healthcare workers, and other related stakeholders to fight against the pandemic. It can be used to show how the virus transmits across the globe as well as the progress of the medical field. The forecasting for possible transmission in a particular area can also be enabled by the analysis of big data. Simultaneously, big data also helps to develop effective treatment procedure, and tackles the crisis. Many studies have already been published on the epidemiology of COVID-19 [22], guidelines for controlling infection [7], safety [23], health services [22], information seeking behaviors [24], application of digital technologies [25] and experts opinion [26], but the potential of big data technologies is still lacking. Therefore, this study aims to explore the potential of the big data approach for effective management of the COVID-19 pandemic crisis. To attain the research objective, this study addresses a few key research questions: (i) How can big data technology help to control transmission of COVID-19? (ii) What are the key applications of big data to manage the COVID-19 crisis? And (iii) what are the challenges associated with implementation of big data technology? This study will provide a new insight and help policy makers, and administrators to develop data driven initiatives to tackle and manage the COVID-19 crisis.
Methods
Research design
A systematic literature review has been done over last 10 years. Information related to the latest innovation has been compiled to illuminate the debate regarding the potential of big data technology for improving healthcare services. This study mainly emphasizes a new paradigm of big data application which can play a vital role in improving healthcare services, particularly for COVID-19 pandemic crisis management.
Search strategy
Literature reviews can contribute to developing a particular field by accumulating key issues in the existing literature. Therefore, this study extensively searched a number of renowned databases including Web of Science, Engineering Village, Scopus, and Google Scholar using the necessary keywords COVID-19, big data, digital technology, health, and coronavirus. This systematic literature review was done in August 2020.
Inclusion and exclusion criteria
Two inclusion criteria were followed for the selection of quality documents: (a) Does this study focus on COVID-19 and big data? And (b) does it deal with a big data approach or any related approach for managing the COVID-19 pandemic? This study also considers the related studies based on primary data as well as systematic literature reviews to present strong arguments for controlling the COVID-19 pandemic.
Results
This study is mainly guided by systematic review and meta-analysis (PRISMA) checklists. The major stages of PRISMA checklists are identification, screening, eligibility, and inclusion. At the identification stage, 311 quality documents were selected along with other 7 from references of the documents. At the screening stage, 136 documents were removed after careful abstract screening through the inclusion and exclusion criteria of this study. At the eligibility stage, 182 quality documents were selected by removing 147 documents due to a number of reasons such as no full text, non-relevancy, not focusing on COVID-19, and big data. At the inclusion stage, 32 quality documents were finally selected, comprising journal articles, book chapters, books, and working papers for explaining the potential of big data technologies for managing the COVID-19 crisis (Figure 2).
Discussion
This section is presented as three sub-sections: the first sub-section deals with sources of COVID-19 related big data, the second sub-section deals with key applications of big data, and the third sub-section deals with challenges of implementation of big data. This study reveals a conceptual framework (Figure 3) that explains the key steps of big data driven COVID-19 management. This framework comprises major 4 steps: COVID-19 data sources, big data techniques, big data processing, and big data application [27]. The details of the explanation are given in the following sub-sections.
Sources of COVID-19 related big data
This study analyzes all the selected documents carefully and reveals ten possible sources of big data. These sources provide a huge number of COVID-19 related data in terms of people’s movement, information seeking, treatment and daily affairs. The major sources of big data are social media, immigration and customs databases, COVID-19 database/healthcare data, mobile data, mobile technology, public transportation system, bank card transaction, closed-circuit camera/security camera, and car Geographical Positioning System (GPS) (Table I) [1, 10, 12, 18, 27–31]. Social media are among the greatest sources of people’s connectivity to various social media such as Facebook, Twitter, WeChat, IMO and QQ [27]. Social media not only provide accurate information but also fake information. In this corona pandemic, most countries have adopted some common measures such as social distancing, massive testing, wearing masks, washing hands, using sanitizer, avoiding crowded places and lockdown [28]. In the lockdown time, people spend more time using social media. Due to availability, social media have already achieved acceptance from all classes of people [32]. So, data collection should be genuine to avoid fake information. But data from various public sources such as immigration, customs, healthcare agencies, mobile and public transportations are authentic sources of big data [1]. Some other sources such as bank card transactions, security camera, car GPS, mobile technology and tracking devices also provide authentic data. Additionally, some developed countries including China, South Korea, Taiwan and Japan have already developed their own health app to track the movement of people and take an electronic health record (HER) through the app [33]. Since various sources of big data are cheap, available and handy, it could be a great source for developing appropriate measures to manage the COVID-19 crisis.
Table I
Sources | Description | Researcher |
---|---|---|
Social media | Social media apps and networks | Saheb [27] |
Immigration and customs databases | Airports, seaports and land ports | Whitelaw et al. [28] |
COVID-19 database/Healthcare data | Various hospital and diagnostic centers | Whitelaw et al. [28] |
Mobile data | Mobile companies | Radanliev et al. [29] |
Mobile technology | Various apps or mobile technology | Dwivedi et al. [18] |
Public transportation system | Aviation, railway and ground | Jovanović et al. [12] |
Bank card transaction | Debit and credit card transactions in automated teller machine (ATM), POS (point of sale) and online transactions | Lin and Hou [1] |
Closed-circuit camera/Security camera | Security camera of different places e.g. road, railway station, airport, land port, office and home | Whitelaw et al. [28]; Aceto et al. [30] |
Car Geographical Positioning System (GPS) | Geographical Positioning System (GPS) of car and other transport | Beaunoyer et al. [31]; Vafea et al. [10] |
Wearable tracking device | Various devices to track people’s movement | Whitelaw et al. [28] |
Big data techniques
This study also identified 10 key techniques that can usually be used for COVID-19 crisis management: modelling, machine learning, data mining, visualization, statistics, simulation, optimization, text mining, forecasting, and social network analysis (Table II) [28, 34–47]. Scholars and policy makers can easily take decisions by using these techniques after obtaining data from various sources.
Table II
Techniques | Researchers |
---|---|
Modelling | Liu et al. [34]; Whitelaw et al. [28] |
Machine learning | Sujath et al. [35]; Khanday et al. [36] |
Data mining | Kumar [37]; Benke and Benke [38] |
Visualization | Zhou et al. [39]; Preuveneers et al. [40] |
Statistics | Benke and Benke [38] |
Simulation | Nazir et al. [41]; Rahman et al. [42] |
Optimization | Ajayi et al. [43] |
Text mining | Khanday et al. [36] |
Forecasting | Shinde et al. [44]; Hu et al. [45] |
Social network analysis | Yuan et al. [46]; Rajendran et al. [47] |
Real time monitoring on COVID-19 can be done by using the analytics of big data. Traditional computational models fails to predict the trend of transmission of COVID-19, e.g. the Susceptible-Infected-Removed (SIR) model [48]. The SIR model is mainly dependent on two key assumptions, i.e. the recovered people must not face infection again and non-acceptance of time variation [49]. COVID-19 infection and transmission pattern cannot meet these conditions, which reduces the model’s compatibility for the COVID-19 pandemic. Researchers are trying to develop a big data analytics compatible model for better forecasting of COVID-19 transmission. Big data analytics can enable people to visualize its movement, which helps to take timely decisions for controlling the pandemic [50, 51]. Machine learning is a recognized tool that can help to map the transmission trend, hotspots and forecast about COVID-19. Generally, a few techniques and models of machine learning have potential for controlling COVID-19, such as the Boltzmann machine, Markov model and neural network. Data mining is also a popular approach to control the pandemic. Various countries’ governments have already adopted a data mining approach for analyzing the trend of transmission of coronavirus. Visualization is also adopted by the practitioners and policy makers for mapping the pattern of transmission, hotspots and vulnerable areas. Statistics helps to know the magnitude of infection rate, morbidity and recovery that aid in taking proper steps to fight against COVID-19. Simulation, optimization and text mining can help to forecast a possible outbreak through using big data. Forecasting provides an opportunity to take necessary measures to fight against the COVID-19 pandemic. Social network analysis is a tool to analyze the social network data to understand the real-time conditions of the extent of an outbreak of COVID-19.
Potential application of big data
This study also identified 8 key applications: infection identification, travel history, symptoms of fever, early detection, identification of transmission, ready information in a lockdown period, people’s movement, and development of treatments and vaccine (Table III) [2, 4, 10, 15, 24, 34]. It has now been proved that big data helps to diagnose and identify the infected people. Big data is also used for forecasting the outcome of the treatment, which helps to select the way of treatment [52]. For example, genome data can be used for multiple PCR (polymerase chain reactions) and obtain accurate results about the probable threat of COVID-19 in a specific area [53]. Similarly, genomic data can also be used for travelers. Traveler’s history and genome data can provide a clear picture of viral diversity that can also help to get effective treatment. This kind of data can also be used for areas that have no genomic data as a reference [54]. For example, Zhongnan Hospital, China has analyzed a data set of 11 500 people while 276 and 170 identified as suspected and infected respectively, which provided an opportunity for examination of the hematology and pathogen detection easily [55].
Table III
Applications | Description | Researcher |
---|---|---|
Infection identification | Big data helps to identify infected people and keep records for further use | Haleem et al. [2]; Vafea et al. [10] |
Travel history | Travel history helps to identify the person who come into contact with infected person and helps to control virus transmission | Petersen et al. [51] |
Symptoms of fever | Big data usually stores the data of fever and related symptoms of people so that it can help to identify the suspicious person | Ohia et al. [24] |
Early detection | Big data helps to identify possible infection trend and early detection | Liu et al. [34] |
Identification of transmission | Since COVID-19 is a rapidly transmitted disease, big data can help to identify the way and area of transmission and control it | Haleem et al. [2] |
Ready information in lockdown period | Usually people have to stay home due to lockdown, so ready information is an urgent need for them that can be provided by big data | Zwitter and Gstrein [4] |
People’s movement | People’s movement can easily be traced by using big data technology that can also help to identify possible infected area | Chen et al. [15] |
Development of treatments and vaccine | Development of treatments and vaccines can be achieved by advanced digital technologies | Zwitter and Gstrein [4] |
Various public agencies collect the daily body temperature of a large group of people and make a big dataset [56]. This fever related dataset easily helps to identify the infected and suspected cases. Big data also helps to achieve early detection through analysis of a dataset [57]. Big data also plays a vital role in identification of transmission pattern and outbreak prediction. For example, Giordano et al. [58] analyzed a dataset of the pandemic in Italy and predicted the possibility of outbreak of COVID-19 that helped to adopt a proper control strategy. Similarly, Brauer and Castillo-Chavez [59] used a model for determining the pattern of human transmission of COVID-19 using a model from a dataset of Italy and predicted successfully. This kind of model also helps to visualize the infected areas and hotspots of COVID-19 transmission. Another study was done by Strzelecki [60] on a global scope of the COVID-19 pandemic using a large dataset of Google. He applied the Google Trends tool on a dataset collected from China, Italy, South Korea and Iran, and visualized the trend of outbreaks and possible hotspots of COVID-19 outbreaks. An optimization model of data can provide accuracy of the prediction of present and future outbreaks by using a large dataset. In a lockdown period, people can easily get ready information regarding the status of the outbreak, extent of severity, exposure, sensitivity and probable control measures from the results of various analyses [61]. Big data can also track the COVID-19 transmission pattern, which is very useful for healthcare related organizations for smooth control of the crisis [1]. For example, Zhao et al. [62] used data of 854 424 air passengers of various airports in Wuhan, China during the period December 2019 to January 2020 for detection of virus transmission pattern. They applied a big data-oriented approach along with statistical models and found a highly significant relationship between population and infection cases. Big data-based modelling also helps to determine the infected people in a specific area. At this moment, vaccination for COVID-19 is an urgent issue to relieve people from the deadliest threat of COVID-19 [63]. Big data provides an opportunity to analyze a large dataset and choose a suitable element for vaccine development. Some attempts have already been made to develop a vaccine by researchers using big data technologies that help to screen effective spike sequences for successful vaccine development.
Challenges of big data implementation
A number of challenges have been identified such as expertise limitation, operation challenges, regulatory challenges, and resource limitation, limited access to the market, privacy, and ethical issues (Table IV) [2, 25, 28, 64–67]. Each major challenge has several factors such as expertise limitation caused due to lack of cooperation within staff, data analysis expertise, poor data handling, and communication ability. Similarly, operation challenges are raised due to lack of knowledge of data integration, poor support to patients, poor approachability and reluctance to work. Regulatory challenges usually happen due to poor regulatory compliance, creditability of data, required high investment and work pressure. Another challenge of big data implementation is resource limitation that is caused due to constraint of regulation, poor data access, restraints on data use and wrong diagnosis. The benefit of big data technologies can be gained by addressing the above challenges.
Table IV
Dimension of challenges | Factor | Researchers |
---|---|---|
Expertise limitation | Lack of cooperation among staff Lack of data analysis expertise Poor data handling ability Poor communication | Whitelaw et al. [28] |
Operation challenges | Lack of knowledge of data integration Poor support to patients Poor approachability Reluctance to work | Haleem et al. [2] |
Regulatory challenges | Poor regulatory compliance Creditability of data Required high investment Work pressure | Owusu [25] |
Resource limitation | Constraint of regulation Poor data access Restraints on data use Wrong diagnosis | Abusaada and Elshater [64] |
Limited access to market | Low power for decision making Scarcity of value addition Low incentives | Li et al. [65] |
Privacy | Maintaining privacy of everyone | Anisetti et al. [66] |
Ethical issues | All ethical standards should be maintained. | Ma and Tsai [67] |
The limitation of expertise is one of the top challenges regarding big data technology usage for controlling the COVID-19 pandemic. An expert team is necessary to deal with huge data analysis and interpretation. Similarly, operational challenges also create barriers to utilize the big data approach. Scarcity of error free and standard databases also create hindrances to achieve a trustworthy solution. Sometimes, errors in datasets generate results that question the reliability of the results. So, an error free and clean dataset is necessary to get accurate results. The main goal of controlling COVID-19 is to keep people safe and healthy. Simultaneously, data privacy and security are also important. But some software applications requires personal data such as location, travel trajectory, age, date of birth, citizenship identity and other issues to track and analyze the status of virus transmission [68]. The expert team should be sincere and trusted to use people’s personal information and protect privacy of data. Ethical issues should be maintained by the concerned authority during handling a large dataset.
In conclusion, the COVID-19 crisis has rapidly expanded globally due to the lack of effective control measures. Technology driven control measures can be a potential tool to control this global crisis. Application of big data for improving health related quality of life is a new paradigm of healthcare services. The healthcare industry can use the potential of big data technologies at various stages. The use of big data is considered as a transformative driver for enhancing the quality of people’s lives. This study identified the thirty-two most relevant documents for qualitative analysis. This study argues that application of big data is a key approach to analyze the virus infection trend, transmission pattern, virus association and differences of genetic modifications. It can save people from the deadliest threat of the COVID-19 pandemic though its real-time, rapidity, accuracy and cost-effective characteristics. The major sources of big data are social media, immigration and customs databases, COVID-19 database/healthcare data, mobile data, mobile technology, public transportation systems, bank card transaction, closed-circuit camera/security camera, and car GPS. The health professional can easily get data from these sources that are almost readily available, low cost, accurate and generated from usual daily affairs. This study also identified 10 key techniques that can be usually used for COVID-19 crisis management, i.e. modelling, machine learning, data mining, visualization, statistics, simulation, optimization, text mining, forecasting, and social network analysis. This study also identified 8 key applications, i.e. infection identification, travel history, symptoms of fever, early detection, identification of transmission, ready information in lockdown period, people’s movement and development of treatments and vaccine. These applications have already proved their effectiveness, accuracy, and efficiency for COVID-19 pandemic control and management. It also explores several limitations of big data usage, e.g. unethical use, privacy, and exploitative use of data. The findings of the study provide a new insight and help policy makers and administrators to develop data driven initiatives to tackle and manage the COVID-19 crisis.
Thorough systematic literature review is important for generating new knowledge and accumulating existing knowledge that can help to contribute to various disciplines. An extensive systematic review is considered as original research by many scholars. But it still has some limitations due to sole dependency on the existing literature. As a systematic review, and secondary data dependent study, this paper also has these limitations. A future study on the same issue may be carried out based on primary data for more accurate results.