G. 2051

Page 1

Global Research journal of Natural Science

& Technology (GRJNST)

Volume: 04 - Issue 2 (2026), 2051

ISSN P: 2790-7643 ISSN E: 2790-7651

www.grjnst.net

https://doi.org/10.53762/grjnst.04.02.03

Overcoming Big Data Challenges: Scalability, Quality, and Privacy in AI-

Integrated Systems

Received: 09 January 2026. Accepted: 29 January 2025. Published: 31 March 2026

Mehran M. Memon

Department of Computer Science,

DHA Suffa University, Karachi Pakistan mehran.memon@dsu.edu.pk

Syeda Tehreem Naqvi

Department of Computer Science,

DHA Suffa University, Karachi Pakistan tehreem.naqvi@dsu.edu.pk

Shahid Iqbal (Corresponding Author)

Department of Computer Engineering,

Faculty of Engineering BZU, Multan.

shahid.iqbal@bzu.edu.pk

Nimra Memon

Government Girls Degree College Nawab Shah

memonnimra03@gmail.com

Huma Jamshed

Department of Computer Science,

DHA Suffa University, Karachi Pakistan huma.jamshed@dsu.edu.pk

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use and distribution

G. 2051

Page 2

Abstract: The swift progression of sensor networks, IoT devices, and Big Data technology has changed

the way data is being managed in numerous sectors including government agencies, healthcare and

smart cities. For the emergent technological advancement, it is no longer sufficient to merely acquire

data. The real value of big data lies in using AI to analyze data instantly and generate useful insights.

When AI is used in big data technology, it creates concerns such as data scalability, data quality,

interpretability, and global data privacy regulations. To address such issues, technologies like edge

computing, federated learning, and zero-trust architecture are being cast-off. By means of an innovative

synthesis of big data architectural development, ethical data practices, and AI integration, this paper

offers a unified framework that conforms to emerging. By connecting these dimensions, the research

offers a forward-looking view on creating intelligent, adaptive, and regulation-compliant data

ecosystems.

Keywords: Big Data, 10Vs, Data Governance, Real-Time Analytics, Edge Computing, AI Ethics, Data

Privacy, Distributed System

1. Introduction

Big Data is a term that describes large and complex sets of data that cannot be processed using

conventional data processing methods [1], [2]. Data can be: structured (for example, databases or

transaction records), semi-structured (for example, XML or JSON), or unstructured (for example,

videos or social media) [3]. New storage and computing technologies have led to decreased costs and

complexity when it comes to accumulating massive amounts of data generated by edge devices/sensors,

enterprise systems, and digital platforms [4], [5]. Due to the rapid increase in the volume, speed, and

diversity of original 3V's of Big Data, data has become a highly important strategic resource for

organizations [6]. Organizations leverage Big Data across many industries including health care,

finance, manufacturing, and smart cities to gain insights and improve efficiency and to drive innovation

[7], [8]. The size and complexity of Big Data present additional challenges in contrast to those faced by

traditional systems. New ecosystems will need to manage not only the storage and processing needs of

this data but also the integration of AI for real-time analytics and sufficient governance [9]. Integrating

ML into data pipelines highlights issues with respect to data quality and bias, model transparency, and

compliance with privacy laws such as GDPR and CCPA [10],[11].

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 3

In the same manner as above-mentioned architectures, will be utilized as cloud systems or other-based

system architectures with increasing dependability and scalability [10]. Advanced technology moves

towards an enhanced level of efficiency while providing increased privacy by performing local analytic

processing capabilities that were traditionally done remotely (e.g., information is often processed and

stored at the source of an event). They will also require strong security mechanisms and encryption to

provide a safe computing experience [12],[13],[14]. Zero-trust security concepts are a strong way to

help mitigate cyber security hazards, improve data ownership and enable trust in today’s always

connected digital society [15]. This Paper discusses the evolution of Big Data management, highlights

recent developments, and identifies challenges. It provides a summary of current techniques,

frameworks, and ethical issues that will shape the future of data-centric systems. There are a number of

research works that focus on different aspects of Big Data, such as storage technologies, analytics

platforms, and management frameworks. However, there is a scarcity of review articles that bring all

these aspects together from the perspectives of AI ethics, real-time decision-making, and ever-changing

regulations. Nevertheless, this article brings all these aspects of Big Data together and provides a unique

framework for developing Big Data systems that are scalable, secure, and compliant with regulations.

The paper is divided into the following sections: Section 2 provides the historical context in which Big

Data came to be, along with the various characteristics it developed over time, eventually leading to the

10 Vs framework. Section 3 provides in-depth information about the challenges associated with Big

Data, including the conventional as well as the evolving challenges. Section 4 provides information

about the role of real-time analytics in the context of Big Data, including the role of AI in the process.

Section 5 provides in-depth information about the ethical and legal implications of Big Data, including

the various challenges it poses in the context of international regulations. Section 6 provides

information about the various architectures associated with Big Data, including the role of various

technologies such as containerization, edge computing, federated learning, zero trust, etc. Section 7

provides various case studies, including the role of Big Data in the context of various industries across

the world as well as various regional industries. Finally, Section 8 provides the future direction in which

Big Data is expected to go, including the various innovative ways in which it could be made to work

effectively.

2. Background: Emergence of Big Data

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 4

The emergence of Big Data stems from the exponential growth of digital information driven by the

rapid evolution of internet technologies, computational power, and the widespread digitization of

services [16], [17]. In the early stages, data was primarily structured and stored in relational databases

to support enterprise functions like finance, inventory, and operations. Though, with the advent of

technology in the form of digital platforms, mobile applications, sensor networks, and social media, the

amount and complexity of the data exceeded the limits that could be handled by the conventional data

systems.

As the technology used in the field of communication has evolved, the data present on the World

Wide Web started to take the form of semi-structured and unstructured [18]. This was due to the

advent of multimedia, user logs, and social interactions on the internet [19], [20]. This data deluge was

generated from multiple sources, including IoT devices, cloud platforms, edge systems, and federated

networks. At that time, systems were unable to handle the stream of such datasets [21], [22]. There was

an urgent need for scalable, secure, and intelligent systems capable of extracting timely insights from

highly heterogeneous data sources [23]. This motivated high-tech organizations like Google and

Amazon to introduce distributed frameworks such as the Google File System (GFS) and MapReduce

to manage large-scale data processing [24]. Ultimately resulted in the development of a scalable, fault-

tolerant, data processing system, Apache Hadoop, which became an ecosystem for managing structured,

unstructured and semi-structured data known as big data [25].

Big Data can be defined as a dataset which is extremely large, diverse, and continues to grow

exponentially over time [26]. Conventional data management technology fails to handle such an

immense volume, high velocity, and wide variety of data sets [27], [28], [29]. The main characteristics

of Big Data were initially defined by the 3Vs framework: Volume, Velocity, and Variety [26], [30].

With time, additional dimensions were introduced to address several concerns related to big data.

According to IDC, by the end of 2025, global data volume is expected to surpass 175 zettabytes, with

over 90% being unstructured (IDC, 2023). Therefore, the real value of Big Data in today’s world

stems not from its storage or processing capabilities but from its capacity to drive AI-powered real-time

analytics, uphold ethical governance, and guarantee data sovereignty [31], [32], [33].

3. The Evolving Dimensions of Big Data

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 5

"The fundamental characteristics of Big Data were initially defined by the 3Vs model: Volume,

Velocity, and Variety. Additional dimensions were added over time to handle arising problems

pertaining to data quality, governance, usability, and security as Big Data's complexity and adoption

grew. Today, the ten Vs framework is commonly used to characterize the term 'Big Data' because it

gathers its complexity, restrictions, and transformational potential [34], [35]. Figure 1 shows this

conceptual progression from the initial 3Vs to the more complete 10Vs model, therefore giving a

holistic perspective of the characteristics defining modern Big Data analytics.

Figure 1: 10Vs of Big Data

3.1. 3V’s: Volume, Velocity, and Variety.

By definition, big data needs to possess three defining traits. By definition, a big data needs to have the

following characteristics:

1. Volume A large amount of data can be generated by different sources (e.g., social media, sensors,

business tools). This huge volume of data is increasing rapidly, so this requires continual oversight and

handling to extract valuable insights from it.

2. Velocity The pace with which data are generated is constantly changing, so this requires an urgent

need for timely processes to support decision-making from that data (e.g., from real-time observation).

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 6

3. Variety The collection of different types of data, including structured (e.g., documents), semi-

structured (e.g., JSON), and unstructured (e.g., social media) data from multiple sources, requires the

use of specific data curation methods/techniques

3.2. 4^thV: Veracity The fourth V was introduced to address the concerns related to the reliability and

integrity of big data:

4. Veracity refers to the integrity and dependability of big data. This characteristic handles issues

related to data inconsistency, incompleteness, bias, and ambiguity.

th

3.3. 5 V: Value

The significance of big data was derived by adding the fifth characteristic, called Value

5. Value related to the usefulness of data collected from a variety of sources. High-value data has the

potential to transform operations, drive innovation, and provide a competitive edge.

th

3.4. 6 and 7 V: Validity, Volatility

To further refine the big data characteristics, two more components, Validity and Volatility, were

added, forming the 7Vs framework

6. Validity refers to data that is accurate, valid, and suitable for the purpose for which it is intended.

An example of a valid data set is a data set that has a good veracity (validity) would still not be valid if

it was outdated or irrelevant.

7. Volatility characteristic of the data's flow is the time span and stability of that flow; some data can

have a very long shelf life and remain relevant for many years, while other data can have a very short

shelf life and quickly become irrelevant. The life cycle will help determine whether you keep or discard

the particular data.

th

3.5. 8 till 10 V: Variability, Visualization, Vulnerability

Recently, Big Data has been evolving to include three new "V's” or Variable Characteristics. This has

driven the creation of V's for total of ten V's,

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 7

8. Variability describes how the different data flows have varying values over time or have value at

different times. Variability creates operational difficulties when attempting to analyze a data set that

has had shifts in its definition and/or form of use.

9. Visualization focuses on data representation in a graphical form for better understanding,

exploration, and communication of insights.

10. Vulnerability relates to the importance of data privacy and security when handling sensitive or

personally identifiable information. It should align with legal standards and user trust.

4. Traditional and Emerging Challenges in Big Data

4.1. Traditional Concerns: Storage, Processing, and Data Integration

The shift from conventional systems to the Big Data ecosystem has resulted in various challenges in the

context of data storage, processing, and integration. As mentioned in Figure 2, the growth in the 3 V's

of Big Data, namely Volume, Velocity, and Variety, has shown the shortcomings of conventional data

handling mechanisms [36]. Conventional data handling mechanisms are not able to scale effectively,

thereby showing bottlenecks in the context of storage, latency, as well as operational costs [36], [37],

[38]. With the evolution of the Big Data ecosystem, the need for the development of scalable, fault-

tolerant, as well as cost-effective mechanisms has shown the shortcomings of conventional data

handling mechanisms. In addition, the need for real-time analysis has shown the shortcomings of

conventional batch processing-based systems, namely Hadoop MapReduce, which, despite being able

to process big data, are not able to provide the required real-time environment for making decisions

[39], [40]. As a result, the need for distributed stream processing-based systems has shown the

prominence of Apache Spark as a solution in the context of real-time data environments. However, the

heterogeneous nature of the data has shown the shortcomings in the context of data integration [41].

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 8

Figure 2. Traditional Concerns: Storage, Processing, and Data Integration

Table 1: Traditional Data Management Concerns and Evolving Needs

Category

Challenges

Modern Needs

● Limited scalability with volume growth

● Monolithic architecture

● Cost-effective capacity expansion

● Fault tolerance

Storage

● High costs of expanding capacity

● Distributed stream processing

● In-memory computing

● Problem in meeting real-time needs

Processing

Integration

● Latency and throughput limitations

● Complex resource orchestration

● Advanced ETL pipelines

● Diverse formats and data sources

● Data cleansing and normalization

● Unified, seamless analytics

● Data silos from legacy systems

4.1.1 Data Storage: Evolution and Integration

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 9

Evolution and consolidation in the early years, Big Data projects mostly depended on traditional data

warehousing technologies and aimed at data storage and management of massive volumes of data. Early

in data gathering, these solutions worked well; they were designed for structured data. The constraints

of data warehouses, particularly in relation to not being able to scale out and manage

unstructured/semi-structured data, were becoming more apparent with variations in data volume,

velocity, and variety. To get beyond such constraints, companies began using distributed storage

solutions, which proved to be useful in fault-tolerant, scalable, and flexible data processing. This was

necessary not only to handle the exponential rise in data volume but also to allow real-time data access

and analysis throughout spatially distributed systems. Abstraction of several actual storage devices into

one single logical resource has depended on storage virtualization. Through centralized control and

decentralized access, this enabled better data transparency and operational efficiency. Although these

advances were being made, problems in data security, data integrity, and data latency were being

experienced in relation to cloud and hybrid storage systems. Rising solutions for such are new

technologies.[42].

4.1.2 Data Processing: From Traditional to Distributed Approaches

From Centralized to Distributed, one of the significant changes in data handling and use is the shift

from centralized to distributed data processing systems. Cloud computing (distributed model) and

cloud-based processing (in conjunction with edge processing) create new efficiencies through reduced

energy usage and changed cloud computing facial data management provisions. Characteristics that

define distributed compute systems; the ability to scale out, to always be available, high-performance,

and at a relatively low cost will create non-replaceable systems for modern application. Real-time

processing of data within the system stacks makes systems more responsive; Improved efficiency via

pre-filtering of data from data storage nodes through either the use of FPGA-based network interface

cards with query filters or others, reduces bandwidth consumption and increases speed when

transmitting to large data and processing locations [43].

4.1.3 Data Integration: Methods and Challenges

The foundation of integrating different datasets from multiple sources lies within the ETL process;

however, many of the older ETL approaches can’t accommodate large amounts of data because of their

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 10

limitations concerning scalability, flexibility, and the difficulties associated with dealing with

unstructured or semi-structured datasets. The integration and query of these multiple repositories of

data require the use of virtual repositories and standard interfaces now more than ever, especially with

the increased volume of data. Additionally, at run time, adding new data to these repositories consumes

extensive amounts of time, resulting in both delays in data processing and potential data loss as well as

ineffective utilization of currently available data. Virtualization, distributed computing, shared data

models and advanced security measures are all being employed to overcome these storage, processing

and integration challenges of data. New technologies such as blockchain, edge computing and

homomorphic encryption are developing avenues to provide complex solutions to these new data-

intensive applications [44],[45]. The issues of data silos, data quality and data heterogeneity extend far

beyond just the technology that supports data systems as shown in Figure 3. The integration,

management and enrichment of both compact and varied data become critical to achieving and

sustaining competitive advantage by leveraging unified data for organizations and industries. The

difficulties in addressing this issue will continue to inhibit innovation, commerce and most important,

decision-making [46],[47].

4.2. Beyond Infrastructure: Data Silos, Quality, and Heterogeneity

As shown in Figure 3, the problems with data silos, data quality, and data heterogeneity refer to much

more than the technological infrastructure of data systems [48]. With growing reliance on

sophisticated and large sources and several types of data, it becomes more imperative to combine,

administer, and create value from many and irregularly arranged data. The efficacy of decision-making,

the optimizing of processes, and the achieving of innovation [49] all depend on these factors.

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 11

Figure 3: Data Management Challenges Beyond the 10Vs

4.2.1 Data Silos: Integration and Interoperability

Isolated from other departments or systems, a silo is a repository of data under the control of a single

unit or department inside a business. Usually held in distinct systems, silos data often has trouble with

other data sets, therefore limiting integration, cooperation, and whole analysis. This split presents a

major barrier for integrated analytics, therefore preventing the company from making wise, data-driven

decisions.

Organizations and institutions have widely adopted Digital Twin (DT) technologies to address the

problems previously discussed as they enable integrated data management and provide quick

operational insights via seamless connection across several systems, hence supporting strategic decision-

making processes [50], [51]. The Fair Digital Objects (FAIR DOs) framework will do away with the

many instances of independent data sources by integrating the principles of Accessibility, Findability,

Interoperability, and Reusability, which will provide the ability/means to provide much-needed

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 12

structured data, standardized data formats, and machine-readable data that can facilitate the discovery,

sharing, and re-use of data within multiple domains and/or technology platforms.

Industry-recognized open standards, such as Building Information Modeling (BIM), Geographic

Information Systems (GIS), and Industry Foundation Classes (IFC), are being adopted by multiple

industries. Adoption of these types of standards by an organization will result in organizations being

able to create much more standardized automated digital environments that are adaptable to

changes/chaos, maintain data integrity, and allow for multiple software applications/systems to work

in an integrated manner therefore solving many of the problems associated with chaotic information

systems.

4.2.2 Data Quality: Design, Dynamics, and Value

Data quality is becoming one of the most important factors in measuring an organization's overall

success due to the growing number of sources producing various types of data streams through big data

and how difficult it can be to manage all those data streams because many data feeds such as social

media posts and low-cost IoT sensors can contain noise, are frequently refreshed, and can also be

unreliable; therefore, organizations are today's data sources face many challenges in managing

information as an organization since organizations have a responsibility to maintain the integrity of

their organization by maintaining high-quality data about all their items regardless of how large or

small that item may be. To overcome the many issues associated with managing data effectively

organizations require careful planning and implementation of data cleansing and validation procedures

that will minimize the impact of data inconsistencies on decision-making; data cleansing and validation

procedures can be supported by the use of adaptive algorithms. Some of the most important aspects of

establishing trust and having effective decision-making capability are based on the data quality

attributes of provenance and clarity; achieving these qualities provides organizations with the necessary

means and tools to unlock valuable insights and obtain competitive advantages in fast-paced data

environments.

4.3. Heterogeneity: Complexity and Methodological Considerations

Data infrastructures are usually very heterogenous and heavily use a number of tools and technology

along with varying types of data. While evaluating such infrastructures, there is a requirement to

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 13

consider their interrelated, dynamic, and relational nature by employing analysis techniques in order to

assess them in a more objective manner. The characteristics of such infrastructure are complex activity,

such as data integration, selection of methods, and preservation of semantic properties, necessitating

construction of flexible, multilayered models that can address such issues in real terms.

Table 2 provides a summary of key issues with data silos, data quality, and heterogeneity, along with

solutions that are being suggested and benefits that accrue subsequently. Data silos, quality issues, and

data heterogeneity are not only technical issues, but also strategic necessities in more than one industry.

Handling them in a proper manner makes systems more integrated, agile, and smarter, enabling better-

informed decision-making and better operational excellence. The impact of such improvements

manifest in key areas, in which better data integration and management translate into tangible gains.

Table 2 outlines the challenges, with their solutions linked to advantages for every trade-off. Moreover,

Table 3 provides insight into the application domain where addressing these problems leads to better

results.

Table 2. Challenges, Solutions, and Benefits Related to Data Silos, Quality, and Heterogeneity

Data Category

Silos

Challenges

Solutions

Benefits

● Isolation

● Digital Twins

● Unified views

● Fragmented systems ● FAIR Digital Objects (DOs) ● Strategic decisions

● Inconsistency

● Inaccuracy

● Socio-technical alignment

● Quality-by-design

● Replicability

● Reliable analytics

Quality

● Informed decisions

● Unreliability

● Diverse data types

● Adaptive frameworks

● Scalable analytics

Heterogeneity

● Contextual diversity ● Multi-layer modeling

● Complex data fusion

Table 3. Application-Specific Benefits of Resolving Data Silos, Quality, and Heterogeneity

Application Area

Key Benefits

Supply Chain Management

Integrated insights into new product development and strategic

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 14

planning

Civil Infrastructure Management

Research Data Infrastructures

Urban and Road Systems

Improved predictive maintenance and asset lifecycle efficiency

Cross-disciplinary data sharing and accelerated scientific discovery

Better segmentation and management of physical infrastructure

5. Real-Time Analytics and AI-Driven Decision Making

5.1. Issues with real-time analytics in big data

Real-time analytics is the constant collecting, processing, and analysis of data via efficient, low-latency

pipelines. In fields like predictive maintenance, fraud detection, autonomous systems, and tailored

marketing where even a split-second delay might cause financial loss [52] this ability especially shines.

Real-time systems must analyze high-speed data streams generated from a variety of sources including

IoT sensors, social media channels, transactional databases, and other distributed systems to meet these

standards. Strong streaming infrastructures able to handle scalable, fault-tolerant, low-latency

processing in continuous data pipelines including Apache Kafka, Apache Flink, and Apache Spark

Streaming are needed for such systems. Though including real-time streaming technology into business

ecosystems has significant drawbacks. In a variety of contexts [53], [54], companies have to handle data

governance problems including ensuring quality, consistency, and compliance as well as the technical

complexity of growing and harmonizing data flows. In addition, real-time big data analysis using

multiple IoT and edge devices raises issues regarding inconsistencies in how the data is formatted,

insufficient standardized metadata available, and issues related to context incompatibility. Most

information systems using ontology-based or semantic modelling paradigms frequently employ similar

solutions to address these issues. The key building blocks, enabling technologies, and architectural

components for real-time big data analysis are illustrated in Table 4 and Figure 4 and can serve as

insights for constructing robust and responsive data infrastructures by companies.

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 15

Figure 4. Key Components of Real-Time Analytics in Big Data Systems

Table 4: Key components of real-time analytics architecture, supporting continuous data flow and

timely insight generation

Component

Description

Example Technologies

Data Ingestion

Continuous collection of streaming data

Apache Kafka, MQTT

Stream Processing

Data Storage

Real-time data transformation and analysis

Low-latency storage for processed data

Apache Flink, Spark Streaming

Apache Cassandra, HBase

Real-time dashboards and notification

systems

Visualization & Alerts

Grafana, Kibana

5.2. Integration of AI into Data Pipelines

The integration of Artificial Intelligence (AI) into Big Data environments has significantly enhanced

decision-making through the implementation of predictive analytics, anomaly detection, and automated

insights. Machine Learning (ML) models exploit huge and diverse datasets to identify patterns and

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 16

generate findings relevant to business objectives. However, implementing these models in commercial

systems presents several challenges. Primarily, training models on high-volume, high-dimensional, and

heterogeneous data is hard and requires significant preprocessing and adjustment. Secondly, real-time

inference and continuous update models for adjusting to changing data patterns are computationally

challenging and expensive.

Likewise, model explainability and transparency are significant to satisfy regulatory requirements and

build user confidence [55], [56]. As models excel in capturing complicated patterns in real-time data,

their internal decision logic becomes less comprehensible, posing safety concerns in regulated

organizations. When combined with ML, distributed big data sets become complex, imposing the

usage of containerization, microservices, and ML Ops frameworks to aid with lifecycle management,

scalability, and continuous monitoring [57]. Balancing high prediction accuracy with interpretability is

still an issue, specifically when AI models are added into real-time data pipelines with speed and

accountability.

5.3. Trust and Accountability in AI Decision Systems

Bias in ML algorithms can lead to the generation of inaccurate predictions. The issue is that the data

used for training these models often does not adequately represent the entire spectrum of people

present in the real world due to issues such as biased sampling, systemic social inequities, or human

bias. To ensure ML systems are accountable, it's necessary to implement a set of procedures to

guarantee that ML models are audited regularly and comprehensively throughout the entire

development life cycle of the ML model. In addition, being able to understand how and why an AI

model arrived at its output, also known as explainability, contributes greatly toward gaining user trust,

fostering transparency, and ensuring accountability of the AI model. To achieve fairness and

compliance with regulations, many times statistical tools known as SHAP and LIME are used to

explain what an AI model did. These types of tools increase the transparency of the AI system, allowing

users to identify potential sources of error and bias prior to their escalation. Additionally, to comply

with applicable international data protection laws, AI systems must be designed with respect to user

privacy, by utilizing technologies such as differential privacy, federated learning, and secure multi-party

computations [58], [59].

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 17

6. Data Governance and Regulatory Compliance

6.1. Addressing Challenges in Privacy, Security, and Ethical Data Use

Organizations have to be responsible for the use of huge data as it becomes more widely used across

several industries. Large data volumes Data management now depends on issues of privacy, user

agreement, and ethical responsibility among others. Traditional techniques of data management are

least effective in adapting to the rapid changing nature of data collecting and use. This highlights the

urgent need for flexible and reactive policies able to adapt with these constant changes [60], [61].

Three major issues arise when handling modern data: confidentiality, security, and ethics. Since data

from healthcare systems, social media, and IoT devices sometimes include sensitive personal data,

privacy becomes especially important. To safeguard this data while yet enabling its careful usage,

methods such anonymization, differential privacy, and safe data-sharing systems are absolutely essential.

Due to the way in which this system is decentralized, it adds complexity to ensuring the secure

protection of data. This is because there are multiple points of access (from different sources) and

multiple platforms connected to the same data. As the number of points of access increases, so does the

potential for unauthorized access to that data — resulting in data breaches that have significant

financial, legal and reputational consequences. To mitigate these risks, organizations must build an

effective security framework, which consists of security measures like strong access controls, as well as

ongoing access monitoring. In addition, organizations should responsibly manage their data; when an

organization decides to use data to make decisions, it should ensure that the decision-making process is

transparent, equitable and ethical. Participating in this way demonstrates an organization’s commitment

to responsible data use and builds trust from all parties involved.

6.2. Observance of International Privacy Regulations

The primary challenges to scaling data for regulatory oversight derive from its sheer size, complexity

and velocity. The characteristics prevent regulators from effectively monitoring the use of data across

jurisdictions, resulting in the need for greater regulatory structures for the protection of privacy. In

particular, the CCPA and (GDPR) give more rights to individuals over their data and obligates

organizations to utilize data minimization, transparency, obtaining consent, access to data, and

reporting of breaches. However, compliance remains a challenge for large multinationals. Big data

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 18

systems are constantly ingesting data from various sources and complying with regulations which have

very different requirements from nation-to-nation including consent, storage and audit requirements.

Organizations are responding to the challenges by developing flexible data governance frameworks that

adhere to uniform policies when operating in different nations. Emerging technology will also offer

exciting possibilities. Federated learning allows for collaborative analysis whilst maintaining locality of

data and blockchain allows for an immutable security record for compliance verification. Organizations

will need to embrace an adaptive governance framework and use privacy preserving technologies to be

successful in the active digital operational environment by balancing the utility of the data treatment

and the compliance requirements for future digital development.

6.3 Ethical frameworks and data sovereignty

Data superiority is a key strategy in data management that focuses on making sure the data complies

with the laws of the location where it is collected. Companies operating in different countries often

grapple with the nuances of each jurisdiction's privacy legislation. A robustly structured, policy-based

data governance framework is important to ensure legal compliance while also improving the efficiency,

security, and accessibility of worldwide data systems.

Ethical data governance frameworks are fundamental to the individual and collective responsible and

ethical use of data and AI. Such frameworks should be built on fundamentally accepted principles, such

as transparency, informed consent, equity, accountability, and reducing bias. It is critical to involve all

relevant stakeholders, especially when mental models in AI are automating decisions with minimal

human oversight and ethical judgments. This collaborative approach helps organizations to create clear

guidelines around the implementation of AI and to routinely assess potential risk to human morals

from AI decisions. An appropriate governance model consists of three greater pillars: compliance to

legal regulations to protect data; fiduciary governance addressing the ethics, security, privacy, and

sovereignty; and lastly, an ethical framework supporting the responsible use of AI and data governance.

Figure 5 portrays an example of a conceptual framework with interconnected principles: compliance

with legal regulations for data protection. Fiduciary governance addressing ethics, security, privacy, and

sovereignty. An ethical framework including responsible use of data, and ethical and responsible use of

AI.

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 19

Figure 5: Ethical Data Governance and Compliance

7. Infrastructure and Architecture for Big Data

7.1. Shifts Towards Cloud-Native and Distributed Architectures

Cloud-native frameworks have emerged as a revolutionary response to tackle the issues related to big

data. Cloud computing technology allows for scalable resources on demand, ensuring fault tolerance

and high availability. Seamless integration with container orchestration tools ensures high performance

and flexibility. Modern analytics and AI dependent applications rely on both batch processing and

stream processing (hybrid) frameworks as critical components of their architectures. The hybrid cloud

allows workloads to be run between on-premise hybrid clouds and public clouds. This leads to

increases in both cost & performance benefits while also addressing concerns related to the privacy of

data and data sovereignty compliance. While the above architectural advances provide the benefit of

hybrid cloud infrastructures, new technologies, like federated learning come with the theoretical ability

to preserve privacy, are still in the early stages of their development and adoption. Challenges to

widespread adoption are found in model convergence, multiple device coordination and

communication to all devices.

7.2. Needs for Dynamic Scalability, Low Latency, and High Availability

Modern Big Data systems focus on availability. They ensure continuous data processing and analytics.

Applications like data replication, failover systems, and consensus distribution protocols, such as Paxos

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 20

and Raft, help minimize downtime and improve failure tolerance [62], [63]. Minimal delay is crucial

for real-time analytics and AI-driven applications that need quick, accurate results. Technologies such

as in-memory data grids, edge caching, and simplified abstracted pipelines eliminate latency and provide

pathways to immediate insight. Auto-scaling on cloud platforms allows for flexible scalability of

systems by automatically adjusting resources based on fluctuating workloads [64].

7.3. New Trends: Federated Learning, Edge Computing, and Zero Trust Architectures

To avoid data dissipation via the network, edge computing enables processing at or near the source,

which lowers latency, reduces bandwidth and ultimately, privacy is enhanced. This method promotes

autonomous systems, industrial automation, or IoT [64]. Federated learning trains ML models using

decentralized data sources without exchanging raw data, ensuring data privacy and GDPR compliance

[64]. The essential idea of Zero Trust Architectures (ZTA) is 'never trust, always verify,' which

requires severe identity checks, limited access permissions, and continuous monitoring of the private

network. Within the scope of decentralized Big Data systems, it is imperative to ensure the

confidentiality of sensitive information in order to combat advanced cyber threats, such as insider risks,

and it is essential to implement adequate security measures. Table 5 shows the components, their

respective technology, and benefits.

Table 5: Paradigms with example technologies and associated benefits

Component/Paradigm

Description

Benefits

Example Technologies

Cloud-Native

Architectures

Modular, containerized

systems for scalability

Decentralized data

Flexibility, rapid

deployment

Kubernetes, Docker

Scalability, fault

tolerance

Distributed Architectures

Edge Computing

Data lakes, data mesh

ownership and processing

Localized data processing

near data sources

Low latency,

AWS IoT Greengrass,

Azure IoT Edge

bandwidth savings

Privacy preservation,

compliance

Collaborative ML training

without raw data sharing

Continuous verification

and least privilege access

Federated Learning

TensorFlow Federated

Enhanced security,

breach mitigation

BeyondCorp, Palo

Alto Networks

Zero Trust Security

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 21

8. Case Studies and Industry Examples

8.1 Successful Big Data Implementations Across Industries

It is evident that several industries have commenced the implementation of Big Data solutions, with the

establishment of robust systems being a key component of this process. These methods facilitate

handling substantial volumes of information, preserving confidentiality, and processing data as it is

received in real-time. Let's discuss healthcare first. Hospitals are now utilizing real-time data analysis in

conjunction with AI to identify diseases early, develop treatment plans that are tailored to each patient,

and maintain track of patients. The financial world has also adopted this trend significantly. It is

also visible that major banking organizations like JP Morgan Chase have begun to implement cloud

technologies along with zero-trust security frameworks. These tools allow them to handle millions of

transactions each day without compromising security. These systems are created to identify any

questionable or deceptive activities in real-time, thus helping to protect both the bank and its clients

from financial wrongdoing.

Manufacturing firms are part of this initiative as they consistently oversee their production machinery

and equipment to ensure smooth operations. The Predix platform created by General Electric

represents how edge computing could lead to remarkable changes. Data processing takes place where

the data is created, allowing the machines to work efficiently, reducing repair times, and even providing

the ability to predict when interventions may be needed before problems arise. What I find most

interesting about these scenarios is how they illustrate that when organizations invest in appropriate

data management systems and capably manage them, not only do they make their operational processes

more efficient and effective, but they also actually improve how they identify and manage risk, identify

valuable insights, and make informed business decisions. The combination of extraordinary technology

and effective organizational systems is proving to be a successful formula for varied industries, and I

expect to see this trend grow as more organizations understand the need for a complete Big Data

platform and strategy as part of their operations.

8.2. Regional Focus: Big Data Governance in Pakistan

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 22

The use of big data in Pakistan is growing across several industries including banking, government

services, and telecommunication. But the nation still has major difficulties with effective data

management, suitable technological infrastructure, and safeguarding people's data. The country has

lately shown some forward movement in suggesting data privacy policies matching the EU GDPR.

Pakistan should be compliant with international data standards, to promote data responsibility, and to

let the country to have more control of its data is the target. This gives the area a solid basis for a more

secure and robust big data ecosystem.

For big data to be truly effective in Pakistan, three key things must occur: companies need to choose

the right technology that matches their goals, they must consider ethical guidelines from the start when

designing their systems, and they must build infrastructure that can change as new regulations are

implemented. Countries like Pakistan that are still developing their economies need to focus on

creating their own policy rules and building up their people's skills to make sure big data adoption can

last for the long term and benefit everyone.

9. Future Directions and Innovative Solutions

9.1. Advances in AI Explainability, Privacy-Preserving AI, Adaptive and Resilient Infrastructure

ML chine learning models are expected to become increasingly complicated soon. As a result,

establishing such systems' interpretability is vital for compliance with regulations as well as gaining user

trust. The need has led to the development of Explainable AI (XAI) approaches that provide better

insight into model decision-making. [65] suggest employing homomorphic encryption and federated

learning to train collaborative models without revealing raw data. The issues surrounding secretive data

ownership are addressed here. In addition, it is anticipated that AI automation will be used and

therefore, adaptive scalability and resource optimization must be achievable with such systems to

improve response times and the reliability of those systems. The ability to distribute processing of data

across many levels through edge computing architectures may dramatically enhance latency and the

ability to recover from failures in real time. However, aside from these benefits; some of the key

challenges in utilizing XAI will still persist – including the lack of a coined XAI approach,

inconsistencies in model behavior in federated environments, and establishing strong recovery

capabilities without compromising performance. Future Research should concentrate on developing

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 23

interoperable governance frameworks, adaptive/ethical AI models, and strong edge/cloud coordination

protocols to facilitate the coming generation of intelligent, effective and trustworthy Big Data systems.

9.2. Frameworks for Continuous Improvement

Governance is expected to be a crucial element in the creation of AI-powered systems. Incorporating

main ethical ideas, these technologies should help to satisfy compliance with evolving regulations, and

provide early risk detection via real-time monitoring and automatic policy processes. Openness,

accountability, and fairness throughout the AI life cycle in big data platforms can be helped by a good

governance model. Responsible, scalable AI systems needs a mix of infrastructure resilience, constant

monitoring, and privacy-conscious explainability. Figure 6 shows how these three concepts converge.

The course of Big Data and AI innovation depends on three elements together.

Figure 6: Innovative Solutions & Future Directions for Big Data Systems

Conclusion

The research has shown and explored the development of Big Data Management and the continuous

development of regulations, ethical guidelines, and technical advancements. Overall, the research has

highlighted the challenges associated with the anticipated increase in data volume, velocity, and variety

that will accompany the additional 10 Vs of Big Data, which will create the demand for real-time data

analytics through AI-based decision-making. Building trust in the data ecosystem is achieved through a

combination of the technical sophistication of systems, the ethical use of data, and strict compliance

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 24

with data protection and privacy laws globally. Moreover, the shift to cloud-based, distributed, and

edge-based systems using a zero-trust model will create the resilient architecture needed to support this

vision. In the future, the development of Intelligent Data Systems will be driven by the advancements

of Explainable AI (XAI), Privacy Preserving Technologies (PPT), and Adaptable/Autonomous Self-

Healing Infrastructure (ASHI).

The key message for all audiences is that unless the potential of Big Data can be realized, the unique

development opportunities that are present will not be available through a unified strategy that

integrates cutting-edge technology, ethical governance, and continuous oversight to build intelligent,

secure, and ready-to-use data ecosystems.

References

[1]

[2]

[3]

[4]

[5]

X. Han, O. J. Gstrein, and V. Andrikopoulos, “When we talk about Big Data, What do we really

mean? Toward a more precise definition of Big Data,” Front Big Data, vol. 7, p. 1441869, Sep. 2024,

doi: 10.3389/FDATA.2024.1441869/BIBTEX.

F. Marozzo and D. Talia, “Perspectives on Big Data, Cloud-Based Data Analysis and Machine

Learning Systems,” Big Data and Cognitive Computing 2023, Vol. 7, Page 104, vol. 7, no. 2, p. 104,

May 2023, doi: 10.3390/BDCC7020104.

W.-C. Tan, “Unstructured and structured data: Can we have the best of both worlds with large

language

models?,”

Apr.

2023,

Accessed:

Jun.

24,

2025.

[Online].

Available:

https://arxiv.org/pdf/2304.13010

G. Jeon, M. Albertini, V. Bellandi, and A. Chehri, “Intelligent mobile edge computing for IoT big

data,” Complex and Intelligent Systems, vol. 8, no. 5, pp. 3595–3601, Oct. 2022, doi:

10.1007/S40747-022-00821-7/METRICS.

K. Saini, U. Pandey, and P. Raj, “Edge computing challenges and concerns,” Advances in Computers,

vol. 127, pp. 259–278, Jan. 2022, doi: 10.1016/BS.ADCOM.2022.02.006.

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 25

[6]

M. Shahnawaz and M. Kumar, “A Comprehensive Survey on Big Data Analytics: Characteristics, Tools

and Techniques,” ACM Comput Surv, vol. 57, no. 8, pp. 1–33, Mar. 2025, doi:

10.1145/3718364/ASSET/B9D139A2-5843-4293-9D94-

C8008F21467E/ASSETS/IMAGES/LARGE/CSUR-2023-0936-F07.JPG.

[7]

[8]

[9]

H. Liu et al., “An historical overview of artificial intelligence for diagnosis of major depressive

disorder,”

Front

Psychiatry,

vol.

15,

p.

1417253,

Nov.

2024,

doi:

10.3389/FPSYT.2024.1417253/BIBTEX.

S. K. Sharma, A. I. Alutaibi, A. R. Khan, G. G. Tejani, F. Ahmad, and S. J. Mousavirad, “Early

detection of mental health disorders using machine learning models using behavioral and voice data

analysis,” Sci Rep, vol. 15, no. 1, p. 16518, Dec. 2025, doi: 10.1038/S41598-025-00386-8.

K. Mao, Y. Wu, and J. Chen, “A systematic review on automated clinical depression diagnosis,” npj

Mental Health Research 2023 2:1, vol. 2, no. 1, pp. 1–17, Nov. 2023, doi: 10.1038/s44184-023-

00040-z.

[10]

[11]

V. Patel et al., “The Lancet Commission on global mental health and sustainable development,” The

Lancet, vol. 392, no. 10157, pp. 1553–1598, Oct. 2018, doi: 10.1016/S0140-6736(18)31612-X.

P. Cruz-Gonzalez et al., “Artificial intelligence in mental health care: A systematic review of diagnosis,

monitoring,

and

intervention

applications,”

Psychol

Med,

vol.

55, Feb.

2025,

doi:

10.1017/S0033291724003295,.

[12]

[13]

[14]

L. Albshaier, S. Almarri, and A. Albuali, “Federated Learning for Cloud and Edge Security: A

Systematic Review of Challenges and AI Opportunities,” Electronics (Switzerland), vol. 14, no. 5, p.

1019, Mar. 2025, doi: 10.3390/ELECTRONICS14051019/S1.

G. K. Mahato, A. Banerjee, S. K. Chakraborty, and X. Z. Gao, “Privacy preserving verifiable federated

learning scheme using blockchain and homomorphic encryption,” Appl Soft Comput, vol. 167, p.

112405, Dec. 2024, doi: 10.1016/J.ASOC.2024.112405.

R. H. Alamir, A. Noor, H. Almukhalfi, R. Almukhlifi, and T. H. Noor, “SecFedDNN: A Secure

Federated Deep Learning Framework for Edge–Cloud Environments,” Systems 2025, Vol. 13, Page

463, vol. 13, no. 6, p. 463, Jun. 2025, doi: 10.3390/SYSTEMS13060463.

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 26

[15]

A. Rahman et al., “Machine Learning-Based Prediction of Mental Well-Being Using Health Behavior

Data from University Students,” Bioengineering 2023, Vol. 10, Page 575, vol. 10, no. 5, p. 575, May

2023, doi: 10.3390/BIOENGINEERING10050575.

[16]

[17]

Y. Kumar, J. Marchena, A. H. Awlla, J. J. Li, and H. B. Abdalla, “The AI-Powered Evolution of Big

Data,” Applied Sciences 2024, Vol. 14, Page 10176, vol. 14, no. 22, p. 10176, Nov. 2024, doi:

10.3390/APP142210176.

C. Jin, A. Xu, Y. Zhu, and J. Li, “Technology growth in the digital age: Evidence from China,”

Technol

Forecast

Soc

Change,

vol.

187,

p.

122221,

Feb.

2023,

doi:

10.1016/J.TECHFORE.2022.122221.

[18]

[19]

[20]

K. Guo, D. Diefenbach, A. Gourru, and C. Gravier, “Wikidata as a seed for Web Extraction,”

Proceedings of the ACM Web Conference 2023, vol. 1, p. 10, Jan. 2024, doi: 10.1145/3543507.

E. Olshannikova, T. Olsson, J. Huhtamäki, and H. Kärkkäinen, “Cenceptualizing Big Social Data,” J

Big Data, vol. 4, no. 1, pp. 1–19, Dec. 2017, doi: 10.1186/S40537-017-0063-X/TABLES/2.

S. Bazzaz Abkenar, M. Haghi Kashani, E. Mahdipour, and S. M. Jameii, “Big data analytics meets

social media: A systematic review of techniques, open issues, and future directions,” Telematics and

Informatics, vol. 57, p. 101517, Mar. 2020, doi: 10.1016/J.TELE.2020.101517.

[21]

[22]

[23]

F. R. Mughal et al., “Adaptive federated learning for resource-constrained IoT devices through edge

intelligence and multi-edge clustering,” Sci Rep, vol. 14, no. 1, p. 28746, Dec. 2024, doi:

10.1038/S41598-024-78239-Z.

C. Prigent, A. Costan, G. Antoniu, and L. Cudennec, “Enabling federated learning across the

computing continuum: Systems, challenges and future directions,” Future Generation Computer

Systems, vol. 160, pp. 767–783, Nov. 2024, doi: 10.1016/J.FUTURE.2024.06.043.

A. / Ml, O. Feature, G. Al, and B. I. Tools, “Big Data Architecture for Large Organizations,” May

2025, Accessed: Jun. 24, 2025. [Online]. Available: https://arxiv.org/pdf/2505.04717

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 27

[24]

Y. Chen, S. Alspaugh, and R. Katz, “Interactive Analytical Processing in Big Data Systems: A Cross-

Industry Study of MapReduce Workloads,” Proceedings of the VLDB Endowment, vol. 5, no. 12, pp.

1802–1813, Aug. 2012, doi: 10.14778/2367502.2367519.

[25]

I. Polato, R. Ré, A. Goldman, and F. Kon, “A comprehensive view of Hadoop research—A systematic

literature review,” Journal of Network and Computer Applications, vol. 46, pp. 1–25, Nov. 2014, doi:

10.1016/J.JNCA.2014.07.022.

[26]

[27]

R. A. de Oliveira and M. H. J. Bollen, “Deep learning for power quality,” Electric Power Systems

Research, vol. 214, Jan. 2023, doi: 10.1016/j.epsr.2022.108887.

J. Moorthy et al., “Big Data: Prospects and Challenges,” Vikalpa, vol. 40, no. 1, pp. 74–96, Mar. 2015,

doi:

10.1177/0256090915575450/ASSET/54AE76D2-61E0-4F0B-B896-

71DE25358524/ASSETS/IMAGES/LARGE/10.1177_0256090915575450-FIG5.JPG.

[28]

[29]

T. P. Raptis, A. Passarella, and M. Conti, “Data Management in Industry 4.0: State of the Art and

Open

Challenges,”

IEEE

Access,

vol.

7,

pp.

97052–97093,

May

2019,

doi:

10.1109/ACCESS.2019.2929296.

N. Freitas, A. D. Rocha, and J. Barata, “Data management in industry: concepts, systematic review and

future directions,” Journal of Intelligent Manufacturing 2025, pp. 1–29, Feb. 2025, doi:

10.1007/S10845-025-02570-Z.

[30]

[31]

[32]

I. Lee, “Big data: Dimensions, evolution, impacts, and challenges,” Bus Horiz, vol. 60, no. 3, pp. 293–

303, May 2017, doi: 10.1016/J.BUSHOR.2017.01.004.

A. Li, “AI-Driven Big Data Analytics: Scalable Architectures and Real-Time Processing,” 2025.

[Online]. Available: https://pinnaclepubs.com/index.php/EJACI

Y. Kumar, J. Marchena, A. H. Awlla, J. J. Li, and H. B. Abdalla, “The AI-Powered Evolution of Big

Data,” Applied Sciences 2024, Vol. 14, Page 10176, vol. 14, no. 22, p. 10176, Nov. 2024, doi:

10.3390/APP142210176.

[33]

F. von Scherenberg, M. Hellmeier, and B. Otto, “Data Sovereignty in Information Systems,” Electronic

Markets, vol. 34, no. 1, pp. 1–11, Dec. 2024, doi: 10.1007/S12525-024-00693-4/TABLES/4.

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 28

[34]

Z. Sun, K. Strang, and R. Li, “Big data with ten big characteristics,” ACM International Conference

Proceeding Series, pp. 56–61, Oct. 2018, doi: 10.1145/3291801.3291822.

[35]

M. Shahnawaz and M. Kumar, “A Comprehensive Survey on Big Data Analytics: Characteristics, Tools

and Techniques,” ACM Comput Surv, vol. 57, no. 8, pp. 1–33, Mar. 2025, doi:

10.1145/3718364/ASSET/B9D139A2-5843-4293-9D94-

C8008F21467E/ASSETS/IMAGES/LARGE/CSUR-2023-0936-F07.JPG.

[36]

P. Kostakis and A. Kargas, “Big-Data Management: A Driver for Digital Transformation?,”

Information 2021, Vol. 12, Page 411, vol. 12, no. 10, p. 411, Oct. 2021, doi:

10.3390/INFO12100411.

[37]

[38]

[39]

[40]

C. W. Tsai, C. F. Lai, H. C. Chao, and A. V. Vasilakos, “Big data analytics: a survey,” J Big Data, vol.

2, no. 1, pp. 1–32, Dec. 2015, doi: 10.1186/S40537-015-0030-3/TABLES/3.

G. Bello-Orgaz, J. J. Jung, and D. Camacho, “Social big data: Recent achievements and new challenges,”

Inf Fusion, vol. 28, p. 45, Mar. 2015, doi: 10.1016/J.INFFUS.2015.08.005.

S. Shahrivari and S. Jalili, “Beyond Batch Processing: Towards Real-Time and Streaming Big Data,”

Computers, vol. 3, no. 4, pp. 117–129, Mar. 2014, doi: 10.3390/computers3040117.

V. Gurusamy, S. Kannan, and K. Nandhini, “The Real Time Big Data Processing Framework

Advantages and Limitations,” International Journal of Computer Sciences and Engineering, vol. 5, no.

12, pp. 305–312, Dec. 2017, doi: 10.26438/IJCSE/V5I12.305312.

[41]

[42]

[43]

R. Hai, C. Koutras, C. Quix, and M. Jarke, “Data Lakes: A Survey of Functions and Systems,” IEEE

Trans

Knowl

Data

Eng,

vol.

35,

no.

12,

pp.

12571–12590,

Feb.

2023,

doi:

10.1109/TKDE.2023.3270101.

M. Strohbach, J. Daubert, H. Ravkin, and M. Lischka, “Big data storage,” New Horizons for a Data-

Driven Economy: A Roadmap for Usage and Exploitation of Big Data in Europe, pp. 119–141, Jan.

2016, doi: 10.1007/978-3-319-21569-3_7/TABLES/2.

A. Rodríguez, J. Valverde, J. Portilla, A. Otero, T. Riesgo, and E. De La Torre, “FPGA-Based High-

Performance Embedded Systems for Adaptive Edge Computing in Cyber-Physical Systems: The

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 29

ARTICo3 Framework,” Sensors 2018, Vol. 18, Page 1877, vol. 18, no. 6, p. 1877, Jun. 2018, doi:

10.3390/S18061877.

[44]

[45]

R. Raj et al., “Blockchain and Homomorphic Encryption for Data Security and Statistical Privacy,”

Electronics 2024, Vol. 13, Page 3050, vol. 13, no. 15, p. 3050, Aug. 2024, doi:

10.3390/ELECTRONICS13153050.

S. M. F. Ali and R. Wrembel, “From conceptual design to performance optimization of ETL

workflows: current state of research and open problems,” VLDB Journal, vol. 26, no. 6, pp. 777–801,

Dec. 2017, doi: 10.1007/S00778-017-0477-2/FIGURES/18.

[46]

[47]

M. Janssen, H. van der Voort, and A. Wahyudi, “Factors influencing big data decision-making

quality,” J Bus Res, vol. 70, pp. 338–345, Jan. 2017, doi: 10.1016/J.JBUSRES.2016.08.007.

S. Sleep, P. Gala, and D. E. Harrison, “Removing silos to enable data-driven decisions: The importance

of marketing and IT knowledge, cooperation, and information quality,” J Bus Res, vol. 156, p.

113471, Feb. 2023, doi: 10.1016/J.JBUSRES.2022.113471.

[48]

[49]

A. Fawzy, A. Tahir, M. Galster, and P. Liang, “Exploring data management challenges and solutions in

agile software development: a literature review and practitioner survey,” Empir Softw Eng, vol. 30, no.

3, pp. 1–61, Jun. 2025, doi: 10.1007/S10664-025-10630-4/TABLES/15.

Á. Szukits and Á. Szukits agnesszukits, “The illusion of data-driven decision making – The mediating

effect of digital orientation and controllers’ added value in explaining organizational implications of

advanced analytics,” Journal of Management Control 2022 33:3, vol. 33, no. 3, pp. 403–446, Jun.

2022, doi: 10.1007/S00187-022-00343-W.

[50]

S. Li and F. Brennan, “Digital twin enabled structural integrity management: Critical review and

framework development,” Proceedings of the Institution of Mechanical Engineers Part M: Journal of

Engineering for the Maritime Environment, vol. 238, no. 4, pp. 707–727, Nov. 2024, doi:

10.1177/14750902241227254/ASSET/BCB0DD0D-A759-4646-8A1D-

B834F041F6A6/ASSETS/IMAGES/LARGE/10.1177_14750902241227254-FIG12.JPG.

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 30

[51]

M. R. Yan, L. Y. Hong, and K. Warren, “Integrated knowledge visualization and the enterprise digital

twin system for supporting strategic management decision,” Management Decision, vol. 60, no. 4, pp.

1095–1115, Mar. 2022, doi: 10.1108/MD-02-2021-0182/FULL/XML.

[52]

[53]

F. Li and Z. Chen, “Dynamic quantification anti-fraud machine learning model for real-time

transaction fraud detection in banking,” Discover Computing, vol. 28, no. 1, pp. 1–15, Dec. 2025, doi:

10.1007/S10791-025-09549-7/TABLES/5.

E. Costa E Silva, O. Oliveira, and B. Oliveira, “Enhancing Real-Time Analytics: Streaming Data

Quality Metrics for Continuous Monitoring,” ACM International Conference Proceeding Series, pp.

97–101, Dec. 2024, doi: 10.1145/3686592.3686609/ASSETS/HTML/IMAGES/ICOMS2024-

17-FIG2.JPG.

[54]

O. Obioha Val, O. Selesi-Aina, T. M. Kolade, M. O. Gbadebo, O. Olateju, and O. O. Olaniyi, “Real-

Time Data Governance and Compliance in Cloud-Native Robotics Systems,” SSRN Electronic

Journal, 2025, doi: 10.2139/SSRN.5018252.

[55]

[56]

S. Baron, “Trust, Explainability and AI,” Philos Technol, vol. 38, no. 1, pp. 1–23, Mar. 2025, doi:

10.1007/S13347-024-00837-6/METRICS.

M. Xu and Y. Wang, “Explainability increases trust resilience in intelligent agents,” British Journal of

Psychology,

2024,

doi:

10.1111/BJOP.12740;REQUESTEDJOURNAL:JOURNAL:20448295;PAGE:STRING:ARTICL

E/CHAPTER.

[57]

[58]

B. Eken, S. Pallewatta, N. K. Tran, A. Tosun, and M. A. Babar, “A Multivocal Review of MLOps

Practices, Challenges and Open Issues,” ACM Comput Surv, vol. 1, Jun. 2024, Accessed: Jul. 07, 2025.

[Online]. Available: https://arxiv.org/pdf/2406.09737

S. Saifullah, D. Mercier, A. Lucieri, A. Dengel, and S. Ahmed, “The privacy-explainability trade-off:

unraveling the impacts of differential privacy and federated learning on attribution methods,” Front

Artif Intell, vol. 7, p. 1236947, Jul. 2024, doi: 10.3389/FRAI.2024.1236947/BIBTEX.

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03

G. 2051

Page 31

[59]

D. Wasif, D. Chen, S. Madabushi, N. Alluru, T. J. Moore, and J.-H. Cho, “Empirical Analysis of

Privacy-Fairness-Accuracy Trade-offs in Federated Learning: A Step Towards Responsible AI,” Mar.

2025, Accessed: Jul. 07, 2025. [Online]. Available: https://arxiv.org/pdf/2503.16233

[60]

[61]

[62]

[63]

[64]

N. Gruschka, V. Mavroeidis, K. Vishi, and M. Jensen, “Privacy Issues and Data Protection in Big Data:

A Case Study Analysis under GDPR,” Proceedings - 2018 IEEE International Conference on Big

Data, Big Data 2018, pp. 5027–5033, Nov. 2018, doi: 10.1109/BigData.2018.8622621.

J. Masinde, F. Mugambi, and D. W. Muthee, “Big data and personal information privacy in developing

countries: insights from Kenya,” Front Big Data, vol. 8, p. 1532362, Apr. 2025, doi:

10.3389/FDATA.2025.1532362/BIBTEX.

J. Rao, E. J. Shekita, and S. Tata, “Using Paxos to build a scalable, consistent, and highly available

datastore,” Proceedings of the VLDB Endowment, vol. 4, no. 4, pp. 243–254, Jan. 2011, doi:

10.14778/1938545.1938549;CTYPE:STRING:JOURNAL.

Z. Hussein, M. A. Salama, and S. A. El-Rahman, “Evolution of blockchain consensus algorithms: a

review on the latest milestones of blockchain consensus algorithms,” Cybersecurity, vol. 6, no. 1, pp. 1–

22, Dec. 2023, doi: 10.1186/S42400-023-00163-Y/TABLES/7.

D. Shenoy, R. Bhat, and K. Krishna Prakasha, “Exploring privacy mechanisms and metrics in federated

learning,” Artificial Intelligence Review 2025 58:8, vol. 58, no. 8, pp. 1–51, May 2025, doi:

10.1007/S10462-025-11170-5.

GRJNST, Volume: 04 - Issue 2 (2026) / ISSN P: 2790-7643

Article ID: 2051

https://doi.org/10.53762/grjnst.04.02.03