Big Data Opportunities and Challenges: Opportunities, Open Issues, and Challenges | Saylor Academy

Opportunities, Open Issues, and Challenges

According to McKinsey, the effective use of Big Data benefits 180 transform economies and ushers in a new wave of productive growth. Capitalizing on valuable knowledge beyond Big Data is the basic competitive strategy of current enterprises. New competitors must be able to attract employees who possess critical skills in handling Big Data. By harnessing Big Data, businesses gain many advantages, including increased operational efficiency, informed strategic direction, improved customer service, new products, and new customers and markets.

With Big Data, users not only face numerous attractive opportunities but also encounter challenges. Such difficulties lie in data capture, storage, searching, sharing, analysis, and visualization. These challenges must be overcome to maximize Big Data, however, because the amount of information surpasses our harnessing capabilities. For several decades, computer architecture has been CPU-heavy but I/O-poor. This system imbalance limits the exploration of Big Data. CPU performance doubles every 18 months according to Moore's Law, and the performance of disk drives doubles at the same rate. However, the rotational speed of the disks has improved only slightly over the last decade. As a result of this imbalance, random I/O speeds have improved moderately, whereas sequential I/O speeds have increased gradually with density.

Information is simultaneously increasing at an exponential rate, but information processing methods are improving relatively slowly. Currently, a limited number of tools are available to completely address the issues in Big Data analysis. The state-of-the-art techniques and technologies in many important Big Data applications (i.e., Hadoop, Hbase, and Cassandra) cannot solve the real problems of storage, searching, sharing, visualization, and real-time analysis ideally. Moreover, Hadoop and MapReduce lack query processing strategies and possess low-level infrastructures with respect to data processing and its management. For large-scale data analysis, SAS, R, and Matlab are unsuitable. Graph lab provides a framework that calculates graph-based algorithms related to machine learning; however, it does not manage data effectively. Therefore, proper tools to adequately exploit Big Data are still lacking.

Challenges in Big Data analysis include data inconsistency and incompleteness, scalability, timeliness, and security. Prior to data analysis, data must be well constructed. However, considering the variety of datasets in Big Data, the efficient representation, access, and analysis of unstructured or semistructured data are still challenging. Understanding the method by which data can be preprocessed is important to improve data quality and the analysis results. Datasets are often very large at several GB or more, and they originate from heterogeneous sources. Hence, current real-world databases are highly susceptible to inconsistent, incomplete, and noisy data. Therefore, numerous data preprocessing techniques, including data cleaning, integration, transformation, and reduction, should be applied to remove noise and correct inconsistencies. Each subprocess faces a different challenge with respect to data-driven applications. Thus, future research must address the remaining issues related to confidentiality. These issues include encrypting large amounts of data, reducing the computation power of encryption algorithms, and applying different encryption algorithms to heterogeneous data.

Privacy is major concern in outsourced data. Recently, some controversies have revealed how some security agencies are using data generated by individuals for their own benefits without permission. Therefore, policies that cover all user privacy concerns should be developed. Furthermore, rule violators should be identified and user data should not be misused or leaked.

Cloud platforms contain large amounts of data. However, the customers cannot physically assess the data because of data outsourcing. Thus, data integrity is jeopardized. The major challenges in integrity are that previously developed hashing schemes are no longer applicable to such large amounts of data. Integrity checking is also difficult because of the lack of support given remote data access and the lack of information regarding internal storage. The following questions must also be answered. How can integrity assessment be conducted realistically? How can large amounts of data be processed under integrity rules and algorithms? How can online integrity be verified without exposing the structure of internal storage?

Big Data has developed such that it cannot be harnessed individually. Big Data is characterized by large systems, profits, and challenges. Thus, additional research is needed to address these issues and improve the efficient display, analysis, and storage of Big Data. To enhance such research, capital investments, human resources, and innovative ideas are the basic requirements.

Course Introduction

Course Syllabus

Unit 1: Introduction to Data Management

1.1: Data Management

Data and Databases

1.1.1: Data Lifecycle Management

Research Data Management 101

1.1.2: Value within Data Management

Data Governance

1.1.3: Research Data

Managing Research Data

Basics of Research Data Management

1.2: Data Management Plans

Data Management Planning

Data Management Plans

1.3: Data Management Careers

The Evolving Role of the Data Architect

1.3.1: Data Consultant

Data Consultants and Architects

1.3.2: Operations Analyst

Operations Analysts

1.3.3: IT Systems Analyst

IT Systems Analysts

1.3.4: Database Administrator

Database Administrator

Unit 1 Study Resources

Unit 1 Review Video

Study Guide: Unit 1

Unit 1 Assessment

Unit 1 Assessment

Unit 2: Understanding Databases and DBMSes

2.1.1: Database Design and terms

Database Systems Concepts

2.1.2: Database Software

Characteristics and Benefits of a Database

2.1.3: Data Administration

O*NET Online Summary Report for Database Administrators.

2.2.1: The Essentials

Introduction to Database Management System (DBMS)

2.2.2: How Organizations Use DBMS

Combining Data Management with Organizational Change

Unit 2 Study Resources

Unit 2 Review Video

Study Guide: Unit 2

Unit 2 Assessment

Unit 2 Assessment

Unit 3: Data Models

3.1: Types of Data Models

Data Modeling

Types of Data Models

Data Modeling and Metadata Management

3.1.1: Conceptual Models

Data Modelling

3.1.2: Logical Models

Types of Data Models

3.1.3: Physical Models

Physical Data Models

3.2: Data Model Advantages and Disadvantages

Lessons in Data Modeling

What are the Advantages of a Relationship Model?

3.3: The Enhanced E-R Model and Business Rules

The Entity-Relationship Model

Business Rules

3.3.1: Supertypes and Subtypes

The Enhanced Entity-Relationship Model

3.3.2: How Business Rules are Used

Integrity Rules and Constraints

3.4: Database Security

Database Security

Unit 3 Study Resources

Unit 3 Review Video

Study Guide: Unit 3

Unit 3 Assessment

Unit 3 Assessment

Unit 4: Big Data Processing and Cloud Computing

4.1: Big Data

Introduction to Big Data

4.1.1: Big Data Storage

Infrastructure for Big Data

4.1.2: Big Data Analytics