• Unit 4: Data Warehousing and Integration

    Once your data is mined and cleaned, it needs to be stored in a useful way for BI teams. As the uses for information have grown, technologies like collection and analysis software and dashboards have been developed to keep up with the demand. This unit explores the methods used to access, use, and secure information. Simply put, a data warehouse (DW) is just a warehouse holding all the relevant data. However, data warehousing requires designers to map data between source and target models, capturing the details of the transformation in a metadata repository. Tools that support these various modeling, mapping, and documentation activities are known as DW design tools. What of Data Integration? DI is a family of techniques and best practices that repurpose data by transforming it as it's moved. What information do you "extract" from units; how do you organize the pages; whether you use diagrams; how do you process the information "transform", and what, after weighing up, are your concluding thoughts that you share with others "load". Those are examples of your data integration, the most common form of DI in data warehousing.

    Completing this unit should take you approximately 10 hours.

    • 4.1: The History of Data Storage

      It is important to understand that the earliest need for data accumulation and storage was in ancient Sumeria for commerce. Just like you will use or are using data to support your firm's decision-making, ancient traders used their notches to determine whether to continue extending credit to Hamid or whether to find additional sources of wheat, as this season's supplies were selling faster than expected. Data has always been crucial to business success, but today the amount available and our understanding of its uses has expanded exponentially.

      • 4.1.1: Early Days

        From punch cards being used to communicate information to equipment a long time before computers were developed to Professor Fredrick Williams in 1948 developing Random Access Memory (RAM), the history of data storage is wide, varied, and extremely complex, with the longest serving era being that of IBM from the mid-1950s to approximately 2003 with their magnetic disk storage development and market domination. Since then, the technological development of data warehousing and storage regarding speed has moved beyond, but large mainframes have remained relatively similar in size.

      • 4.1.2: The Evolution of Data Storage

        Data modeling defines not just data elements but also the structures they form and the relationships between them. The theories generally relate to the languages used in data storage regarding how a question is posed and its relationship to logic or how you communicate and make sense of the information.

    • 4.2: Big Data Administration

      Big data management (BDM) is the administration, management, and governance of large volumes of structured or unstructured data. Governance ensures that corporate and governmental rules and policies are followed using policies, processes, and controls. The need for more efficiency and simplicity continues with developers searching for different methods and new tools.

        • 4.2.1: How Data Warehousing Works

          Think back to our Sumerian trader. He may have had various clay tablets organized by those who owed him money, those he owed, or by commodity. Figuring out how to organize and store the information that is too much to keep track of in our heads is a long-term human problem. As we develop the ability to create more data, we must find ways to store and retrieve it logically so it is not lost forever. When we typed on a sheet of paper, for instance, the paper did not keep track of what we were typing unless we accidentally placed a second sheet of paper under it or a carbon sheet between two pieces of paper. In this way, we only created the same information twice, not two pieces of information. The typewriter was also not collecting any information. Today, our every keystroke is logged by our word processing systems, and our preferred fonts, phrasing, addresses, and much more are collected every minute. Think about how all that information multiplied by millions of people every day could be stored and easily retrieved.

        • 4.2.2: Common Methods and Tools

          This section will provide an overview of those methods and tools for data warehousing commonly discussed in open-source information. There are thousands of proprietary tools, with more being created as we learn about the warehousing concept. They are developed for specific dataset types and particular industries. Large companies even create data analyst job descriptions expecting recruits to have heard of them. This subunit will help you in those kinds of interviews and enable you to speak confidently about various types so potential employers will know you can pick up their system quickly.

      • 4.3: Trends in Data Storage and Integration

        Many factors have changed in the current year regarding the future of data storage. As a move toward more security, containers are being used with more microservice architectures being implemented, and how those issues, such as operationality, will be a key trend to address. As cloud infrastructure grows, so does the market for on-premise storage facilities, as more businesses want in-house control.

          • 4.3.1: Local Data vs. Cloud Storage

            Despite cloud storage's prevalence and greater security, most businesses still prefer to use local data storage instead of the cloud. As companies like Amazon AWS, Microsoft Azure, and Google Cloud continue to showcase their ability to handle extremely large data sets securely, how will change occur with increasing data growth? We need more space, especially with heavier use of video, consumer, and enterprise for surveillance, social media, and anything generated for economic value must collect and analyze web-generated data. With AI/ML and the Internet of Things (IoT), the ability to access more data for training is how they grow.

          • 4.3.2: Beyond the Cloud

            What's next? Currently, cloud computing rules the world and has a participatory role in all aspects of our lives in some way. The need to understand what comes next is preeminent as technology advances exponentially. Alternative methods of use beyond the cloud are currently underway, especially regarding data loss due to potential ransomware, theft, and power outages.

        • Study Guide: Unit 4

          We recommend reviewing this Study Guide before taking the Unit 4 Assessment.

        • Unit 4 Assessment

          • Receive a grade