Topic outline
-
Once your data is mined and cleaned, it needs to be stored in a useful way for BI teams. As the uses for information have grown, technologies like collection and analysis software and dashboards have been developed to keep up with the demand. This unit explores the methods used to access, use, and secure information. Simply put, a data warehouse (DW) is just a warehouse holding all the relevant data. However, data warehousing requires designers to map data between source and target models, capturing the details of the transformation in a metadata repository. Tools that support these various modeling, mapping, and documentation activities are known as DW design tools. What of Data Integration? DI is a family of techniques and best practices that repurpose data by transforming it as it's moved. What information do you "extract" from units; how do you organize the pages; whether you use diagrams; how do you process the information "transform", and what, after weighing up, are your concluding thoughts that you share with others "load". Those are examples of your data integration, the most common form of DI in data warehousing.
Completing this unit should take you approximately 10 hours.
-
It is important to understand that the earliest need for data accumulation and storage was in ancient Sumeria for commerce. Just like you will use or are using data to support your firm's decision-making, ancient traders used their notches to determine whether to continue extending credit to Hamid or whether to find additional sources of wheat, as this season's supplies were selling faster than expected. Data has always been crucial to business success, but today the amount available and our understanding of its uses has expanded exponentially.
-
Imagine how befuddled our Sumerian trader would be with the dizzying amount of data we can capture and use today!
-
This article briefly explores the history and development of the purposes and phases of data collection and storage. It can broadly be characterized by shifts in how and where the data is stored and the nature of the data (structured to unstructured to mobile and sensor-based content). What are some uses of data warehousing in your life based on each of the big data phases defined? What do you think the next phase of big data will include?
-
Data sharing is important to furthering scientific discovery; this has never been more important than today with the COVID-19 pandemic. While the medical sector is one of the most important areas for data sharing, can you think of other sectors where data sharing is highly significant? For example, agriculture and the processes by which our food is developed. While applying the concepts of this article, what are some additional potential concerns for data sharing, and how could it affect future research?
-
A data warehouse is considered a core business intelligence and mining component. What are some differences between data warehousing and data mining, and how do the two intersect/relate to each other?
-
From punch cards being used to communicate information to equipment a long time before computers were developed to Professor Fredrick Williams in 1948 developing Random Access Memory (RAM), the history of data storage is wide, varied, and extremely complex, with the longest serving era being that of IBM from the mid-1950s to approximately 2003 with their magnetic disk storage development and market domination. Since then, the technological development of data warehousing and storage regarding speed has moved beyond, but large mainframes have remained relatively similar in size.
-
The various data storage media throughout time are shown in this page: vinyl, floppy disks, CDs to USBs, and memory cards. As you can see, media have gotten smaller/more pocket size and more secure. How many of these have you used? Can you identify when each was core to data storage and their advantages and disadvantages? How long before all personal hardware modes of data storage are defunct?
-
Data was stored in file-based systems before the advent of database systems which were developed/created because of the disadvantages of the file-based system. This included integrity problems, isolation of data, and security issues. How might a small agricultural business use a database system to gain a competitive advantage in the marketplace? Think about in what areas and how a database system would be used. For example, how could a homeowner implement a database to finesse their home management?
-
-
Data modeling defines not just data elements but also the structures they form and the relationships between them. The theories generally relate to the languages used in data storage regarding how a question is posed and its relationship to logic or how you communicate and make sense of the information.
-
The main real-world datasets used in the studies analyzed for this paper were sensor data, image metadata, website publications, and electronic documents. Most of the studies analyzed did not document the specific languages they used to model their data or the tool they used. But due to the need to analyze large volumes of data with various structures, which arrive in high frequency, database research became more focused on NoSQL than relational databases. Why might a NoSQL vs. Relational approach be best for database management, according to growing trends captured in this review of research?
-
This article explores the various tools and technologies currently being leveraged (like Hadoop, which is useful for developing applications that can perform absolute statistical analysis on vast quantities of data) and the issues faced when using them (heterogeneity and timeliness, security, incompleteness and scalability of the data are the biggest obstacles when analyzing big data). What are some additional areas where big data utilization can grow? What needs to improve? What other technologies do you envision being used in collaboration with big data in the future, and in what ways?
-
This article lists the various computer information systems/storage types and how they work. This article includes definitions of various types of storage, from hard drives and flash memory, such as USB drives and solid state drives (memory cards), to optical discs and smart cards. We currently use smart cards more than this article suggests.
-
-
-
Big data management (BDM) is the administration, management, and governance of large volumes of structured or unstructured data. Governance ensures that corporate and governmental rules and policies are followed using policies, processes, and controls. The need for more efficiency and simplicity continues with developers searching for different methods and new tools.
-
Think back to our Sumerian trader. He may have had various clay tablets organized by those who owed him money, those he owed, or by commodity. Figuring out how to organize and store the information that is too much to keep track of in our heads is a long-term human problem. As we develop the ability to create more data, we must find ways to store and retrieve it logically so it is not lost forever. When we typed on a sheet of paper, for instance, the paper did not keep track of what we were typing unless we accidentally placed a second sheet of paper under it or a carbon sheet between two pieces of paper. In this way, we only created the same information twice, not two pieces of information. The typewriter was also not collecting any information. Today, our every keystroke is logged by our word processing systems, and our preferred fonts, phrasing, addresses, and much more are collected every minute. Think about how all that information multiplied by millions of people every day could be stored and easily retrieved.
-
This paper explores what a Database Management System (DBMS) suited to the future may look like based on issues that can be seen today, as well as emerging trends and how this system may be created. An apt example includes a system that allows efficient and continuous querying and mining of data flows that can be employed on media with different computing capacities. What human-to-machine communication and interoperability do you think was most beneficial? Consider how, for example, an individual embedded medical device will be included in DBMS as processes get more complex and storage facilities become more distributed. What are some key aspects of DBMS that could benefit future architectures?
-
This video provides an in-depth explanation with real-world case examples in specific industries of DW strategies. How might you apply these concepts to your industry? What would the pros and cons be?
-
This article highlights the importance and intersection of data mining and data warehousing in the context of big data. In a data warehouse, data helps analysts to make informed decisions in an organization. Based on your understanding of a database warehouse and the data mining life cycle, consider a specific issue in an industry you know. How might you apply these steps to address the issue?
-
-
This section will provide an overview of those methods and tools for data warehousing commonly discussed in open-source information. There are thousands of proprietary tools, with more being created as we learn about the warehousing concept. They are developed for specific dataset types and particular industries. Large companies even create data analyst job descriptions expecting recruits to have heard of them. This subunit will help you in those kinds of interviews and enable you to speak confidently about various types so potential employers will know you can pick up their system quickly.
-
Businesses and institutions must collect and store temporal data for accountability and traceability. This paper highlights an approach to dealing with transaction lineage that considers how data can be stored based on timestamp granularities and methods for refreshing data warehouses with time-varying data via batch cycles. Identify three ways transaction lineage can be used and how this is relevant to temporal data. What industries do you think transaction lineage will always be relevant in? How?
-
This case study provides insight into how a data warehouse was built for a firm in the financial sector using its existing Microsoft technology. It touches on the current form of "static reports" currently used within the company, which we have identified as problematic. This case study showcases a step-by-step method of how this DW is built. After reading, you should understand the theory and practical application of the DW approach. How would you apply a similar framework to a large department store chain's supply chain?
-
-
-
Many factors have changed in the current year regarding the future of data storage. As a move toward more security, containers are being used with more microservice architectures being implemented, and how those issues, such as operationality, will be a key trend to address. As cloud infrastructure grows, so does the market for on-premise storage facilities, as more businesses want in-house control.
-
Despite cloud storage's prevalence and greater security, most businesses still prefer to use local data storage instead of the cloud. As companies like Amazon AWS, Microsoft Azure, and Google Cloud continue to showcase their ability to handle extremely large data sets securely, how will change occur with increasing data growth? We need more space, especially with heavier use of video, consumer, and enterprise for surveillance, social media, and anything generated for economic value must collect and analyze web-generated data. With AI/ML and the Internet of Things (IoT), the ability to access more data for training is how they grow.
-
"Good" hardware is integral for a data warehouse and its software to function efficiently, and the architect of the warehouse must be "hardware aware". As each hardware and software technology advances, so do data warehouses with the advent of, for example, new nonvolatile memory (NVM) and high-speed networks for base support. This article focuses on the need to develop and adopt new management and analysis methods.
-
Big data and business analytics methods for improved business decision-making, technological approaches, applications, and open research challenges. Big data has brought companies in developed countries many positive effects, which those in emerging and developing nations may replicate. However, big data's many challenges include data security, management, characteristics, compliance, and regulation. This paper contains a neatly wrapped breakdown outlining the structure, components, and tools that provide effective and efficient processing for the Hadoop ecosystem.
-
This toolkit was developed with the World Bank to teach and provide tools for entrepreneurs to collect data. Business analysis for tech hubs is difficult because the hubs simultaneously influence and are influenced by their local ecosystems. Areas in which tech hubs may benefit from business analytics include finding focus, sharing success with customers, and fundraising.
Imagine you are setting up a tech hub using the framework provided in this toolkit. Make a plan of how you would effectively collect the data. How would you decide what to measure? What resources will you need to effectively implement, monitor, and report the services your tech hub offers?
-
-
What's next? Currently, cloud computing rules the world and has a participatory role in all aspects of our lives in some way. The need to understand what comes next is preeminent as technology advances exponentially. Alternative methods of use beyond the cloud are currently underway, especially regarding data loss due to potential ransomware, theft, and power outages.
-
This article highlights use cases of ocean observation to explore how cloud computing can be improved to handle increased data flows. As the amount of data ingested increases, the cloud could replace traditional approaches to data warehousing. High-performance mass storage of observational data, coupled with on-demand computing to run model simulations near the data, tools to manage workflows, and a framework to share and collaborate, enables a more flexible and adaptable observation and prediction computing architecture. Apply this structure in your industry regarding how to get data, store data, organize it, and conduct analysis and visualization in the cloud. What are some potential problems for large datasets? Think about how you would overcome those challenges. How would "sandboxes" provide some security when testing a system?
-
Cloud computing is useful in providing easy-to-access high-performance computing, networking, and storage via the net. Future work should be geared toward working on data science/AI/ML services to protect user data to make data more secure. What are the three service models of the cloud? How do they differ for the consumer? Identify, define, and provide some examples of issues with potential solutions you may have experienced with your social media accounts.
-
-
-
This review video is an excellent way to review what you've learned so far and is presented by one of the professors who created the course.
-
Watch this as you work through the unit and prepare to take the final exam.
-
We also recommend that you review this Study Guide before taking the Unit 4 Assessment.
-
-
-
Take this assessment to see how well you understood this unit.
- This assessment does not count towards your grade. It is just for practice!
- You will see the correct answers when you submit your answers. Use this to help you study for the final exam!
- You can take this assessment as many times as you want, whenever you want.
-