• Unit 6: Data on the Internet

    The growth of the internet has led to an increase in e-business and e-commerce. An E-business is any organization that conducts business over the internet. E-commerce is the transmission of funds or money over an electronic network, primarily the internet. Both e-business and e-commerce may occur as business-to-business (B2B), business-to-consumer (B2C), consumer-to-consumer (C2C) and consumer-to-business (C2B)  The internet gives us instant access to millions of IP addresses. It digitally connects us to numerous networks with the click of a key or touch of a screen. Advancements increased the use of the internet for business, data is easily collected and used for business growth.  Websites collect and store vast amounts of data on each consumer. Organizations determine what is relevant and irrelevant to the consumer. Data abstraction is a process that delivers only necessary information while concealing background details. So far, you have learned that database systems (DMBS) are made of complex data structures. To improve user experience and ease user interaction on the internet, developers hide internal irrelevant details from the user. This is the definition of data abstraction. This data is used to conduct marketing and increase growth for B2B and B2C sales.  This unit will cover data over the internet and growth, which has been immense in the past few years, and it is growing faster than ever. Also, the unit will review data integration and information retrieval, such as structured queries over the web.

    Completing this unit should take you approximately 3 hours.

    • 6.1: Web Data

      Websites are probably the most widely used and primary source of "big data". Organizations across every industry use technology to store, collect, and integrate consumer data sourced from websites in database management systems (DBMS). A relational database management system (RDBMS) is a DBMS designed especially for relational data. This makes it easy to store web data in a structured format using rows and columns. 

      Web analytics provide organizations with consumer or visitor data. This data is used to optimize content based on user interest. Think back to what you learned about SQL in Unit 5. Remember the importance of using this programming language? Therefore, since the internet is the most common source of big data, then it is important that you continue to develop SQL programming skills to leverage web data.

      • 6.1.1: Approaches to Web Data Abstraction

        Advancements in technology have sparked growth in e-business and e-commerce. As a result, web data has become the primary source of "big data". Next, you will learn different data abstraction methodologies for organizational web data.

        There are various types of approaches for abstracting web data. Web data extraction is also known as web harvesting, web scraping, and screen scraping. This is commonly done through an application program interface (API). 

        An API is a software intermediary that allows communication between two or more applications. Remember how APIs were used to send a request for data to internet users? Let's learn a few more approaches to web data abstraction.

      • 6.1.2: Applications of Data Abstraction

        You learned that abstraction is the process of selecting relevant data from databases. You can model the abstraction of the same data to use with different applications. Once you model using abstraction, the same data can be used in different applications. For example, JAVA abstraction is done using abstract classes and interfaces.

      • 6.1.3: Web Crawling

        A web crawler is an automated script or program that browses the internet in a methodical and automated way. Web crawlers are also known as web spiders or web robots. Many internet websites, in search engines, use crawling to provide up-to-date data.

        Web crawlers copy web pages for processing later through a search engine. This allows users to find web pages quickly when using a search engine. Sometimes, web crawlers are used to extract other forms of information or data from websites.

      • 6.1.4: Legal Issues

        You learned about using web crawling to browse the internet and extract data from other sites. This is a result of governments passing open data laws. However, there are a few legal concerns associated with web crawling. 

        Web crawling is legal when you do it for your own purpose. This falls under the fair use doctrine. However, problems and difficulties start if you want to use scraped data for other reasons, particularly commercial purposes. On September 9, 2019, the U.S. 9th circuit court of Appeals ruled that web scraping public websites does not violate the Computer Fraud and Abuse Act (CFAA). Some website owners consider scraping theft because they believe this information is "their own".

        During this unit, you learned about web data collected via e-business and e-commerce operations. Because of the advancement in technology, websites account for the majority of "big data". Now you understand the approach, methodology, and applications associated with data abstraction. Remember, web crawling is data shared with outside agencies collected from websites. Because of ever-changing laws and regulations, you should stay knowledgeable about country and state open data laws that govern web crawling. Unit 7 will deliver more detail about data sharing between users and organizations.

    • Study Guide: Unit 6

      We recommend reviewing this Study Guide before taking the Unit 6 Assessment.

    • Unit 6 Assessment

      • Receive a grade