BUS611 Study Guide

Unit 7: Data Sharing

7a. Explain the advantages and disadvantages of sharing data

  • How is data managed in a shared environment?
  • What are some of the advantages and disadvantages of data sharing?

Data sharing is the process of making data available to others. This exchange process allows any organization or individual to use data or metadata. Over time, data sharing has become one of the most essential methods to encourage scientific and organizational progress. Today, organizations and institutions encourage a culture of openness, accountability, and secondary analysis of data. Data sharing is important because it promotes the cross-flow of information and builds partnerships between researchers and organizations. Data sharing allows others to further investigate previous research or reveal new insights about an event from previously collected data.
 
Data sharing is also common in commercial applications. Organizations share data for a wide variety of reasons. For example, an organization may outsource its payroll processing and then share data with the payroll processor, typically through an API. The degree of coordination between the organizations will vary. The simplest way of sharing data is through an API, with each organization maintaining its own systems and data structures. In other cases, the organizations may jointly design and operate a database management system to serve several organizations in the same industry.
 
When managing data in a shared environment, many users will have the ability to make changes to the data over time. Therefore, managing data integrity, the overall accuracy, completeness, and consistency of data is a significant challenge in a shared-data environment. To address the degradation of data integrity over time, the organization managing the data must implement excellent data governance and integrity policies and procedures. There will also need to be a regular review of the policies and agreements that document how data will be shared between organizations. These agreements can become even more complex when cloud service providers are also a part of the shared data environment.
 
Data sharing among academic and research users is the ability to distribute exact information through multiple applications. Data sharing is essential because it promotes the cross-flow of information and builds partnerships between researchers and organizations. Data sharing allows others to further investigate previous research or reveal new insights about an event from previously collected data. Distributing and sharing data improves our ability to learn more about a topic. Organizations distribute data by publishing it using different platforms. This is also known as discoverability. Publishing makes sharing and locating data by other researchers and industries easier. However, discussing the pros and cons of sharing data before distributing it through publishing is essential.
 
Challenges that arise from data sharing include the ownership of intellectual property, coordination of security systems to avoid data breaches, and the technical complexities of tight interconnectivity between heterogeneous systems. Sharing can also become challenging when the participating organizations are located in different countries or other legal jurisdictions, and the laws among the various jurisdictions are not harmonized. In this case, it may be necessary to comply with the laws of the most stringent jurisdiction, which may impose unnecessary costs on sharing participants in less stringent jurisdictions.
 
To review, see What is Data Sharing?.

 

7b. Identify data reuse, sharing, and access policies from funding agencies, institutions, and publishers

  • Why is data sharing and reuse important in research activities?
  • What are some of the considerations that funding agencies and publishers must consider?

Data sharing has significant advantages for research in government, academic institutions, and industry. First, researchers can further investigate or develop new concepts based on the foundation of previous research when research data is shared. Second, shared data can be more reliable when collected by other researchers. Third, data sharing reduces costs associated with collecting new data. However, data sharing also has disadvantages. Organizations and institutions must work together to agree on policies, guidelines, and data-sharing standards to counter data-sharing disadvantages. This is especially true when data is being shared among industry participants, who may want to keep results proprietary, and academic or government participants who desire to make data and research results freely available through publication to the general population.
 
There have always been advocates for sharing data over the decades. The idea of using data collected for other purposes is known as secondary data analysis. An example of this might be the use of data collected by public health authorities to detect disease patterns by pharmaceutical companies to facilitate the development of new treatments. Although this type of data sharing is widely accepted in science, barriers exist when using data from other sources in a more commercial setting. Some issues and barriers include concerns about data manipulation errors, data possessiveness, data documentation, and data management. When data is being used and manipulated by several different entities, procedures must be put in place so that each user of the data can be confident that the data was properly handled by prior users of the data. Also, many organizations view their data as an asset and are often unwilling to share it without receiving significant compensation.
 
International barriers and obstacles also exist to sharing data. It includes language differences, legal differences, differences in technology advancement, and differences in data documentation and standards. There can also be cultural and other societal differences around how data is copied, how results are attributed, intellectual property law, and others.
 
Laws, frameworks, and expectations about the rights of data providers, or those about whom data is being collected, are constantly evolving. These frameworks define certain rights that data subjects possess. Among them is that the customer or other data provider should always have the right to be informed about how information is to be collected and used. For example, the General Data Protection Regulation (GDPR) is a set of rules and legislation created in the European Union (EU). The legislation contains extensive regulations about handling citizens' personal information. These rules apply to any organization that handles the data of EU citizens, regardless of where the organization is located. The principles of the GDPR are being adopted worldwide, and every organization should consider how they will implement these principles in their data handling practices. In particular, there is an increasing focus on protecting personally identifiable information (PII), any data that could potentially identify a specific individual.
 
To review, see Better Data Sharing Rules.

 

7c. Assess issues/obstacles related to reusing and sharing of data, such as different legal, governance, and ethical systems,

  • What are some differences in international law that relate to the management of data?
  • Why is security so important in the international sharing of data?

Local and international issues and barriers are associated with data sharing. Institutions and publishers revisit publishing requirements continually to address these concerns. Therefore, revisiting publishing requirements is considered an ongoing process improvement strategy to incorporate agreements on data sharing between organizations. In Europe, for example, GDPR. Such regulations are often designed to protect data privacy. Data privacy is the branch of data management that deals with handling personal data in compliance with data protection laws, regulations, and general privacy best practices. Notice that while laws and regulations are usually specified very precisely, the notion of best practices can vary.
 
Companies need to better understand where data exists within their infrastructure for everyone, not only people who live in the EU. They need to know where that data lives, who has access to it, how it's processed, who else it might be transmitted to, how to give it to you when you request it, and how to delete it when you request that it be deleted. One of the friendliest ways to do this is by building diagrams. This exercise is helpful to help visualize how data flows into an organization, where it ends up, how it's used, who knows it's there, and where it is most vulnerable. This helps organizations accomplish other important things, like designing disaster recovery tactics, incident response plans, and overall resilience. Efforts involved in building a better understanding of how an organization works and how it is most vulnerable pay for themselves in a crisis when unplanned events compromise productivity, reputations, and bottom lines.
 
Notice that database security is of even greater importance in a shared environment than in a closed environment. The diversity of different systems, networks, users, and administrators presents additional vulnerabilities. Thus, the security mechanisms to prevent misuse must be even more robust. If the data is transmitted internationally, the number of interception points increases significantly. For this reason, special care must be taken – especially if the data is being transmitted on openly accessible telecommunications systems in countries that are not known for superior performance in securing data. There have even been examples of state espionage, where state actors and security services engage in overt espionage to steal data and gain a competitive advantage for themselves or companies that are closely aligned with the national government. Security techniques like encryption (converting information or data into a code, especially to prevent unauthorized access) of transmitted data can reduce but not eliminate this threat.
 
To review, see Equitable Design.

 

7d. Explain how open access, open science, and open data can lead to process improvements and higher levels of cooperation

  • How does open data lead to improved processes and technologies?
  • What legal structures are in place to balance the needs of developers and users of new technology?

Open access to data and open scientific inquiry can lead to several significant benefits. Indeed, the pace of scientific advancement can be increased if researchers are more willing to collaborate. In addition, if the research results, such as new technologies, are made available to others, the whole economy and society can benefit. Having access to other innovators' data and research results can also often take innovations in new and unanticipated directions. For example, the developer of a new compression algorithm may not have considered how such a technology might improve the process of video distribution in the entertainment industry.
 
The challenge can be constructing legal systems that motivate people to cooperate and share but that do not inhibit the motivation to conduct research in the first place. While many researchers, especially in academic institutions, may be motivated to create new knowledge for the sake of knowledge alone, many are doing so with an expectation that they will retain the financial benefits of new processes, technology, and so forth. This is especially true in corporations, where process and technology improvements are significant ways to increase competitive advantage and profitability.
 
Most nations have created carefully designed legal systems like patents to balance these competing motivations. A patent is a formal grant of rights to an inventor by a government entity. In the United States, for example, the original creator of a new technology or process can make that innovation available to the larger society while retaining a significant financial interest through that patent and trademark system. This typically takes the form of a license, where the inventor of the new technology agrees to share the technology with others in exchange for a fee. Users of the technology would then pay this fee to access and use it while avoiding an infringement lawsuit if they were to use it without this license. To balance the needs of society as a whole, this ownership and licensing structure will have some time limit. After the time limit has expired, the technology enters into the public domain and may be used by anyone.

To review, see Interoperability and Data Sharing.

 

Unit 7 Vocabulary

This vocabulary list includes the terms that you will need to know to successfully complete the final exam.

  • data integrity
  • data privacy
  • data sharing
  • encryption
  • General Data Protection Regulation (GDPR)
  • license
  • personally identifiable information (PII)
  • patent
  • state espionage