Opportunities and Challenges for Data, Models, Computation, and Workflows

This article highlights use cases of ocean observation to explore how cloud computing can be improved to handle increased data flows. As the amount of data ingested increases, the cloud could replace traditional approaches to data warehousing. High-performance mass storage of observational data, coupled with on-demand computing to run model simulations near the data, tools to manage workflows, and a framework to share and collaborate, enables a more flexible and adaptable observation and prediction computing architecture. Apply this structure in your industry regarding how to get data, store data, organize it, and conduct analysis and visualization in the cloud. What are some potential problems for large datasets? Think about how you would overcome those challenges. How would "sandboxes" provide some security when testing a system?

Introduction

Advances in ocean observations and models mean increasing flows of data. Integrating observations between disciplines over spatial scales from regional to global presents challenges. Running ocean models and managing the results is computationally demanding. The rise of cloud computing presents an opportunity to rethink traditional approaches. This includes developing shared data processing workflows utilizing common, adaptable software to handle data ingest and storage, and an associated framework to manage and execute downstream modeling. Working in the cloud presents challenges: migration of legacy technologies and processes, cloud-to-cloud interoperability, and the translation of legislative and bureaucratic requirements for "on-premises" systems to the cloud. To respond to the scientific and societal needs of a fit-for-purpose ocean observing system, and to maximize the benefits of more integrated observing, research on utilizing cloud infrastructures for sharing data and models is underway. Cloud platforms and the services/APIs they provide offer new ways for scientists to observe and predict the ocean’s state. High-performance mass storage of observational data, coupled with on-demand computing to run model simulations in close proximity to the data, tools to manage workflows, and a framework to share and collaborate, enables a more flexible and adaptable observation and prediction computing architecture. Model outputs are stored in the cloud and researchers either download subsets for their interest/area or feed them into their own simulations without leaving the cloud. Expanded storage and computing capabilities make it easier to create, analyze, and distribute products derived from long-term datasets. In this paper, we provide an introduction to cloud computing, describe current uses of the cloud for management and analysis of observational data and model results, and describe workflows for running models and streaming observational data. We discuss topics that must be considered when moving to the cloud: costs, security, and organizational limitations on cloud use. Future uses of the cloud via computational sandboxes and the practicalities and considerations of using the cloud to archive data are explored. We also consider the ways in which the human elements of ocean observations are changing – the rise of a generation of researchers whose observations are likely to be made remotely rather than hands on – and how their expectations and needs drive research towards the cloud. In conclusion, visions of a future where cloud computing is ubiquitous are discussed.


Source: Tiffany C. Vance, Micah Wengren, Eugene Burger, Debra Hernandez, Timothy Kearns, Encarni Medina-Lopez, Nazila Merati, Kevin O’Brien, Jon O’Neil, James T. Potemra, Richard P. Signell, and Kyle Wilcox, https://www.frontiersin.org/articles/10.3389/fmars.2019.00211/full
Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License.