Completion requirements
"Data lineage includes the data origin, what happens to it and where it moves over time (essentially the full journey of a piece of data)". This page explains the concept of data lineage and its utility in tracing errors back to their root cause in the data process. Data lineage is a way of debugging Big Data pipelines, but the process is not simple. Many challenges exist, such as scalability, fault tolerance, anomaly detection, and more. For each of the challenges listed, write your own definition.
Associations
Association is a combination of the inputs, outputs and the operation itself. The operation is represented in terms of a black box also known as the actor. The associations describe the transformations that are applied on the data. The associations are
stored in the association tables. Each unique actor is represented by its own association table. An association itself looks like {i, T, o} where i is the set of inputs to the actor T and o is set of outputs given produced by the actor. Associations
are the basic units of Data Lineage. Individual associations are later clubbed together to construct the entire history of transformations that were applied to the data.