Data Lineage

Prescriptive data lineage

The concept of prescriptive data lineage combines both the logical model (entity) of how that data should flow with the actual lineage for that instance.

Data lineage and provenance typically refers to the way or the steps a dataset came to its current state Data lineage, as well as all copies or derivatives. However, simply looking back at only audit or log correlations to determine lineage from a forensic point of view is flawed for certain data management cases. For instance, it is impossible to determine with certainty if the route a data workflow took was correct or in compliance without the logic model.

Only by combining a logical model with atomic forensic events can proper activities be validated:

  1. Authorized copies, joins, or CTAS operations
  2. Mapping of processing to the systems that those process are run on
  3. Ad-Hoc versus established processing sequences

Many certified compliance reports require provenance of data flow as well as the end state data for a specific instance. With these types of situations, any deviation from the prescribed path need to be accounted for and potentially remediated. This is marks a shift in thinking from purely a look back model to a framework which is better suited to capture compliance workflows.