Businesses and institutions must collect and store temporal data for accountability and traceability. This paper highlights an approach to dealing with transaction lineage that considers how data can be stored based on timestamp granularities and methods for refreshing data warehouses with time-varying data via batch cycles. Identify three ways transaction lineage can be used and how this is relevant to temporal data. What industries do you think transaction lineage will always be relevant in? How?
Process of Loading Derived Tables
In data warehouses, quite often derived tables are created for analytical purposes. These tables are loaded by pulling data from multiple tables and doing various aggregations, summations, and other computations. The response time of standard repetitive queries can be improved significantly if the answers of complex reporting query are stored in a simple table with keyed access. These table structures are created in such a way that they fulfill the reporting needs of different business applications. The report tools will point to these tables via simple (SELECT *) views. When loading the derived tables, by pulling data from primary source tables and dimension or header tables, row effective timestamp ranges need to be generated for dimension/ header/ secondary source table to make a relationship between the row effective date and the time on primary/ fact table rows.
Figure 2: Table showing data after Initial Full Refresh.
Figure 2 is an example of loading staging and analytical subject area tables. Initial full refreshes in staging subject areas are performed using database utilities such as Fast-Load and Multi-Load. The downstream analytical subject areas are refreshed with data from staging tables. Since both source (staging) and target (analytical) tables reside in the same data warehouse database-specific software such as stored procedures and macros could be used to update the analytical subject areas.
The SQL in the form of stored procedures can be used to load data into derived tables. To do a full or incremental refresh, when joining between source primary and secondary or dimension tables or joining between line and header tables, the table join has to be based on primary key + row_eff_ts columns. As both primary and secondary/ dimension tables will hold transaction lineage it is important to make a one to one relationship for transactions lineage data using row_eff_ts columns. In this case the secondary/dimension table row_eff_ts must be less than or equal to the row_eff_ts column of primary source table. Note that the secondary/ dimension table data must come from the source at the same time or during a previous refresh in order to have the primary source table row effective timestamp match else data will be filtered out in loading the target table. This might happen when the primary source table and the secondary/ dimension tables is joined with an 'inner join' (instead of left outer) conditions.
Figure 3: Table showing data after Incremental Refresh followed by Full Refresh.
Figure 3 is an example of a table update with incremental data (Rahman 2010). In Analytical Table-A, for doc_nbr 10002, we can see that every time changed data comes from source the most recent row for a key gets updated with a new date (row_expr_dt) and time (row_expr_tm). The most current row for a particular key (e.g., fin_doc = 10002) displays the row_expr_dt as '9999-12-31'.
The SQL in Figure 4 shows how the join relating to row_eff_ts should appear:
Figure 4: Temporal Relation between primary source and dimension tables.
The 'and' clause highlighted in Figure 4 is a less-than-equal-to predicate which generates the one to one relationship for transaction lineage data of two tables. The join facilitates the one to one relationship should there be lower and upper bounds of the timestamp intervals that are not the same in the joining tables. The dimension table may have different lower and upper bounds of timestamps for current record, compared to primary source or fact table, as dimension data changes slowly.