Introduction

In a competitive business environment, successful businesses are data driven. A data warehouse architecture selection is founded on business needs. The business executives would want to make strategic as well as tactical business decisions with accurate information at the right time. The accuracy of information is dependent on detailed data as well as time-varying data. The data warehouse with time-varying data is instrumental in strategic decision making. The business requirements for temporal data go beyond what is typical of conventional database implementation.

Customer transactions keep changing over time with changing customer behavior patterns. Temporal data is concerned with time-varying data. Time-varying data states that each version of a record is relevant to some moment in time. The temporal aspects normally consist of valid-time and transaction-time. Valid time defines the time period when a particular tuple is true in modeled reality, while the transaction time defines the time period when that particular tuple is captured in the database.

A temporal data warehouse is significantly different from an operational database in many respects. Operational source systems are usually non-temporal and maintain only current state of data as opposed to complete history of data with transaction lineage. Data warehouses are always maintained to hold large volumes of historical data.

Data management and warehousing is considered the foundation of business intelligence (BI) and analytics. During the last decade data warehousing has achieved prominence. Scattered databases and data-marts are being consolidated into more useful data warehouses. The advent of new information technologies and techniques such as temporal data warehousing presents unique opportunities for firms to enhance their customer agility. This also speaks for maturity of data warehousing technologies. Temporal data warehousing has gained prominence among different stakeholders including suppliers, business users, and researchers because of user popularity and management patronage.

"A temporal data warehouse is a repository of historical information, originating from multiple, autonomous, (sometimes) heterogeneous and non-temporal sources. It is available for queries and analysis (such as data mining) not only to users interested in current information but also to those interested in researching past information to identify relevant trends".

W.H. Inmon defines temporal data warehouse as "a collection of integrated, subject-oriented databases designed to support the DSS function, where each unit of data is relevant to some moment in time. The data warehouse contains atomic data and lightly summarized data". In this definition time-varying means the possibility to keep different values of the same record according to its changes over time.

Temporal data warehouses provide a history of serialized changes to data identified by times when changes occurred. This allows for querying the current state as well as past states of a record. Conventional databases provide users only current state of data which is true as of a single point in time. Users of a data warehouse are not only interested in the current state of data, but also in the transaction lineage as to how a particular record has evolved over time. A record inserted in a database is never physically deleted. A new record or a new version of an existing record is always added to reflect a transaction lineage for that data. Thus an evolving history of data is maintained in the temporal data warehouse.

Temporal data has important applications in many domains. Most of those domains applications can benefit from a temporal data warehouse such as banking, retail sales, financial services, medical records, inventory management, telecommunications, and reservation systems. In the case of a bank account, an account holder's balance will change after each transaction. The amount or descriptions of a financial document will change for business purposes. Such data is often valuable to different stakeholders and should be stored in both current state and all previously current states.

Although there are clear benefits and demand for temporal database management systems (DBMS), there are only a few commercially available. Most of the current commercial databases are non-temporal and hence, they do not provide a special temporal query language, a temporal data definition language, or a temporal manipulation language.

In the absence of a temporal DBMS, we argue that an effort should be made to take advantage of current commercial databases and allow for handling multiple versions of data including past, current, and future states of data. This can be done with application coding for handling multiple versions of data. The current commercial relational databases with a high-level language such as SQL are mature enough to manage complex data transformations and also have performance improvement measures, such as various efficient algorithms for indexing. The improvements in the area of disk storage technology and declining cost of data storage have also made it possible to efficiently store and manage temporal data with all transaction lineages.

The temporal database implementations could be done by extending a non-temporal data model into a temporal data model and building temporal support into applications. Two timestamp fields need to be added to each table of the conventional data model. The new columns consist of 'row effective timestamp' and 'row expired timestamp' which hold date and time values to identify each individual row in terms of their present status such as past or current, or future.

The data warehouses are refreshed at a certain time intervals with data from different operational databases. In order to keep data warehouses run efficient and to maintain consistent data in the warehouse it is important that data arrive in the warehouse in a timely fashion and be loaded via batch cycle runs. Since data warehouse consists of thousands of tables in multiple different subject areas the table refreshes must be done in order of dependencies via batch cycles. Batch refreshes have proven to be an efficient method of loading from the standpoint of performance and data consistency. Another aspect of storing data in data warehouses is that initially data is captured in staging subject areas with one to one relation between operational source and data warehouse staging area tables. Analytical subject areas are refreshed from the staging area tables. The analytical subject area refresh requires collecting data from more than one subject area or more than one table from a particular staging subject area.

The purpose of this article is to discuss implementations such as temporal data update methodologies, viewing of data consistently, coexistence of load and query against the same table, performance improvement of load and report queries, and maintenance of views. The intended result is a temporal data warehouse that can be used concurrently to load new data and allow various reporting applications to return results consistent with their selected time slice.