"Good" hardware is integral for a data warehouse and its software to function efficiently, and the architect of the warehouse must be "hardware aware". As each hardware and software technology advances, so do data warehouses with the advent of, for example, new nonvolatile memory (NVM) and high-speed networks for base support. This article focuses on the need to develop and adopt new management and analysis methods.
The Trend of Hardware
In recent years, storage, processor, and network technologies have made a great breakthrough. As shown in Fig. 1, a growing set of new hardware, architecture, and features are becoming the foundation of the future computing platforms. The current
trends indicate that these techniques are significantly changing the underlying environment of traditional data management and analysis systems, including high-performance processors and hardware accelerators, NVM, RDMA-capable (remote direct
memory access) networks. Significantly, the ongoing underlying environments, marked by heterogeneous multi-core architecture and hybrid storage hierarchy, make the already complicated software design space become more sophisticated.
Fig. 1
New hardware and environment
The Trend of Processor Technologies
The development of processor technology has gone through for more than 40 years. Its development roadmap has shifted from scale-up to scale-out, and the aim dramatically shifts away from pursuing higher clock speed and instead focuses on creating
more cores per processor. According to Moore's law, pushing the computing frequency of the processor continuously is one of the most important ways to improve the performance of the computer in the era of serial computing. At the same time, lots
of optimization techniques, such as the instruction-level parallelism (ILP), pipeline, prefetching, branch prediction, out-of-order instruction execution, multi-level cache, and hyper-threading, can be automatically identified and utilized by
the processor and the compiler. Therefore, software can consistently and transparently enjoy free and regular performance gains. However, limited by the heat, power consumption, instruction-level parallelism, manufacturing processes, and other
factors, the scale-up approach reaches the ceiling.
After 2005, high-performance processor technology has entered the multi-core era and multi-core parallel processing technology has become the mainstream. But although data processing capability
has been significantly enhanced in multi-core architectures, software cannot automatically gain the benefits. Instead, programmers have to transform the traditional serial programs into parallel programs, and optimize the algorithm performance
for the LLC (Last Level Cache) of multi-core processors. Nowadays, the performance of multi-core processors has been significantly improved with the semiconductor technology. For example, the 14-nm Xeon processor currently integrates up to 24
cores, supporting up to 3.07 TB memory and 85 GB/s memory bandwidth. However, x86 processor still has the disadvantages of low integration, high power consumption, and high price. Also the general-purpose multi-core processors can hardly to meet
the demands of the highly concurrent applications. The development of the processor is going to be specifically optimized for an application, i.e., specialized hardware accelerators.
GPU, Xeon Phi, field programmable gate array (FPGA),
and the like are representative of dedicated hardware accelerators. By exploiting GPUs, Xeon Phi coprocessors, and FPGAs, parts of compute-intensive and data-intensive workload can be offloaded from the CPU efficiently. Some fundamental hardware
characteristics of these accelerators are given in Table 1. There is no doubt that the processing environment within the computer system becomes more and more complicated, and correspondingly, the data management and analysis systems might try
to seek diversified ways to actively adapt to new situations.
Table 1 Processor characters
Type | Xeon E7-8890 V4 | Xeon Phi 7290F | Xeon Phi 724P | NVIDIA Tesla V100 |
---|---|---|---|---|
#core/#thread | 24/48 | 72/288 | 68/272 | 5120 CUDA cores/640 tensor cores |
Core frequency | 2.20 GHz | 1.50 GHz | 1.3 GHz | 1.455 GHz |
Memory capacity | 3.07 TB | 384 GB | 16 GB | 16 GB HBM2 VRAM |
Cache capacity | 60 MB L3 | 36 MB L2/16 GB HBM | 34 MB L2 | 6 MB L2 |
Memory type | DDR4-1866/4 channels | DDR4-2400/6 channels | MCDRAM/16 channels | HBM2 |
Memory bandwidth (GB/s) | 85 | 115.2 | 500 | 900 |
Price | $7174.00 | $3368.00 | $3324.00 | $149,000.00 |
The Trend of Storage Technologies
As high-performance processors and hardware accelerator technologies develop rapidly, the performance gap between CPU and storage keeps widening year by year. The "memory wall" makes the data access become a non-negligible performance bottleneck.
Faced with the slow I/O capabilities of traditional secondary storage devices, data management and analysis systems have had to adopt some design strategies such as cache pools, concurrency control, and disk-oriented algorithms and data structure
to mitigate or hide I/O performance gap. However, I/O bottlenecks still severely constrain the processing power of data-intensive computing.
It is especially interesting that the new storage medium represented by NVM provides a potential
avenue to break the I/O bottleneck. The NVM is actually a general term for a type of storage technology which does not represent a specific storage technology or medium. It is also referred to as storage class memory (SCM) in some research literature.
Typically, NVMs include phase change memory (PCM), magnetoresistive random access memory (MRAM), resistive random access memory (RRAM), and ferroelectronic RAM (FeRAM). Although the characteristics and manufacturing processes of these memories
are obviously different, they generally have some common features, including durability, high storage density, low-latency random read/write, and fine-grained byte addressing. The specifications are given in Table 2. From a performance point of
view, NVM is close to the DDR memory, but also has a nonvolatile feature. Therefore, it may gradually become the main storage device, while DDR memory is used as a temporary data cache. At present, flash memory technology is today a mature technology.
Take a single PCIe flash memory for example. Its capacity can reach up to 12.8 TB, and read/write performance is also high. Based on this, it can be a cache between RAM and hard disk and also can be an alternative of the hard drive as a persistent
storage device. In terms of energy consumption, DRAM consumes less energy under high load. On the contrary, it consumes more energy under low load than other storage devices because refreshing the entire DRAM is required. The common feature of
the NVM is that they have dual capabilities of both DRAM-like high-speed access and disk-like persistence, effectively breaking the "performance wall" of traditional storage medium that cannot overcome.
Table 2 Performance metrics of different storage devices
Attribute | DRAM | PCM (25 nm) | MRAM | RRAM | FeRAM | Flash SSD | HDD |
---|---|---|---|---|---|---|---|
Nonvolatile | No | Yes | Yes | Yes | Yes | Yes | Yes |
Density (um2/bit) | 0.00380 | 0.00250 | 0.00740 | 0.00580 | 0.00355 | 0.00210 | 0.00006 |
Read latency (ns) | 55 | 48 | 20 | 116 | 55 | 25,000 | 3,000,000 |
Write latency (ns) | 55 | 150 | 20 | 145 | 55 | 200,000 | 3,000,000 |
Read power consumption (pJ/bit) | 12.5 | 2 | 0.02 | 4.81 | 2.13 | 250 | 2500 |
Write power consumption (pJ/bit) | 12.5 | 19.2 | 0.03 | 13.8 | 10.12 | 250 | 2500 |
Lifespan | > 1015 | 108 | 1015 | 108 | 1014 | 104 | > 1015 |
At the same time, the development of new storage technologies has also had a significant impact on processor technology. The 3D stacking technology that enhances higher bandwidth can be applied to the on-board storage of many-core processors, delivering high-performance data cache support for the powerful parallel processing. With the NVM technology, the multi-level hybrid storage environment will certainly break the balance among the CPU, main memory, system bus, and external memory in the traditional computer architecture. It will also change the existing storage hierarchy and optimize data access critical paths to bridge the performance gap between storage tiers, providing new opportunities for data management and analytics.
The Trend of Network Technologies
In addition to the local storage I/O bottleneck, the network I/O bottleneck is also the main performance issue in the datacenter. Under traditional Ethernet network, the limited data transmission capability and the non-trivial CPU overhead of the
TPC/IP stack have severely impacted the performance of distributed data processing. Therefore, the overall throughput of distributed database system is sharply reduced under the influence of the high proportion of distributed transactions, which
lead to potentially heavy network IO. Based on this, the existing data management systems have to resort to some specific strategies such as coordinated partitioning, relaxed consistency assurance, and deterministic execution scheme to control
or reduce the ratio of distributed transactions. However, most of these measures suffer from unreasonable assumptions and applicable conditions, or opaqueness to application developers. In particular, the scalability of the system is still greatly
limited, especially when the workload does not have the distinguishable characteristics to be split independently.
It is important to note that the increased contention likelihood is the most cited reasons when discussing the scalability
issue of distributed transactions, but the most important factor is the CPU overhead of the TCP/IP stack incurred by traditional Ethernet network. In other words, software-oriented optimization will not fundamentally address the scalability issue
within distributed environments. In recent years, the high-performance RDMA-enabled network is dramatically improving the network latency and ensures that users can bypass the CPU when transferring data on the network. InfiniBand, iWARP, and RoCE
are all RDMA-enabled network protocols, with appropriate hardware that can accelerate operations to increase the value of application. With price reduction in RDMA-related hardware, more and more emerging industry clusters working on the RDMA-related
network environment, requiring a fundamental rethinking of the design of data management and analysis systems, include but not limited to distributed query, transaction processing, and other core functions.
Although the development of new
hardware exhibits complicated variety and the composition of the hardware environment is also uncertain, it is foreseeable that they eventually will become the standard hardware components in the future. Data management and analysis on modern
hardware will become a new research hotspots field.