The Trend of Hardware

In recent years, storage, processor, and network technologies have made a great breakthrough. As shown in Fig. 1, a growing set of new hardware, architecture, and features are becoming the foundation of the future computing platforms. The current trends indicate that these techniques are significantly changing the underlying environment of traditional data management and analysis systems, including high-performance processors and hardware accelerators, NVM, RDMA-capable (remote direct memory access) networks. Significantly, the ongoing underlying environments, marked by heterogeneous multi-core architecture and hybrid storage hierarchy, make the already complicated software design space become more sophisticated.

Fig. 1

figure 1

New hardware and environment


The Trend of Processor Technologies

The development of processor technology has gone through for more than 40 years. Its development roadmap has shifted from scale-up to scale-out, and the aim dramatically shifts away from pursuing higher clock speed and instead focuses on creating more cores per processor. According to Moore's law, pushing the computing frequency of the processor continuously is one of the most important ways to improve the performance of the computer in the era of serial computing. At the same time, lots of optimization techniques, such as the instruction-level parallelism (ILP), pipeline, prefetching, branch prediction, out-of-order instruction execution, multi-level cache, and hyper-threading, can be automatically identified and utilized by the processor and the compiler. Therefore, software can consistently and transparently enjoy free and regular performance gains. However, limited by the heat, power consumption, instruction-level parallelism, manufacturing processes, and other factors, the scale-up approach reaches the ceiling.

After 2005, high-performance processor technology has entered the multi-core era and multi-core parallel processing technology has become the mainstream. But although data processing capability has been significantly enhanced in multi-core architectures, software cannot automatically gain the benefits. Instead, programmers have to transform the traditional serial programs into parallel programs, and optimize the algorithm performance for the LLC (Last Level Cache) of multi-core processors. Nowadays, the performance of multi-core processors has been significantly improved with the semiconductor technology. For example, the 14-nm Xeon processor currently integrates up to 24 cores, supporting up to 3.07 TB memory and 85 GB/s memory bandwidth. However, x86 processor still has the disadvantages of low integration, high power consumption, and high price. Also the general-purpose multi-core processors can hardly to meet the demands of the highly concurrent applications. The development of the processor is going to be specifically optimized for an application, i.e., specialized hardware accelerators.

GPU, Xeon Phi, field programmable gate array (FPGA), and the like are representative of dedicated hardware accelerators. By exploiting GPUs, Xeon Phi coprocessors, and FPGAs, parts of compute-intensive and data-intensive workload can be offloaded from the CPU efficiently. Some fundamental hardware characteristics of these accelerators are given in Table 1. There is no doubt that the processing environment within the computer system becomes more and more complicated, and correspondingly, the data management and analysis systems might try to seek diversified ways to actively adapt to new situations.

Table 1 Processor characters

Type Xeon E7-8890 V4 Xeon Phi 7290F Xeon Phi 724P NVIDIA Tesla V100
#core/#thread 24/48 72/288 68/272 5120 CUDA cores/640 tensor cores
Core frequency 2.20 GHz 1.50 GHz 1.3 GHz 1.455 GHz
Memory capacity 3.07 TB 384 GB 16 GB 16 GB HBM2 VRAM
Cache capacity 60 MB L3 36 MB L2/16 GB HBM 34 MB L2 6 MB L2
Memory type DDR4-1866/4 channels DDR4-2400/6 channels MCDRAM/16 channels HBM2
Memory bandwidth (GB/s) 85 115.2 500 900
Price $7174.00 $3368.00 $3324.00 $149,000.00

The Trend of Storage Technologies

As high-performance processors and hardware accelerator technologies develop rapidly, the performance gap between CPU and storage keeps widening year by year. The "memory wall" makes the data access become a non-negligible performance bottleneck. Faced with the slow I/O capabilities of traditional secondary storage devices, data management and analysis systems have had to adopt some design strategies such as cache pools, concurrency control, and disk-oriented algorithms and data structure to mitigate or hide I/O performance gap. However, I/O bottlenecks still severely constrain the processing power of data-intensive computing.

It is especially interesting that the new storage medium represented by NVM provides a potential avenue to break the I/O bottleneck. The NVM is actually a general term for a type of storage technology which does not represent a specific storage technology or medium. It is also referred to as storage class memory (SCM) in some research literature. Typically, NVMs include phase change memory (PCM), magnetoresistive random access memory (MRAM), resistive random access memory (RRAM), and ferroelectronic RAM (FeRAM). Although the characteristics and manufacturing processes of these memories are obviously different, they generally have some common features, including durability, high storage density, low-latency random read/write, and fine-grained byte addressing. The specifications are given in Table 2. From a performance point of view, NVM is close to the DDR memory, but also has a nonvolatile feature. Therefore, it may gradually become the main storage device, while DDR memory is used as a temporary data cache. At present, flash memory technology is today a mature technology. Take a single PCIe flash memory for example. Its capacity can reach up to 12.8 TB, and read/write performance is also high. Based on this, it can be a cache between RAM and hard disk and also can be an alternative of the hard drive as a persistent storage device. In terms of energy consumption, DRAM consumes less energy under high load. On the contrary, it consumes more energy under low load than other storage devices because refreshing the entire DRAM is required. The common feature of the NVM is that they have dual capabilities of both DRAM-like high-speed access and disk-like persistence, effectively breaking the "performance wall" of traditional storage medium that cannot overcome.

Table 2 Performance metrics of different storage devices

Attribute DRAM PCM (25 nm) MRAM RRAM FeRAM Flash SSD HDD
Nonvolatile No Yes Yes Yes Yes Yes Yes
Density (um2/bit) 0.00380 0.00250 0.00740 0.00580 0.00355 0.00210 0.00006
Read latency (ns) 55 48 20 116 55 25,000 3,000,000
Write latency (ns) 55 150 20 145 55 200,000 3,000,000
Read power consumption (pJ/bit) 12.5 2 0.02 4.81 2.13 250 2500
Write power consumption (pJ/bit) 12.5 19.2 0.03 13.8 10.12 250 2500
Lifespan > 1015 108 1015 108 1014 104 > 1015

At the same time, the development of new storage technologies has also had a significant impact on processor technology. The 3D stacking technology that enhances higher bandwidth can be applied to the on-board storage of many-core processors, delivering high-performance data cache support for the powerful parallel processing. With the NVM technology, the multi-level hybrid storage environment will certainly break the balance among the CPU, main memory, system bus, and external memory in the traditional computer architecture. It will also change the existing storage hierarchy and optimize data access critical paths to bridge the performance gap between storage tiers, providing new opportunities for data management and analytics.


The Trend of Network Technologies

In addition to the local storage I/O bottleneck, the network I/O bottleneck is also the main performance issue in the datacenter. Under traditional Ethernet network, the limited data transmission capability and the non-trivial CPU overhead of the TPC/IP stack have severely impacted the performance of distributed data processing. Therefore, the overall throughput of distributed database system is sharply reduced under the influence of the high proportion of distributed transactions, which lead to potentially heavy network IO. Based on this, the existing data management systems have to resort to some specific strategies such as coordinated partitioning, relaxed consistency assurance, and deterministic execution scheme to control or reduce the ratio of distributed transactions. However, most of these measures suffer from unreasonable assumptions and applicable conditions, or opaqueness to application developers. In particular, the scalability of the system is still greatly limited, especially when the workload does not have the distinguishable characteristics to be split independently.

It is important to note that the increased contention likelihood is the most cited reasons when discussing the scalability issue of distributed transactions, but the most important factor is the CPU overhead of the TCP/IP stack incurred by traditional Ethernet network. In other words, software-oriented optimization will not fundamentally address the scalability issue within distributed environments. In recent years, the high-performance RDMA-enabled network is dramatically improving the network latency and ensures that users can bypass the CPU when transferring data on the network. InfiniBand, iWARP, and RoCE are all RDMA-enabled network protocols, with appropriate hardware that can accelerate operations to increase the value of application. With price reduction in RDMA-related hardware, more and more emerging industry clusters working on the RDMA-related network environment, requiring a fundamental rethinking of the design of data management and analysis systems, include but not limited to distributed query, transaction processing, and other core functions.

Although the development of new hardware exhibits complicated variety and the composition of the hardware environment is also uncertain, it is foreseeable that they eventually will become the standard hardware components in the future. Data management and analysis on modern hardware will become a new research hotspots field.