## GPU, Distributed, Grid, and Cloud Computing

Read section 2.8 to learn about GPU computing. GPU stands for Graphics Processing Unit. These are of interest because of the increase in the amount of graphics data handled by popular laptops and desktop computers. Furthermore, it turns out that since a GPU does primarily arithmetic computations, the architecture of a GPU is applicable to other types of applications that involve arithmetic on large amounts of data.

Read section 2.10 to learn about distributed, grid, and cloud computing. These are configurations of multiple computers for increasing performance and/or decreasing cost for high volume database access, sharing of resources, or accessing remote computer resources, respectively.

#### 2.8 GPU computing

A GPU (or sometimes General Purpose Graphics Processing Unit (GPGPU)) is a special purpose processor, de-signed for fast graphics processing. Howver, since the operations done for graphics are a form of arithmetic, GPUs have gradually evolved a design that is also useful for non-graphics computing. The general design of a GPU is motivated by the ‘graphics pipeline’: in a form of data parallelism (section 2.4.1) identical operations are performed on many data elements, and a number of such blocks of data parallelism can be active at the same time.

Present day GPUs have an architecture that combines SIMD and MIMD parallelism. For instance, an NVidia GPU has 16 Streaming Multiprocessors (SMs), and a SMs consists of 8 Streaming Processors (SPs), which correspond to processor cores; see figure 2.13.

The SIMD, or , nature of GPUs becomes apparent in the way CUDA starts processes. A kernel, that is, a function that will be executed on the GPU, is started on mn cores by:

KernelProc<< m,n >>(args)

The collection of mn cores executing the kernel is known as a grid, and it is structured as m thread blocks of n threads each. A thread block can have up to 512 threads.

Recall that threads share an address space (see section 2.5.1.1), so they need a way to identify what part of the data each thread will operate on. For this, the blocks in a thread are numbered with x, y coordinates, and the threads in a block are numbered with x, y, z coordinates. Each thread knows its coordinates in the block, and its block’s coordinates in the grid.

Thread blocks are truly data parallel: if there is a conditional that makes some threads take the true branch and others the false branch, then one branch will be executed first, with all threads in the other branch stopped. Subsequently, and not simultaneously, the threads on the other branch will then execute their code. This may induce a severy performance penalty.

These are some of the differences between GPUs and regular CPUs:

• First of all, as of this writing (late 2010), GPUs are attached processors, so any data they operate on has to be transferred from the CPU. Since the memory bandwidth of this transfer is low, sufficient work has to be done on the GPU to overcome this overhead.
• Since GPUs are graphics processors, they put an emphasis on single precision floating point arithmetic. To accomodate the scientific computing community, double precision support is increasing, but double precision speed is typically half the single precision flop rate.
• A CPU is optimized to handle a single stream of instructions, that can be very heterogeneous in character; a GPU is made explicitly for data parallelism, and will perform badly on traditional codes.
• A CPU is made to handle one thread , or at best a small number of threads. A GPU needs a large number of threads, far larger than the number of computational cores, to perform efficiently.

#### 2.10 Distributed computing, grid computing, cloud computing

In this section we will take a short look at terms such as cloud computing, and an earlier term distributed computing. These are concepts that have a relation to parallel computing in the scientific sense, but that differ in certain fundamental ways.

Distributed computing can be traced back as coming from large database servers, such as airline reservations systems, which had to be accessed by many travel agents simultaneously. For a large enough volume of database accesses, a single server will not suffice, so the mechanism of remote procedure call was invented, where the central server would call code (the procedure in question) on a different (remote) machine. The remote call could involve transfer of data, the data could be already on the remote machine, or there would be some mechanism that data on the two machines would stay synchronized. This gave rise to the Storage Area Network (SAN). A generation later than distributed database systems, web servers had to deal with the same problem of many simultaneous accesses to what had to act like a single server.

We already see one big difference between distributed computing and high performance parallel computing. Scientific computing needs parallelism because a single simulation becomes too big or slow for one machine; the business applications sketched above deal with many users executing small programs (that is, database or web queries) against a large data set. For scientific needs, the processors of a parallel machine (the nodes in a cluster) have to have a very fast connection to each other; for business needs no such network is needed, as long as the central dataset stays coherent.

Both in HPC and in business computing, the server has to stay available and operative, but in distributed computing there is considerably more liberty in how to realize this. For a user connecting to a service such as a database, it does not matter what actual server executes their request. Therefore, distributed computing can make use of virtualization: a virtual server can be spawned off on any piece of hardware.

An analogy can be made between remote servers, which supply computing power wherever it is needed, and the electric grid, which supplies electric power wherever it is needed. This has led to grid computing or utility computing, with the Teragrid, owned by the US National Science Foundation, as an example. Grid computing was originally intended as a way of hooking up computers connected by a Local Area Network (LAN) or Wide Area Network (WAN), often the Internet. The machines could be parallel themselves, and were often owned by dif-ferent institutions. More recently, it has been viewed as a way of sharing resources, both datasets and scientific instruments, over the network.

Some of what are now described as ‘cloud applications’ are of a massively parallel nature. One is Google’s search engine, which indexes the whole of the Internet, and another is the GPS capability of Android mobile phones, which combines GIS, GPS, and mashup data. This type of parallelism is different from the scientific kind. One computing model that has been formalized is Google’s MapReduce [24], which combines a data parallel aspect (the ‘map’ part) and a central accumulation part (‘reduce’). Neither involves the tightly coupled neighbour-to-neighbour communication that is common in scientific computing. An open source framework for MapReduce computing exists in Hadoop [3]. Amazon offers a commercial Hadoop service.

The concept of having a remote computer serve user needs is attractive even if no large datasets are involved, since it absolves the user from the need of maintaining software on their local machine. Thus, Google Docs offers various ‘office’ applications without the user actually installing any software. This idea is sometimes called Software-as-a-Service, where the user connects to an ‘application server’, and accesses it through a client such as a web browser. In the case of Google Docs, there is no longer a large central dataset, but each user interacts with their own data, maintained on Google’s servers. This of course has the large advantage that the data is available from anywhere the user has access to a web browser.

The term cloud computing usually refers to this internet-based model where the data is not maintained by the user. However, it can span some or all of the above concepts, depending on who uses the term. Here is a list of characteristics:

• A cloud is remote, involving applications running on servers that are not owned by the user. The user pays for services on a subscription basis, as pay-as-you-go.
• Cloud computing is typically associated with large amounts of data, either a single central dataset such as on airline database server, or many independent datasets such as for Google Docs, each of which are used by a single user or a small group of users. In the case of large datasets, they are stored distributedly, with concurrent access for the clients.
• Cloud computing makes a whole datacenter appear as a single computer to the user [71].
• The services offered by cloud computing are typically business applications and IT services, rather than scientific computing.
• Computing in a cloud is probably virtualized, or at least the client interfaces to an abstract notion of a server. These strategies often serve to ‘move the work to the data’.
• Server processes are loosely coupled, at best synchronized through working on the same dataset.
• Cloud computing can be interface through a web browser; it can involve a business model that is ‘pay as you go’.
• The scientific kind of parallelism is not possible or not efficient using cloud computing.

Cloud computing clearly depends on the following factors:

• The ubiquity of the internet;
• Virtualization of servers;
• Commoditization of processors and hard drives.

The infrastructure for cloud computing can be interesting from a computer science point of view, involving dis-tributed file systems, scheduling, virtualization, and mechanisms for ensuring high reliability.

Source: Victor Eijkhout, Edmond Chow, and Robert van de Geijn, https://s3.amazonaws.com/saylordotorg-resources/wwwresources/site/textbookuploads/5345_scicompbook.pdf