Visualization methods

Historically, the primary areas of visualization were Science Visualization and Information Visualization. However, during recent decades, the field of Visual Analytics was actively developing.

As a separate discipline, visualization emerged in 1980 as a reaction to the increasing amount of data generated by computer calculations. It was named Science Visualization, as it displays data from scientific experiments related to physical processes. This is primarily a realistic three-dimensional visualization, which has been used in architecture, medicine, biology, meteorology, etc. This visualization is also known as Spatial Data visualization, which focuses on the visualization of volumes and surfaces.

Information Visualization emerged as a branch of the Human-Computer Interaction field in the end of 1980s. It utilizes graphics to assist people in comprehending and interpreting data. As it helps to form mental models of the data, for humans it is easier to reveal specific features and patterns of the obtained information.

Visual Analytics combines visualization and data analysis. It has absorbed features of Information Visualization as well as Science Visualization. The main difference from other fields is the development and provision of visualization technologies and tools.

Efficient visualization tools should consider cognitive and perceptual properties of the human brain. Visualization aims to improve the clarity and aesthetic appeal of the displayed information and allows a person to understand large amount of data and interact with it. Significant purposes of Big Data visual representation are: to identify hidden patterns or anomalies in data; to increase flexibility while searching of certain values; to compare various units in order to obtain relative difference in quantities; to enable real-time human interaction (touring, scaling, etc.).

Visualization methods have evolved much over the last decades (see Fig. 3), the only limit for novel techniques being human imagination. To anticipate the next steps of data visualization development, it is necessary to take into account the successes of the past. It is considered that quantitative data visualization appeared in the field of statistics and analytics quite recently. However, the main precursors were cartography and statistical graphics, created before the 19th century for the expansion of statistical thinking, business planning and other purposes. The evolution in the knowledge of visualization techniques resulted in mathematical and statistical advances as well as in drawing and reproducing images.

Fig. 3

The evolution of visualization methodology. Development of visualization methods originates from 18th century and it is rapid


The evolution of visualization methodology. Development of visualization methods originates from 18th century and it is rapidly improving today due to technical sophistication

By the 16th century, tools for accurate observation and measurement were developed. Precisely, in those days the first steps were done in the development of data visualization. The 17th century was swept by the problem of space, time and distance measurements. Furthermore, the study of the world's population and economic data had started.

The 18th century was marked by the expansion of statistical theory, ideas of data graphical representation and the advent of new graphic forms. At the end of the century thematic maps displaying geological, medical and economic data was used for the first time. For example, Charles de Fourcroy used geometric figures and cartograms to compare areas or demographic quantities. Johann Lambert (1728–1777) was a revolutionary person, who used different types of tables and line graphs to display variable data. The first methods were performed as simple plots followed by one-dimensional histograms. Still, those examples are useful only for small amounts of data. By introducing more information, this type of diagram would reach a point of worthlessness.

At the turn of 20–21st centuries, steps were taken in the development of interactive statistical computing and new paradigms for data analysis. Technological progress was certainly a significant prerequisite for the rapid development of visualization techniques, methods and tools. More precisely, large-scale statistical and graphics software engineering was invented, and computer processing speed and capacity vastly increased.

However, the next step presenting a system with the addition of a time dimension appeared as a significant breakthrough. In the beginning of the present century few dimensional visualization methods were in use as a part 2D/3D node-link diagram. Already at this level of abstraction, any user may classify the goal and specify further analytical steps for the research, but unfortunately, data scaling became an essential issue.

Moreover, currently used technologies for data visualization are already causing enormous resource demands which include high memory requirements and extremely high deployment cost. However, the currently existing environment faces a new limitation based on the large amounts of data to be visualized in contrast to past imagination issue. Modern effective methods are focused on representation in specified rooms equipped with widescreen monitors or projectors.

Nowadays, there are a fairly large number of data visualization tools offering different possibilities. These tools can be classified based on three factors: by the data type, by visualization technique type and by the interoperability. The first refers to the different types of data to be visualized:

  • Univariate data One dimensional arrays, time series, etc.
  • Two-dimensional data Point two-dimensional graphs, geographical coordinates, etc.
  • Multidimensional data Financial indicators, results of experiments, etc.
  • Texts and hypertexts Newspaper articles, web documents, etc.
  • Hierarchical and links The structure subordination in the organization, e-mails, documents and hyperlinks, etc.
  • Algorithms and programs Information flows, debug operations, etc.

The second factor is based on visualization techniques and samples to represent different types of data. Visualization techniques can be both elementary (line graphs, charts, bar charts) and complex (based on the mathematical apparatus). Furthermore, visualization can be performed as a combination of various methods. However, visualized representation of data is abstract and extremely limited by one's perception capabilities and requests (see Fig. 4).

Fig. 4

Human perception capability issue. Human perceptional capabilities are not sufficient to embrace large amount of data


Human perception capability issue. Human perceptional capabilities are not sufficient to embrace large amount of data

Types of visualization techniques are listed below:

  1. 2D/3D standard figure. May be implemented as bars, line graphs, various charts, etc. (see Fig. 5). The main drawback of this type is the complexity of the acceptable visualization for complicated data structures;
  2. Geometric transformation. This technique represents information as scatter diagram (see Fig. 6). This type is geared towards a multi-dimensional data set's transformation in order to display it in Cartesian and non-Cartesian geometric spaces. This class includes methods of mathematical statistics;
  3. Display icons. Ruled shapes (needle icons) and star icons. Basically, this type displays the values of elements of multidimensional data in properties of images (see Fig. 7). Such images may include human faces, arrows, stars, etc. Images can be grouped together for holistic analysis. The result of the visualization is a texture pattern, which varies according to the specific characteristics of the data;
  4. Methods focused on the pixels. Recursive templates and cyclic segments. The main idea is to display the values in each dimension into the colored pixel and to merge some of them according to specific measurements (see Fig. 8). Since one pixel is used to display a single value, therefore visualization of large amounts of data can be reachable with this methodology;
  5. Hierarchical images. Tree maps and overlay measurements (see Fig. 9). These type methods are used with the hierarchical structured data.

The third factor is related to the interoperability with visual imagery and techniques for better data analysis. The application used for the visualization should present visual forms that capture the essence of data itself. However, it is not always enough for a complete analysis. Data representation should be constructed in order to allow a user to have different visual points of view. Thus, the appropriate compatibility should be performed:

  1. Dynamic projection. Non-static change of projections in multidimensional data sets is used. An example of the dynamic projection in two-dimensional plane of multidimensional data in a scatter plots. It is necessary to note that the number of possible projections increases exponentially with the number of measurements and, thus, perception suffers more.
  2. Interactive filtering. In the investigation of large amounts of data there is a need to share data sets and highlight significant subsets in order to filter images. Significantly, that there should be an opportunity to have a visual representation in real time. A subset can be chosen either directly from a list or by determining a subset of the properties of interest;
  3. Scaling images. Scaling is a well-known method of interaction used in many applications. Especially for Big Data processing, this method is very useful due to the ability to represent data in a compressed form. It provides the ability to simultaneously display any part of an image in a more detailed form. Nevertheless, a lower level entity may be represented by a pixel at a higher level, a certain visual image or an accompanying text label;
  4. Interactive distortion supports the research process data using distortion scale with partial detail. The basic idea of this method is that a part of the fine granularity displayed data is shown in addition to one with a low level of details. The most popular methods are hyperbolic and spherical distortion;
  5. Interactive combination brings together a combination of different visualization techniques to overcome specific deficiencies by their conjugation. For example, different points of the dynamic projection can be combined with the techniques of coloring.

To summarize, any visualization method can be classified by data type, visualization technique and interoperability. Each method can support different types of data, various images and varied methods for interaction.

Fig. 5

An example of the 2D/3D standard figures visualization techniques. a The simple line graph and b example of a bar chart


An example of the 2D/3D standard figures visualization techniques. a The simple line graph and b example of a bar chart


Fig. 6

An example of the geometric transformations visualization techniques. a Example of a parallel coordinates and b the scatter p


An example of the geometric transformations visualization techniques. a Example of a parallel coordinates and b the scatter plot


Fig. 7

An example of the display icons visualization techniques. Picture demonstrates the visualization of various social connection


An example of the display icons visualization techniques. Picture demonstrates the visualization of various social connections in Australia


Fig. 8

An example of the methods focused on the pixels. Picture demonstrates an amount of data visualized in pixels. Each color has


An example of the methods focused on the pixels. Picture demonstrates an amount of data visualized in pixels. Each color has its specific meaning

Fig. 9

An example of the hierarchical images. Picture illustrates a tree map of data


An example of the hierarchical images. Picture illustrates a tree map of data


A visual representation of Big Data analysis is crucial for its interpretation. As it was already mentioned, it is evident that human perception is limited. The main purpose of modern data representation methods is related to improvement in forms of images, diagrams or animation. Examples of well known techniques for data visualization are presented below:

  • Tag cloud is used in text analysis, with a weighting value dependent on the frequency of use (citation) of a particular word or phrase (see Fig. 10). It consists of an accumulation of lexical items (words, symbols or combination of the two). This technique is commonly integrated with web sources to quickly familiarize visitors with the content via key words.
  • Clustergram is an imaging technique used in cluster analysis by means of representing the relation of individual elements of the data as they change their number (see Fig. 11). Choosing the optimal number of clusters is also an important component of cluster analysis.
  • Motion charts allow effective exploration of large and multivariate data and interact with it utilizing dynamic 2D bubble charts (see Fig. 12). The blobs (bubbles - central objects of this technique) can be controlled due to variable mapping for which it is designed. For instance, motion charts graphical data tools are provided by Google, amCharts and IBM Many Eyes.
  • Dashboard enables the display of log files of various formats and filter data based on chosen data ranges (see Fig. 13). Traditionally, dashboard consists of three layers: data (raw data), analysis (includes formulas and imported data from data layer to tables) and presentation (graphical representation based on the analysis layer)

Nowadays, there are many publicly available tools to create meaningful and attractive visualizations. For instance, there is a chart of open visualization tools for data visualization and analysis published by Sharon Machils. The author provides a list, which contains more than 30 tools from easiest to most difficult: Zoho Reports, Weave, Infogr.am, Datawrapper and others.

Fig. 10

An example of the tag cloud. This picture illustrates visualization of the paper abstract


An example of the tag cloud. This picture illustrates visualization of the paper abstract


Fig. 11

An example of the clustergram. This picture illustrates different state of data in several clusters


An example of the clustergram. This picture illustrates different state of data in several clusters


Fig. 12

An example of the motion chart. This picture illustrates the data in forms of bubbles that have various meaning based on colo


An example of the motion chart. This picture illustrates the data in forms of bubbles that have various meaning based on color and size


Fig. 13

An example of the dashboard. This picture illustrates pie chart, visualization of data in pixels, line graph and bar chart


An example of the dashboard. This picture illustrates pie chart, visualization of data in pixels, line graph and bar chart

All of these modern methods and tools follow fundamental cognitive psychology principles and use the essential criteria of data successful representation such as manipulation of size, color and connections between visual objects (see Fig. 14). In terms of human cognition, the Gestalt Principles are relevant. The basis of Gestalt psychology is a study of visual perception. It suggests that people tend to perceive the world in a form of holistic ordered configuration rather than constituent fragments (e.g. at first, person perceives forest and after that can identify single trees as part of the whole). Moreover, our mind fills in the gaps, seeks to avoid uncertainty and easily recognizes similarities and differences. The main Gestalt principles such as law of proximity (collection of objects forming a group), law of similarity (objects are grouped perceptually if they are similar to each other), symmetry (people tend to perceive object as symmetrical shapes), closure (our mind tends to close up objects that are not complete) and figure-ground law (prominent and recessed roles of visual objects) should be taken into account in Big Data Visualization.

Fig. 14

Fundamental cognitive psychology principles.


Fundamental cognitive psychology principles. Color is used to catch significant differences in the data sets by view; manipulation of visual object sizes may assist persons to identify the most important elements of the information; representation of connections improves patterns identifications and aims to facilitate data analysis; Grouping objects using similarity principle decreases cognitive load

To this end, the most effective visualization method is the one that uses multiple criteria in the optimal manner. Otherwise, too many colors, shapes, and interconnections may cause difficulties in the comprehension of data, or some visual elements may be too complex to recognize.

After observation and discussion about existing visualization methods and tools for Big Data, we can clarify and outline its important disadvantages that are sufficiently discussed by specialists from different fields. Various ways of data interpretation make them meaningful. It is easy to distort valuable information in its visualization, because a picture convinces people more effectively than textual content. Existing visualization tools aim to create as simple and abstract images as possible, which can lead to a problem when significant data can be interpreted as disordered information and important connections between data units will be hidden from the user. It is a problem of visibility loss, which also refers to display resolution, where the quality of represented data depends on number of pixels and their density. A solution may be in the use of larger screens. However, this concept brings a problem of human brain cognitive-perceptual limitations, as will be discussed in detail in the section Integration with Augmented and Virtual Reality.

Using visual and automated methods in Big Data processing gives a possibility to use human knowledge and intuition. Moreover, it becomes possible to discover novel solutions for complex data visualization. Vast amounts of information motivate researchers and developers to create new tools for quick and accurate analysis. As an example, the rapid development of visualization techniques may be concerned. In the world of interconnected research areas, developers need to combine existing basic, effective visualization methods with new technological opportunities to solve the central problems and challenges of Big Data analysis.