Interactive Visualizations of Big Data

Abstract

Purpose

Big Data introduces high amounts and new forms of structured, unstructured and semi-structured data into the field of accounting and this requires alternative data management and reporting methods. Generating insights from these new data sources highlight the need for different and interactive forms of visualization in the field of visual analytics. Nonetheless, a considerable gap between the recommendations in research and the current usage in practice is evident. In order to understand and overcome this gap, a detailed analysis of the status quo as well as the identification of potential barriers for adoption is vital. The paper aims to discuss this issue.

Design/methodology/approach

A survey with 145 business accountants from Austrian companies from a wide array of business sectors and all hierarchy levels has been conducted. The survey is targeted toward the purpose of this study: identifying barriers, clustered as human-related and technological-related, as well as investigating current practice with respect to interactive visualization use for Big Data.

Findings

The lack of knowledge and experience regarding new visualization types and interaction techniques and the sole focus on Microsoft Excel as a visualization tool can be identified as the main barriers, while the use of multiple data sources and the gradual implementation of further software tools determine the first drivers of adoption.

Research limitations/implications

Due to the data collection with a standardized survey, there was no possibility of dealing with participants individually, which could lead to a misinterpretation of the given answers. Further, the sample population is Austrian, which might cause issues in terms of generalizing results to other geographical or cultural heritages.

Practical implications

The study shows that those knowledgeable and familiar with interactive Big Data visualizations indicate high perceived ease of use. It is, therefore, necessary to offer sufficient training as well as user-centered visualizations and technological support to further increase usage within the accounting profession.

Originality/value

A lot of research has been dedicated to the introduction of novel forms of interactive visualizations. However, little focus has been laid on the impact of these new tools for Big Data from a practitioner's perspective and their needs.

Keywords

Financial reporting
Big data
Visual analytics
Interactive visualization
Big data visualization

Source: Lisa Maria Perkhofer, Peter Hofer, Conny Walchshofer, Thomas Plank, and Hans-Christian Jetter, https://www.emerald.com/insight/content/doi/10.1108/JAAR-10-2017-0114/full/html
This work is licensed under a Creative Commons Attribution 4.0 License.

1. Introduction

Over the last decade, there has been a tremendous increase in the amount of digital data accrued from various sources. Database systems allow for a mass-collection of data which can be used for problem detection and prediction, but also for gaining an increased overall understanding of current business models by identifying important influencing factors. Consequently, we can observe an increased interest in using these large structured and unstructured data sets in order to enhance informed and analytical decision-making.

This trend has also influenced accounting professionals. Handling large amounts of data is not a new requirement as at a fundamental level the profession has to summarize, structure and prepare data for various decision-making purposes. However, the concept of Big Data is increasingly not only understood as the handling of data high in volume, but also high in variety and velocity. This is because not only is company internal data used for analysis (mostly structured data) but also external sources such as websites, texts, videos, the Internet of Things, RFID, sensors and other items/sources (mostly semi-structured and unstructured data), which are required to generate a more comprehensive picture. Nonetheless, we speak of Big Data, as soon as one of the above-mentioned criteria is met.

In this context, the discipline of information visualization (InfoVis) has gained increasing attention because its goal is to identify and create distinct images by emphasizing particular task or data characteristics to facilitate understanding and comprehension. Visualization per se is particularly helpful in this regard and especially for management accounting as the objective is to inform internal and external stakeholders about the past, the current and the future state of the company. By means of visualization, trends, correlations and irregularities can be localized in a more efficient and effective way. This is especially true if the data sets are increasing in size and complexity. To do so, reporting in various forms (e.g. internal and external) has been institutionalized in accounting and the use of traditional visualizations (for instance pie, column or bar charts) is already common practice.

However, with the change in the data structure brought about by Big Data use, researchers and domain experts argue that the way various stakeholders are being informed needs to be altered. Including not only structured but also semi-structured and unstructured data sets, led to the development of newer forms of visualizations better suited for the new requirements. Recently, these new and interactive visualization options are generated on a regular basis, nonetheless, studies indicate that there is a considerable gap between what is possible and recommended from an expert’s perspective to what is actually employed by practitioners. Although the value of the new visualization forms in the context of Big Data is recognized, it seems to be rather a preferred analytics topic on the wish list for future implementation than something widely used. Consequently, users remain to rely on simplistic and traditional visualizations for complex problems, which erroneously simplifies the related information of data sets. Unfortunately, this can cause interesting relationships to remain hidden, decision-making options to be reduced, and users to be encouraged to rely on biases and heuristics during the decision-making process.

The purpose of this paper is twofold: first, we investigate the status quo of reporting and analyze whether the incorporation of semi-structured and unstructured data sets have already taken place and whether this change has caused modifications in current reporting practice with regard to visualization use. With respect to modifications made, we are not only interested in the adoption of newer visualization types but also in the adoption of interaction techniques, as the latter is also a key component for sense-making in early Big Data visual analytics processes. Second, we focus on reasons regarding the previously mentioned slow adaption. We contribute to the ongoing discussion by being the first to investigate in detail why this obvious gap between experts and practitioners exists. To do so, this paper will systematically distinguish between human-related and technology-related barriers from a finance and accounting-related perspective. While for human-related barriers, we concentrate on the perception and experience with new visualization forms, we investigate functionalities and characteristics of tools and data sets for technology-related ones. Gaining an understanding on barriers may help in overcoming them and enhance the use of interactive visualizations for Big Data, which is highly relevant for sense-making in a data driven environment. Further, this is one of the few studies that concentrate on actual users and their needs rather than introducing and promoting new visualization options, which is current practice in the field of InfoVis.

Empirical evidence is collected using a quantitative questionnaire, which is distributed to accounting professionals in Austria. In total, 145 evaluable responses from a broad variety of business sectors could be obtained and were used for analysis. In order to distinguish the visualization types under investigation look at the following tables and figures. While Figure 1 represents visualizations encountered in everyday life (which are also the main types used by accounting professionals in the past and in the further course of this paper defined as type I visualizations), Figure 2 represents new and more complex visualization types to be used in an interactive manner especially designed to handle large structured and unstructured data sets which are henceforth called type II visualizations.

2. Theoretical Background

The profession of accounting has been handling large data sets for a long time; however, the integration of various data sources increases the necessity to change current evaluation and reporting practices. Big Data is not only increasing the data volume for analysis, but also its veracity and velocity. Visualizations and in particular new, complex type II visualizations are an essential component of identifying relationships, outliers and patterns within these large structured and unstructured data sets. However, based on previous studies it seems that newer and interactive visualizations are creating barriers inhibiting their widespread use.

According to the literature on adaption and/or resistance in an information systems context, it can be identified that barriers manifest if the costs outweigh its benefits. In a first step toward ascertaining whether type II visualizations are a vain attempt by domain experts, we are going to evaluate "benefits". Hence, we provide evidence for the need to adapt existing and frequently used visualizations to account for changes in tasks and data characteristics. In a second step, we are investigating potential reasons that enhance "costs". These seem to arise due to inherent factors of the technology (e.g. a poor visualization design, no accessibility to interactive type II visualizations, the necessity of purchasing software) and due to factors associated with human-computer interaction (e.g. poor interaction design, little or no experience with new visualization types).

2.1 Benefits of domain specific interactive visualizations

2.1.1 Information processing and visualizations

Visualizations per se have been identified to be useful for information processing since the 1970s. This is the case because visualizing information supports specific features of the data as well as various abilities of the decision maker. This interplay allows for visualizations to be seen as a mode that speaks a unified language, that supports the comprehension of large amounts of information, and that enhances the ability of humans to detect patterns, trends and sequences. As a result, visualizations are said to boost information processing by relying on the human perceptual system, which is highly developed and allows multiple processes to be executed simultaneously.

However, for an efficient and effective usage of visualizations, they need to be adjusted depending on the situation and the user. The need for adjustment concerning task characteristics was conceptualized in the 1990s by the theory of cognitive fit; this theory states that efficient and effective decision-making can only be achieved if the external representation (the visualization handed to the decision maker or user) fits the user's internal representation (the mental representation the decision maker associates with the task). Otherwise, additional cognitive effort is needed for processing, which deteriorates decision-making quality. This theory has been adjusted multiple times, allowing for other important influences such as data and user characteristics to be recognized.

Acknowledging this fact has led to a broad bandwidth of visualization options. The bandwidth thereby ranges from visualizations encountered in everyday life such as pie, line, or bar charts (for examples see Figure 1) to domain specific multi-dimensional and interactive visualization types such as parallel coordinates, scatterplot matrices, or force-directed graphs (for examples see Figure 2). Each visualization type is introduced because it particularly supports specific tasks and data sets in order to enhance decision-making.

Figure 1. Visualizations used in everyday life (type I visualizations)

Figure 2. Visualizations designed to cope with large structured and unstructured data sets (type II visualizations)

2.1.2 Domain specific visualizations

Consequently, when looking at big, complex and unstructured data sets it seems reasonable to adjust visualization practice accordingly. Two competing strategies (and a combination of both) can be found in the context of visualizing these data sets:

first, the use of new and interactive type II visualizations; and
second, the use of type I visualizations, particularly in an interactive form in combination with computer supported aggregation techniques to report summary data.

With respect to interactive type II visualizations, new forms such as sunburst, force-directed graph, treemap, heatmap, parallel coordinates, etc., have been generated on a regular basis. Type II visualizations are designed such that larger amounts of information can be presented to the user and each type is generated to emphasize particular features of the underlying data set. These visualizations are therefore created to serve a precise purpose and convey a specific message. This allows for insights to emerge that would otherwise have remained hidden. For example, in fraud detection small amounts of continuous money-outflow regularly stay unrecognized because on an aggregated reporting level they are hardly visible. However, through a Sankey chart or parallel coordinates every single transaction can be plotted, allowing for unauthorized payments to be detected with a considerably enhanced probability. These charts show connections and therefore visually indicate problems with missing or wrong classifications. However, these advantages also lead to implications for perceptual processing. It has been hypothesized that type II visualizations increase the need for domain knowledge during the sense-making process. Due to the lack of experience with interaction techniques and the visualization in use, information processing is considered to be impaired due to the increased risk of inducing a state of information overload through the complexity and breadth of the displayed data. Additionally, showing the full bandwidth of data can lead to an overlap of data points in the visual representation and further impair information processing.

The mentioned negative effects of type II visualizations have led to a second focus, namely, type I visualizations, which are extended to an interactive form. More precisely, the user can work with visualization techniques such as column charts but has the possibility to add filters, drill-down options or linking techniques in order to change or reduce the underlying data set. By actively clicking, scrolling and filtering the data, the user gains a deeper understanding of the relations within the data set. In this context, the users can rely on already known and broadly used visualization options while being allowed to interact with the data set. This approach reduces the load imposed on the decision maker because both the visual format and the data volume are presented immediately and stay the same. Through interaction, however, access to the underlying and bigger data set is granted. Interactive type I visualizations provide the user with a general overview before using interaction techniques to drill-down or filter for profound details. The process of interaction is controlled by the user and only applied if he or she is interested in further details or is able to capture anomalies as well as outliers in the presented overview. Negative aspects associated with this approach deal with the level and the technique of aggregation. Aggregation can lead to dangerous misinterpretation by relying on sums and averages, which can drastically reduce the likelihood of detecting anomalies and outliers as well as increasing the risk of hiding interesting relationships within the data set. In the context of data exploration in particular, aggregated data should be avoided due to unknown relationships within the given data.

2.1.3 Interactive visualizations

Both of the above-mentioned approaches have one commonality: interaction. With interaction, a limited amount of data is visible on the screen but at the same time, the user has the possibility to explore the whole data set. Allowing interactive features means providing control to the user by adjusting the values and properties according to their needs. Being allowed to choose what information to display and how to display it can increase understanding and comprehension. Dilla et al. summarize interactive visualization as being an "on demand visualization process that allows decision makers to navigate to selected data and display it at various levels of detail and in various formats". The user or decision maker can individually determine the sequence with which they want to explore the data. Interactive information visualization supports explorative data analysis to identify patterns and generate hypotheses. Moreover, the evaluation process can start according to Shneiderman's mantra: "overview first, zoom and filter, then details-on-demand" and therefore mitigate problems in connection with information overload.

2.2 Cost of domain specific interactive visualizations

2.2.1 Human-related factors

In order to successfully interpret visualizations in a Big Data context and to arrive at comprehensive insights, the user needs the expertise to decode the visualization and to use the provided interactive skillset. As explained before, during the sense-making process, an internal representation is created, which at best corresponds to the external representation. This internal representation stands for the optimum solution a user can think of when confronted with a particular problem. Consequently, the internal representation is dependent on similar task-related performances that the user has experienced previously. Visualizations that were helpful in a similar situation are considered as options. Therefore, an internal representation of a specific visualization can only exist if the user has experience with this particular visualization.

The importance of experience can further be explained by examining cognitive load theory. The theory states that one needs schemes stored in long-term memory in order to process information effectively and efficiently. In other words, the higher the experience of the user with a specific visualization, the better the construction of the corresponding schema in the user's long-term memory and the higher the probability that the appropriate visualization pops up as an internal representation. If no schema exists, processing is inhibited and users do not feel sufficiently supported, and consequently tend to dislike or even oppose the proposed visualization options.

Further, in order to create schemas for future processing, cognitive effort needs to be directed toward learning and rehearsal. On the one hand, this means that the user needs to be confident that directing cognitive resources toward learning is worth the effort (the costs outweigh the benefits); on the other hand, additional investment costs are incurred for sufficient training and support in the initial implementation phases entailed. Visualizations as well as possibilities of interacting with specific visualizations need to be explained and presented. Strategies for efficient and effective learning from cognitive load theory can be applied in this context (e.g. worked examples).

2.2.2 Technology-related factors

From a more technical perspective, Big Data is the collection of large data sets, which can also show a great diversity of data types. The biggest challenge is the efficient use of semi-structured and unstructured data sources (e.g. text, image and video). In this context, a study conducted by IBM showed that the integration of various data sources is on the rise and already common practice, especially when it comes to geo-location based and sensor-based data. Unlike structured data, which can be processed rather easily with traditional relational database management systems (ERP), semi-structured or unstructured data require specific tools for comprehensive data preparation (e.g. parsing, indexing) and analysis. Subsequently, the challenge arises through linking multiple data sources that were initially used as stand-alone silos. Consequently, the physical merging of various data sources calls for adequate technical support. Using multiple data sources therefore not only increases the complexity during decision-making processes but also calls for investment in data storage technologies and analytic tools.

When looking at the annually repeated study conducted by Gartner, we can also observe a change in the offered front-end products. New tools for an easy integration of various sources, as well as an adaption of reporting and planning practice, are offered by almost all major players (e.g. Microsoft, SAP, Oracle). With the increased usage of these tools by practitioners, the market is currently shifting from traditional ERP systems and their related standardized and static reporting practices toward online-platforms and self-service. Furthermore, tools also allow for the integration of interaction and type II visualizations. Easy access and a widespread utilization of such tools could drastically increase the familiarity and experience with these new visualization types.

2.3 Hypotheses development

As discussed in the theoretical background, two competing strategies can be associated with the use of Big Data: interactive type I and interactive type II visualizations. From a human-related perspective, it seems reasonable to keep using type I visualizations. By doing so, lower cognitive effort needs to be invested during analytical decision-making processes. Problems with visualizations and especially with type II visualizations emerge if no schemas are available for reading in long-term memory. The large amount of data in combination with the unusual layout can lead to information overload and consequently impair processing possibilities; as a state of information overload is perceived as stressful and unpleasant, the consequence should be a reduced perceived usability of type II visualization. This leads to the following hypotheses:

H1.

There is no difference in use between interactive type I and interactive type II visualizations.

H2.

The lower the use of type II visualizations, the lower their perceived ease of use (EoU).

Interaction is an essential part of the sense-making process and enhances the user's processing capabilities. Multiple options have been developed such as filtering, zooming, distorting, as well as linking and brushing. However, research also suggests that too much and too complex (e.g. zooming, drill-through, and multiple views of the same data) interaction techniques can negatively affect users. This is the case because the process of interaction is a rather new and challenging concept. Dealing with large amounts of options again increases the risk of information overload. Therefore, in the first stages of implementation the focus should be placed on simple interaction techniques which have also been used in previous reporting systems (e.g. filtering, drag and drop). As most companies are still in the early stages of implementing interactive visualization techniques, simple ones should result in higher performance and subsequently in higher preference and usage:

H3.

There is no difference in use between simple and advanced interaction techniques.

H4.

The lower the use of interaction techniques, the lower their perceived EoU.

Being able to interpret visualizations and use interaction techniques in an efficient and effective manner has a significant influence on preferences and subjective assessment. If no schema is available, either information processing is inhibited or one needs to direct additional cognitive resources toward processing and the creation of schemas. Directing additional resources, however, demands higher engagement and a motivated user. Confirming this line of argument, in their observational study Grammel et al. showed a strong selection bias of users toward already known visualizations even if it meant sacrificing performance. They concluded that when confronted with both, types I and II visualizations, only those familiar with the newer options turned to type II visualizations. More extensive use of type I visualization therefore can be reasoned by referring to the fact that related schemas have already been created as they have been around much longer and they are integrated in daily life use. We therefore propose the following hypotheses (Figure 3):

Figure 3. Research model human-related barriers

H5a.

The lower the familiarity with type II visualizations, the lower their use.

H5b.

The lower the familiarity with type II visualizations, the lower their perceived EoU.

H6.

There is no difference in familiarity between type I and type II visualizations.

It has been shown that type II visualizations and interaction techniques are helpful in the early stages of data exploration, especially for analyzing semi-structured and unstructured data sets. For conventional data sets from structured and internal ERP systems, type I visualizations, which have been employed for centuries, are still a viable option. Consequently, it seems reasonable that type II visualizations as well as interaction techniques are only integrated if a high proportion of semi-structured and unstructured data is used:

H7.

The lower the number of various data sources, the lower the use of type II visualizations.

H8.

The lower the number of various data sources, the lower the use of interaction techniques.

Current studies suggest that Microsoft Excel is today's most popular generic data analytics tool. Microsoft Excel is widely available at low cost; however, it does not provide sufficient possibilities to integrate interactions and it fails to adequately support the creation and usage of type II visualizations. Although by relying on Microsoft Excel, type I visualizations can be created effortlessly, more advanced visualization techniques are either impossible to illustrate or formidable expertise is needed. Thus, as the majority of users are neither visualization nor data analytics experts, new visualization types remain largely unknown or implementation seems to be too vigorous.

Following this line of argument, it seems plausible that with more accessible and usable tools, the community of users could possibly be extended beyond domain experts and novices could be enabled to actively work and interact with data. Tools for visual analytics such as Microsoft Power BI, Tableau, or Qlik have been introduced to the market, allowing the integration of domain specific and interactive visualizations into current reporting practices. We, therefore, propose that with the use of such tools barriers could be reduced, leading to our final hypotheses (Figure 4):

Figure 4. Research model technology-related barriers

H9.

The lower the number of visualization tools used, the lower the use of type II visualizations.

H10.

The lower the number of visualization tools used, the lower the use of interaction techniques.

3. Research Methods

The data collection was completed by the end of 2016 with an online survey. The questionnaire was sent out as well as distributed to participants at the “Controlling Insights Steyr” event, where 337 business practitioners and leaders from 192 different companies took part. This event annually gathers managerial accountants within Austria and therefore provides a solid basis to analyze the status quo of the reporting practice within this area. From the 337 attendees, 105 participated in the survey. To increase the data sample, the questionnaire was additionally sent out to alumni of an Austrian economic university, namely, Facebook. The university has a study program specifically designed for managerial accounting and hence includes the target audience needed for this analysis. This two-step sampling approach resulted in 145 evaluable responses from a broad variety of business sectors.

3.1 Questionnaire

The questionnaire started by introducing the purpose and also by depicting various visualization types under investigation. The visualizations used are clustered by type I (business graphics encountered in everyday life) and type II visualizations (geographical, hierarchical, multi-dimensional, network, text and geographical visualizations). With respect to these options, participants had to answer whether those visualizations are in use. Additionally, they rated familiarity with each type on a seven-point Likert-scale.

For collecting information on interaction techniques, participants were asked to indicate their use. Classifications are made based on the discussion in the theoretical background and other widely cited studies. The following interaction techniques, which are clustered by simple and advanced techniques, have been used:

Simple interaction techniques:
- assigning data to axis (drag and drop, drop-down);
- filtering the data; and
- assigning color and symbols to data (brushing and highlighting).
Advanced interaction techniques:
- selecting data points for further analysis (zooming, drill-through); and
- multiple views of the same data.

For an indication, if interactive type II visualizations are a useful concept, we used the construct of perceived EoU from the technology acceptance model introduced by Fred D. Davis in 1989, which is regularly cited and well recognized in the information systems literature. Per this definition, perceived EoU is “the degree to which the prospective user expects the target system to be free of effort”. More precisely, the participants had to indicate on a seven-point Likert scale if:

interactive visualizations are easy to understand;
they support the comprehension of content;
they decrease task difficulty; and
they increase working performance.

By analyzing these questions, we have been able to derive indications on the perceived benefits of interactive type II visualizations. In the following, these can then be contrasted with the costs of adaption.

With respect to the tools used, options based on frequently used tools of the Gartner Magic Quadrant 2016 were chosen: Qlik, Microsoft Power BI, Tableau, R, SAS and in-house developed software. In addition, an option for others was presented to the participants. For the list of data sources offered, we referred to the IBM report. The options given were ERP, economic data, geographical data, web analytics, social media, and sensor data (IoT). Additionally, we provided the option others.

Finally, information on demographic information was collected. Demographic data results are summarized and presented in the next subsection.

3.2 Demographic information

Table I provides an overview of the respondents, clustered by business sector, Table II summarizes the positions held within the company and Table III summarizes age as well as gender. The tables demonstrate numerous business sectors and therefore support the generalizability of our results across industries. The high proportion (about 50 percent) of participants in management positions indicates high quality data.

Table I Overview of respondents, clustered by business sector

Business sector	No.	%
Manufacturing	55	37.9
Services	14	9.7
Finance, insurance and real estate	10	6.9
Construction	6	4.1
Wholesale and retail trade	13	9.0
Transportation, communications, electric, gas and sanitary services	5	3.4
Public administration	11	7.6
Not specified	31	21.4
Total	145	100.0

Table II Overview of respondents, clustered by positions

Position within the company	No.	%
Top-management	25	21.4
Middle management	33	22.8
Lower management	14	17.2
Employee	41	28.3
Not specified	31	10.3
Total	145	100.0

Table III Overview of respondents, descriptive statistics

Descriptive statistics of participants	No.	%
Male	79	54.5
Female	35	24.1
Not specified	31	21.4
20-30	35	24.1
31-40	36	24.8
41-50	28	19.3
\( \geq\)51	14	9.7
Not specified	32	22.1
Total	145	100.0

3.3 Data analysis

For data analysis, we coded answers with Microsoft Excel and carried out the statistical analysis using SPSS. To evaluate visualization and interaction use as well as data source and visualization tool utilization, a graphically modified table provides information about which and how many are currently in use. Subsequent to the descriptive statistics, the Kruskal–Wallis test was applied in order to perform significance testing. Furthermore, a variance analysis (ANOVA) and Pearson correlation was conducted to determine differences between groups or correlations respectively. Details on the conducted analysis are also provided in the respective results section.

3.4 Limitations

Due to the data collection with a standardized survey, there was no possibility of dealing with participants individually, which could lead to a misinterpretation of the given answers. To proactively avoid a misinterpretation of questions, we attempted to phrase all questions unambiguously and provide introductory information about the purpose of the survey. Pre-tests and interviews with five participants were conducted before launching the study. Further, the sample population is Austrian, which might cause issues in terms of generalizing results to other geographical areas or cultural heritages.

4. Results

This section is clustered in accordance with the goals of this study. First, the status quo in Austria is going to be presented and discussed (section 4.1) before a deeper analysis of possible barriers is conducted. These barriers are divided into human-related (section 4.2.) and technological-related factors (section 4.3.), as discussed in previous sections.

4.1 Status quo in Austria

4.1.1 Visualization use

For the different visualization types, participants had to answer if the presented types are in use within their companies. "Use" is coded with 1 and "No use" with 0. Answers provided for the various visualization types are presented in Figure 5, which is ordered by the number of visualization types in use. The color code provides additional information and highlights the most common combinations in black and the least common combinations in light gray.

Figure 5. Utilization of visualization types

The most frequently utilized visualizations are business graphics (e.g. line, bar and pie) or type I visualizations, which are applied by 93.8 percent (136 out of 145), followed by geographical visualizations (34.5 percent; 50 out of 145). Common combinations are business graphs with geographical or multi-dimensional visualizations. One noteworthy finding is that 40.7 percent base their analysis solely on type I visualizations. A significant difference between these visualization types can be detected (Kruskal–Wallis test).

The following table presents results on possible influences on the use of type II visualization. We checked whether gender, age, the position held within the company, company size or industries have an effect on utilization. Only gender shows a significant difference. Men use type II visualizations more often compared to women with a difference in means of 0.406.

Table IV Possible influences on use of type II visualizations

	Gender ANOVA	AGE correlation	Position ANOVA	Company size correlation	Industries ANOVA
Type II use	p = 0.040	p = 0.210 r = 0.118	p = 0.427	p = 0.483 r = 0.066	p = 0.103

4.1.2 Interaction use

Analysis on the use of various interaction techniques is presented in Figure 6. This analysis shows that the utilization ranges from 86 answers (67.7 percent) for filtering as the most common technique to 27 (21.3 percent) for the selection of data points as the least common one. Overall, 85.8 percent use at least one interaction technique and most of them use a combination of two interaction techniques.

Figure 6. Utilization of interaction techniques

As type II visualizations are recommended to be used interactively, we check for a correlation between the use of type II visualizations and interactions. Using Pearson's correlation, a significant relationship (p=0.003; r=0.246) is evident; this indicates that practitioners in the field of accounting take note of this recommendation. Additionally, we have ascertained whether interaction use is influenced by any of the variables mentioned in Table V. No significant results can be derived for either of the tested possible influences.

Table V Possible influences on interaction use

	Gender ANOVA	AGE correlation	Position ANOVA	Company size correlation	Industries ANOVA
Interaction use	p = 0.990	p = 0.597 r = 0.050	p = 0.138	p = 0.544 r = -0.057	p = 0.112

4.1.3 Data source use

New technologies in data collection and data storage allow for various forms of data to be analyzed for deeper insights. This section therefore analyses how many and how intensely such sources are currently used. For this analysis, 29 participants did not provide answers and therefore the basis is reduced to 116 participants for evaluation. The analysis presents data on "Use" and "No use," which are coded with 1 and 0, respectively.

Although ERP systems top the list (89.6 percent), they are closely followed by economic data from external databases (81.0 percent). Data that are more likely to be clustered as semi-structured or unstructured such as IoT and Social Media are also quite common. IoT and Social Media are used by 52.6 percent and 53.4 percent, respectively. A notable point is that one third of the participants even use a combination of all mentioned data sources and more than 50 percent use at least a second data source in addition to ERP. In order to test for significant differences between the ranks, the Kruskal–Wallis test has been used for an overall investigation (p=0.000) which is significant.

None of the collected demographic variables show significant influence on data sources in use. Gender, age, and position are not tested because if and how many data sources are used is not influenced by a single person but the whole company (Table VI).

Table VI Possible influences on data sources in use

	Gender	AGE	Position	Company size correlation	Industries ANOVA
Data sources in use	n/a	n/a	n/a	p = 0.354	p = 0.989
Data sources in use excluding ERP	n/a	n/a	n/a	p = 285	p = 0.995

4.1.4 Visualization tools use

With respect to visualization tools, we focus on current top-selling software products. This analysis begins by inspecting the utilization of the various products, which are summarized in Figure 8.

Figure 8. Utilization of software tools used

By far the most commonly used tool in use is Microsoft Excel, which is the basis of analysis for 84.8 percent of the companies represented in this survey (or 96.9 percent if those not providing any answer are excluded). On average 1.5 tools are used, with a combination of Microsoft Excel together with Qlik or Microsoft Power BI constituting the most common examples. Under "other software tools," participants stated, for example, IBM Cognos, SAP BI, MircoStrategy, Jedox, or Infor. The Kruskal–Wallis test indicates that there is a significant difference in usage between the different visualization tool options (p=0.000). With respect to the tested additional variables, only industries show a significant result. A high variety of tools are used in the service industry as well as in wholesale and trade. Industries mainly relying on Microsoft Excel are finance and public administration (Table VII).

Table VII Possible influences on tools in use


	Gender	AGE	Position	Company size correlation	Industries ANOVA
Tools in use	n/a	n/a	n/a	p = 0.885	p = 0.012*
Tools in use without Microsoft Excel	n/a	n/a	n/a	p = 723	p = 0.003**

Notes: *p<0.05; **p<0.01

4.1.5 Summary of status quo

In Austria, more than 50 percent of the participants in the discipline of accounting stated that they use type II visualizations (with geographical visualizations being the most frequently used type) and 85.8 percent indicated the use of interaction techniques to some extent. Filtering, as one of the simplest interaction techniques, is used most frequently. Moreover, the use of type II visualizations is positively correlated with the use of interaction following the recommendation of domain experts to use type II visualizations in an interactive form. Given the fact that we can observe different stages of adaption, we have a solid basis for testing reasons for resistance. These reasons (or barriers) are discussed in detail in the following subsections.

In the context of technical advances, we can observe that the use of various data sources besides traditional ERP systems (representing mainly structured data sets) seems to be common. With an average in data source use of 3.9 and with the inclusion of semi-structured and unstructured data sets such as IoT or social media, it can be concluded that the integration of Big Data into the financial analysis has arrived in practice. With respect to visualization tools, Microsoft Excel is still the most common one, however, other tools are also used quite frequently. Big Data, therefore, is no longer a catchphrase; instead, it has already started to change practices and tools in the management accounting profession. Interestingly, the use of tools besides Microsoft Excel is significantly influenced by industries. While companies in the service industry (including advisory) have a high rate of adoption, companies in traditional finance domains (banking, insurance) seem to resist the use of other visualization software tools such as Microsoft Power BI or QlikView.

4.2 Human-related barriers

4.2.1 Resistance to new visualization types

To answer the first hypothesis, we analyzed the difference in use between interactive type I and interactive type II visualizations. Consequently, we separated our data file by those at least using one kind of interaction and those not using any interaction at all. In sum, 109 participants indicated to use interaction, which is the basis for our comparison of interactive visualization type use. Thereof, 41 participants indicated that they only use type I, two participants stated that they only use type II, 62 use a combination of type I and type II, and four use none of the presented visualization options (most likely they are using tables with filtering options). Significance testing based on the Mann–Whitney U test shows that interactive type I visualizations are used more often than interactive type II visualizations (p=0.000), rejecting our null-hypothesis stating no difference between their usage. Therefore, we can detect a resistance to change when it comes to the adaption of newer and probably unfamiliar types of visualizations.

4.2.2 Resistance to new interaction techniques

The logical next step is to analyze if users are also persistent to interaction techniques. Again, the basis for calculation are the 109 participants stating to use interaction. Thereof, only three indicate to use solely advanced interaction techniques, 61 indicate to use solely simple interaction techniques and 45 indicate to use a mix of both. The difference based on Mann–Whitney U-test shows that simple interaction techniques are used significantly more often than advances ones (p=0.000). Based on these results, our null-hypothesis (H3) can be rejected.

H3. There is no difference in use between simple and advanced interaction techniques.

4.2.3 Perceived EoU

To test H2 and H4, perceived EoU is measured. The hypotheses propose a correlation between the use of multiple type II visualizations as well as the use of multiple interaction techniques and the construct's perceived EoU. The correlations are calculated using Pearson's correlation coefficient, while Cronbach's α was utilized to test the internal reliability of the construct. Cronbach's α of perceived EoU is 0.767 and therefore above the 0.7 threshold. The mean level of agreement of the four questions described in Section 3.1 lies between 5.53 and 4.52, which is well above average. This indicates a medium to high perceived EoU for interactive visualizations. For correlation analysis, a sum score of the four constructs is used with the results presented in the following table (Table VIII).

H2. The lower the use of type II visualization, the lower their perceived ease of use (EoU).

H4. The lower the use of interaction techniques, the lower their perceived EoU.

Table VIII Results of perceived EoU

	Type II count	Interaction count
Perceived EoU
Pearson	0.279**	0.220*
Sign.	0.002	0.018
n	116	116

Notes: *p<0.05; **p<0.01

Results indicate that the higher the use of type II visualizations, the more likely participants are to perceive them as helpful in their daily working experience. Furthermore, the strength of the impact of usage on perceived EoU, with a factor of 0.279, can be classified as moderately strong. The same seems to be true for the use of multiple forms of interaction and therefore H2 and H3 can be confirmed.

4.2.4 Familiarity

Visualization types were rated according to the participants' familiarity on a seven-point Likert scale. We included familiarity with visualizations in our study as visualization types could still be known even though they are not used. 1 represents no while 7 indicates a high familiarity. Results based on ANOVA and a post hoc SNK are presented in the following table (Table IX).

Table IX ANOVA familiarity with different chart types (seven-point likert)

ANOVA and post hoc SNK	1	2	3	4	5	6	7	Average	Significant sub-groups		Industries ANOVA
Business graphics	0	3	1	1	2	22	116	6.669				1.000
Geographical Vis	6	10	15	16	43	36	19	4.821			1.000
Multi-dimensional Vis	19	29	35	18	27	14	3	3.986		1.000
Text-/Webbased Vis	14	19	23	21	43	17	8	3.407	0.210
Network Vis	25	26	36	25	23	8	2	3.186

Notes: *p<0.05; **p<0.01

This analysis demonstrates that type I visualizations are by far the most familiar visualization types, which is in line with the high utilization presented in the previous analysis. Based on these results we can reject H6 indicating no difference in familiarity between type I and type II visualizations. However, based on the aggregated average of all type II visualizations it seems that the majority of the participants have at least some experience with type II visualizations. Only five participants indicated that they are not familiar at all (the average score for type II visualizations is 1) and 10 indicated that they are mostly unfamiliar with them (average score of type II visualizations is 2). 49 participants indicate a familiarity above average (above 5).

Again, we test for possible influences of the variables collected (gender, age, the position held within the company, company size or industry). Significant results can be obtained between industries. In service and public administration, we identify a high familiarity, while for the transportation, communication and electric industries a low familiarity is evident. In addition, there is an indication of a higher familiarity depending on positions. Participants in higher positions (top or middle management) are more familiar with type II visualizations compared to participants in lower positions (lower management or employees) in management accounting. The results are presented in Table X.

Table X Possible influences on use familiarity with type II visualizations

	Gender ANOVA	Age correlation	Position ANOVA	Company size correlation	Industries ANOVA
Familiarity_Average	p = 0.406	r = -0.034 p = 0.722	p = 0.077	r = -0161 p = 0.088	p = 0.045
Familiarity_Sum	p = 0.398	r = -0.047 p = 0.621	p = 0.074	r = -0.154 p = 0.103	p = 0.076

To test H5a and H5b, an analysis of the correlation between familiarity and type II utilization as well as familiarity and the perceived EoU has been conducted. The results show a strong positive correlation for both usage and perceived EoU. Therefore, both hypotheses can be confirmed.

H5a. The lower the familiarity with type II visualizations, the lower their use.

H5b. The lower the familiarity with type II visualizations, the lower their perceived EoU.

Table XI Results of familiarity


	Type II count	Perceived EoU
Familiarity_Average
Pearson	0.340**	0.350*
Sign.	0.000	0.000
n	145	116
Familiarity_Sum
Pearson	0.330**	0.372**
Sign.	0.000	0.000
n	145	116

Notes: *p<0.05; **p<0.01

4.2.5 Summary of human-related barriers

Based on this analysis, we can state that a medium degree of familiarity regarding type II visualization is already present in practice. However, only if type II visualizations are used as intended (in combination with interaction techniques) can they release their full potential and enable users to benefit from their use. The lack of willingness to deal with more advanced interaction techniques negatively affects the use of more complex type II visualizations. It is necessary to increase the familiarity for both type II visualizations and advanced interaction techniques in order to achieve more widespread usage throughout industry sectors. As soon as this initial barrier is crossed and participants are familiar with type II visualizations, the perceived EoU will also be positively influenced and thus frequency of use will be enhanced. This last part is essential as it indicates that type II visualizations are not dispensable, as they are considered useful by those knowledgeable. The barrier lies in introducing new options to their user base in an appropriate manner.

4.3 Technological-related barriers

4.3.1 Data sources

Semi-structured or unstructured data sets are mainly connected to economic data, web analytics, social media data, and sensor data, while structured data sets are related to traditional ERP systems. In this analysis, we want to check whether a high usage of semi-structured or unstructured data sets correlates with the likelihood of turning to type II visualizations as well as to a higher number of interaction techniques.

The results in Table XII indicate that a moderate correlation for type II visualizations can be found, while for interaction no correlation exists. Therefore, H7 can be confirmed while H8 needs to be rejected.

Table XII Results on data sources


	Type II count	Interaction count
Source Count
Pearson without ERP	0.236**	0.179
Sign.	0.011	0.055
n	116	116

Notes: *p<0.05; **p<0.01

H7. The lower the number of various data sources, the lower the use of type II visualizations.

H8. The lower the number of various data sources, the lower the use of interaction techniques.

4.3.2 Visualization tools

Analyzing for a relation between the number of visualization tools and the number of different interaction techniques and type II visualizations is done using Pearson's correlation. While there is an effect for the number of tools in use with respect to type II visualizations, there is no effect with respect to interaction count. Therefore, H9 can be confirmed, while H10 needs to be rejected. This analysis is additionally calculated with and without the integration of Microsoft Excel, as Excel does not provide sufficient support for either type II or for advanced interaction techniques.

H9. The lower the number of visualization tools used, the lower the use of type II visualizations.

H10. The lower the number of visualization tools used, the lower the use of interaction techniques.

Table XIII Results of visualization tools

	Tools count	Tools without Microsoft Excel
Type II count
Pearson	0.330**	0.345*
Sign.	0.000	0.000
n	127	127
Interaction Count
Pearson	0.132	0.141
Sign.	0.139	0.113
n	127	127

Notes: *p<0.05; **p<0.01

To further conduct our analysis based on tools, we also explore those solely basing their analysis on Microsoft Excel. Using ANOVA (the split variable is sole Microsoft Excel compared to a combination of tools in use) to calculate the difference in the number of used type II visualizations, it is evident that there is significantly less usage for those participants using only Microsoft Excel (p=0.000). The sole use of Microsoft Excel can therefore itself be seen as a barrier. No results can be obtained based on interaction techniques.

4.3.3 Summary of technological-related barriers

In the context of technological-related barriers, we can first and foremost identify the sole focus on Microsoft Excel as a barrier. The use of different data sets, which are semi-structured or unstructured in nature, can be identified as an enabler or driver. The more data sets that are integrated into traditional reporting and management information systems, the higher the likelihood of type II visualizations being employed. The same can be said about visualization tools as their use also increases usage of type II visualizations.

With respect to interaction, no correlations can be found between data sets or visualization tools. More advanced interaction techniques are not even applied if data sets require or visualization tools offer their use.

5. Conclusion

In this paper, we describe the state-of-the-art of visualization use in Austrian companies from an accounting-related perspective with a focus on Big Data. In particular, we concentrate on novel interactive visualization options (type II visualizations) and analyze their impact on utilization and preference. An analysis of interactive type II visualizations is of importance as the need for an integration of large structured and unstructured data sets into current reporting practices is rising, theoretically favoring their use. Therefore, our objective was to document the current state of adoption with respect to visualization types and interactions techniques, to understand to which extent semi-structured and unstructured data sources are already used by accounting professionals, and how these data sources and structures influence visualization practice and preferences. The latter allows to derive barriers and enablers for adaption, which are clustered in human-related and technology-related factors:

Summarizing the status quo, multiple states of adaption are evident; nevertheless, the majority of companies are still at the beginning stages. Concerning type II visualizations, a mix of types is used with geographical visualizations topping the list. However, their use is still underrepresented compared to type I visualizations. For interaction techniques, filtering is by far the most frequently used technique, however, more advanced techniques such as multiple coordinated views are rarely utilized. Unfortunately, some of the type II visualizations require the utilization of advanced interaction techniques in order to unleash their full potential. Regardless using simple interaction techniques limits EoU of type II visualizations.
With respect to data sources used, traditional data sources (ERP) are by far the most frequently ones used. ERP systems use internal data sources and can be associated with structured data. However, the introduction of additional and semi-structured or unstructured data sources is considerable (e.g. the use of IoT or social media) and the higher their use, the higher the likelihood of including type II visualizations.

The results obtained by this study and the fact that Big Data (increasing volume, variety and velocity of data sets) is being introduced into everyday business processes show that dispensability of type II visualization is not the cause of the identified gap between research and practice. This vague stage between actual adaption and possible resistance is a solid basis for our analysis and allows us to derive related factors and identify possible barriers.
Both, the lack of familiarity as well as the lack of knowledge with respect to new and interactive visualization options have been identified as human-related barriers. This can first be explained as type II visualizations are more complex and therefore increase the risk of information overload, which in turn results in a selection bias toward already known visualization types. Second, the tendency to rely on already known and rather simple interaction techniques aggravates this problem. This is the case because some of the more complex type II visualizations are inherently built on newer forms of interaction techniques. Only with the use of more advanced interaction techniques can these types release their full potential. Results on the proposed hypotheses in the context of human-related barriers are presented in Figure 9.

Figure 9. Results on research model human-related barriers
Technological-related factors did not provide information on potential barriers, with the exception of the sole focus on Microsoft Excel. Instead, these factors seem to be enablers or drivers of adaption. The higher the use of various data sources, the higher the benefits and the necessity of using interactive type II visualizations. Additionally, the higher the use of tools, the easier the access and the inclusion of interactive type II visualizations. In terms of industries, the sole focus on Microsoft Excel is especially high in transportation, communication and electronics (100 percent) as well as in finance, insurance and real estate (60 percent) in our data sample. Results on the proposed hypotheses in the context of technology-related barriers are presented in Figure 10.

Figure 10. Results on research model technology-related barriers

This study is the first to show how the use of interactive visualization types can be boosted, namely, by the use of technological support (tools) as well as by introducing new and interactive visualization options to the target audience in an appropriate manner. Knowledge on their use is a key in order to enhance their perceived EoU and in turn increase their utilization. Education in accounting needs to incorporate interactive visualization in their curriculums to foster appropriate and widespread use. Also, tools might need to include educational support, e.g. short-videos on the construction, operation and understanding of new visualization option to increase usability, especially when users work with Big Data.

Already identified and very promising areas for interactive type II visualizations in managerial accounting are fraud detection, records and risk management. Further, conventional reporting practice (internal and external) could benefit a great deal, as these visualization types are also task and data optimized for semi-structured and unstructured data sets which are increasingly being used. In conclusion, the mentioned gap between research and practice remains predominant, possibly negatively affecting decision-making in a Big Data related context. However, promising ways to overcome this gap have been localized and suggested in this paper.

Site:	Saylor Academy
Course:	BUS610: Advanced Business Intelligence and Analytics
Book:	Interactive Visualizations of Big Data

Printed by:	Guest user
Date:	Monday, 19 May 2025, 3:46 PM

Interactive Visualizations of Big Data

Description

Table of contents

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Keywords

1. Introduction

2. Theoretical Background

2.1 Benefits of domain specific interactive visualizations

2.1.1 Information processing and visualizations

2.1.2 Domain specific visualizations

2.1.3 Interactive visualizations

2.2 Cost of domain specific interactive visualizations

2.2.1 Human-related factors

2.2.2 Technology-related factors

2.3 Hypotheses development

3. Research Methods

3.1 Questionnaire

3.2 Demographic information

3.3 Data analysis

3.4 Limitations

4. Results

4.1 Status quo in Austria

4.1.1 Visualization use

4.1.2 Interaction use

4.1.3 Data source use

4.1.4 Visualization tools use

4.1.5 Summary of status quo

4.2 Human-related barriers

4.2.1 Resistance to new visualization types

4.2.2 Resistance to new interaction techniques

4.2.3 Perceived EoU

4.2.4 Familiarity

4.2.5 Summary of human-related barriers

4.3 Technological-related barriers

4.3.1 Data sources

4.3.2 Visualization tools

4.3.3 Summary of technological-related barriers

5. Conclusion