A Survey of Visualizing Business Data
Site: | Saylor Academy |
Course: | BUS610: Advanced Business Intelligence and Analytics |
Book: | A Survey of Visualizing Business Data |
Printed by: | Guest user |
Date: | Friday, 4 April 2025, 1:31 PM |
Description
By classifying business intelligence appropriately, we allow ourselves to spot opportunities for investment and exploitation, increasing our ability to turn the data and insight we collect into profit. Business intelligence and its research can be
divided into a taxonomy. This paper breaks that down. Even without data, are there areas that may contain similar opportunities?
Abstract
A rapidly increasing number of businesses rely on visualisation solutions for their data management challenges. This demand stems from an industry-wide shift towards data-driven approaches to decision making and problem-solving. However, there is an overwhelming mass of heterogeneous data collected as a result. The analysis of these data become a critical and challenging part of the business process. Employing visual analysis increases data comprehension thus enabling a wider range of users to interpret the underlying behaviour, as opposed to skilled but expensive data analysts. Widening the reach to an audience with a broader range of backgrounds creates new opportunities for decision making, problem-solving, trend identification, and creative thinking. In this survey, we identify trends in business visualisation and visual analytic literature where visualisation is used to address data challenges and identify areas in which industries use visual design to develop their understanding of the business environment. Our novel classification of literature includes the topics of businesses intelligence, business ecosystem, customer-centric. This survey provides a valuable overview and insight into the business visualisation literature with a novel classification that highlights both mature and less developed research directions.
Source: Richard C. Roberts and Robert S. Laramee, https://www.mdpi.com/2078-2489/9/11/285/htm
This work is licensed under a Creative Commons Attribution 4.0 License.
1. Introduction and Motivation
As businesses make the transition to digital solutions, they become overwhelmed by the volume of data they collect. The continued evolution of improved hardware propagates a cycle of collecting larger quantities of data at a lower cost. Despite the investment made to collect data, it still only accounts for a fraction of the process required for useful output. Interpretation of data is vital to unlock the potential value held within and to make the most informed decisions. Companies often employ teams of data analysts to achieve this. However, this can be very costly. In addition to the cost, it also limits the number of people who may understand or access the analysis. Employing visualisation approaches enables employees with a wide range of backgrounds to view and understand it. Opening the analysis to a wider audience encourages ideas and provokes discussion about the nature of the behaviour under investigation. This broadening of the audience highlights a unique benefit that data visualisation and visual analytics can offer.
Visualisation and visual analytics have the capability to overcome the challenges associated with large datasets and multidimensional relationships. In business, the holistic nature of "big picture" approaches is valuable, providing a complete overview
of a scenario or situation. Pairing this requirement of large-scale data analytics with the capabilities of data visualisation produces fruitful and thought-provoking output.
The body of business visualisation literature is growing rapidly. During the IEEE VIS 2014 conference, a workshop entitled "Business" focused on the conversion of business data into meaningful visual insight which aids in better decision making. The workshop
was so popular that a second workshop was held during the IEEE VIS 2015 conference entitled "From Data to Actionable Business Insights". In addition to these workshops, Computer Graphics and Applications published a special issue entitled "Business
Intelligence Analytics".
The challenges associated with the transition to data visualisation are typically related to skill set, data scale, and ease of interpretation. Traditional visual design such as simple bar charts and line plots are often unable to accommodate the scale and complexity of data. Many off-the-shelf software packages are created to address business visualisation requirements but often require specialised training to use or are incapable of conveying proprietary data a company generates. To overcome these challenges, custom software can be developed to directly address the challenges and reduce the training necessary to use it. According to a study by Gartner, the "Visualization and Data Discovery" market segment is the fastest growing area of Business Intelligence. Inspired by our collaboration with industry, we collected literature that addresses the challenges associated with visualising business data with the aim of understanding the processes involved and maximising business output. Our contributions include:
-
The first Business Visualisation survey of its kind to our knowledge;
-
An overview and classification of 70 published visualisation business papers;
-
A novel categorisation of Business Visualisation literature supported by related literature sources;
-
A reference for businesses looking to explore their datasets with visualisation; and
-
The identification of both mature and immature research directions in this rapidly evolving field.
Although the relationship between academic research and business is often complicated due to conflicts in interests and goals, this survey shows that the two are not only compatible but have a vast potential for growth in their respective fields.
This paper can act as a reference for businesses wishing to explore their own data through visualisation. Utilising both the primary and secondary classifications used in this paper, the reader can find previously published visualisation research that
explores data similar to their own.
1.1. Literature Classification
To develop a classification, we looked for predominant and recurring themes in the visualisation and visual analysis literature. Firstly, we selected papers that focus on the visualisation of business data designed for a practical business application. We then divided the papers into three primary categories. The top-level categories we used to classify the publications are:
-
Business Intelligence
-
Business Ecosystem
-
Customer Centric
See Figure 1 and Table 1 and Table 5 for an overview of the literature in these categories. Section 1.2 presents the inspiration behind this classification.
Figure 1. The top level hierarchical classification of the literature survey. Green classifications represent leaf nodes, while yellow represents umbrella classifications (see Section 1.1). Financial visualisation (red) is closely linked to this field, but has been published as a survey by Ko et al.
Table 1. This table shows our sub-classification based on data source. Divided into primary, secondary, and hybrid, the source shows the origin of data behind the research. Papers labelled in yellow are not summarised in detail due to space limitations but are still cited to be comprehensive. Papers labelled in green are summarised in the survey.
Type | Source | Business Intelligence | Business Ecosystem | Customer Centric | ||
---|---|---|---|---|---|---|
Internal Intelligence | External Intelligence | Business Ecosystem | Customer Behaviour | Customer Feedback | ||
Primary Data | Intentional, Active, Digital Collection | Otsuka et al. | Yaeli et al. Nagaoka et al. |
|||
Intentional, Active, Research Study Data | Burkhard Sedlmair et al. Kandel et al. Aigner Lafon et al. |
Bresciani and Eppler Bertschi Keahey |
Merino et al. Basole et al. |
Dou et al. | Brodbeck and Girardin | |
Hybrid Data | WebScrape | Ramesh et al. | Lu et al. | Shi et al. Sijtsma et al. |
Chen et al. Ziegler et al. Oelke et al. Wu et al. Hao et al. Saitoh Fayoumi et al. Haleem et al. Saga and Yagi |
|
Secondary Data | A Priori Database | Wright Vliegen et al. Bai et al. Nicholas et al. Roberts et al. Kumar and Belwal Roberts et al. |
Ferreira et al. | Wattenberg Wu and Phillips Basole et al. Basole et al. Deligiannidis and Noyes Basole et al. Iyer and Basole Schotter et al. Basole et al. |
Woo et al. Hanafizadeh and Mirzazadeh Kameoka et al. Wu et al. Sathiyanarayanan et al. |
Kang et al. |
Business Process | Du et al. Broeksema et al. Ghooshchi et al. Bachhofner et al. Lea et al. |
Hao et al. Hao et al. |
Basole Basole and Bellamy |
|||
Business By-product | Gresh and Kelton Eick Keim et al. |
Liu et al | Otjacques et al. Ko et al. |
Rodden Nair et al. |
1.1.1. Business Intelligence (BI)
"The main task of business intelligence (BI) is providing decision support for business activities based on empirical information".
Papers that fall in this category aim to provide a visual design that improves the understanding of a business' internal or external environment. The emphasis is that the resulting visual system is created for the use of a single business as opposed to
a whole economy or ecosystem. In the "Business Intelligence Guidebook", Sherman states that BI turns data into "actionable" information. It is this output that businesses strive for through whatever means are available to them. BI is seen as both
a process as well as a saleable product. We identify two subcategories in this section: Internal Intelligence and External Intelligence.
Internal Intelligence (II)
Internal Intelligence involves the knowledge of internal business processes. Papers in this category often aim to improve business process efficiency or gain a better understanding of the internal structure of the company. This perspective is inward facing. For example, Kandel et al. explored the role that visualisation plays in day-to-day business operations. The focus is placed on a company's internal operations.
External Intelligence (EI)
External intelligence examines the business ecosystem from the perspective of a single business. The focus is often placed on the business competitors to aid in competitive development or in identifying business operations out of the businesses control. This perspective is outward facing. For example, Hao et al. used visualisation to explore fraud detection data in the banking industry.
1.1.2. Business Ecosystem (BE)
A Business Ecosystem is defined as:
"An economic community supported by a foundation of interacting organisations and individuals – the organisms of the business world. The economic community produces goods and services of value to customers, who are themselves members of the ecosystem. The member organisms also include suppliers, lead producers, competitors, and other stakeholders". To further understand the definition, Rothschild stated:"A capitalist economy can best be comprehended as a living ecosystem. Key phenomena observed in nature – competition, specialisation, co-operation, exploitation, learning, growth, and several others are also central to business life".
This topic encompasses research that focuses on an economic community. The literature here differs from business intelligence as the research aims to understand an economy from an external perspective instead of through the eyes of an individual business. The focus is on business networks and their surrounding environments. For example, Basole et al. presented an overview of the telecommunications industry using an in-house visualisation tool called dotlink360.
1.1.3. Customer Centric (CC)
Customer-centric literature focuses on visualising customer data. Businesses are moving towards a customer focused method of operating. This focus ensures that the customers' interests are seen as the highest priority and therefore benefits the business through customer loyalty and superior product development. There are two sub-categories in this classification based on customer feedback and customer behaviour.
Customer Behaviour (CB)
The Customer Behaviour sub-category examines potential customers and identifies patterns in their behaviour so that their actions may either be predicted or utilised to sell a product or service more effectively. The scope of behaviour is broad. Geospatial information can be used to inform the physical movement of people, online tracking data can be used to optimise website sale processes, and customer segmentation information can be used to estimate the future behaviour of customers. For example, Yaeli et al. visualised the movements of customers through physical retail stores using GPS tracking data. This customer path dependency falls into the customer behaviour category.
Customer Feedback (CF)
Customer Feedback research focuses on customers who have used a product/service and have provided feedback through any medium. Often surveys are used, and sentiment analysis is performed on the data to examine the feedback. In other scenarios, direct customer feedback is used (i.e., interviews). This feedback is highly important to customer-focused businesses because it enables vital insight into the reception of a product or service. For example, Oelke et al. presented a visual analysis of web scraped customer feedback data from multiple online sources.
1.1.4. Business Finance
The topic of financial data visualisation could be included in this survey. However, this topic is covered fully in the EuroVis 2016 STAR paper by Ko et al. We briefly summarise the survey in Section 2. Additionally, Rodriguez and Kaczmarek published a "Visualising Financial Data" textbook. As such, it is not the focus of our survey. In addition, joining both a business and financial visualisation survey would be too large.
In addition to the primary classification, we have created a second-level classification that is outlined in Section 1.3.
1.2. Justification of Classification: Turning Data into Profit
The original inspiration motivating this classification came from our research into why businesses collect data and what they do with it. In short, they do so to maximise profit. Our taxonomy encompasses the core platforms by which businesses derive profit. A 2016 Gartner study identifies the key areas a digital enterprise can support business which includes placing value on: data analysis and information systems for decision making (business intelligence), customer-focused analysis and ecosystem analysis. In addition to these, the report also identifies an IoT platform as valuable, however, IoT is beyond the scope of this survey. Therefore, we have chosen these facets as the basis for our top-level classification. A complete business taxonomy does not exist in a form that would be useful to this survey, and so we have developed this taxonomy from an existing business sources and from the research found within the literature search. We found that the research fell into these categories due to reoccurring themes. The significance of the taxonomy facets is still thoroughly established in the academic business world.
Increasingly, businesses are
striving to utilise their valuable resource – data. Bean presented an
article in the Harvard Business Review highlights a decrease in
expenditure as the ultimate goal of data analytics, with the objectives
of increasing revenue,
product innovation, and operational efficiency (business
intelligence). Bean stated that businesses are now looking to find new
ways to innovate with data to extract previously untapped value. We
propose that these new methods of innovation can be
achieved through the creation of visual designs that depict the data
decision making.
Buluswar interviewed six senior leaders from major businesses, asking how they utilise their data. A strong theme emerged highlighting the importance of business intelligence but also of consumer centred analysis, focusing on what serves the customer best – looking at their experience with the product and exploring how to align the business with customer expectations. Many books have been published that discuss how to collect and data-mine customer centred business data so that it can be used for business development purposes.
The business ecosystem was
identified by the Gartner report as a critical facet of a businesses
knowledge. Moore described the practical benefits of embracing the
ecosystem centred view:
"A business ecosystem can also be conceived as a network of interdependent niches that in turn are occupied organisations. These niches can be said to be more or less open, to the degree to which they embrace alternative contributors. One of the most exciting ideas in business today is that business ecosystems can be "opened up" to the entire world of potential contributions and creative participants".
1.3. Data Classification
Our
secondary level of classification is based on the origin of the data
used in each paper. The data source list can be seen in Table 1. Each
paper is assigned a type of data source.
We
break the data sources down into primary and secondary. Primary data
can be defined as "Original data collected by the visualisation
researchers for a specific research goal". Secondary data can be defined
as "Data originally collected for a different purpose and reused for
another research question". We also include one sub-classification of
web-scraped data that is a hybrid of both primary and secondary.
To
illustrate these two, we present a data collection pipeline for
visualisation in Figure 2. We use the term "A Priori" to describe the
visualisation hypothesis that is formulated before the data is
collected. If the initial data are collected after the visualisation
hypothesis, then we refer to this as an "A Priori Hypothesis".
Figure 2. The differences between primary and secondary data sources in visualisation. Research involving primary data proposes the visualisation hypothesis before the data is collected (a priori hypothesis), here as research involving secondary data proposes the visualisation hypothesis with the knowledge that the data has already been collected (a posteriori hypothesis). The hybrid pipeline contains two stages of data collection pre and post hypothesis. The initial creation of the data is often in the context of social media. At this point, the data are not collected to support the hypothesis posed by the visualisation research. Once the visualisation hypothesis is formulated, the data can then be scraped and collated into a second structured dataset for the purpose of visual research.
Primary:
In this category, we include:
-
Intentional, Active Digital Collection
-
Intentional, Active Research Study Data
The first primary data source is the collection of data for the
explicit purpose of the visualisation research featured in this survey.
For example, Otsuka et al. collected data through electronic name tags
worn by members of staff that identify and record
interactions between the staff. These data are then used to
visualise the inter-office relationships.
Study data are collected to
support an a priori hypothesis first hand through interviews,
questionnaires, and reviews. The most popular use of
study data in this survey lies in the internal intelligence
classification. These data are typically collected as part of the
visualisation research. For example, Kandel et al. presented an
interview study with data visualisation analysts working
with industry to characterise the process of industrial data
analysis.
Secondary:
Secondary data sources are not collated by those performing the visualisation research featured in this survey. Here, the researchers pose a hypothesis a posteriori, i.e., after the initial data collection. Researchers use the pre-existing data to explore and perform the analysis. In this category, we include:
-
A Priori Databases
-
Business Processes
-
Business By-product
Pre-existing databases are often used as a case study to demonstrate new visualisation techniques. They are databases that are created for the purpose of previous analysis, and not for the visualisation research in which it is currently used. Roberts et al. used a pre-existing database for their treemap-based research on call centre data, provided by their industry partner.
Business process visualisation refers to the graphical representation of the operational procedures implemented within a company. For example, Broeksema et al. presented a visualisation system for business decision management and the processes behind these decisions.
The data by-product of a business is similar to pre-existing databases except that the data is collected as a by-product of business operations. These databases are often in the form of financial records, or Point of Sale (PoS) transactions. For example, Keim et al. presented a novel approach to bar chart designs using transactions data. The business by-product differentiates from the pre-existing database as data collected as a by-product will be part of a continuous generation and collection, regardless of who uses it.
Hybrid:
We classify web scraping data as a hybrid of both primary and secondary data. Although the data are originally gathered for a different purpose other than visualisation (Secondary), the collation of a new, structured dataset is performed after a hypothesis for visualisation is posed (Primary). This leads to our creation of the hybrid primary/secondary classification. It is often used in the field of customer feedback (see Figure 1) and often involves social media data. The data are downloaded from various online sources into an archive that can be used for research purposes. For example, Hao et al. presented a visual sentiment analysis of customer feedback streams through the scraping of Twitter data.
Web-scraped data are differentiated from a digital collection due to the process by which the data is obtained. Web scraped data utilise and collate online sources of information into one structured dataset. The real-world, digital collection of data utilises hardware components that collect information offline.
1.4. Literature Search Methodology
To ensure that we include a complete collection of the literature, we employed a methodical approach to the literature search. Firstly, we identified keywords and phrases that encompass the field of business visualisation, e.g., "Business", "customer",
"market", "visualisation", "visual analysis", "economic", "ecosystem", "intelligence", and "finance". Then, we used a logical AND combination of these keywords to search through digital libraries and conference proceedings as shown in Table 2.
We started by searching the IEEE Xplore and ACM Digital Libraries, and then we browsed the conference proceedings of both the IEEE Visualisation and EuroVis conferences. We repeated this process with Google Scholar. A database that documents the publications
within IEEE VIS conference from 1990 to 2015 called VisPubData was also incorporated. The combination of search terms is outlined below. Each search term was split into two segments. Each combination of first and second terms were used in the
search process.
First Term | Second Term |
---|---|
Business | Visualisation |
Customer | Visual Analytics |
Market | Visual Analysis |
Economic | Visual Intelligence |
Finance | |
Corporate | |
Commercial |
Once the list is complete, we carefully read through while summarising and classifying each paper according to the systematic process outlined previously, prioritising the most recently published work. Then, we inspected the references for each paper, looking for previous papers that do not appear in the preliminary searches. The literature search process lasted over a year.
In summary, our primary sources during the literature search are:
-
IEEE Xplore
-
ACM Digital Library
-
Google Scholar
-
References of collected papers
Table 3 shows where most of the papers are published.
Table 3. This table describes the distribution of papers found in each of the major publication venues.
Conference/Journal | Count |
---|---|
IEEE Transactions on Visualization and Computer Graphics | 13 |
The IEEE Information Visualisation Conference (IV) | 6 |
IEEE Transactions on Visualization and Computer Graphics | 5 |
IEEE Visual Analytics Science and Technology (VAST) | 3 |
The Annual EuroVis Conference | 3 |
Information Visualisation Journal (SAGE) | 3 |
The Annual PacificVis Conference | 1 |
VIS Business Visualisation Workshop | 1 |
Other | 34 |
total | 69 |
1.5. Survey Scope
Due
to the multidisciplinary nature of this state-of-the-art report, we
require a well-defined scope to ensure that the most relevant research
is included. The scope encompasses academic research that has emphasis
placed on state-of-the-art visualisation for the purposes of business
data exploration. The quality of the visual design must be regarded as
state-of-the-art – and the motivation behind the research must focus on
business data.
Many
conferences and journals publish articles which include visualisation
of business data but do not focus on novel visual design aspects of the
research. Papers published in visualisation journals are a valuable
source for our literature scope. Non-visualisation journal and
conference papers are only in scope if the focus of the research is
visualisation-oriented with the ultimate goal of better understanding
the business data for informing the decision-making process.
The
primary source of in-scope research comes from the conferences and
journals that make visualisation the subject of their publishings. The
conference proceedings of IEEE VIS and EuroVis, or the IEEE TVCG and CGF
journal contain a wealth of publications that focus their attention on
business data. These papers are considered the primary driving force
behind the evolution of the field. See Table 3.
1.5.1. Out of Scope
Publication venues such as
the conference "Software Engineering and Service Science" (ICSESS) may
publish papers that include visualisation of business data but do not
place emphasis on novel visual design or value. For example, in a paper
entitled "Visualizations-based
Analysis of Telco Data for Business Intelligence", Ashraf and
Khan designed imagery to represent telecommunications data. However,
these images come in the form of a pie chart and a radar chart where
most of the analysis is performed through a
numerical calculation. We do not include papers like this in the
scope due to the limited visual component of the research.
Papers
we consider to be borderline might have good potential in the field of
business visualisation and often mention this as a valid application,
however, they do not focus on the business aspect. For example, Wu et
al. created an opinion-based visual design from social media data which
shows valuable public opinion on products but also on non-business based
events such as WWII and political scandals. Because of the tenuous
connection, we do not include this within scope.
The
topic of social media falls outside of the scope of this survey.
Including it would make the survey too large. We refer readers to Wanner
et al. for an existing survey on this topic.
1.5.2. In Scope
It is possible for papers in
cross-disciplinary journals to be within scope, only if the
visualisations used are state-of-the-art or add additional value to the
visualisation literature. For example, the journal "Expert Systems with
Applications"
published Hanafizadeh and Mirzazadeh's "Visualizing market
segmentation using self-organizing maps and Fuzzy Delphi method-ADSL
market of a telecommunication company", where advanced visual methods
are used with emphasis placed on the contribution
of the visualisation within a business context. A further
example of an in-scope paper comes from the proceedings of the
"International Conference on Web Intelligence and Intelligent Agent
Technology" in a paper by Ziegler et al. entitled "Mining
and Exploring Unstructured Customer Feedback Data Using Language
Models and Treemap Visualizations" where customer feedback data are
structured in a specialised treemap. This falls within scope due to the
focus on the customised treemap design
and the novel features implemented in the software.
The main body of publications in this survey was obtained from the major visualisation publication venues. However, the "business" component of each paper is more complicated to define. To clarify this aspect of the scope, we impose a heuristic that a business related subject has to be mentioned in the title or abstract of the paper. A case study alone is not enough to fall within scope. For an overview of the business-related subjects, please refer to the literature search methodology in Section 1.4.
1.6. Organisation of Survey
The
survey is presented in the same structure as the classification
outlined in Section 1.1 and in Table 1. The three primary classification
groups are Business Intelligence, Business Ecosystem, and Customer
Centric. The Business Intelligence classification contains two
sub-classifications; Internal Intelligence and External Intelligence.
The Customer Centric classification also contains two
sub-classifications of Customer Behaviour and Customer Feedback.
See Figure 1.
2. Related Surveys
This section outlines closely related surveys in this field but have a different scope. See the work of McNabb and Laramee for a comprehensive overview of survey papers in information visualisation. Firstly, we provide an overview of the survey on
visual analytics of financial data. Secondly, we describe Zhang et al.'s survey outlining state-of-the-art commercial visualisation systems readily available on the market.
Ko et al. presented a comprehensive survey on visual analytical approaches to financial data. This paper aims to classify financial visualisation papers as well as make contributions as to the requirements for a financial visualisation paper.
The survey first identifies financial task requirements by interviewing analysts with a background in financial analysis. They introduce metrics by which financial visualisations can be quantified. Secondly, the survey collects and classifies previously
published research in the field of financial visualisation. The requirements for VA systems are a product of the interview process with industry experts. Ranging from R1 (basic requirements) to R7 (Maximised utility).
- R1: Provide sufficient information to deduce basic patterns including historical and context data.
- R2: Automated techniques for pattern detection, trends and anomalies.
- R3: User interaction with the system. Enabling data resolution selection (drill-down), and data comparison.
- R4: Statical analysis of trends and anomalies identifying "statistically significant" trends.
- R5: Forecasting for future trends based on currently available data.
- R6: Additional functions for data cleansing, customisation and presentation.
- R7: Clear visualisations that avoid occlusion as well as supporting R6 and R3 functionality.
Ko et al. highlighted the discrepancy in the volume of research between the classes of financial visualisation. The discrepancy is attributed to data privacy issues within the financial community.
Zhang et al. presented a comparison of the industry-leading visual analytics software used in Big Data Analysis. The Top 10 most prolific pieces of visual analytics software are compared. The four most popular of these are: Tableau, Spotfire, QlikView, and JMP.
Contributions include a comparison of how the software handles data, what frameworks are used, and the efficiency of the data management. Aspects such as data import ease, data compatibility, etc. are taken into account when assessing the software. Four subsections of criteria are used to assess the automatic analysis of the software: statistics, data modelling, dimensionality reduction, and visual query analysis. This is considered a way of learning about the data without greatly customised user input. Features such as pattern recognition are mentioned as useful in some of the software but not implemented across all of them. Visualisation techniques are divided into two subsections: graphical representations and interaction techniques. System and architecture is divided into stand-alone desktop applications and server-sided dashboard tools.
The visual analytics field was initially defined by Thomas and Cook, and then further refined by Keim et al. Using these as a basis for analysis, Zhang carried out the software comparison research. Our survey is neither financial visualisation nor a survey of software tools.
3. Business Visualisation Articles
This section contains the core state-of-the-art review of Business Visualisation.
3.1. Business Intelligence (BI)
The category of business intelligence (BI), which consists of two sub-classifications: Internal intelligence and external intelligence. Business intelligence is seen as the generation of meaningful insight into business data.
3.1.1. Internal Intelligence (II)
This subsection contains all inwardly focused BI visualisation research. If the emphasis is placed on internal structures or processes within the business, then we classify the paper to be internal.
Primary Data as Intentional, Active Digital Collection (II): This section contains research that use digital hardware to collect data for the purpose of internal intelligence visualisation research. Only a small amount of research is found in this category, presumably due to the availability of existing data and the high cost of collecting new data. Otsuka et al. presented methods of visualising internal staffing relationships within a company. Each staff member wears a digital name tag that records interactions with other staff members. All internal interactions are recorded and then visually mapped. A topographic map is used to display this interaction. To visualise the interaction, a topological map called a dendrogram is used. See Figure 3.
Figure 3. The dendrographic representation of the hierarchical relationship. The higher up the member is, the closer to the centre their node is drawn. Clusters of users are highlighted in green. The dots represent individuals and are also given unique identification numbers. The dots are coloured by the role classification of the employee and the definition of the contours around the dots within each group highlight how well the group integrates. Image courtesy of Otsuka et al.
The employees are clustered into groups showing the social and working dynamic of the organisation. Each employee can fall into multiple groups and therefore can be represented by multiple nodes. The process of adding new members automatically generates
the dendrograms. This is done by either creating a new group, adding the member to an existing group, or merging two groups to make the member fit. The novel algorithm classifies the interactions into a hierarchy in such a way that members can belong
to multiple groupings, and the staff hierarchy can also be seen. The research informs managers on the employee networks in their own company.
The strongest influences in the field of visualising hierarchical structures have been MoireGraphs, Multi-Tree Hierarchies, and ConeTrees. This research data have strong links with social networking data. Visualisation projects of this nature are becoming
increasingly more popular as social networks grow.
Primary Data as Intentional, Active Research Study (II): In the following, internal intelligence (II) is derived from data sourced using direct research studies. Sedlmair et al. examined the analytical role of visualisation tools used in large
automotive companies. Nine different challenges are discussed along with a set of recommendations for planning and evaluating large company visualisation software. The paper also explores two case studies within the automotive industry that highlight
the challenges of information visualisation in the environment of large businesses.
The article outlines many field characteristics that present challenges when attempting to perform analytical data visualisation in a large business environment. These challenges range from the aligning the software capabilities with the corporate aims,
through data acquisition processes, to the end result of either publication, research, or appealing to the stakeholders. The recommendations for the field are established as a counter to the challenges, e.g., the challenge of publication and stakeholder
appeasement can be overcome by making all publication conditions clear at the start of the project, agreeing what components of the research can be published through a documented communication.
- Case Study 1: AutobahnVis. The AutobahnVis software provides an overview and navigation of error detection in network communication logs. The challenges that arose while developing the software were largely the complexity of the data and the specialised skill required to interpret it. It had to be acquired from busy staff members within the company, resulting in a large time cost and expense to the project. The complexity of the project is reflected in the design, and therefore presented several challenges along the way.
- Case Study 2: MostVis. The MostVis software is designed as an alternative visual access to auxiliary information. It presents large hierarchical data related to the bus systems of car models. The visual hierarchy tree runs from left to right and shows complex information about a car's auxiliary data. Company stakeholders accepted the resulting software research and provided funding to expand further, highlighting the importance of stakeholder support in visualisation research.
The next paper uses primary interview data to increase understanding of a company's employees. Kandel et al. conducted an interview study with 35 enterprise analysts with the aim to better understand their day-to-day operations and how visualisation tools
are used from the analysts perspective. The study is conducted with 35 participants from 25 organisations within 15 different industries.
It was found that analysts generally fall into three main categories; hacker, scripter, and application user. These three groups of people have very different tool requirements. Six of the "Hackers" claimed that visualisation tools such as Tableau or
D3 are useful only as reporting tools as they did not offer any data flexibility and can only be used to present information. They already know what information they would like to portray. Scripters did use statistical visualisation packages to produce
visualisations for exploration purposes, and they found that using the same package for visualisation and analysis helped them transition smoothly between visualisation and analysis. The extent to which application users created visual designs was
through simple packages such as Excel or used standard reporting tools such as Crystal Reports.
Analysts report that a primary benefit of visualisation is error detection. When working with large amounts of data, errors in the collection often go unnoticed, and visualisation highlights these errors. In general, it is reported that visualisations
are best used alongside statistical analysis. The findings indicate that imagery of large amounts of high dimensional data is too complex, and simple visualisations do not scale to this level.
Prior to Kandel et al., Sedlmair et al. discussed the difficulties of evaluating visualisation tools in the corporate environments. Kwon and Fisher discussed the difficulties of using visualisation tools from the perspective of someone who is not trained
in the field. Other research has looked at the processes of analysts but often do not focus specifically on business or on visualisation.
Burkhard presented a framework for the creation of business strategy visual designs. Building on the knowledge visualisation framework, this research identifies the aspects of strategic data suited for visualisation by isolating the different perspectives
of the visual design. The resulting guidelines produce valuable strategy based imagery suitable in a business context – focussing on internal operations of a business.
- The Function perspective distinguishes functions of visualisations based on the desired outcome; i.e., if the goal is to create new insight, recall the data, produce motivation, elaboration etc.
- The Knowledge perspective identifies the type of knowledge that is required to be transferred, i.e., what, who, where, why, and how?
- The Recipient perspective highlights the target group recipient, i.e., individual worker, team leader, senior management, workgroup etc.
- The Visualisation type perspective examines the type of visual design suitable for the above context. i.e., sketches, diagrams, maps, images, interactive visualisations, stories.
Secondary Data as A Priori Database (II): The literature in this section studies pre-existing data for internal intelligence visualisation. The first paper describes some visual designs to inform a company's internal decision making after
the original data is collected.
Wright presented six case study examples of info-vis software used in a business context. These real-world examples are among the first recorded utilisations of information visualisation software used in the day-to-day management of the business. In this
article, case studies focus on 3D visualisations of pre-existing business data ranging from financial information to management support.
- Fixed Income Management: In this case study, a dataset of financial portfolios is depicted using 3D line graphs. Emphasis is placed on the 3D nature of the visual design as it enables thousands of data points to be plotted compared to a smaller number in 2D. The more holistic view enables investors to quickly see the state of their portfolio or compare multiple 2D visual designs.
- Derivatives Risk Management: The software conveys the risk involved in options trading. A virtual environment contains multiple visual representations including a virtual screen showing the yield curve, a surface plot mapping the current profit and loss, and a grid map that shows the relative profit and loss. Users can interact by adjusting the extraneous variables such as interest rates to change the forecast visual designs.
- Management Decision Support: A geospatial map is used to display the locations of a chain of businesses and then 3D bar charts are overlaid on top to show the metric values used to analyse the businesses. This enables managers to evaluate and balance multiple business locations.
- Credit Scoring: This design uses a geospatial map to display credit scores in the United States. The software enables the market risk of permitting loans to be analysed.
- Retail Sales Analysis: Again using geospatial maps, this enables the user to compare the retail value of stores across the U.S. both individually or aggregately in each state. Three-dimensional bar charts or raised map tiles are used to show the sales from each sector or store.
- Management Reporting: This managerial software uses a virtual environment and 3D bar charts to show the portfolios of a business. The portfolios are grouped into asset classes and represent the main axis of data. A virtual screen shows 60 scenarios that would affect the portfolios and users can select each to see the effect. Another virtual screen shows the currency conversion rates which change with the scenario.
Figure 4. The transformation of the treemaps into other shapes: (a) the original treemap; (b) the original transformed into a pyramid but without density uniformity; (c) the modified version of (b) now adjusted to show uniform density; and (d) the transformation of the original into a pie chart. Image courtesy of Vliegen et al.
The first proposed mixed treemap takes the structured layout of the slice and dice algorithm and combines it with the square readability of the squarified treemaps algorithm. This way, the higher level nodes remain ordered, but the lower level nodes become
squarified. The matrix modification enables data comparison of different sizes by subdividing the rectangle into a grid. When the number of cells in the grid exceeds the number of items, then dummy nodes are added and blank cells are used. Another
modification involves the transformation of the treemap visualisations in pixel space to reflect the aesthetics of other popular visual designs, such as pie charts.
Prior to Vliegen et al., Harris provided an extensive overview of traditional depictions of business data. The treemap was first introduced by Johnson and Shneiderman in the early 1990s. Further work enhancing the original design is also presented.
Continuing on this theme, we look at research utilising pre-existing databases for the purposes of exploring internal business operations and functions. Nicholas et al. presented a novel way of conveying the failure rates of automotive components and
the effect that this has on customer satisfaction. The base design of the visual layout extends the standard chord diagram and depicts three-way relationships using both curved angles and glyphs. Previous attempts at chord diagrams focus on two-way
relationships. This method enables a third relationship to be added and compared. The dataset is collected from by automotive company recording the failure rates of automotive components. The automotive data are divided into 11 autonomous fault categories
(engine, transmission, etc.). The aim of the visual design is to ascertain which combination of component failures yield the most dissatisfied customer. See Figure 5 on page 16.
Figure 5. The extended chord diagram where glyphs are added to show the relationship between automotive component failure rates and customer satisfaction. Image courtesy of Nicholas et al.
The multi-chord diagram represents the frequency of failures between components with the curve thickness and then may use colour to represent how dissatisfied the customer is. See Figure 5. In the glyph extension, the lines of the chord diagram are
shortened and only display the intersection of three component fails. This reduces overlap and improves visibility by reducing clutter while retaining the vital information. The benefit of this design is that the company can now clearly see what products
typically fail together as well as how badly the hardware failure affects customer satisfaction. Recommendations can be made to improve the worst offending components that have the largest impact on customer satisfaction. Less focus is placed on the
failing components that do not have a negative influence on customer satisfaction.
The inspiration for the chord diagram extension came from Bostock, Ogievetsy and Heer in their "Data-Driven Documents" paper. A radial technique by Kerren and Jusufi enables the visualisation of an undirected hyper-graph. Nicholas et al. used elements
of this visual design.
This next example of derived internal intelligence using a pre-existing industry acquired data involves the analysis of call centre data. Roberts et al. presented an analytics system that visualises call centre data provided by their industry partner.
They modified the traditional treemap to accommodate time series, event-based data such that 24 h of call centre activity can be presented in one view. Novel interactions and filtering methods are used to modify the view from a full day of data to
individual call records.
The top hierarchical level of the treemap shows leaf nodes representing time frames (24 h > 1 h > 10 min > 1 min). When an hour node is selected, the graphic is zoomed in smoothly to reveal the individual call records as the new leaf nodes of
the treemap. A selection of sliders enables the user to filter the call records by a range of metrics, narrowing the scope of the calls in a focus + context environment. Roberts et al. noted that queue times sharply increase around 13:00 each day.
This is attributed to shifting staff levels within the call centre. See Figure 6 on page 17.
Figure 6. A focus + context treemap visualisation of call centre data: (a) all callers who have waited longer than 15 min in the queue but spoke to an agent for less than that period of time; (b) a combination of temporal and event based filters; (c) the increased queue time at the start of 13:00; and (d) all abandoned calls during the hour. Image courtesy of Roberts et al.
A common area of research in the call centre is the customer service quality being provided by the staff. This research extends Blanch and Lecolinet's work on navigating treemaps using a zoom interaction
Roberts et al. continued their work exploring call centre event data, using parallel coordinates plots (PCP). See Figure 7. The focus of this research is to develop new brushing techniques to overcome the challenges associated with overplotting.
The main contribution lies in the "sketch-based" brushing that can be easily applied, modified, and moved around the plot to enhance the analytics of the software..
Figure 7. The sketch brush placement with the dynamic brush interval range glyphs which were automatically placed at 1 standard deviation from the mean. Image courtesy of Roberts et al.
Additionally, a range of glyph-based user options guide the user in their brush placement and provide additional information about the surrounding metadata. A priority rendering feature lets the user select a point on the n-dimensional plot that they
want to focus on, and the draw order of the polylines change to show that data drawn on top, helping with overplotted graphs. The software also includes an automatic brushing feature that can be used to either scale the parallel coordinates axis or
apply a brush sketch according to the distribution of the data. This feature aims to make complex datasets accessible to new users.
Secondary Data as a Business Process (II): This sub-section focuses on internal intelligence generated from existing business processes. The derived internal intelligence of this literature aims to inform decision making.
Broeksema et al. presented a visual analytics system for operational decision making in a business management environment. The system displays the decision-making process in the software. A car insurance case study is used. The VA toolset is used
to analyse a dataset from the car insurance industry. The data used is from an auto-quote request that has been stripped of confidential information.
A set of 144 rules process the input information and generate a quote. Discounts are applied based on variables such as car safety features, no claims bonuses, etc. These data are then fed into the decision map visualisation. The decision map is based
on work by Zizi et al. but instead maps concepts as opposed to instances. Two diverging factors in the data are plotted against each other in a scatterplot and the space is segmented according to the most prolific value in that data space. See Figure
8.
Figure 8. The decision map as an interactive analysis of variables of interest: (a) the analysts has selected the complete list of available attributes associated with drivers and then selected the age group bar chart to show where the clusters of age groups sit in the plot; (b) the simplified version where all non-contributing variables are removed; and (c) a breakdown of the decision process and classification of the "students" demographic. Image courtesy of Broeksema et al.
The decision map is primarily based on Zizi et al.'s dynamic map. Decision support systems often are used with financial data.
Secondary Data as a Business By-Product (II): This segment describes visualisation research related to internal intelligence using business by-product data. In this example, the by-product is data recording delivery times and product sales. Gresh and Kelton present a visualisation application focused on the presentation of business intelligence data using 2D and 3D visual designs in the delivery industry.
Customer delivery times are the focus of this research. The business has delivery targets to achieve, however, due to the vast stock range of hardware and the small delivery targets, the business needs to optimise their warehouse locations and stocking. This optimisation analysis is done through the visualisation of the delivery times. When using the software, the user is presented with a control window that enables them to choose a subset of the data. The user is then shown a combination of two- and three-dimensional visual designs. These images show the target service level in comparison to the actual service level.
Comparable software has been created, but does not exploit the full potential of the data. A more customised approach is required for this subset.
Keim et al. proposed a new type of bar chart that can be used in the visual analysis of large transaction datasets. The design enables the user to see transaction value correlations and outliers. The bar chart is created from sales data by separating the range of transaction values into tiers and then assigning each transaction a tier. The bar is drawn as a sorted accumulation of all transactions within that period. The tiers are coloured such that the bar appears to be subject to a continuous colour gradation while still visualising each transaction as discrete. See Figure 9.
Figure 9. The Value-Cell bar chart. Each bar represents a month worth of sales. Within the bar, tiered sales values for each month are shown. The red tips of some bars represent outlying large sales during that time period. The colour legend shows the value of the transactions. The tiers blend together to create a gradation effect. Image courtesy of Keim et al.
The design of the value-cell bar chart shares elements with Keim's VisDB system that sorts coloured pixels in relation to a user query. Further related work introduced a pixel-based bar chart for large transaction datasets.
3.1.2. External Intelligence (EI)
This subsection contains outward focused BI visualisation research. If the emphasis is placed on the external environment such as direct competition within the market sector, then we classify the paper as external.
The types of stakeholders examined in this paper range from the marketing team to the end users. Each has a different set of expectations for the visual design. The range of requirements spans across two axes. The first ranges from effective to attractive,
and the second from simple to complex. Although this is a generalisation, often effective visualisations are not considered aesthetically pleasing, whereas attractive graphics are not always the most effective. The y-axis is the scale of visual complexity.
Some stakeholders require a complex visualisation to show off the work while others require a simple graphic to aid in information retention. See Figure 10 on page 20.
Figure 10. The stakeholder visualisation requirements map. The x-axis ranges from "effective" to "attractive" and the y axis ranges from "simple" to "complex". The marketing departments expectations of a visualisation rank highly in both complexity and attractiveness, but indicate less interest about its effectiveness. Image courtesy of Keahey.
Secondary data as A Priori Database (EI): This subsection contains one visualisation research paper that uses a pre-existing database for analysis in the field of external intelligence. Ferreira et al. utilised the many urban taxi behaviour
data that have been collected in New York City. Primarily, geospatial data are used in the visualisations to map out the routes the taxis traditionally make. This research intends to find answers to some previously unanswered questions such "What
is the average trip time?", "How does taxi activity vary throughout the week?", and "What effect do major events have on the taxi behaviour?".
The TaxiVis software enables the user to select the time frame. The geospatial paths appear on a map widget, and the raw data appear on a data summary widget. See Figure 11. A heat map is overlaid onto a map of the city to display the majority of the data. This heat map highlights the trip density within the city as plotting individual points would occlude most of the data due to its volume. Side-by-side comparisons are used to show each day's activity throughout the week. It is easy to observe that Monday is the quietest day with activity progressively getting busier throughout the week. The same side-by-side comparison is used to compare the taxi activity during major city events such as presidential visits, or natural disasters. Using day-to-day comparisons of the same map visualisations shows the progression of such events.
Figure 11. Taxi activity in Manhattan during the week of Hurricane Irene. Image courtesy of Ferreira et al.
Previously, Ge et al. proposed an analytics method for taxi drivers to calculate the most financially efficient way of finding passengers. Peng et al. modelled the day-to-day habits of taxi drivers; however, neither of these research topics is visualisation
focused. There has been research into the visualisation of movement data, but not explicitly related to taxi data.
Secondary Data as a Business Process (EI): Here, we look at two papers by Hao et al. that explore the external environment of a business that impacts the day-to-day operations of the business.
Hao et al. presented BizViz, a visualisation software that interactively visualises business operations processes. The BizViz software analyses the relationships between important external operational parameters. The primary focus is on data distribution
of up to three operational parameters but the user can drill down to a one-on-one comparison of parameters where the transaction sets can be seen.
The design uses a circular chord-like plot to present its data. Separated into three sections, the circle presents one attribute on the left half, one attribute on the centre dividing line, and the final attribute on the right side. See Figure 12.
Edges denote relationships. When drilled down to two attributes, the circle is split into two, and the half circles are plotted with horizontal lines at varying density to represent the two selected operational parameters. The data specifically deal
with fraud prevention and credit card usage in retail.
Figure 12. The BizViz visualisation. The left side of the circle shows each geographical region the data was collected from. Lines are drawn from each region to the centre line which holds a range denoting fraud value. From there, the line crosses over to the right axis where the fraud count is tallied. The user can select a region to drill down into and reveal comparisons of the fraud value vs. fraud count. Image courtesy of Hao et al.
Hao et al. later published further research in this field, continuing the parallel/chord visualisation in the financial security sector. They focused on business process impact by adapting the previous visualisation methods use case to show the new data.
The examples show the source attribute to be the customer type (measured by importance or size). The intermediate attribute is mapped to a time frame for ordering (delays, or order time), and the destination attribute is mapped to the outcome (order
accepted/rejected, penalty costing, etc.).
Secondary Data as a Business By-Product (EI): This subsection contains external intelligence visualisation research using data sourced from the by-product of business operations.
Liu et al. explored the potential of billboard placement through the visualisation of GPS taxi data. Using a combination of geospatial visualisation and glyph designs, Liu et al. explored the optimum placements of billboards across a city. A geospatial
heat map is used to display the flow of taxi traffic, highlighting high volume areas where billboard potential is maximised. The solution view uses layered circular glyphs to evaluate potential points of interest where billboards can be placed. Using
a combination of the location view and solution view, the user can select which location is best suited to their billboard as a function of cost and footfall.
The billboard location selection process is a sub-problem of Multicriteria Decision Making, using a geospatial context. Taxi GPS data have been extensively researched with applications from route optimisation to urban planning. Visualisation is often
used to present data of this nature. Chen et al. provided a full survey of traffic data visualisation.
Data Overview: We provide an overview of the data descriptions in each paper and their availability in Table 4. It shows a description of each data source in the survey categorised to match Table 1. It shows that most data in the
business visualisation literature are still proprietary.
Table 4. This table summarises the data sources in each research paper, identifying the accessibility of the data as well as a brief description.
Classification | Paper Ref | Access | Description | |
---|---|---|---|---|
Business Intelligence | Internal Intelligence | Wright | Proprietary | Case Study from portfolio management, derivatives management, customer credit scores |
Gresh and Kelton | Proprietary | Private IBM business by-product data | ||
Eick | Proprietary | Log data from web servers used to analyse the efficiency of their website | ||
Burkhard | Proprietary | Case study from Swiss Federal Institute of Technology using business strategy data | ||
Vliegen et al. | Proprietary | Unspecified business data | ||
Keim et al. | Proprietary | Transaction datasets | ||
Otsuka et al. | Proprietary | Digital nametags collect employee interaction data | ||
Sedlmair et al. | Survey | Existing software evaluation | ||
Kandel et al. | Proprietary | Interview Study with industry experts | ||
Du et al. | Survey | A survey of business process visualisation literature | ||
Aigner | Proprietary | Text from interview study | ||
Broeksema et al. | Proprietary | Decision model data | ||
Bai et al. | Proprietary | Geospatial data for utility network coverage | ||
Lafon et al. | Proprietary | User Study of unspecified business data visualisation | ||
Nicholas et al. | Proprietary | Private customer survey database from automotive company | ||
Roberts et al. | Proprietary | Private call centre interaction database | ||
Ghooshchi et al. | Proprietary | Business Processes from undefined source | ||
Kumar and Belwal | Public | Multiple public data sources looking at different aspects of a business | ||
Bachhofner et al. | Proprietary | Business processes from industry contacts | ||
Lea et al. | Proprietary | Business process data was used alongside simulated data to test prototypes | ||
Roberts et al. | Proprietary | Call centre event data from industry partner | ||
External Intelligence | Hao et al. | Proprietary | Case study data from financial transactions, service contracts data | |
Hao et al. | Proprietary | Case Study Data: Financial transactions, service contracts data | ||
Bresciani and Eppler | Public/Proprietary | Case study from Gartner, Argument Map, Five Forces Process | ||
Bertschi | N/a | Critical Discussion of knowledge visualisation in business. No data used | ||
Ferreira et al. | Proprietary | Data provided by Taxi and Limousine Commission of New York City | ||
Keahey | Proprietary | Expert opinion data | ||
Liu et al | Proprietary | GPS trajectory data | ||
Ramesh et al. | Public | Data mined from "various sources". Presented for insight into the external operations of a business | ||
Business Intelligence | Business Ecosystem | Wattenberg | Public | Public stock market data |
Merino et al. | Public | Stock market data | ||
Otjacques et al. | Proprietary | Human resources data | ||
Wu and Phillips | Public | Public Dow Jones 30 data | ||
Basole et al. | Proprietary | Business ecosystem data | ||
Ko et al. | Proprietary | Generic Point of Sale data | ||
Basole et al. | Commercially and Publicly Available | The Thomson Reuters SDC Platinum database and Capital IQ Compustat database | ||
Basole | Proprietary | Case study: global supply chain data, competitive dynamics data, venture capital network data | ||
Deligiannidis and Noyes | Proprietary | Data obtained from US Department of Commerce Census Bureau | ||
Basole and Bellamy | Proprietary | Supply network Structure data | ||
Lu et al. | Public | Twitter data + IMDb | ||
Basole et al. | Proprietary | Three commercial datasets are used that cover finance, relationships, and public opinion | ||
Iyer and Basole | Proprietary | The visualisations use IoT data to show the "big players" in the technology industry | ||
Basole et al. | Proprietary | User study generated data looking at the effectiveness of different visual designs for decision support | ||
Schotter et al. | Proprietary | Investment data is used alongside geospatial data | ||
Basole et al. | Proprietary | Combination of multiple proprietary datasets including geospatial and commercial data | ||
Customer Centric | Customer Behaviour | Woo et al. | Proprietary | Audio data from customers in call centre |
Hanafizadeh and Mirzazadeh | Proprietary | Six-dimensional vector customer dataset | ||
Shi et al. | Proprietary | Generic search engine data | ||
Rodden | Proprietary | Private Youtube site navigation data | ||
Yaeli et al. | Proprietary | Digitally collected customer path tracking data | ||
Dou et al. | Proprietary | Survey conducted on Reddit.com | ||
Kameoka et al. | Proprietary | Dataset provided by industry parnter – supermarket PoS data | ||
Nair et al. | Proprietary | Large customer behaviour dataset – unspecified origin | ||
Wu et al. | Proprietary | Telco data obtained from China's largest telecommunications operator | ||
Nagaoka et al. | Proprietary | Customer behaviour collected from digital devices | ||
Sijtsma et al. | Public | Twitter data mined to collect the customer experience and expectation of various retail stores | ||
Sathiyanarayanan et al. | Public | Email exchange at company level | ||
Customer Centric | Customer Feedback | Brodbeck and Girardin | Proprietary | Questionnaires distributed to customer of the public transport network |
Chen et al. | Public | Amazon.com reviews | ||
Ziegler et al. | Proprietary | Unspecified textual customer feedback data | ||
Oelke et al. | Public | Amazon.com reviews | ||
Wu et al. | Public | TripAdvisor data used | ||
Hao et al. | Public | Twitter data | ||
Saitoh | Proprietary | Web scraped customer review data | ||
Kang et al. | Proprietary | Combination of production and customer service data direct from manufacturer | ||
Fayoumi et al. | Proprietary | Web scraped social media data from Twitter | ||
Haleem et al. | Proprietary | Web scraped customer reviews | ||
Saga and Yagi | Public | Customer feedback collected from web crawler using specified keywords about the examined product |
3.2. Business Ecosystem (BE)
This section exemplifies literature that places emphasis on a complete business ecosystem from an external perspective without the specific focus on a single body within the ecosystem. These ecosystems can be collaborative, competitive environments, industries,
or stock market-based. The focus is often placed on an overview of the complete ecosystem. This classification is dominated by research from Rahul C. Basole, who specialises in Business Ecosystem visualisation, and is associated with eight papers
of the seventeen in this field.
Primary Data as Intentional, Active Research Study (BE): Here, we present a visualisation paper that performs a research study using business ecosystem data. The ecosystem is represented by stock market activity. Merino et al. presented a
user study evaluation of different visualisations and identify which are more suited to visualising large amounts of stock market data. This is done through the "Task-At-Hand" interface which offers a selection of visualisation techniques to display
the data and user options that incorporate brushing and linking.
Traditional visual designs range from bar charts and line graphs to pie charts and tables. These visualisation techniques are easy to understand but are limited in the depth of data that they can display. The Geometrically-transformed group contains visualisations
such as parallel coordinates and pyramid representations of data. These are useful for identifying trends in the data but, again, it can be challenging to interpret more complicated datasets.
Iconic displays map attributes for complex datasets. Glyph-based approaches depict a large number of data attributes, however, they are not appropriate for large datasets due to overplotting. Pixel-based designs depict the benefit of conveying a large
number of data points but present a challenge when positioning the data points being plotted. Stacked Display techniques are designed to present hierarchical data. Treemaps are often used for visualising data of this nature. The space-filling approach
enables useful analysis of a dataset; however, it is dependant on the algorithm used to generate the treemap as varying aspect ratios can have a negative impact on utility.
Each design category is assessed by a range of criteria. See Figure 13 for results. The visualisation technique categorisation was identified by Keim and the criteria by which the designs are judged were identified by Zhou et al.
Figure 13. The results table for the task evaluation. The "Charts" category scores highly, whereas the "Parallel Coordinates" scores relatively poorly. Evaluation criteria such as "Correlations" only scores highly in the "Charts" category. This shows the limitations of more elaborate visualisations, and the value of simplicity. Image courtesy of Merino et al.
Basole et al. performed a study that evaluates the effectiveness of visualisation methods on business ecosystem data. The three presentation types examined – list, matrix, and network – are the most popular forms of ecosystem visualisation. Seven criteria
were used to evaluate the visual representation: ease to learn, ease of use (beginner), ease of use (intermediate), speed of use, speed to learn, control over analysis, and flexible capabilities. See Figure 14.
Figure 14. The results of the study. Network outperforms the other two methods, but the matrix method provides a consistently quick and useful visual tool as well. Image courtesy of Basole et al.
The results found that the network outperformed the other two in almost every criterion, other than the ease to learn. Once the cost of training has been expended, the network is a far superior method of displaying business ecosystem data.
Hybrid Web-scrape (BE): In the following category, web-scraped data are used to visualise the business ecosystem. Lu et al. outlined methods that use visual analysis to predict box office success. Social media data are used to derive predictions,
ranging from Twitter data to Bitly link data mining.
To obtain customer data about the movies, tweet data are mined using keywords related to a given movie. In addition to this, IMDb is used to collect numerical data about the film. All data are collected two weeks before the release date. Both the
volume of tweets and the content are taken into account when analysing the tweet data. Sentiment analysis is performed on the text so that the film reception can be assessed and the qualitative data are quantified. A word cloud of the most commonly
used words where colour is mapped to sentiment enables users to see at a glance how positively the twitter community ranks the film. A timeline visualisation shows the magnitude of positive or negative tweets over a two-week period before the release
of the film. This enables marketing campaigns to be measured through a new medium. Both film review scores and the film revenues were predicted from the data. Despicable Me 2 has a predicted score of 7.8 and an actual score of 7.9, whereas the predicted
five-day revenue for the film is $116.5 m and the actual five-day revenue is $143 m. This shows the score is far easier to predict than the revenue, but the system can calculate a rough estimate.
Prediction models for the film industry have previously been worked on. The relationship between movie review and revenue is also examined. Asur and Huberman found that the volume of tweets relating to a film had a direct relationship on the revenue,
accounting for 80% of the variance in prediction.
Secondary Data as A Priori Database (BE): The literature in this section contains a number of visualisation research papers on the business ecosystem where the data are taken from pre-existing (a priori) databases. This is by far the most
common data source for an ecosystem visualisation paper. Wu and Phillips presented a visual design of the 2008 financial crisis. Using stock prices and news headlines on a timeline leading up until the crash, the user can identify relationships between
financial news headlines and the Dow Jones Industrial Average (DJIA).
The user is presented with a dashboard-style visual design that uses brushing and linking techniques to show different aspects of the finance evolution. The dashboard is made up of three components. The bubble motion chart depicts the influence of news
articles on the trading of stocks. This visual design projects a live animation to present the data. A radial plot displays the frequency by which important words are used in the headlines of financial news articles. Words that result in a positive
impact are drawn in green. The News Events Bar Chart shows a simple chart that indicates the number of news articles written about each of the chosen companies up until a given point.
Lux presented the first overview of financial visualisation, which Wu and Phillips extended. Merino et al. analysed different visualisation techniques for financial information to discover those most suited to the data. Keim et al. suggested that charts
are the most efficient method of presenting financial data.
Basole et al. presented an overview of the business ecosystem using the example of the mobile industry. To achieve the overview, Basole et al. presented dotlink360, an in-house visual analytics application that utilises data from pre-existing finance
and business databases as well as current news articles to explore a businesses' ecosystem. This involves exploring the connections between companies and the types of connections, the difference between companies in similar market position, and how
these positions have changed over the years. See Figure 15.
Figure 15. The dotlink360 application. This network clusters firms by industry. The position of the nodes is based on the weight of their involvement within each of the industries listed around the perimeter. The user can select a firm to see their involvement with other companies within their ecosystem. Image courtesy of Basole et al.
Previous software that attempts to yield similar insight into the business ecosystem does not take into account the dynamic and complex data involved. This research takes a more holistic approach to the data analysis which encompasses the complex data
landscape "from core to periphery". Emphasis is placed on the user interaction design to ensure a practical use in the field. The dotlink360 incorporates a range of visual designs, namely a "periscope view" that maps the companies in a network that
are clustered by industry.
Few citations to previous related work are provided here, most notably Basole's visualisation of interfirm relationships in a converging ecosystem.
In a later paper, Basole et al. also presented methods of visualising relationships between firms within an industry that enable analysis of the surrounding business ecosystem. The networks that link businesses together are the focal point. By extension,
the designs convey a holistic competitive insight into the ecosystem beyond the traditional single market view. See Figure 16.
Figure 16. The Path view in the connectivity perspective of the software. Clusters of companies can be seen as market segments where the links between nodes signify the company agreements. Image courtesy of Basole et al.
The software primarily focuses on the agreement portfolios of an entity, defined as either a company, market segment, or country. An agreement is classed as an official interaction between two entities whereby a decision has been processed.
The connectivity perspective is the primary window that shows the set of connections between firms. See Figure 16. There are four types of view in the connectivity perspective: Path, a network view that shows connections between companies; Segment,
a view that depicts a company's position relative to its market segment (company focused); ScatterNet, a node-link diagram combined with a scatterplot to show company-to-company agreements; and Geography, a view that maps the physical location of
the companies on a map.
The foundation of research in this paper stems from the development of network analysis software that shows static imagery of organisational networks. The business process analysis component of the research has a foundation in previous business visualisation
tools. Knowledge management and discovery tools were also available but in separate software packages.
Schotter et al. explores international business relationships through the creation of interactive dashboard designs. Data are overlaid on a google map view which represents the pathways in which Japan is expanding their business throughout Asia. This
interactive map shows the "communities" within a network, where nodes share a higher rate of connections than in other areas. Additionally, heat maps and hexbinning (tile-based heat maps) are also used to show the relationships between these communities.
The paper presents a range of visual designs that utilise the plotting of meta-data over a google map window, or through standard visualisation methods such as matrix plot and chord diagrams. These dashboard-like designs focus on the nature of the relationships
between business entities, and the communities within an ecosystem.
Basole et al. presented ecoxight, an interactive network diagram displaying business ecosystem data. The network is constituted of two components, nodes and edges. Nodes represent either a company, API, or investor within an ecosystem and edges which
represent the relationships between the nodes. Each edge has a source, target and weight. User options enable interaction through data filtering and visual control of size, colour, and shape of nodes within the network. Multiple views enable the user
to explore the data from different perspectives.
Secondary Data as a Business Process (BE): This subsection features two visualisation papers that focus on the business processes involved in the business ecosystem. Both use data that were not collected originally by the visualisation researchers
(secondary data). Basole utilised visualisation tools to enhance understanding of a business's position in the global market by exploring supply chain network process. The approach in creating the tool is to co-author the design with the corporate
users to ensure the use case is being adhered to.
Three scenarios are devised and tools are created to cater to each of their user requirements.
The global supply chain is an integral process to the success of a business. Poor optimisation of this process can wreak havoc on the finances of the business in the future. The tool depicts the risk involved in managing a global supply chain using a
network graph to show suppliers and the relationships between them. Shared suppliers are seen as a higher risk and are therefore highlighted. Through this visual design, we can clearly see what suppliers should be used to minimise the risk of chain
disruption. See Figure 17.
Figure 17. The force-directed network visualisation of ecosystem convergence. The nodes represent market segments and edges show inter-firm relationships between the segments. Image courtesy of Basole.
It is essential for corporate management to have a detailed and accurate view of the competitive market in which they are engaged. A force-directed network is used to achieve this. A user can view the market ecosystem as a whole which enables informed
decision making. Venture capitalists are highly sought after in the business world. Understanding the motivation behind the investments these people make is integral to gaining funding from them. Here, the previous force-directed network is adapted
to show venture capitalist activity within those markets.
Previously, corporate visualisation has been studied, but no focus is placed on intelligence tools.
As business globalisation increases, the process of supply chain management is becoming increasingly more complex. Large supply operations are often subject to delays and susceptible to extraneous environmental factors outside business control. Basole
and Bellamy provided insight into visual forms of risk analysis for supply chains. Network graphs are used to depict the risk across all supply chain visualisations. There are two main focus areas: firm level and industry level.
For firm-level analysis the network graph places the firm at the centre of the layout and all firm connections are shown within a circle radius connected by relationship curves. See Figure 18. The connecting firm's dependants are then shown on the next circle radius. This continues until a map of the supply chain is built around the first firm. The industry level visualisation places all firms in an industry as nodes around in a circle. Relationship edges connect the firms and colours are used to map the risk level. Node size is linked to topological importance.
Figure 18. A three tier supply network of a major electronics manufacturer. The inner layer represents the primary supply dependants of the manufacturer, the middle layer represents the secondary dependants (primary dependants for the inner circle of dependants), and the third layer represents the tertiary dependants (primary dependants of the middle layer of dependants). Image courtesy of Basole and Bellamy.
Decision support related research has been carried out in the field of network topologies. There is previous research investigating workflow visualisation, identifying influential nodes in topological networks, and identifying unhealthy supply networks.
Secondary Data as a Business By-Product (BE): In the following, business by-product data are used to depict the business ecosystem. Business by-product data often come in the form of transaction and sales records. Ko et al. created a visual
analytics system that draws a comparison of two competing businesses, displaying trends, growth rates, sales, etc. The software utilises PoS (Point of Sale) data to create the analysis and design. Visual features of MarketAnalyser can be seen
in Figure 19.
Figure 19. The main window of MarketAnalyser. The left panel contains all the filtering options and colour legends. The main screen space conveys the pixel based comparisons, and the right ride panel contains the geographical view. The bottom panel contains the stacked bar view and time sliders. Image courtesy of Ko et al.
A screen space saving layout is implemented to display sales, growth, and trends for each of the chosen companies. Filtering options enable the user to prioritise stores through a cumulative method of filtering companies using sales or trends. The user
can choose the most suitable company to compare with their data. Figure 19 on page 32 shows the geographical view of the software. This enables the user to see the geographical sale locations for each company. The colour of each region shows
the direction of the trend in sales. The stacked graphs in Figure 19 can show relationships between different product purchases. It can highlight combinational trends of multiple products between businesses. Sliders are used to select time
intervals.
Market trends and financial forecasting are traditionally conveyed using the standard set of visualisation tools such as bar charts and line graphs. Treemaps are sometimes used to represent market data. The design of the display matrix utilises elements
from Keim's work on pixel-oriented visual designs.
3.3. Customer Centric Literature
This section presents the two-part customer centric literature. Here, we examine visualisation research that focuses on the customers involved in the business, as opposed to the business itself. Table 5 shows the number of research papers appearing
in each business classification of the survey by year. We can observe a general increase in the number of publications over time.
Table 5. This table shows a temporal classification of the papers included in the survey by year. Primary papers included in the survey are labelled in green and secondary papers are labelled in yellow. See Section 1.5 for a description of our primary and secondary papers. The secondary classification used in the survey is shown in Table 1.
Business Intelligence | Business Ecosystem | Customer Centric | |||
---|---|---|---|---|---|
Internal Intelligence | External Intelligence | Business Ecosystem | Customer Behaviour | Customer Feedback | |
1997 | Wright | ||||
1999 | Wattenberg | ||||
2003 | Gresh and Kelton Eick |
Brodbeck and Girardin | |||
2004 | Hao et al. | ||||
2005 | Burkhard | Woo et al. | |||
2006 | Vliegen et al. | Hao et al. | Merino et al. | Chen et al. | |
2007 | Keim et al. | ||||
2008 | Ziegler et al. | ||||
2009 | Otsuka et al. | Bresciani and Eppler Bertschi |
Otjacques et al. | Oelke et al. | |
2010 | Wu and Phillips | Wu et al. | |||
2011 | Sedlmair et al. | Basole et al. | Hanafizadeh and Mirzazadeh | ||
2012 | Kandel et al. Du et al. |
Ko et al. | |||
2013 | Aigner Broeksema et al. Bai et al. Lafon et al. |
Ferreira et al. | Basole et al. | Hao et al. | |
2014 | Nicholas et al. | Basole Deligiannidis and Noyes Basole and Bellamy Lu et al. |
Shi et al. Rodden Yaeli et al. |
Saitoh | |
2015 | Keahey | Basole et al. | Dou et al. Kameoka et al. Nair et al. |
||
2016 | Roberts et al. | Liu et al | Iyer and Basole Basole et al. |
Wu et al. Nagaoka et al. Sijtsma et al. |
|
2017 | Ghooshchi et al. Kumar and Belwal Bachhofner et al. |
Ramesh et al. | Schotter et al. | Kang et al. Fayoumi et al. |
|
2018 | Lea et al. Roberts et al. |
Basole et al. | Sathiyanarayanan et al. | Haleem et al. Saga and Yagi |
3.3.1. Customer Behaviour (CB)
The sub-category of customer behaviour encompasses literature that focuses on profiling customers or potential customers in an attempt to observe or predict customer behaviour. This type of analysis has become progressively more popular in recent years
due to the availability of suitable data (see Table 5).
Primary Data as Intentional, Active Digital Collection (CB): The following research conveys geo-location data collected through hardware that is used to track customer behaviour. Yaeli et al. analysed the movement of customers shopping in
retail stores. The data are collected from mobile devices capable of WiFi, NFC, and Bluetooth in the target area. Analytical visual interfaces are used to explore the customer path in the retail store. The analysis aims to provide a better customer
in-store experience and to improve business decision making.
Customer value is derived by combining the motion data with other meta-data such as point of sale information. Customers who visit the store more frequently or make more significant purchases are clustered into a group with the goal of targeting these
customers more. Other segmentation criteria taken from the original tracking data outline the customer's path through the store, e.g., in a department store, some customers browse the entire inventory while others go directly to one section of the
store.
The data used in the design are department store data. A geospatial map of the store is drawn with each department labelled. Customers are given an entry point and then arrows depict the path that each customer makes through the store. Contrary to popular
belief, it is shown that low-value customers are more likely to walk at random through the store, whereas high-value customers take a direct and efficient path around the store. See Figure 20.
Figure 20. These designs show the path taken for typical high value and low value customers. It is immediately evident that high value customers are more methodical in their walking path, possibly with a specific product to purchase in mind. Low value customers are more likely to randomly choose the next route direction. Image courtesy of Yaeli et al.
In-store path dependency analysis has been widely researched. However, there is no noted related work for the visualisation of these data.
Primary Data as Intentional, Active Research Study (CB): This subsection presents visualisation research that explores customer behaviour through research-based user studies – primary data collected by the visualisation researchers themselves.
Dou et al. presented an analytics system that provides insight into social, economic and behavioural issues through demographic analysis. An online survey is used to collect textual data along with the correct demographic information about the respondent
so that the predictive visitations can be tested. The software utilises a number of visualisation methods and their effectiveness is analysed and ranked based on their ability to predict demographic information.
The ultimate aim of this system is to answer three questions: What are different demographic groups interested in? Which demographic groups share interests? Can we successfully classify online users into the correct demographic?
To address the first question, a parallel set diagram is combined with a word cloud such that the user can see a customer segment or demographic (shown with the parallel set) alongside their interests (shown with the word cloud). The user can select a
demographic in the parallel set and the word cloud will update with that segment's interests. See Figure 21a). The cluster view maps each demographic against their interests to see if there is any overlap between demographic and interests. See Figure
21b). A tabular view shows the most common textual attributes associated with each demographic variable. This addresses the third question.
Figure 21. (a) Parallel set + word cloud visualisation; and (b) the User Cluster visualisation. Image courtesy of Dou et al.
Parallel sets are often used to visualise categorical or demographic data. Previous research focuses on the visualisation of text data without the linking of demographic data alongside them.
Hybrid Web-scrape (CB): The research in this section uses web-scraped data to derive and study customer behaviour.
Shi et al. presented a visual analytics system that tracks user search engine loyalty and the behaviour of users switching between different search engines. A new visualisation method "Flow View" is presented that develops its structure based on a flow
metaphor. A density map and a word cloud are also used to further the understanding of the customer loyalty behaviour. See Figure 22.
Figure 22. The Flow View of the LoyalTracker software. The depth of each layer represents the level of loyalty the users portray. Curved lines are used to convey the inflow and outflow of customers for each loyalty layer. The flow shows a weeks worth of data for each cycle in the visual design. Image courtesy of Shi et al.
Inspired by an infographic by an XKCD Munroe comic, the flow view presents user loyalty by separating the flow out into different degrees of loyalty. The transitions between these layers show the trending changes in customer loyalty. Highlighting the
link between loyalty and satisfaction, the density map plots loyalty on the x-axis and satisfaction on the y-axis. A word cloud displays the keywords searched in each search engine where transitions between search engines can be seen.
Web behaviour visualisation is a mature research topic. Trees have also been used to depict these data. Search engine switching behaviour has been widely researched, though Randall Munroe's webcomic XKCD inspired the visual design itself.
Sijtsma et al. presented TweetViz, a tool that utilises twitter data for the purposes of customer feedback and business intelligence. A dashboard approach is taken that visualises the geographical location of the business alongside the tweet feedback.
The user can select what company to view, and then all of the stores from that company are marked on the map. The sentiment of the tweets about these stores are computer and displayed to the user through the colour of a marker on the map. This system
enables the user to quickly find "problem stores" and then read the twitter feedback of the customers who have been there. User options provide filtering of customer demographics and competitor comparisons.
Twitter data have often been used for sentiment and business analytics purposes; the novelty of this application is that it collects customer opinions of physical stores and visualises the sentiment behind them, removing the need for a review to be explicitly
written.
Secondary Data as A Priori Database (CB): This subsection contains research that derives and depicts customer behaviour from pre-existing secondary databases. Woo et al. presented a method of conveying customer targeting data using a heat
map. This method depicts value distribution across customer needs and characteristics which helps the planning of a customer-oriented business strategy.
A pre-existing database appears to have been used for this research.
A customer need is interpreted as any voiced opinion about a product. This can be represented linearly by taking the volume of expressed opinions without a positive or negative correlation. The customer characteristic refers to a linear scale that can
quantify a customer. Using a number of metrics, customers are placed on a single sliding scale. The customer map is drawn on a 2D plane with one axis representing a key customer characteristic and the other a key customer need. Each customer will
occupy their own space on the x–y grid and therefore a heat map can be created to see where the customers fall. Clusters of customers can be identified as well as trends in the data.
Customer segmentation visualisation methods previously use neural networks to draw a self-organising map. Mulhern suggested a framework that emphasises the importance of segment-based target marketing as calculated by visually inspecting profit curves.
Hanafizadeh and Mirzazadeh presented methods for displaying market segmentation data to inform marketing strategies. Utilising a pre-existing database, clustering techniques alongside self-organising maps are used to segment customers into behavioural
demographics. Each cell of the SoM grid is coloured using a combination of RGB values which are derived from the attributes associated with the customer such as education level or income. These customer nodes are arranged into clusters, isolating
groups of similar purchasers. This segmentation enables the user to see distinct groups of customers and the motivation behind their purchases.
The self-organised map was introduced by Kohonen in 1981 and is used to present data of high dimensionality. The maps have previously been used to present business-related data.
The self-organising map was more recently explored by Kameoka et al. who continued this area of research by visualising customer segmentation data, as opposed to market segmentation data. The dataset that was used contained over 100k members of a
Japanese supermarket loyalty scheme. The SoM clusters loyalty club members into unique categories which can be used to more effectively market selected products to the different groups.
Wu et al. presented "TelCoVis", a visual analytics system that highlights behavioural patterns in potential customers through data obtained from China's largest telecommunications company. This system focuses on co-occurrence (people from two regions
visiting the same urban space during the same time span). See Figure 23 on page 38.
Figure 23. The contour-based treemap view: (a) the circular segmentation of time throughout the day; (b) the temporal contour of visits to a given location.; and (c) each sector represents a home location of the visitors during the time frame. Image courtesy of Wu et al.
They used two visual designs to convey this data. Firstly, a contour based treemap view that divides radial space into time segments and renders a contoured treemap within the space representing the distribution of visitors in that area. Secondly, a geospatial
heat map shows the flow of people into and out of a given area. Domain expert feedback attests to the usefulness of these in the context of business intelligence.
The visualisation of mobile phone data is an established area of research. Deville et al. presented population mapping, the data have even been studied for the purpose of fire and rescue services. The movement behaviour aspect often employs heat map-like
rendering to address the geospatial data as well as the intensity data.
3.3.2. Customer Feedback (CF)
This subsection discusses the visualisation of feedback data. The papers focus on aspects of the customer experience such as satisfaction or opinions. These visual insights aim to adapt the business product or service to better suit the customer expectations.Primary Data as Intentional, Research Study (CF): This subsection presents a paper that use interview studies to visualise customer feedback – data collected by the visualisation researchers. Broadbeck and Girardin presented a visualisation
tool that uses parallel coordinates combined with a tree structure to analyse customer feedback data.
The design of the tool maps the data to parallel coordinates and then implements a selection system that converts the customer survey data to a hierarchical structure by clustering questions by dimensions. The user can select subsets of the data according
to these dimensions. The three dimensions are quality criteria (questions), quality dimensions (segmented), and indices (individual data records). Primarily, the contribution lies in the combination of the data selection tool with the traditional
parallel coordinate diagram and the creation of an easy-to-use visual analytics system. See Figure 24. The tool primarily uses parallel coordinates but introduces a hierarchical tree nature to the data. Using the two together, brushing and
linking is also incorporated.
Figure 24. The three parallel coordinates represent each layer of the hierarchy. The file system tree-view below shows the data selection system where the user can choose subsets of the data to visualise. Image courtesy of Broadbeck and Girardin.
Hybrid Web-scrape (CF)
In the following, many research papers use web-scraped data to visualise customer feedback.
Ziegler et al. presented a system that analyses textual customer feedback data from an unspecified online feedback website. The system uses clustering techniques to analyse short segments of data-mined text to provide a quantitative context for customer
feedback. Treemaps are then used to convey these data and the nodes are automatically labelled. See Figure 25.
The top view of the visual layout provides an overview of the most frequently cited topics and issues by presenting them in clustered treemap nodes. Colour is mapped to sentiment and the size of the nodes is mapped to the volume of feedback within a given
cluster. Treemap nodes are formed from clusters of similar feedback. These nodes are labelled by the groups of words found within the clusters. To ensure that the labels are legible, the orientation of the label is rotated to match its longest axis.
Twenty-three people were shown the system and were asked if they were more confident with the treemap or being presented the data in a list format. Fifteen opted for the treemap while six indicated they preferred lists. Two users were indifferent. This
supports the use of visual interfaces in a business data analysis environment. The treemap design and layout is taken from Bederson et al. The textual analysis and clustering is based on the work of Ponte et al.
Oelke et al. presented an approach to visually analyse large quantities of customer review data scraped from online sources. The research takes a holistic approach to calculate the customers' opinions, breaking down each component attributed to the
product and producing an overall product score. The visual designs enable quick overviews to be made of the data as well as clustered comparisons of similar reviews.
The visual summary reports offer an overview of the customer feedback. Using a matrix grid that compares multiple products against a range of features, the design enables the viewer to look at either the most favourable product or their most valued attribute
to inform their purchasing decision. See Figure 26. The cluster analysis uses 2D scatterplots to categorise the reviewers. The table sizes vary depending on the number of people within a cluster.
Figure 26. The visual summary report. Each row represents a product (in this case, printers), and each column represents an attribute of the product. The internal square in each of the matrix compartments show how many reviewers commented on this attribute. Image courtesy of Oelke et al.
Chord diagram and parallel coordinate hybrid plots are used to show a detailed view of the data. The Hybrid can be used to highlight correlations between different attributes of the customer feedback. The left half of the chord circle shows the values
for each product attribute and the right half shows the product score. Edges that connect the two join via a centre axis that represents each product. Filtering processes enable the user to identify trends and observe the customer opinion of each
product.
Previously, Gamon et al. presented Pulse, a clustered visualisation technique for displaying customer review data. This calculates the average review per cluster and incorporates a treemap. Gregory et al. focused on sentiment analysis but did not only
calculate a positive or negative outcome. Instead, they identified more detailed thoughts such as pleasure, pain, power, conflict, etc. These data are rendered primarily through radial plotting methods such as the rose plot.
Wu et al. presented OpinionSeer, interactive software that presents customer feedback of hotels. The opinion data are mined from Trip Advisor, a popular venue review website. The main focus of this visualisation is the opinion wheel. See Figure 27 on
page 41.
Figure 27. (a) The OpinionSeer temporal rings at different scales. Each layer represents a year and the colour depicts the feedback. (b) The reviewer location linked in with the time of year the review is made. The user selects the location from the outside ring to see the feedback from that country. Image courtesy of Wu et al.
The vertices of the opinion triangle represent the reviewer's disbelief, uncertainty, and belief. The three component values are weighted so that one coordinate can be plotted to represent the value for all the variables. This triangle lies at the centre
of the OpinionSeer. The software offers multiple options for the ring of the radial layout. Some discretise the ring into sections representing the age demographic of the reviewer and then colour the section according to the average scores by the
reviewers. Some use a layering system to present the same information and variations of these to present the demographic of reviewers along with the review data. The interactive software is designed such that users may analyse the data at different
levels of detail. The user can focus on one aspect of the hotel service and compare these, or rather look at the hotel overview data. In addition to the opinion triangle and rings, a word cloud is used to analyse the textual customer feedback data.
Previous work in the field often uses standardised bar charts to present customer feedback data. Scatter plots have been suggested to present positive/negative customer data. Similar software systems have been created but are less flexible and are not
as versatile.
Hao et al. utilised the abundant resource of social media feedback by analysing the customer reception of products worldwide. The customer sentiment is analysed from the text data and then visualised using a geo-temporal map.
The data are mined from Twitter due to the ease of access, quality of meta-data such as location, etc., subject availability from hashtags and concise messages limited to 140 characters. Given that the text streams are unpredictable, each noun or compound-noun
in the message is run through sentiment analysis algorithms and an average sentiment value is derived for each message. Once the sentiment is quantified, the values can be mapped (depending on the quality of information) to a geo-spatial heat map.
In addition to the geo-map, a matrix plot-like view can visualise the location against the days in a month. This temporal view can show the change in product sentiment over time.
The analysis of Twitter data is thoroughly studied, as is sentiment analysis from text data. Popescu and Etzitoni proposed a noun filtering method for text analysis so that message noise is removed. More semantic rules were developed to improve the analysis
quality from the data. Feature-based sentiment visualisation was developed for customer feedback streams and forms the basis of Hao et al.'s research. Saga and Yagi researched network visualisation of customer expectation through use of a web
crawler that collects feedback data from the search engine Bing. The results are automatically collated into an expectation network which presents words from the feedback as nodes in the network and the connecting edges represent the relationships
between the words. The case study in this paper explores customer expectations for coffee products, and how different segments of customers prioritise different qualities in the coffee such as bitter or richness. The network provides a holistic viewpoint
from multiple customer perspectives.
4. Discussion and Observations
Throughout this survey, we examine the trends and driving forces behind business-oriented visualisation. We identify a range of varying classifying features for the research (see Table 1 and Table 5). Ultimately, the goal for businesses
is to generate a profit. This can be done through improving the efficiency of internal processes (internal intelligence), identifying the actions of competition (external intelligence and business ecosystem), or improving business-customer relationships
and therefore increasing sales (customer feedback and customer behaviour).
The most popular data source was from pre-existing databases showing business ecosystem data. Rahul C. Basole is a significant contributor to this field, who is associated with over half of the papers in this collection.
The second most popular primary data source was used in the field of internal intelligence, demonstrating the relative affordability of internally generated data for the purpose of research. This is often qualitative study data that evaluate certain
operations of the business. Internal operations are more easily accessible than external operations, and so produce more research.
Customer-centric visualisation literature has seen a shift from customer feedback to customer behaviour research (see Table 5). Prior to 2011, five customer feedback visualisation papers were published in contrast to just one customer behaviour visualisation
paper. Post-2011, just one customer feedback visualisation paper was published and six customer behaviour visualisation papers were published. We speculate that this decrease in feedback-driven analysis and increase in behavioural driven analysis
is strongly related to the increasing availability of GPS data from devices such as smartphones. This idea is reinforced through the timeline by which smartphone GPS data started to be utilised in research. The benefit of tracking customer behaviour
over collecting customer feedback is that it provides an unbiased view of the consumer. Response bias can skew analysis and warps the decisions made based off of subjective data.
Our data classification reflects the business ethos of cost reduction. Primary data are expensive to collect, especially on a large scale. Even in the customer-centric fields where accurate, up-to-date information on the consumer is essential for successful
business operations, we find that instead the researchers opt for web scraped data – sacrificing data quality for data quantity. This sacrifice in quality for quantity might be attributed to the advancements in big data utilisation. Instead of running
costly studies and questionnaires that detail the thoughts and feelings of a smaller number of potential customers, a business can base their decisions on lower quality feedback from a large number of potential customers assuming that their data is
representative of useful information and knowledge.
In the case of web scraping customer feedback, many data already exist publicly on the Internet. Creating a web scraper automates the collection and organisation process, substantially reducing the cost. While the quality may not be as high as primary
data, the quantity often compensates for that. No secondary data sources are used in the customer feedback classification, presumably due to the time-critical nature of customer interactions. A company would not wait until the data already exists
to analyse their target market. In addition, the nature of these data is very niche, i.e., it is likely that the data would only be of use to one company.
Webscraped data are seldom used for business intelligence. This is interesting as it highlights the relative distrust companies have for this data regarding the operations of their business. Primary data would be trusted the most, and secondary data sources
can at least be validated. However, the unknown nature of online data is enough to prevent companies from deriving actionable intelligence from it unless the data are in the form of customer feedback.
Secondary data sources are far more popular across most fields. Particularly in business ecosystem research, which overwhelmingly uses pre-existing databases – presumably due to the ease by which existing databases can be accessed – but also due to the broad
utility of business ecosystem datasets (see Table 5). The data are often used as a case study or proof of concept for visualisation techniques where the emphasis is placed on the visualisation techniques and not the business insight. However,
it is still important to observe how the business data are being utilised through visualisation. If the visual design is successful in presenting the data in a meaningful way, then insight should naturally follow.
Table 1 highlights some gaps and trends in the data sources used in the field of business visualisation. The most notable trend shows that customer feedback data are heavily dependent on webscraped sources. The table also shows a significant proportion
of the data used in the research is not publicly available. While this may seem disappointing at first, it suggests that the data were explicitly made available to academic researchers or the research was performed by professionals in industry. This
increases our confidence that the visual designs were created with the goal of business insight in mind as the work was, at least in part, collaborative.
We planned a classification that identified the target user for each visualisation, however, the majority of the research did not specify who would benefit from interpreting the visual designs. Assumptions could be made, but not without some subjective
guesswork. This is interesting because an ill-defined target audience suggests the research to be mostly experimental and in its early stages of maturity. This suggests that, if successful, the field of business visualisation could experience more
growth in the future once the effectiveness of the field has been validated.
Other attempted classifications were also stricken with similar issues. We also looked at the different industries with which the data were associated. However, few trends were available using this taxonomy and often the industry was difficult to classify due to vague descriptions – possibly due to businesses wanting to keep details of their data undisclosed. We also attempted a "visualisation type" classification that based the structure of the survey on the visualisation techniques used in each paper. Though the taxonomy was not particularly useful, we observed that older research was more inclined to use 3D visualisations with more recent publications focusing on two-dimensional graphics – although the data are not conclusive. This topic was discussed at the IEEE VIS 2014 conference during the "2D vs 3D" panel. We already observe an increase in visualisation adoption (see Table 5 and Figure 28 on page 44) whereby newer publications are being published in non-visualisation journals, but still focus on visualisation.
This is a reliable indicator of adoption as business oriented journals are publishing visualisation centred papers.
5. Future Work
Among the most recently published of the literature, both Roberts et al. and Wu et al. visualised telecommunications data. The future work discussions in each of these papers emphasise the need for more features to be added to their software
in an attempt to extract all the value from the data. Each additional useful feature provides the user with an advantage over their competition. This feature-rich desire is prevalent throughout the literature. We also observe the desire for deeper,
more meaningful data evaluation in the future work of the field – quantitatively evaluating the current visual features and their impact on real-world scenario data. The final outstanding action noted in the future work is to improve the current features
that have been implemented.
Another interesting classification of the literature presented here could be based on user-tasks. We have identified this as future work. We also believe that a survey of machine learning techniques combined with visualisation is a promising direction for future work given the rise in popularity of machine learning. At present, commercial breakthroughs appear to be kept secret to provide a competitive edge. However, as the field evolves, it would make a good survey topic in itself.
These directions for future work highlight the differences between the academic and business agenda. Businesses will focus on the development of software with maximised utility, whether that involves focusing on current features and refining them or adding new features that complement the existing system. Academics place more emphasis on brand new ideas as opposed to finding the perfect existing solution to data problems. Therein lies the dichotomy of interest. We also note the lack of literature focusing on company performance – this could be due to the classified nature of business performance whereby the company want to keep this information away from the public. We believe this is a direction rich in future work. As the availability of data increases, businesses will be more inclined to utilise the analytical potential of data visualisation.
As the field of data visualisation matures, we can expect more businesses to begin the adoption process. Early adopters of visualisation such as IBM will enjoy a competitive advantage over the later adopters. As the adoption rate increases, the field should advance at a faster rate due to higher levels of popularity and interest.
6. Conclusions
In this survey, we present a unique overview and insight into the development of business-oriented visualisation in academic research and beyond. We use a novel classification of literature that enables readers to understand the state-of-the-art in this field comprehensively. We have outlined trends in the research and discussed the potential future of visualisation within the business world. The survey provides a valuable resource for both the visualisation community and businesses interested in adopting visualisation approaches.