Saturday, August 29, 2015

Data Visualization

Data Visualization

We have seen that the graph type that is the most useful cannot be completely derived from the characteristics of the data. The message of the graphical presentation should first be identified in order to make sure that the graph matches the goal of the user. As discovered earlier the goal of the user depends on the phase in the decision process and the state of the data. Furthermore, each graphical technique is linked to the effectiveness of the perception of an observer (Mackinlay, 1986).

Optimal graphical presentation


The process by which one moves from data to the optimal graphical presentation involves three successive steps as depicted in the figure below.



Graphical presentations show values compared to other values. The kind of message the information system might transmit to the user, leads to one of the five basic kinds of comparison (Zelazny, 1996):

  • Component: percentage of total
  • Item: ranking of items
  • Times series: changes over time
  • Frequency distribution: items within ranges
  • Correlation: relationship between variables

The state of the data


The message a graph could transmit depends on the decision phase the user is in, the state of the data and which measure and dimension types are available. However, at this time, I cannot make a valid and complete mapping to derive which message (and hence which type of comparison should be made) is appropriate at a given time. Formulating a message still requires human interpretation and processing. Consider the following information, which displays the percentage of January sales by product line.

Product line Region A Region B
Accounting 13% 39%
Education 35% 6%
Law 27% 27%
Tax 25% 28%

Table 9: January sales by product line and region

6 messages


This table with figures can be translated to at least 6 messages. If this data is displayed in two pie charts, the emphasis that the mix of sales is different for region A and B is shown. Depicting the data as two horizontal bar charts will stress that the figures of region A and B will vary by product line. However, which one is appropriate? It depends on the message an individual would transmit to a proposed audience. In the above table, we can see that the two top product lines ‘Accounting’ and ‘Education’ reveal the most variation (13% versus 39% and 35% versus 6%). The two other product lines are nearly competitive. In order to select the correct message and the corresponding optimal graphical presentation the user should communicate which aspect of the figures triggers him most. Since this requires an explicit action from the user (which should be avoided) we could skirt around this problem by attempting to apply some kind of intelligence.

Which presentations are useful


This kind of intelligence will first apply the expressiveness criteria in order to sort out those graphical presentations that are not valid representations of the data. Secondly, to determine the message, the system could favor variations over equals only and also whether the equal numbers do not historically vary over time and are already observed by the user. Thirdly, the mode the user is in can be used to determine which presentations are useful. Besides, the intelligent system provides the user with a few extra graphs. These are randomly shown at different locations and different times. I suppose that the graph type that unveils a message that will be useful information for the manager will be observed for longer than a graph type that is not very useful.

Data Mining


Combined with meta information, such as the degree of variation in the data, the navigation path walked through, the type of the dimension, the perspective, the type of the measure, the mode of the process and so on, this will be input for a data mining process. Data mining can distinguish those variables that give the best indication for the optimal graph type in a particular situation provided that all relevant variables are stored and available. It will further refine and optimize the graphical presentation. The outcomes of the data mining process will be made explicit in the form of newly discovered rules or a change of the uncertainty factors of existing rules.

No comments:

Post a Comment