Saturday, June 6, 2015

Criteria of expressiveness of the data

Criteria of expressiveness of the data

Variations that should be used for designing expressive graphs and satisfying the users information needs, should at least consider the following data characteristics (Roth and Mattis, 1998b):


Table of criteria


Dimension Property Description

Data type

Set ordering

This characteristic is the nature of the ordering relationship among values of a data set. There are three order techniques: quantitative, ordinal and nominal. Quantitative sets can be mapped to graphical components with a quantitatively varying visual dimension. Pounds are quantitative by nature and can be referenced by axis, by pie angles and so on. Ordinal sets reveal in themselves semantics and consist of an ordered number of items (easy, moderate, difficult), which classify themselves due to their position in the list. They can only be presented by graphical components that have no quantitative dimension (shape or color). Nominal sets (e.g. a set of phone-brands) can also only be mapped to components that do not vary quantitatively.
Coordinates versus amounts Coordinate data types (e.g. a calendar date, latitude, time-zone, district) need a frame of reference to be meaningful. These elements specify a point or location temporally, spatially or otherwise. Amounts (e.g. number of orders) can be interpreted without a particular frame of reference.
Domain of member-ship The domain of membership will integrate common conventions with the intelligent graphics system. A measure can belong to different domains of time, space, temperature or mass. This helps the system to preserve subtle stylistic conventions such as presenting time along a horizontal axis and temperature along a vertical axis.
Properties of relational structure Relational coverage This characteristic conveys whether every element of a set can be mapped to at least one element of another. For instance, every value for the cost attribute of an activity maps to a set of dollar amounts. If so, the relation has relational coverage. If not, it is has non-coverage. Bar and plot charts cannot display the non-coverage relations appropriate for quantitative sets. Non-coverage occurs when data is missing, a relation is not applicable and if null values are informative.
Cardinality Cardinality expresses the number of elements of a set that have a relation with an element of another set. The cardinality can be single-valued (1:1), fixed-multiple valued (1:n[..]) and variably-valued (1:n).
Uniqueness The uniqueness determines if one element of a set maps uniquely to an element of another set. Start-dates are not unique since multiple activities can start on the same date.
Relation-ships among relations Complex data types Some relations map to multiple values, each playing at different role. The period-of-employment relation maps between employees and two years: first and last year. Those composite relations can be displayed properly as intervals, statistical abstractions and 2-D co-ordinate locations.
Algebraic dependencies In most EISs many algebraic dependencies exist (e.g. total revenue = revenue department A + revenue department B and so on). Such dependencies, in combination with the amount of relations, can be used to choose the proper graph.
Type of relation Unary, binary and n-ary Unary relations express a single property possessed by some elements of a set but not others. Binary relations (Boolean) map one element of a set to a set of elements that are limited to two and are exclusive and opposite. N-ary relations map one element of a set to a set of elements that is not limited but consists of more than two elements.

These characteristics are primarily concerned with the criteria of expressiveness of the data.

Will the criteria lead to more effective graphs?


Will these syntactical characteristics lead towards effective graphics, in particular those used in EISs? EISs operate for the most time with aggregate data. The above characteristics are very useful for detailed and small data sets. However, they should be aligned to large and aggregated data sets that are delivered by data warehouses or multi-dimensional cubes.

Large data sets


Steven Roth has focused later upon extending those characteristics for large data sets (Roth, 1998). He integrates several data manipulation and decomposition techniques in order to deal with such large data sets. However, they still take the OLTP database as a starting point. Aggregations are dynamically built and so far, I believe that this work-around will bring us back to the drawbacks related to instantly querying the OLTP database.

Aggregated information


Furthermore, aggregated information will lose some characteristics mentioned in the table. For instance, the property of complex data types is not yet useful and can be left out when the period-of-employment is aggregated as an average or ratio. The average of a period-of-employment can no longer be decomposed to a first and last year of employment, so the need to detect composite relations in the data and to display them will decrease. However, the results of aggregate functions like maximum and minimum will hold enough detail to encounter the issue of complex data types. Those functions select the highest or the lowest element of a specific set and can be found back in the source set of data elements.

Number of arguments


Finally, the above table can be extended by two properties that are also important for designing effective graphical presentations. The number of arguments in a relation can be matched against the number of encoding components of a graph. Metres or indicators can express relations well with one argument. Relations that have two arguments (for instance Product and Sales) can be expressed appropriately by Pie, Bar and Line charts. Besides, the proportion of quantitative and non-quantitative arguments is also important.

Maps can only express non-quantitative arguments


Maps, scatter plots (which does not mean bubble charts in this context) and link diagrams, can only express relations with only non-quantitative arguments. I conclude that the above characteristics are very useful when single data elements should be displayed. This is rarely the case with EISs although managers might have the possibility to drill down to a certain level of detail. At that point, the characteristics are completely applicable and useful only and if there is a connection between the meta information of the EIS database and the OLTP database.

The expressiveness of the data characteristics does not guarantee that users are provided with useful and insightful graphs. Therefore, a second cycle of inferencing should be applied to ensure the effectiveness of the graphical presentation, which is discussed in the next paragraph. A third cycle of inferencing should consider the task of the user and the goal a user wishes to achieve. This subject is treated in more detail in one of the next articles.

6 comments: