Friday, July 12, 2013

Categories of a dimension

Categories of a dimension

At run time, a dimension is filled with categories. The number of categories can be an indication to:
  • Present a different type of graph;
  • Decide to use replacement instead of insertion of the categories of the level below;
  • Split the categories automatically into smaller groups.
If the number of categories exceeds the limit of 6, a pie chart would be inappropriate (Zelazny, 1996) or the rest of the categories should taken together in a single group ‘Other...’. Moreover, pie charts cannot be used if among the categories there are positive as well as negative numbers. In that case, a bar chart is a better choice.


More than 10 categories?


In addition, if the number of categories exceeds the limit of 10 and the window is be nearly completely filled with the categories of the sub level, then replacing is a better way to present the information. When using replacing, take care that the chosen category of the level above remains on the screen.

Four classes to break down


There are four classes of decomposition by which the interface could decide to break down categories into smaller groups (Roth, 1998b):
  • User-defined or predefined natural groupings: these can be defined interactively by the user or in advance as built-in knowledge. A predefined natural grouping is time where the system can decompose all days by using the year, quarter and month as natural levels. An analyst may decompose these days by holidays or non-holiday days;
  • Element frequency divisions: divisions are computed to have the same number of categories (equi-frequency);
  • Set interval divisions: groupings are computed to have the same interval size (equi-interval). For example: the car damages could be grouped with intervals of thousand dollars;
  • System-provided statistical or data-mining methods to partition the data into meaningful groups.

Exploit the underlying data model


Exploiting the underlying data model can prevent groupings that have too many categories. A multi-dimensional OLAP model consists of dimensions and measures. Dimensions are extracted or calculated from attributes of the OLTP database. Imagine a dimension Customer that contains thousands of categories and is linked to the attribute ‘Customer.Name’. The entity Customer in the integrated database can also have other attributes that make a useful decomposition of such a dimension possible. Of course, such attributes should be meaningful to the user.

Set of rules


Therefore, the use of such attributes should be carefully applied by using a set of rules:
  • Build up the hierarchy from the lowest level upwards.
  • Seek first for attributes that are defined as foreign-keys and are not already used for that purpose.
  • Seek among the attributes that pass step 2 those that have the highest distinct amount of categories. 
  • The relationship between the Customer entity and the foreign table should be n:1.
  • From those candidate attributes search for those that give the best equi-frequency.
  • Present the list of discovered attributes to the user in order to make sure which ones are meaningful or not.
  • The attribute most used at run time is apparently the most meaningful.
  • If the amount of categories for a given level still exceeds, for instance, the amount of 20, repeat above steps.
If all these steps fail to deliver a convenient attribute to break down categories into smaller groups, exception reporting and ranking will be proper solutions to filter those categories that are more significant than others.

No comments:

Post a Comment