Clickstream Data Warehousing
      Companion website for the book
      by Mark Sweiger, Mark Madsen, Jimmy Langston, and Howard Lombard

             Published by John Wiley & Sons, January 2002


Search this site or  the web
powered by FreeFind




    Site search
    Web search
A Clickstream Data Warehouse Meta-Schema
By Mark Sweiger, President and Principal, Clickstream Consulting

This is the fifth in a series of articles that focus on the application of clickstream data warehouse technology to an e-business environment. In this installment we introduce the concept of a clickstream data warehouse meta-schema that can be customized to fit any type of e-business.



The best way to understand the clickstream data warehouse meta-schema is to start out with a familiar brick-and-mortar Sales and Marketing schema, like the one below:

This schema has a central Sales fact table, surrounded by the standard dimensions-Time, Geography, and Product. To spice up this canonical schema, I have added a Promotion dimension for tracking the effectiveness of any advertising promotions. With a schema like this, an enterprise can get answers to questions like, “How do the sales by quarter for the top selling product in 2000 compare to the sales by quarter for the top selling product in 1999?” or “How much did the Millennium advertising promotion raise sales for its target products in the Eastern Region in the 4th fiscal quarter?”.



As late as 1997 or 1998, your enterprise could win an award for a good implementation of the above Sales and Marketing data warehouse, but this was soon to change with the arrival of the Customer Relationship Management (CRM) juggernaut. Interestingly, transforming this Sales and Marketing data warehouse schema into a CRM-oriented data warehouse schema requires the addition of only one new dimension, the Customer dimension:

With a CRM-oriented data warehouse schema one can answer questions like “What customers responded to the Millennium promotion in the 4th fiscal quarter in the Eastern Region” or “What was the difference in spending by customers who responded to the 4th quarter Millennium promotion, versus the 3rd quarter spending by those same customers (when the promotion was not in effect)?”. Furthermore, this type of CRM data warehouse is cutting-edge technology. If your enterprise currently has a successful implementation of this type of CRM data warehouse, it could win an award right now.

Let’s take the cutting-edge CRM schema and make it even more so by transforming it, step-by-step, into a clickstream data warehouse Electronic Relationship Management (eRM) meta-schema:

CRM sales facts become User Activity facts in the eRM environment. All user activity, not just sales events, can be recorded on the Web. Examples of additional types of user activity include casual browsing,



directed shopping excursions, registration events, ad click-throughs, site entry pages, site exit pages, time spent on a page, etc. User Activity Facts can be strung together into entire user visit histories, which then can be classified by type.

The CRM-style Customer dimension becomes the eRM User dimension in the meta-schema. Users come in many different types, including customers, casual browsers, directed searchers, advertising click-throughs, etc. With a User dimension you can ask business questions that distinguish between the different



types of users resulting in much richer business intelligence than is possible with a limited Customer dimension.

The CRM-style Geography dimension splits into three new dimensions in the eRM environment. Physical Geography is fairly analogous to the old-style CRM regional Geography dimension. The new Web Geography dimension maps the site entry and exit pages, including information about the external referring



sites, so that you can see who drives traffic to your site and where the traffic goes when it leaves. Site Geography maps the parent/child relationships between all website pages, creating a map of the site and how it changes over time.

The Time dimension splits into the Fiscal Time dimension and the User Time dimension in the ERM meta-schema. The enterprise-centric Fiscal Time dimension is analogous to the old CRM corporate Time dimension, but the new User Time dimension recognizes the worldwide



nature of user access, which is definitely not done according to traditional corporate time rules. Internet users can be in any time zone and their seasons and holidays vary widely across their time zones and Physical Geographies. The User Time dimension makes it possible to analyze user activity by specific time zone and seasonalities.

The CRM Product dimension splits into the Content and Activity dimensions in the eRM meta-schema. Because web sites are pure information entities, they are made up of different types of Content and sets of Activities that you can perform on that content. For example, if your e-business has a product-oriented site, then one class of content is your product catalog pages, but you also have many other types of content



including things like registration forms, user feedback forms, business transaction forms, downloads, advertising links, etc. Each type of content has a specific set of activities that correspond to it, and these are often interesting to analyze separately. Examples of activities include a simple look at a page, a clickthrough on an ad, abandonment of a shopping cart, a purchase transaction, entering the site, exiting the site, etc.

Many people have asked me if it is OK to put a Product dimension into their eRM schema, usually in lieu of the Content dimension. I think this is OK so long as you realize that other types of site content are not getting tracked when you narrow the scope of this dimension to only Products. There are many other types of content and there are many types of Web business models that are not just Product focused, which is why I prefer a more general model.

Finally, the Promotion dimension is split into the Internal and External Promotion dimensions in the new meta-schema. In the Internet environment, promotions can exist either internally within the site or externally outside the site, like banner ads and affiliated links. It is interesting to analyze each type of promotion separately, because the internal and external



environments are so different. Because of the wide-ranging nature of advertising engines, your banner ad can appear on sites you never dreamed of (the equivalent of a billboard in Beijing), and click-throughs from these exotic sites need to be separately analyzed from the “Internet Special” promotion you do within your own site.

I call this completed eRM schema a “meta-schema” because it needs to be instantiated with the measures and attributes that are specific to your e-business. You may even find yourself changing the name of a particular dimension to suit the particular business purpose. But before we can show an example of this kind of customization, we need a more specific discussion about the fact table measures, the dimensional attributes, the data sources, and the extract and load procedures necessary to produce an eRM clickstream data warehouse. These topics will be the subject of the next several articles.

Book Endorsements     Book Table of Contents    Book Authors    Referenced Material  
  Related Articles    Related Links    Download a Project Plan    Discussion Forum     Links    

© Copyright 2001, 2002 Clickstream Consulting, All Rights Reserved