general (mis)understanding of the Data Warehouse jobs

tom.hagan · July 1, 2019, 6:24pm

I understand the benefit of using the Data Warehouse build option for data cubes - improved performance by creating a cached table in the DBI Warehouse database. So far, so good.

In general, do I need to consider the design of my data cube and return results that are more data warehouse-like? Meaning, rather than having a table with many user-friendly varchar columns, instead have the results only return surrogate keys or integer key values? If I do it 2nd way, then creating user hierarchies for each with the surrogate/key field and a friendly description field from a dimension table? Does the number of hierarchies matter?

I guess what I’m getting at is will I realize more performance gains if my data cube’s query is more proper star-schema than many variable data type fields (and should I spend the time testing this)? Or will Dundas just warehouse whatever you throw at it (after some review it seems that this is the case)?

If I look at the steps being performed, the Job Run Details display something along the lines of the following:

Fact table: [zd__]
-this is creating a table in the DBI warehouse database with the Results from the data cube for faster retrieval
Creating indexes for fact table
-this creates indexes on the table created in 1
Hierarchy ‘’, level ‘B.’: [zd_b_]
-this is caching the hierarchies included in the data cube as well.

james.davis · July 1, 2019, 7:15pm

I would say no, because I do not do that. and the Dundas Warehousing of a cube is more like a sudo Warehousing. If you look at the Dundas Warehouse table you will see it does all that for you.

as far as will it perform better if you do it first? I would guess only in cases where it cannot detect a relation ship in your data.
I would be interested in what one of the Programmer thoughts are on this.

jeff · July 2, 2019, 8:26pm

Good questions Tom/James.

The Dundas BI data warehouse will create a fact table with all of your columns indexed and at least one table for the hierarchies. The only exception to this is in the case of date hierarchies and user hierarchies.

In general, the guideline is that you should only be warehousing the data that you really need for performance. Where possible, you should try to pre-aggregate your data if you don’t need up to the second reporting. If your date data granular down to the second, pre-aggregating will make a huge performance difference for you. Also, in case of SQL Server, be aware that we cannot create index for any columns longer than 900 characters (generally any OLAP data is like this).