Ideas needed for how to speed up data display for large data sets

david.glickman · September 4, 2018, 3:56pm

Hey Guys

Due to one of my cubes needing a number of unpivots and loading lots of data, there are currently over 4 million rows! It takes quite a few minutes to load initially and for each filter change. I have warehoused the cube and the raw data in the cube preview comes pretty quick, it's the summary in the cube, or the charts in the dashboard that take ages.

There is some caching going on, because if you wait for it to load on the dashboard, then you filter to some parameter selections and then go back to the first parameters, it loads almost instantly. This cache doesn't last for long though.

I have looked at this page already https://www.dundas.com/support/blog/a-deeper-look-into-dundas-bi-s-in-memory-engine, but didn't come up with any concrete ideas for my situation. Increasing the cache time for this one cube might help, but is this possible, and what ramification will it have for dashboard speed / server load?

Our IT department say there is not enough memory on our server to do in-memory caching and really don't want to spend anything to buy more.

Is there anything else I can do to speed up at least the initial load time?

Thanks

Inactive-Member-45034791 · September 4, 2018, 3:56pm

Hi David,

There are a few methods that you can use to speed up data retrieval for data cubes. Changing the storage type to an In-memory type would be the first solution for improving performance. Another possible solution would be to de-select any un-used columns in your data cube. If your data doesn’t need to be shown at the same level of detail as the raw data, you can aggregate that data to the level of detail that you want to show. If there is a static filter being used on the dashboard, adding that filter to the cube as a transform should also give you some performance gain. Finally, making sure that the connection between the Dundas BI server and your machine is fast and the location of the server is as close as possible would help to ensure that the network isnt a limiting factor.

ariel.pohoryles.1 · September 4, 2018, 3:56pm

The cache you mentioned that makes it almost instant is the result cache and that is different than the in-memory data cube caching. The result cache is on by default for 20 minutes (configurable from the admin--> setup--> config page). What it does is save the result of the queries (or to be more accurate metric sets) so if another user (or even the same user) runs the same queries within that time window (i.e. 20 minutes) the data results will come back from the cache rather than forcing a new execution for the queries.

You could technically "warm up the cache" by setting a scheduled notification for every 20 minutes that will generate an image file of that dashboard to some kind of a folder but that would be a small fix for the initial load and won't help when the user changes the filter and queries for other data. It looks like a deeper investigation is required to see what your data cube is doing - 4M rows in the warehouse isn't necessarily a large volume that should be that slow. make sure you follow the info under building dashboards with performance in mind and especially take a close look at the performance measurements.

steve.glaeser · September 4, 2018, 3:56pm

David,

This may be stupid question, but are you aggregating all you can in the warehoused cube? Your problem desciption sounds as though the dashboard is doing some of the summary you want to show. I have a similar table that (grows 1.8 million rows every 6 months) that I've learned to aggregate as much as possible in the cube, summarizing by calendar dates at a raw minimum. You may be the only one that can determine the required aggregations. In-memory did not buy me enough of a performance benefit. Another performance hit is doing period-over-period metric-set calcs when the data is fluffy.

david.glickman · September 4, 2018, 3:56pm

Hi Steve

That is exactly the problem :(

I have to allow for the user to filter on many columns, as well as having multi-select filters. I don't think that I can aggregate at all on the cube. I will have another think about it...

steve.glaeser · September 4, 2018, 3:56pm

David,

Skating further out on thin ice, have you crafted hierarchies for related elements in your cube? For example, we use hierarchies in General Ledger views for the GL segments, and hierarchies in location (Region, District, Store Location) and inventory (the classes wouldn't make public sense). Those hierarchies also mitigate some preliminary needs to filter and lend themselves nicely to Slicers. From your example of record counts, I would assume you're working with financial or marketing data.