Cascading data cubes - build errors

ken · July 6, 2021, 8:24pm

Hey everyone, I have one data cube that queries raw data and stores it in the warehouse.

Then I have another cube that uses the data from that first cube to do a bunch of things, and stores in a warehouse.

Sometimes, when I try to build both of these cascading data cubes, the second one fails because the first isn’t done building (yes, I’m starting #1 first, then #2). But most times - including the scheduled daily runs - I can start the first one building, and then the second one a moment later, and they run fine.

Is there a best practice for handling these types of cube builds?

david.glickman · July 7, 2021, 8:38am

I’d like to know this as well, it’s something we do often.

Another feature request I have is for some way to chain the builds. So I would set that when cube #1 finishes then cube #2 should start, and then I could click once and the whole cascade would process in order.

ole.christian.valstad · July 7, 2021, 10:16am

Second this also. It would be an amazing feature if we could chain the build of cubes

satya.sankini · July 7, 2021, 1:58pm

Hi Everyone,

We understand a feature like this would be of a great help. However, I’m not sure of the version of DundasBI you’re using currently but, I’m happy to inform that our R&D team has addressed this issue in version 9, with better support!

david.glickman · July 7, 2021, 2:07pm

Great.
When is the beta of version 9 due out?

satya.sankini · July 7, 2021, 2:25pm

Hey David,

We are planning to have it released sometime around mid/end of third Quarter this year.

steve.glaeser · July 7, 2021, 8:06pm

If you don’t want to wait for release 9, you could do this the old fashioned way by building a dashboard with a button that sequences the data cube warehousing you need. Here’s an example of 10 date cubes built in a required sequence from my library of tricks (this is the script for the click action on the button):

//get the data cube service
var dataCubeService = this.getService(“DataCubeService”);
//build the data cube in warehouse using the data cube id
//Parameter ‘true’ if the data cube is to be built in-memory
//‘false’ if the data cube is built in the warehouse
dataCubeService.buildStorage(“e6731a14-c5d3-48d6-a3d7-3df4e1df165d”,false);
setTimeout(function(){ dataCubeService.buildStorage(“be7899f6-b129-437e-8cb2-2a275c666e97”,false);}, 120000);
setTimeout(function(){ dataCubeService.buildStorage(“89774a91-0a74-4804-bdea-5e71309b7a00”,false);}, 150000);
setTimeout(function(){ dataCubeService.buildStorage(“6851d309-ce30-46a9-80d0-cf939509b69b”,false);}, 180000);
setTimeout(function(){ dataCubeService.buildStorage(“5c96791f-115e-414a-98a2-8d2772207b40”,false);}, 210000);
setTimeout(function(){ dataCubeService.buildStorage(“fc3fc3f4-29e2-46d7-8847-0c0e94a0b1f9”,false);}, 240000);
setTimeout(function(){ dataCubeService.buildStorage(“bd8bd2fe-078a-4f75-b91e-1aad96fb12ac”,false);}, 270000);
setTimeout(function(){ dataCubeService.buildStorage(“f20f2227-37a7-4d46-b3fe-da796df89349”,false);}, 300000);
setTimeout(function(){ dataCubeService.buildStorage(“b4112e28-138e-41ed-b904-a5966a4e5e73”,false);}, 330000);
setTimeout(function(){ dataCubeService.buildStorage(“96c9edb8-0f70-4b4d-b61b-5e83502810c1”,false);}, 360000);
setTimeout(function(){ dataCubeService.buildStorage(“1a6191b3-9ed1-4e1b-a858-9f6a5c88a5b6”,false);}, 420000);

Check it out - works today in 7.0 and 8.0. As a caution, remember to leave the browser tab open while these 10 builds are executing. When you dismiss the browser tab (dashboard), there goes the punch list! My start/delay times were timed from the actual build times, they all are relevant to the start time of the button click (not concurrent). Credit to Jay Gong, I believe.

ken · July 7, 2021, 9:04pm

Much appreciated, but this isn’t a build on the dash level. I’m in the cube trying to get it to build and it won’t work. It was working most of the time and all of a sudden stopped. I figured I had just gotten lucky in the past and would ask what the proper way to do it is, but I guess this is a known problem!

david.glickman · July 7, 2021, 10:22pm

This script is something I could do, and I would even improve on it by checking if the first cube had finished before starting the second.
Problem is that it relies on the page staying open (in my case for over an hour) so would not really work.
I need a fire and forget method.

I was thinking of writing back to a cube, then the next time anyone loaded the page checking that and going from there, but I reckon I’ll stick it out until version 9.

steve.glaeser · July 7, 2021, 10:26pm

Fair enough, my use is also for dependent metric sets and dashboards. However, the dashboard and button here is simply a method to build multiple cubes in the correct sequence, it is not at all a user oriented experience. I wrote the mini dashboard for my boss for while I’m on vacation and I didn’t need to fuss about the proper sequence. The use covers several areas of the enterprise, and I’ll be dazzled when Dundas creates a data cube awareness to keep this within the data cube family only.

ken · July 9, 2021, 3:51pm

So, it turns out I needed this after all! I built a refresh button for one of my critical visualizations, but it didn’t work just doing table1.loadData() since the underlying cube needed to be re-run, too. Used your code and BOOM, it worked! Thanks, Steve!

ken · February 2, 2022, 4:45pm

Hi Satya, I see that ‘batching’ got rolled out with version 9. I’m looking at a brief description here.

I -think- this would work for what I need but I’m not sure I’m clear on how this runs. If I have Cube 1 that needs to run and complete before Cube 2 can be fired, how would I set that up?

Also, if I use the API to trigger Cube 1 ONLY, will it automatically fire off Cube 2 when it’s done?

@david.glickman @ole.christian.valstad @steve.glaeser - are any of you using this new batching feature in 9 yet?

david.glickman · February 2, 2022, 5:19pm

@ken - I did try to use the batching feature, but didn’t manage to work out how to set it up properly.
It always seemed to run both simultaneously when I clicked ‘run’ after setting up the batch.

If someone can provide a more detailed user guide, I would be most grateful.

steve.glaeser · February 2, 2022, 6:41pm

I have not yet used the batching feature, and I have been on release 9.0 since last June. I really have more of a business need to sequence the building of dependent data cubes than getting them all to execute a warehouse build concurrently. The only way I know to do that controlled execution remains in the format I contributed (way earlier) on this topic.

ole.christian.valstad · February 2, 2022, 9:02pm

I have been playing around with in version 9, but I also find it a bit confusing to use…

From what I understand if you have Cube A and B and Cube A need to run before Cube B, Cube A needs to be referenced by cube B for this to happen in batching.

But its super confusing that job details now only show for sequence job, instead of running two jobs in sequence.
Also I have the issue that I have 3 Cubes, where Cube A and B need to run before C (C use both A and B as source, A and B are independent of each other), but Cube A and B are too heavy to run at the same time…

If this is actually possible in version 9 I also hope someone can provide a more detailed guide.
I also hope that Dundas can expand this functionality going forward for instance:

let us set up sequence of jobs in any way we see fit, no matter if the cubes reference each other or not - and that we can track each of these jobs in sequence independently in the job details
That we can set up conditions between jobs, e.g if job A fails - there’s no need to run job B

steve.glaeser · February 2, 2022, 9:52pm

If I were in Development’s shoes, I’d want to understand the timeline. You can always schedule the warehouse builds chronologically for the ‘dark of night’ time window. My business case required a mid-day ‘on demand’ build of dependent cubes. So Development may be assuming the ‘dark of night’ issues are what they need to solve, but I experience you commenting on the ‘on demand’ sort of warehouse. It may not have been clear from my initial comment (a while ago), that I was building a button and thus a button script to build my warehoused cubes in a literal sequence. The batching development indicates the assumptions are not about a dashboard button script.

david.glickman · February 3, 2022, 9:35am

The ‘dark of night’ is a difficult time to pinpoint when you have global clients. It’s always daytime somewhere on the spinning ball that we live on. (Apologies to any flat-earthers out there!)

I have spent literally hours setting up such a system. Find out the exact order that cubes need to run in. Find out the average time that they take to run. Start at the beginning and set the schedules accordingly for each of 40+ cubes.
But then the cube grows and takes longer, or someone puts a load on the system by running something else, and the whole system gets out of whack.

And for an on demand run, that system doesn’t work either. Whoever kicks it off will need to know the exact order.

adrian.dobrin · February 3, 2022, 1:09pm

Let’s have Cube1, Cube2 and Cube3. They are unrelated. Cube2 has CubeA and CubeB as dependencies.
The user wants to make sure the order of the build is 1=>2=>3.
The user must create a batch job. In this batch job, the selection must consist of Cube1, Cube2 and Cube3, in this specific order.
A single batch job will created. The result is that all 3 cubes will show up a building at the same time, but in fact they are build according to the sequence.
However, Cube2 has dependencies, and the “dependencies build first” checkbox has been checked.
Therefore the final order of the build is as follows:
Cube1, CubeA, CubeB, Cube2 and ends with Cube3.

ole.christian.valstad · February 3, 2022, 1:21pm

Thanks for the description Then it works as I hoped it would and at least my confusion originated from this:

The result is that all 3 cubes will show up a building at the same time, but in fact they are build according to the sequence.

Therefore the final order of the build is as follows:
Cube1, CubeA, CubeB, Cube2 and ends with Cube3.

Now my question / Feature Request is as follows:

Can we have so that if any Cube fails any subsequent cubes in the batch order will not run
Would it be correct if I
A) Would want Cube A & B to run in parallell - I would set up batch order Cube 1 -> 2 -> 3
B) Would want Cube A & B to run in sequence - I would set up batch order Cube 1 -> A -> B -> 2 -> 3 and then NOT check the “dependencies build first” checkbox?

adrian.dobrin · February 3, 2022, 2:21pm

This would be a feature request, quite easy to implement.

In a batch/sequence job, NO cubes are run in parallel.

As they are part of a sequence job, they cannot run in parallel, but certainly your proposal works for the stated goal.