Cascading data cubes - build errors

david.glickman · July 7, 2021, 10:22pm

This script is something I could do, and I would even improve on it by checking if the first cube had finished before starting the second.
Problem is that it relies on the page staying open (in my case for over an hour) so would not really work.
I need a fire and forget method.

I was thinking of writing back to a cube, then the next time anyone loaded the page checking that and going from there, but I reckon I’ll stick it out until version 9.

steve.glaeser · July 7, 2021, 10:26pm

Fair enough, my use is also for dependent metric sets and dashboards. However, the dashboard and button here is simply a method to build multiple cubes in the correct sequence, it is not at all a user oriented experience. I wrote the mini dashboard for my boss for while I’m on vacation and I didn’t need to fuss about the proper sequence. The use covers several areas of the enterprise, and I’ll be dazzled when Dundas creates a data cube awareness to keep this within the data cube family only.

ken · July 9, 2021, 3:51pm

So, it turns out I needed this after all! I built a refresh button for one of my critical visualizations, but it didn’t work just doing table1.loadData() since the underlying cube needed to be re-run, too. Used your code and BOOM, it worked! Thanks, Steve!

ken · February 2, 2022, 4:45pm

Hi Satya, I see that ‘batching’ got rolled out with version 9. I’m looking at a brief description here.

I -think- this would work for what I need but I’m not sure I’m clear on how this runs. If I have Cube 1 that needs to run and complete before Cube 2 can be fired, how would I set that up?

Also, if I use the API to trigger Cube 1 ONLY, will it automatically fire off Cube 2 when it’s done?

@david.glickman @ole.christian.valstad @steve.glaeser - are any of you using this new batching feature in 9 yet?

david.glickman · February 2, 2022, 5:19pm

@ken - I did try to use the batching feature, but didn’t manage to work out how to set it up properly.
It always seemed to run both simultaneously when I clicked ‘run’ after setting up the batch.

If someone can provide a more detailed user guide, I would be most grateful.

steve.glaeser · February 2, 2022, 6:41pm

I have not yet used the batching feature, and I have been on release 9.0 since last June. I really have more of a business need to sequence the building of dependent data cubes than getting them all to execute a warehouse build concurrently. The only way I know to do that controlled execution remains in the format I contributed (way earlier) on this topic.

ole.christian.valstad · February 2, 2022, 9:02pm

I have been playing around with in version 9, but I also find it a bit confusing to use…

From what I understand if you have Cube A and B and Cube A need to run before Cube B, Cube A needs to be referenced by cube B for this to happen in batching.

But its super confusing that job details now only show for sequence job, instead of running two jobs in sequence.
Also I have the issue that I have 3 Cubes, where Cube A and B need to run before C (C use both A and B as source, A and B are independent of each other), but Cube A and B are too heavy to run at the same time…

If this is actually possible in version 9 I also hope someone can provide a more detailed guide.
I also hope that Dundas can expand this functionality going forward for instance:

let us set up sequence of jobs in any way we see fit, no matter if the cubes reference each other or not - and that we can track each of these jobs in sequence independently in the job details
That we can set up conditions between jobs, e.g if job A fails - there’s no need to run job B

steve.glaeser · February 2, 2022, 9:52pm

If I were in Development’s shoes, I’d want to understand the timeline. You can always schedule the warehouse builds chronologically for the ‘dark of night’ time window. My business case required a mid-day ‘on demand’ build of dependent cubes. So Development may be assuming the ‘dark of night’ issues are what they need to solve, but I experience you commenting on the ‘on demand’ sort of warehouse. It may not have been clear from my initial comment (a while ago), that I was building a button and thus a button script to build my warehoused cubes in a literal sequence. The batching development indicates the assumptions are not about a dashboard button script.

david.glickman · February 3, 2022, 9:35am

The ‘dark of night’ is a difficult time to pinpoint when you have global clients. It’s always daytime somewhere on the spinning ball that we live on. (Apologies to any flat-earthers out there!)

I have spent literally hours setting up such a system. Find out the exact order that cubes need to run in. Find out the average time that they take to run. Start at the beginning and set the schedules accordingly for each of 40+ cubes.
But then the cube grows and takes longer, or someone puts a load on the system by running something else, and the whole system gets out of whack.

And for an on demand run, that system doesn’t work either. Whoever kicks it off will need to know the exact order.

adrian.dobrin · February 3, 2022, 1:09pm

Let’s have Cube1, Cube2 and Cube3. They are unrelated. Cube2 has CubeA and CubeB as dependencies.
The user wants to make sure the order of the build is 1=>2=>3.
The user must create a batch job. In this batch job, the selection must consist of Cube1, Cube2 and Cube3, in this specific order.
A single batch job will created. The result is that all 3 cubes will show up a building at the same time, but in fact they are build according to the sequence.
However, Cube2 has dependencies, and the “dependencies build first” checkbox has been checked.
Therefore the final order of the build is as follows:
Cube1, CubeA, CubeB, Cube2 and ends with Cube3.

ole.christian.valstad · February 3, 2022, 1:21pm

Thanks for the description Then it works as I hoped it would and at least my confusion originated from this:

The result is that all 3 cubes will show up a building at the same time, but in fact they are build according to the sequence.

Therefore the final order of the build is as follows:
Cube1, CubeA, CubeB, Cube2 and ends with Cube3.

Now my question / Feature Request is as follows:

Can we have so that if any Cube fails any subsequent cubes in the batch order will not run
Would it be correct if I
A) Would want Cube A & B to run in parallell - I would set up batch order Cube 1 -> 2 -> 3
B) Would want Cube A & B to run in sequence - I would set up batch order Cube 1 -> A -> B -> 2 -> 3 and then NOT check the “dependencies build first” checkbox?

adrian.dobrin · February 3, 2022, 2:21pm

This would be a feature request, quite easy to implement.

In a batch/sequence job, NO cubes are run in parallel.

As they are part of a sequence job, they cannot run in parallel, but certainly your proposal works for the stated goal.

ken · February 3, 2022, 5:31pm

@ole.christian.valstad Based on Adrian’s response, I think it makes the most sense to manually trigger any cubes you want to run in parallel, for now at least. I have the same situation, so that’s what I’ll do. I have the API call each cube one at a time right now, which isn’t working (since the dependency isn’t completing before the downstream cube fires).

david.glickman · February 4, 2022, 9:16am

I think what we need is a ‘pending’ or ‘queued’ or something in the jobs list. Currently they all show up as running - if I understand Adrian properly - and that is confusing/worrying.

adrian.dobrin · February 4, 2022, 3:19pm

The status belongs to the job. As this is a single sequential job, its status is shared by all the cubes contained within.

ole.christian.valstad · February 4, 2022, 6:47pm

I understand that is how it is implemented, but I think @david.glickman, @ken and myself are asking if in the future we can expand on this so that it is more easy to keep track of each individual data cube job within a sequential job.

If I had 10, 20 or 50++ cubes I would want to run sequentially on a schedule it would be very useful to able to see the status and the statistics for each single data cube in the job overview.

Don’t get me wrong here, the batching feature introduced in version 9 is great and I’m very happy to see Dundas put more and more functionality for data cubes - as the “ETL Layer” is great and part of what makes Dundas unique, I/we are just asking for this to evolve further

ken · February 4, 2022, 10:08pm

Just to be clear:

Batching can be used to run cubes manually from the UI
Batching can be used to set up timed schedules
But batching cannot be used to set up runs via the API

Is that right?

ken · February 4, 2022, 10:15pm

Just to be clear - my issue is that I trigger the cube runs via the API.

So if Cube 1 needs to run before Cube 2, then I have to call the API to trigger Cube 1, then keep the process open and waiting for it to complete (can be a long time) before I can let the API call Cube 2 to run.

That’s the problem I am trying to solve in my use case. The wait time is moderate currently, but could be prohibitive in future use cases.

adrian.dobrin · February 7, 2022, 12:04pm

All DBI functionality is API driven, so sequential jobs are API triggered as well.

ken · February 7, 2022, 1:38pm

I thought so, but cannot find a way to properly trigger via API. The dev console shows a post being made but it ties to a session and I’m unsure how it creates that. I can’t find anything in the documents about how to do it. And I called support on Friday and they were unsure, too. Can you assist in getting that call working from the API?