Max number of transforms/connectors in a cube

mark.josephson · September 4, 2018, 3:55pm

Is there an upper limit on the number of transforms and or connectors that can be brought into a cube before reliability suffers?

I have about 40 data connectors I need to get in a dashboard. Each one only has 2-3 metrics and 2-3 hierarchies. It is getting cumbersome to work with in one cube and even small changes created waterfalling failures in the cube that results in a lot of extra work. I was wondering if it would be better technically to break it up into 5-6 sub cubes and then either union those cubes into a super cube or feed my dashboard from the 6 cubes. Which would be better/faster/more reliable?

Thanks.

ariel.pohoryles.1 · September 4, 2018, 3:55pm

Quick questions:

What is(are) your data source(s) - depending on the source you may get different recommendtaions.
Do you need to join the data together and display it on the same visual or just side by side on the same dashboard?

mark.josephson · September 4, 2018, 3:55pm

The data is coming from our proprietary connector, the PowerTools Platform DataProvider. We are unioning the one month of hourly data from 8 sources (lets collectively call them "source"), then joining it on 2 dimensions to the master where we are comparing 2 values between source and maser to ensure they match.

In the dashboard we have a flat table that displays the values for source and master and have a state to highlight any mismatches.

ariel.pohoryles.1 · September 4, 2018, 3:55pm

sounds like your data structure isn't expected to change and so to reduce the cube complexity I would indeed recommend breaking some of the manipulations to several data cubes and then use another cube that leverages those. That way, if you need to make a change in one of the data cubes it shouldn't break the rest of the process and thus make it easier to manage and maintain.

I’m not sure how your custom data connector was developed but you could potentially reduce the complexity by re-engineering that or move the data calls to a Python data generator transform. The goal here would be to reduce the number of data connections you need to establish but again – it may not be applicable in your case – I just don’t know enough about it.

devbyrd · September 4, 2018, 3:55pm

Follow up question:

If the "master" cube is on a schedule to build once every hour (or some frequency) will it kick off a build of each of the nested cubes? Or do we need to structure the schedules to build in sequence?

ariel.pohoryles.1 · September 4, 2018, 3:55pm

It will never kick off a build of the nested cubes. If the nested cubes are stored in the warehouse - the master cube will take the results from the warehouse (which is updated accoridng to the schedule on the nested cubes). If the nested cubes are not stored in the warehouse or are stored in-memory, it will process the logic of those data cubes (as if the storage was none) withing the master cube when it's executed. You can read about it here.