join tables on overlapping time spans.

matthieu · August 1, 2022, 6:41am

I try to find in Dundas how I could manage the following scenario:

I have:

a Table A of holidays with start/end
a Table B of staff with their holidays
I would like to generate a Table C, which tells for each staff in which ‘holidays period’ their holidays are. For that, the ‘staff holidays’ should be included within the ‘holidays period’.
ie: staff_holiday_start_date >= holiday_period_start_date AND staff_holiday_end_date <= holiday_period_end_date

Table A: Holidays

holidays period	start_date	end_date
spring holidays	2022-03-01	2022-03-30
summer holidays	2022-07-01	2022-07-30
winter holidays	2022-12-01	2022-12-30

Table B: Staff holidays

staff	start_date	end_date
alice	2022-12-03	2022-12-05
bob	2022-11-10	2022-12-13
john	2022-03-20	2022-03-25

Table C: Resultant table

staff	holidays
alice	winter holidays
john	spring holidays

Bob doesn’t show, because his holidays are not included fully in one of the holidays of the Table A.

I haven’t found what is the way to achieve that in Dundas datacubes.

derek.sell · August 1, 2022, 12:53pm

Hi @matthieu

Could you possibly do it in the query?

I’m not sure but would this work? SELECT b.staff
,a.holidays_period
FROM table a
INNER JOIN table b ON a.start_date <= b.start_date
and a.end_date >= b.end_date

matthieu · August 1, 2022, 1:06pm

Thanks for your answer Derek.

Actually my tables come from two different sources (one datacube and one json API). So I am not able to do it at the query level.

upasana · August 1, 2022, 3:29pm

Hi Matthieu,

In this scenario, since for each row of staff, there should be three comparisons done (for Alice, compare dates for spring, summer and winter, for Bob, compare dates for … and so on), a cartesian join is required. Unfortunately, it is not yet supported in Dundas cubes - we do have a Feature Request for this.

However, a cartesian join is equivalent to an inner join: table_1 CROSS JOIN table_2 is table_1 INNER JOIN table_2 ON 1 = 1

So, for the table in cube, you can add a Calculated Element Transform to return just 1 for an Integer type key column. For the data fetched via API, you can either do the same or if you are using Python to connect to the API, you may just append a field containing 1 in the Dataframe itself and return. Next, use Inner join between these two datasets using the newly created constant key fields. This will give you the following.
You will have to add another Calculated element Transform to now compare between the dates and return, let’s say 1 for satisfied conditions and 0 otherwise.

image.png761×378 14.1 KB
Or if you prefer, you can drag the Staff table first. then use a Python Analysis Transform to fetch data from API, join, do analysis and return everything as a Dataframe.

matthieu · August 4, 2022, 8:32am

Hi Upasana,
I followed your first solution (1), and it worked! thanks!