join tables on overlapping time spans.

I try to find in Dundas how I could manage the following scenario:

I have:

  • a Table A of holidays with start/end
  • a Table B of staff with their holidays
  • I would like to generate a Table C, which tells for each staff in which ‘holidays period’ their holidays are. For that, the ‘staff holidays’ should be included within the ‘holidays period’.
    ie: staff_holiday_start_date >= holiday_period_start_date AND staff_holiday_end_date <= holiday_period_end_date

Table A: Holidays

holidays period start_date end_date
spring holidays 2022-03-01 2022-03-30
summer holidays 2022-07-01 2022-07-30
winter holidays 2022-12-01 2022-12-30

Table B: Staff holidays

staff start_date end_date
alice 2022-12-03 2022-12-05
bob 2022-11-10 2022-12-13
john 2022-03-20 2022-03-25

Table C: Resultant table

staff holidays
alice winter holidays
john spring holidays

Bob doesn’t show, because his holidays are not included fully in one of the holidays of the Table A.

I haven’t found what is the way to achieve that in Dundas datacubes.

Hi @matthieu

Could you possibly do it in the query?

I’m not sure but would this work? SELECT b.staff
FROM table a
INNER JOIN table b ON a.start_date <= b.start_date
and a.end_date >= b.end_date

Thanks for your answer Derek.

Actually my tables come from two different sources (one datacube and one json API). So I am not able to do it at the query level.

Hi Matthieu,

In this scenario, since for each row of staff, there should be three comparisons done (for Alice, compare dates for spring, summer and winter, for Bob, compare dates for … and so on), a cartesian join is required. Unfortunately, it is not yet supported in Dundas cubes - we do have a Feature Request for this.

However, a cartesian join is equivalent to an inner join: table_1 CROSS JOIN table_2 is table_1 INNER JOIN table_2 ON 1 = 1

  1. So, for the table in cube, you can add a Calculated Element Transform to return just 1 for an Integer type key column. For the data fetched via API, you can either do the same or if you are using Python to connect to the API, you may just append a field containing 1 in the Dataframe itself and return. Next, use Inner join between these two datasets using the newly created constant key fields. This will give you the following.
    You will have to add another Calculated element Transform to now compare between the dates and return, let’s say 1 for satisfied conditions and 0 otherwise.

  2. Or if you prefer, you can drag the Staff table first. then use a Python Analysis Transform to fetch data from API, join, do analysis and return everything as a Dataframe.

1 Like

Hi Upasana,
I followed your first solution (1), and it worked! thanks!