Add an auto-increment column


(Octavian Popescu) #1

Hello,


I have a Cube done with ETL process with some joins. calculated elements and aggregates.

I need to add a column right before the Process result, with a primary key (distinct and not null), done within the ETL layer (not in the manual select).


Any idea on how this can be done?

PS: tried the rank function, added a sort of PK but i cannot use the column in a Join (it doesn't join with the other table)


Thanks!


(James Davis) #2

Apicture of your Data Cube might help us understand what you are trying to do.

Also I am not sure why you need to do it.


"done within the ETL layer (not in the manual select)", I think yoiu mean done with the one of the "Insert Common" or "Insert Other" data transformation tools in the Data Cube.


I can not at the moment find it but there was a webinar or video that might have something like what you are asking for. It had to do with comparing flower species, I think it might be the python or R.

Does anyone remeber and know what video that is (Ariel, Jordan, Dundas Support)?


(James Davis) #3

This might work

https://dundas.influitive.com/discussions/topics/74260?page=0


just have it start at 1 and then add1 instead of what in the post.


(Octavian Popescu) #4

Thanks for the article. What I don't know is what formula to insert in the calculated element so it can generate an auto-increment primary key.


(Octavian Popescu) #5

https://dundas.com/learning/webinars/10-07-2018-10-advanced-data-visualizations-_-understand-your-data-faster


I think this is the video you are talking about. (starting at min. 25).


So this is my cube:

What I want to achieve is almost the exact same thing as in the video, but the problem is that my data is dynamic (rows can be added).


In my cube I have 'n' colomns (no primary key) and in the data input cube I have a 'status' column. (no PK).

In order to join my cube and the data input cube, I need to have a primary key, which I am trying to generate (and don't know how) and add in my cube. After I learn how to generate it, I will copy it into the input cube, and then I will be able to do the join.


After all this is done, I will encounter another problem. If in my cube, a row will be added and a pk will be assigned for it, how will this particular pk be inserted automatically in the input cube?


Thanks!


(James Davis) #6

To do a join you usually need a PK and FK. Those need to come from the source.

Even if it is not an ID column it needs to be something that says this row belong to that rows in the other source/table.

I have two sources (that I am currently working on) that do not naturally have Key relationship but they deal with the same data just in a different fashion.


if no keys the look for person name, car make and model, flavor of soda and serving size, not sure what you data is but if you are trying to join it certainly they are related in some way.


Let explain mine a little more.

We have a lot of SSIS packages doing a lot of ETL. and we use Bi Xpress to do some monitoring of what is running did it finish and how long it took how many rows got read and insterted, excetra.

We also want to graph the system agent job information as well, for the ones that run these SSIS packages.

Now I created a view made up of the system tables data I want to graph and I have a query to the Bi Xpress table for the data we want to graph.

But there no natural key bewteen these two sources because well one does not really care about.

I had to find data that was the same in both sources to be able to join them.

I found that in some form in both sources there was the name of the SSIS package, but it was not that easy. In one table the package name was in the middle of a varchar so ai had to extract it to a new column. This is what I will use to join them.


You should not just add a column with made up numbers to both table to join them, doing that does not match rows that belong together.


You can Join in the Data Cube, but there is nothing in there that will help you find what to join on and making something just seams like the wrong way to go.


Go back to the data and see if there anything that can be used to match the orws up. if you need to combine a few things together or extract one part out of one, all that can be done in the Data Cube with calulated element.


(Octavian Popescu) #7

Thanks for the response, James!


The input cube is actually a blank cube that is populated with values that are introduced in the dashboard.

Along with a table that is represented by the cube I've showed you in the screenshot earlyer, lets say I have a dropdown in my dashboard with 3 status choices (OK, Not OK, Unassigned) and a text box in which I enter a number (the PK), which I can find in the dashboard table, in the first column.

When I choose one of the selections, that selection will be written in the input cube, along with the PK number I've entered.

So when the join is made, the process result will show only one table, with the status I have selected corresponding to the row that has the respective PK.


This is why I need to generate a PK in the cube...


(James Davis) #8

The missing peice of information! That make a lot more sence now.

Let me think on this.

So you need to link/join the inputed data with data that already exist? is that correct?


I am still leaning towards when you input this new data have something that is chosen out of the existing data that will make that connection, unless that is the (OK,Not OK, Unassigned).


That is not the video, the one I am remebering was all about flower species, found a link but now not sure if it useful for you.

https://www.dundas.com/support/learning/documentation/cleanse-consolidate-modify-data/how-to/clustering-predictions-using-r-language-analysis


The first link I gave where it uses a value of previous row to make the next value (there it is minusing but you will just + 1) I think might be your best bet.


I myslef have asked very specific questins on here and we do this to protect our company data, but usually it is better to be more genral and give lots of details for better understanding of the problem becuase there might be a different/better solution to the problem than the one we ask how to do. might not be in this case.





(Octavian Popescu) #9

''So you need to link/join the inputed data with data that already exist? is that correct? ''

Exactly!

In my case I doo need a PK, because the data in the table might be repetitive, so I cannot link to it.

The ideal case would be to automatically update the input cube with the PK from my cube as new rows are inserted and pk's are generated.


I will try to be more specific next time :)


(James Davis) #10

When they are inputing the data can you have them pick what it links to?


Yes it is always easier to figure things out when you can see the actual process and all the parts involved (that why Dudas suppoort uses goto meeting, my support tickets never get going untill we do a screen share I always leave something out too).


(Ariel Pohoryles) #11

You can probably achieve a solution with the design that you have in mind using the following approach:

  • Create PK column on your cube flow that represent your source data (not the data input cube). If you expect to be able to always sort the source data in a way that will return new inserted rows at the bottom than you can use the Dundas BI built in Rank transform. If the sort order is expected to change then you can use the Python transform to create the logic that you need to ensure that new values are inserted only to new rows in the K column.
  • Your data input flow of the cube will have 2 columns: JoinCol (that is the column you will use for the join with the PK column in the source flow) and the status column. Both empty by default.
  • Use a left join to join the 2 flows. You can then have a calculated column to return a certain status value if the status value is Unknown (meaning it was never set by the users on the dashboard).
  • Then on the dashboard, when the users set the status value they want (right it back to the data input) you would write back the status value they choose as well as the PK value from the source flow into the JoinCol of the data input.





(Octavian Popescu) #12

Thanks for the formula Ariel. I tried to play with the Rank option but I noticed it doesn't work with the join (the join doesn't show any results after I apply all the above). Tried to join with another column and it worked so I guess the Rank is the problem...


(Ariel Pohoryles) #13

Can you please confirm the join doesn't show any results using a left join? If so, I suggest you move this matter over to the Dundas Support team.


(Octavian Popescu) #14

Hi Ariel,

It managed to work but only when I copied the rank column with a calculated element and transformed it into string. Also the column in the Data Input cube must be string format.
With double format, it didn’t work.

The next level thing would be to automatically update the PK column in the data input cube, so that i could have a status drop down on each row of my table/scorecard.

Thanks!


(Ariel Pohoryles) #15

Good. Note you don’t have to use a calculated element for converting the data type - you can use the data conversion transform instead and convert your rank measure to int and then your join should work.

As for updating the PK for the data input flow - is that still needed with my suggested steps above (using a left join approach)?


(Octavian Popescu) #16

Hi all,

In this webinar, at min 25:30, is shown what I want to achieve: (I got the link wrong several posts ago)

Dundas BI 5.0.1 - Intelligent. Fast. Personal - New features requested by you

So Ariel, if I’m not wrong, I need to update the PK in the data input, so I can have a dropdown on each row, as in the video at min. 25, every time a new row is added in my inital cube.

Basically I want to by pass the part where I have to insert the PK value, as you wrote above, and have it automatically inserted in the data input cube, depending on the row I am selecting the status on.

PS: the difference between the webinar example and mine is that my number of rows will be dynamic as the database will be populated with values while in the example there is a defined number of rows and values.

Thanks!


(Ariel Pohoryles) #17

I’m not sure how your approach would work but there is a probably a way to do so. Have you tried using my suggested approach and detailed steps? If so, is there a reason this approach won’t work for your case?