Dynamically generate SQL in a cube

guillaume · February 15, 2019, 6:48pm

Context: I am connecting to hive, via the official ODBC driver of HDP. This driver tries to be smart, and fails quite miserably. To short-circuit the “smartness” I can ask it to pass the query as-is to hive, but if the query has parameters, (eg. placeholder in a manual SQL input in a Cube), then this fails as Hive cannot process question marks in a query.

After a lot of tinkering and swearing, I came to the conclusion that the best way for me would be to generate the SQL (maybe via a dundas script, or a python step, or maybe just by reading it from disk (where a python step could have generated it :))) and feed it to the sql step.

Is it at all possible to do something like that? My feeling is that’s it’s not possible, but I would love to be shown that I am lying.

Thanks,

guillaume · February 22, 2019, 7:35am

Not the resolution I wished for, but I still put it here for posterity.

Basically I could not do what I wanted, so I ended up paying for a driver which actually works, making the use case in my question not applicable any more.

I will keep this question in mind, though, because I would really like to find out a way to do it.

christian.wenzel · August 9, 2019, 10:39am

What maybe could help you is that it is possible to have a SQL Command as a string and pass this to the cube and just let it be executed. At least with MS SQL something like this works, if you disable the sub query optimization.

exec(‘SELECT 1 AS ONE’)

adrian.dobrin · October 1, 2019, 5:54pm

There is a Manual SQL transform that allows the user to input its own string command, and DBI will execute it as it is.

guillaume · October 2, 2019, 8:02am

Thanks Adrian, but I wanted to generate my own string, not just input it. For instance, I would like a python step to output SQL, which would then be executed by the next step.

adrian.dobrin · October 2, 2019, 1:10pm

This is not possible out of the box, as all SQL statements (generated or user-input) must be part of inception transforms (first transform in the chain).

However, all transforms are written as plug-ins based on a dedicated API, so it is possible to write a new inception transform that uses Python inside to generate SQL, and then subsequently uses this SQL to query a database table.

david.glickman · October 2, 2019, 1:18pm

Can’t the python execute SQL?
With enough python code surely it could go anywhere and fetch anything?

adrian.dobrin · October 2, 2019, 2:07pm

Of course, you are right.

guillaume · October 3, 2019, 6:12am

Indeed python could do anything (even invoke the Dundas API to modify itself I suppose. Inception here I come).

So if only there was a way to output from python itself column names and maybe types as well, instead of just a data dump which will have useless column names, that would be great.

To explain what I mean:
Currently, to output data with python, you need to return an array of columns: return [col1_array, col2_array, col3_array]. The output elements are named f1, f2 and so on (if I recall well).

If only I could return a dictionary instead: return {'name1': col1_array, 'name2': col2_array} this would make the life of any ETL’er much easier, especially for autogenerated data: you might not know in advance the name or amount of columns.

If on top of that it would be possible to give a type as well, this would finally be really useable. for instance

 return { 
  'name1': {
    'data': col1_array,
    'type': 'datetime'
  }, 'name2': {
    'data':col2_array,
    'type':'int'
}

adrian.dobrin · October 3, 2019, 1:19pm

This is not possible, as it would mean a backwards-incompatible change in the API.

guillaume · October 4, 2019, 5:33am

I could very imagine a backward compatible way of handling that, based on type of the returned data structure.

I’m not actually asking anyway because I do not need it currently, but having this option would open a huge amount of possibilities. Currently the python step is useful only if there are a tiny number of output columns, and I am indeed only using it as a quick data generator for tests.

david.glickman · October 4, 2019, 8:30am

And I would add to that a request to be able to reference all the input columns without having to create placeholders for them all.