Dynamically generate SQL in a cube

Context: I am connecting to hive, via the official ODBC driver of HDP. This driver tries to be smart, and fails quite miserably. To short-circuit the “smartness” I can ask it to pass the query as-is to hive, but if the query has parameters, (eg. placeholder in a manual SQL input in a Cube), then this fails as Hive cannot process question marks in a query.

After a lot of tinkering and swearing, I came to the conclusion that the best way for me would be to generate the SQL (maybe via a dundas script, or a python step, or maybe just by reading it from disk (where a python step could have generated it :))) and feed it to the sql step.

Is it at all possible to do something like that? My feeling is that’s it’s not possible, but I would love to be shown that I am lying.

Thanks,

Not the resolution I wished for, but I still put it here for posterity.

Basically I could not do what I wanted, so I ended up paying for a driver which actually works, making the use case in my question not applicable any more.

I will keep this question in mind, though, because I would really like to find out a way to do it.

What maybe could help you is that it is possible to have a SQL Command as a string and pass this to the cube and just let it be executed. At least with MS SQL something like this works, if you disable the sub query optimization.

exec(‘SELECT 1 AS ONE’)

There is a Manual SQL transform that allows the user to input its own string command, and DBI will execute it as it is.

Thanks Adrian, but I wanted to generate my own string, not just input it. For instance, I would like a python step to output SQL, which would then be executed by the next step.

This is not possible out of the box, as all SQL statements (generated or user-input) must be part of inception transforms (first transform in the chain).

However, all transforms are written as plug-ins based on a dedicated API, so it is possible to write a new inception transform that uses Python inside to generate SQL, and then subsequently uses this SQL to query a database table.

Can’t the python execute SQL?
With enough python code surely it could go anywhere and fetch anything?

Of course, you are right.

Indeed python could do anything (even invoke the Dundas API to modify itself I suppose. Inception here I come).

So if only there was a way to output from python itself column names and maybe types as well, instead of just a data dump which will have useless column names, that would be great.

To explain what I mean:
Currently, to output data with python, you need to return an array of columns: return [col1_array, col2_array, col3_array]. The output elements are named f1, f2 and so on (if I recall well).

If only I could return a dictionary instead: return {'name1': col1_array, 'name2': col2_array} this would make the life of any ETL’er much easier, especially for autogenerated data: you might not know in advance the name or amount of columns.

If on top of that it would be possible to give a type as well, this would finally be really useable. for instance

 return { 
  'name1': {
    'data': col1_array,
    'type': 'datetime'
  }, 'name2': {
    'data':col2_array,
    'type':'int'
}
1 Like

This is not possible, as it would mean a backwards-incompatible change in the API.

I could very imagine a backward compatible way of handling that, based on type of the returned data structure.

I’m not actually asking anyway because I do not need it currently, but having this option would open a huge amount of possibilities. Currently the python step is useful only if there are a tiny number of output columns, and I am indeed only using it as a quick data generator for tests.

And I would add to that a request to be able to reference all the input columns without having to create placeholders for them all.