1

I have a pd df that looks like this:

tab_name col_name
CV_TAB1 TAB1_COL1
CV_TAB1 TAB1_COL2
CV_TAB1 TAB1_COL3
CV_TAB2 TAB2_COL1
CV_TAB2 TAB2_COL2

what I need is a python script that generates as many SQL scripts as are the different tab_names WITHOUT the CV_ prefix (in the example, 2 scripts) that create a view for every table with associated columns WITHOUT the TABX_ prefix. So:

'CREATE WIEV TAB1 AS (
SELECT 
TAB1_COL1 AS COL1,
TAB1_COL2 AS COL2,
TAB1_COL3 AS COL3
FROM CV_TAB1);'

And another one for TAB2.

Note that in the FROM clause the table name is WITH the CV_.

I've written the code that creates the script:

script = ""
i = 0
list_of_scripts = list()


for t in list_of_table_names:     
    if t['name'][:3] == 'CV_':
        table_name = t['name'][3:] # Removing CV_ from table name
        script += "DROP VIEW IF EXISTS " + str(table_name) + ";\n\n" + "CREATE VIEW "+ str(table_name) + " AS ( " + "\n" + "SELECT\n" # First part of SQL script
        for c in list_of_columns_out_of_pd_df:
            if c['columnName'][:len(table_name)] == str(table_name): # Check for the TABX_ prefix
                column_name = c['columnName'][len(table_name)+1:]
                if i == 0:
                    script += str(c['columnName']) + " AS " + str(column_name) + "\n"
                    i+=1
                else:
                    script += "," + str(c['columnName']) + " AS " + str(column_name) + "\n"
            else:
                if i == 0:
                    script += str(c['columnName']) + " AS " + str(c['columnName']) + "\n"
                    i+=1
                else:
                    script += "," + str(c['columnName']) + " AS " + str(c['columnName']) + "\n"


        i = 0
        script += "FROM " + str(t['name']) + ");"
        list_of_scripts.append(script)
        script = ""

what I am missing are the list_of_table_names and the list_of_columns_out_of_pd_df variables

Thanks!

4
  • Can you please share the script you have written? Commented Mar 29, 2022 at 14:25
  • Done, please note that the script sometimes refers to a previous structure where I was using dictionaries, but it turned out that due to some errors it was not the solution Commented Mar 29, 2022 at 14:36
  • @Federicofkt is the formatting important (going to the line for each colum, from etc...? Commented Mar 29, 2022 at 14:44
  • you could really use a template engine here, like Jinja Commented Mar 29, 2022 at 14:47

1 Answer 1

3

Here is a way using groupby.

l = []
for tb, cols in df.groupby('tab_name')['col_name']:
    # aggregate the cols with the wanted format
    _s = ',\n'.join([f'{col} as {col.split("_")[-1]}' for col in cols])
    # create the full query
    s = f'''CREATE VIEW {tb.split("_")[-1]} AS (\nSELECT\n{_s}\nFROM {tb});'''
    l.append(s)

print(l[0])
# CREATE VIEW TAB1 AS (
# SELECT
# TAB1_COL1 as COL1,
# TAB1_COL2 as COL2,
# TAB1_COL3 as COL3
# FROM CV_TAB1);

print(l[1])
# CREATE VIEW TAB2 AS (
# SELECT
# TAB2_COL1 as COL1,
# TAB2_COL2 as COL2
# FROM CV_TAB2);
Sign up to request clarification or add additional context in comments.

3 Comments

That's almost awesome, the fact is that sometimes I have as column name something like TAB1_XX_TT_CC and your script returns CC, instead I need XX_TT_CC
@Federicofkt you can add the parameter maxsplit in the split, try something like _s = ',\n'.join([f'{col} as {col.split("_", maxsplit=1)[-1]}' for col in cols]), you can add this parameter in the split on the table name tb if needed too
that's a great improvement, the fact is that sometimes I have columns without the TABX_ prefix that start with _, and the maxsplit deletes the first _. I've solved with an if else inside the brackets, and it's working fine, if you have a better suggestion even better. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.