How to automatically create table and its columns based on CSV using python

Ask Question

Asked 5 years, 3 months ago

Modified 5 years, 3 months ago

Viewed 2k times

This is the snippet for CSV

Column Header                   Values
LGA_CODE_2016                   LGA10050    
Median_age_persons              39  
Median_mortgage_repay_monthly   1421    
Median_tot_prsnl_inc_weekly     642 
Median_rent_weekly              231 
Median_tot_fam_inc_weekly       1532    
Average_num_psns_per_bedroom    0.8 
Median_tot_hhd_inc_weekly       1185    
Average_household_size          2.3

I have 200+ CSVs which has combination of datatypes such as Varchar, Integer, Float.
First column of every table must be Primary Key. (i.e LGA_CODE_2016 as above mentioned CSV)

Here is the code I tried

import csv
import psycopg2
import os
import glob
import re

conn = psycopg2.connect("host= hostnamexx dbname=dbnamexx user= usernamexx password= 
pwdxx")
print("Connecting to Database")

csvPath = "./TestDataLGA/"

# Loop through each CSV
for filename in glob.glob(csvPath+"*.csv"):
    # Create a table name
    tablename = filename.replace("./TestDataLGA\\", "").replace(".csv", "")
    print tablename

    # Open file
    fileInput = open(filename, "r")

    # Extract first line of file
    firstLine = fileInput.readline().strip()

    #Extract seconf line of file
    secondLine = fileInput.readline()

    # Split columns into an array [...]
    columns = firstLine.split(",")
    colvals = secondLine.split(",")
     

    # Build SQL code to drop table if exists and create table
    sqlQueryCreate = 'DROP TABLE IF EXISTS '+ " abs.ABS_" + tablename + ";\n"
    sqlQueryCreate += 'CREATE TABLE'+ " abs.ABS_" + tablename + "("

    # Define columns for table
    for column in columns:
        for dtype in colvals:
            dt = bool(re.match(r"^\d+?\.\d+?$", dtype))   
            if dtype.isdigit():
                dtype = "INTEGER"
                 
            elif dt == True:
                dtype = "FLOAT(2)"
                
            else:
                dtype = "VARCHAR(64)"
                
                
        sqlQueryCreate += column + " " + dtype + ",\n"
            
        
    sqlQueryCreate = sqlQueryCreate[:-2]
    sqlQueryCreate += ");"
    
    print sqlQueryCreate

    #cur = conn.cursor()
    #cur.execute(sqlQueryCreate)
    #conn.commit()
    #cur.close()

This is the output that I am getting

DROP TABLE IF EXISTS  abs.ABS_G02_AUS_LGA;
CREATE TABLE abs.ABS_G02_AUS_LGA(LGA_CODE_2016 FLOAT(2),
Median_age_persons FLOAT(2),
Median_mortgage_repay_monthly FLOAT(2),
Median_tot_prsnl_inc_weekly FLOAT(2),
Median_rent_weekly FLOAT(2),
Median_tot_fam_inc_weekly FLOAT(2),
Average_num_psns_per_bedroom FLOAT(2),
Median_tot_hhd_inc_weekly FLOAT(2),
Average_household_size FLOAT(2));
PS C:\Python27\Scripts>

If I run my inner For loop by itself I get correct set of Datatypes based on the CSV but when I am trying to run it with other For loop, it only prints the last generated datatype which is Float(2) for all column headers. I am also confused where to put code for Primary Key.

Can anyone help me fix this issue?

I have tried several permutations and combination of looping them and using Break command. But nothing seems to work.

PS: I am working on Test data hence just one CSV file can be seen here output. This is the continuation for my earlier question how to automatically create table based on CSV into postgres using python

edited Jul 28, 2020 at 4:11

asked Jul 28, 2020 at 3:58

Rose

511 silver badge8 bronze badges

Did you try pandas?

Peaceful
– Peaceful

2020-07-28 04:13:27 +00:00
Commented Jul 28, 2020 at 4:13
@Peaceful No, I am not much aware of Pandas. I am fairly new to Python. I want to continue with this as I sort of have started to understand it unless Pandas would fix this straightaway.

Rose
– Rose

2020-07-28 04:20:59 +00:00
Commented Jul 28, 2020 at 4:20
if you get only last item then you probably have wrong indentation and you run some code outside loop but you should run it inside loop.

furas
– furas

2020-07-28 05:26:48 +00:00
Commented Jul 28, 2020 at 5:26
first you should use print() to see values in variables and which part of code is executed. It is called "print debuging".

furas
– furas

2020-07-28 05:28:51 +00:00
Commented Jul 28, 2020 at 5:28

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How to automatically create table and its columns based on CSV using python

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked