This is the snippet for CSV
Column Header Values
LGA_CODE_2016 LGA10050
Median_age_persons 39
Median_mortgage_repay_monthly 1421
Median_tot_prsnl_inc_weekly 642
Median_rent_weekly 231
Median_tot_fam_inc_weekly 1532
Average_num_psns_per_bedroom 0.8
Median_tot_hhd_inc_weekly 1185
Average_household_size 2.3
I have 200+ CSVs which has combination of datatypes such as Varchar, Integer, Float.
First column of every table must be Primary Key. (i.e LGA_CODE_2016 as above mentioned CSV)
Here is the code I tried
import csv
import psycopg2
import os
import glob
import re
conn = psycopg2.connect("host= hostnamexx dbname=dbnamexx user= usernamexx password=
pwdxx")
print("Connecting to Database")
csvPath = "./TestDataLGA/"
# Loop through each CSV
for filename in glob.glob(csvPath+"*.csv"):
# Create a table name
tablename = filename.replace("./TestDataLGA\\", "").replace(".csv", "")
print tablename
# Open file
fileInput = open(filename, "r")
# Extract first line of file
firstLine = fileInput.readline().strip()
#Extract seconf line of file
secondLine = fileInput.readline()
# Split columns into an array [...]
columns = firstLine.split(",")
colvals = secondLine.split(",")
# Build SQL code to drop table if exists and create table
sqlQueryCreate = 'DROP TABLE IF EXISTS '+ " abs.ABS_" + tablename + ";\n"
sqlQueryCreate += 'CREATE TABLE'+ " abs.ABS_" + tablename + "("
# Define columns for table
for column in columns:
for dtype in colvals:
dt = bool(re.match(r"^\d+?\.\d+?$", dtype))
if dtype.isdigit():
dtype = "INTEGER"
elif dt == True:
dtype = "FLOAT(2)"
else:
dtype = "VARCHAR(64)"
sqlQueryCreate += column + " " + dtype + ",\n"
sqlQueryCreate = sqlQueryCreate[:-2]
sqlQueryCreate += ");"
print sqlQueryCreate
#cur = conn.cursor()
#cur.execute(sqlQueryCreate)
#conn.commit()
#cur.close()
This is the output that I am getting
DROP TABLE IF EXISTS abs.ABS_G02_AUS_LGA;
CREATE TABLE abs.ABS_G02_AUS_LGA(LGA_CODE_2016 FLOAT(2),
Median_age_persons FLOAT(2),
Median_mortgage_repay_monthly FLOAT(2),
Median_tot_prsnl_inc_weekly FLOAT(2),
Median_rent_weekly FLOAT(2),
Median_tot_fam_inc_weekly FLOAT(2),
Average_num_psns_per_bedroom FLOAT(2),
Median_tot_hhd_inc_weekly FLOAT(2),
Average_household_size FLOAT(2));
PS C:\Python27\Scripts>
If I run my inner For loop by itself I get correct set of Datatypes based on the CSV but when I am trying to run it with other For loop, it only prints the last generated datatype which is Float(2) for all column headers. I am also confused where to put code for Primary Key.
Can anyone help me fix this issue?
I have tried several permutations and combination of looping them and using Break command. But nothing seems to work.
PS: I am working on Test data hence just one CSV file can be seen here output. This is the continuation for my earlier question how to automatically create table based on CSV into postgres using python
print()to see values in variables and which part of code is executed. It is called "print debuging".