Create SQL INSERT Script with python

Question

I have a csv file. I want to organize this (unnormalized) data in a relational manner with python:

A ID should be created which connects the tables.

For example split my data from the csv file and create a m:n relationship. My result should be three tables.

The following example may clarify this:

person_name person_age, pet_name
Lisa, 8, Snowball I
Lisa, 8, Snowball II
Bart, 10, Santa's Little Helper

This should be the result:

person_ID, person_name, person age
1, Lisa, 8
2, Bart, 10

pet_ID, pet_Name
1, Snowball I
2, Snowball II
3, Santa's Little Helper

person_ID, pet_ID
1, 1
1, 2
2, 3

I want to know if there are modules in python or some code to accomplish this.

EDIT: My strategy so far was to create a mySQL script with a formatted strings. The code below shows how I created a INSERT script without assinging any new IDs or keys.

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 
import csv 

#set counter variable
cntr = 0

# open file to read
myfile = open('insert_bundesland.sql', 'w')

# Create header
myfile.write('INSERT INTO tbl_local (loc_gemeindeschl_ID, loc_bundesland_ID,      loc_bundesland, loc_stadt, loc_stadt_status, loc_einwohner, loc_einwohner_m, loc_einwohner_w)\n') 

# open csv file
with open('gem_schl.csv') as f:
    reader = csv.reader(f)
    # init for loop - loop over row
    for row in reader:
        # split if there is beside the name of city a status of the city
        x = str.split(row[3], ",")
        if len(x) == 1:
            # if there is no status assign NULL string value
            x.append('NULL')
        del row[3]
        x = row + x
        if  cntr == 0:
            cntr = cntr + 1
        else:
            if cntr == 1:
                # write sql statements 
                x = "\tVALUES\t(%s, %s, '%s', '%s', '%s', %s, %s, %s)\n" % (x[2], x[0], x[1], x[11], x[12], x[3], x[4], x[5])
                myfile.write(x)
                cntr = cntr + 1
            else:
                 x = "\t\t\t(%s, %s, '%s', '%s', '%s', %s, %s, %s)\n" % (x[2], x[0], x[1], x[11], x[12], x[3], x[4], x[5])
                 myfile.write(x)
myfile.write(';')
myfile.close()

To be able to effectively answer this question we need to know what database/dbapi driver you are using and if you are using an ORM, which one are you using? Also how big is the dataset. If it's small will differ from if it's huge. This is also a veyr vague question because the way yo have worded it. Have you already written some code to read the CSV, if so please include it. — Michael Robellard
– Michael Robellard, Commented Aug 26, 2015 at 19:19
Please show us what you have so far and explain what you're stuck on. — Kevin
– Kevin, Commented Aug 26, 2015 at 19:20
I see you are relatively new to Stack Overflow. Stack Overflow is a question-and-answer site. Readers, such as yourself, ask questions and other readers attempt to answer them. Your post seems to be missing the key ingredient: a question. Do you have a specific question? — Robᵩ
– Robᵩ, Commented Aug 26, 2015 at 19:43
Good hint @Robᵩ, I'm also new to python. My question is: Is there a convenient way or module to transform csv data to relational tables. — jnshsrs
– jnshsrs, Commented Aug 26, 2015 at 20:11

Robᵩ · Accepted Answer · 2015-08-26 23:21:20Z

The modules csv and sqlite3 should fit your needs. Here is a sample:

#!/usr/bin/env python2

import sqlite3
import csv


def quotify(s):
    return '"' + s.strip().replace('"', '""') + '"'

con = sqlite3.connect("pets.db")

# Examle contents of pets.csv:
# person_name, person_age, pet_name
# Lisa, 8, Snowball I
# Lisa, 8, Snowball II
# Bart, 10, Santa's Little Helper
with open("pets.csv") as pets:
    pets = csv.reader(pets)
    with con:
        names = next(pets)
        names = [name.decode('utf-8') for name in names]
        for name in names:
            con.execute('drop table if exists %s;' % quotify(name))
            con.execute('create table %s (value unique on conflict ignore);'
                        % quotify(name))
        con.execute("drop table if exists master")
        st = "create table master(%s);" % (
            ','.join("%s" % quotify(name) for name in names))
        con.execute(st)
        for row in pets:
            row = [item.decode('utf-8') for item in row]
            rowids = []
            for name, value in zip(names, row):
                rowids.append(
                    con.execute("insert into %s (value) values(?)"
                                % quotify(name),
                                (value.strip(),)).lastrowid)
            st = 'insert into master values(%s)' % (
                ','.join('?' for rowid in rowids))
            con.execute(st, rowids)

# Demonstration, using Simpon's example from question:
from pprint import pprint
st = '''select person_name.value, person_age.value, pet_name.value
          from person_name, person_age, pet_name, master
         where master.person_name = person_name.rowid
           and master.person_age = person_age.rowid
           and master.pet_name = pet_name.rowid;'''

pprint(con.execute(st).fetchall())

Collectives™ on Stack Overflow

Create SQL INSERT Script with python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related