0

I have a csv file. I want to organize this (unnormalized) data in a relational manner with python:

A ID should be created which connects the tables.

For example split my data from the csv file and create a m:n relationship. My result should be three tables.

The following example may clarify this:

person_name person_age, pet_name
Lisa, 8, Snowball I
Lisa, 8, Snowball II
Bart, 10, Santa's Little Helper

This should be the result:

person_ID, person_name, person age
1, Lisa, 8
2, Bart, 10

pet_ID, pet_Name
1, Snowball I
2, Snowball II
3, Santa's Little Helper

person_ID, pet_ID
1, 1
1, 2
2, 3

I want to know if there are modules in python or some code to accomplish this.

EDIT: My strategy so far was to create a mySQL script with a formatted strings. The code below shows how I created a INSERT script without assinging any new IDs or keys.

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 
import csv 

#set counter variable
cntr = 0

# open file to read
myfile = open('insert_bundesland.sql', 'w')

# Create header
myfile.write('INSERT INTO tbl_local (loc_gemeindeschl_ID, loc_bundesland_ID,      loc_bundesland, loc_stadt, loc_stadt_status, loc_einwohner, loc_einwohner_m, loc_einwohner_w)\n') 

# open csv file
with open('gem_schl.csv') as f:
    reader = csv.reader(f)
    # init for loop - loop over row
    for row in reader:
        # split if there is beside the name of city a status of the city
        x = str.split(row[3], ",")
        if len(x) == 1:
            # if there is no status assign NULL string value
            x.append('NULL')
        del row[3]
        x = row + x
        if  cntr == 0:
            cntr = cntr + 1
        else:
            if cntr == 1:
                # write sql statements 
                x = "\tVALUES\t(%s, %s, '%s', '%s', '%s', %s, %s, %s)\n" % (x[2], x[0], x[1], x[11], x[12], x[3], x[4], x[5])
                myfile.write(x)
                cntr = cntr + 1
            else:
                 x = "\t\t\t(%s, %s, '%s', '%s', '%s', %s, %s, %s)\n" % (x[2], x[0], x[1], x[11], x[12], x[3], x[4], x[5])
                 myfile.write(x)
myfile.write(';')
myfile.close()
5
  • Why wouldn't you use a SQL module in Python like SQLite? Commented Aug 26, 2015 at 19:14
  • To be able to effectively answer this question we need to know what database/dbapi driver you are using and if you are using an ORM, which one are you using? Also how big is the dataset. If it's small will differ from if it's huge. This is also a veyr vague question because the way yo have worded it. Have you already written some code to read the CSV, if so please include it. Commented Aug 26, 2015 at 19:19
  • Please show us what you have so far and explain what you're stuck on. Commented Aug 26, 2015 at 19:20
  • 1
    I see you are relatively new to Stack Overflow. Stack Overflow is a question-and-answer site. Readers, such as yourself, ask questions and other readers attempt to answer them. Your post seems to be missing the key ingredient: a question. Do you have a specific question? Commented Aug 26, 2015 at 19:43
  • Good hint @Robᵩ, I'm also new to python. My question is: Is there a convenient way or module to transform csv data to relational tables. Commented Aug 26, 2015 at 20:11

1 Answer 1

2

The modules csv and sqlite3 should fit your needs. Here is a sample:

#!/usr/bin/env python2

import sqlite3
import csv


def quotify(s):
    return '"' + s.strip().replace('"', '""') + '"'

con = sqlite3.connect("pets.db")

# Examle contents of pets.csv:
# person_name, person_age, pet_name
# Lisa, 8, Snowball I
# Lisa, 8, Snowball II
# Bart, 10, Santa's Little Helper
with open("pets.csv") as pets:
    pets = csv.reader(pets)
    with con:
        names = next(pets)
        names = [name.decode('utf-8') for name in names]
        for name in names:
            con.execute('drop table if exists %s;' % quotify(name))
            con.execute('create table %s (value unique on conflict ignore);'
                        % quotify(name))
        con.execute("drop table if exists master")
        st = "create table master(%s);" % (
            ','.join("%s" % quotify(name) for name in names))
        con.execute(st)
        for row in pets:
            row = [item.decode('utf-8') for item in row]
            rowids = []
            for name, value in zip(names, row):
                rowids.append(
                    con.execute("insert into %s (value) values(?)"
                                % quotify(name),
                                (value.strip(),)).lastrowid)
            st = 'insert into master values(%s)' % (
                ','.join('?' for rowid in rowids))
            con.execute(st, rowids)

# Demonstration, using Simpon's example from question:
from pprint import pprint
st = '''select person_name.value, person_age.value, pet_name.value
          from person_name, person_age, pet_name, master
         where master.person_name = person_name.rowid
           and master.person_age = person_age.rowid
           and master.pet_name = pet_name.rowid;'''

pprint(con.execute(st).fetchall())
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.