I have a csv file. I want to organize this (unnormalized) data in a relational manner with python:
A ID should be created which connects the tables.
For example split my data from the csv file and create a m:n relationship. My result should be three tables.
The following example may clarify this:
person_name person_age, pet_name
Lisa, 8, Snowball I
Lisa, 8, Snowball II
Bart, 10, Santa's Little Helper
This should be the result:
person_ID, person_name, person age
1, Lisa, 8
2, Bart, 10
pet_ID, pet_Name
1, Snowball I
2, Snowball II
3, Santa's Little Helper
person_ID, pet_ID
1, 1
1, 2
2, 3
I want to know if there are modules in python or some code to accomplish this.
EDIT: My strategy so far was to create a mySQL script with a formatted strings. The code below shows how I created a INSERT script without assinging any new IDs or keys.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import csv
#set counter variable
cntr = 0
# open file to read
myfile = open('insert_bundesland.sql', 'w')
# Create header
myfile.write('INSERT INTO tbl_local (loc_gemeindeschl_ID, loc_bundesland_ID, loc_bundesland, loc_stadt, loc_stadt_status, loc_einwohner, loc_einwohner_m, loc_einwohner_w)\n')
# open csv file
with open('gem_schl.csv') as f:
reader = csv.reader(f)
# init for loop - loop over row
for row in reader:
# split if there is beside the name of city a status of the city
x = str.split(row[3], ",")
if len(x) == 1:
# if there is no status assign NULL string value
x.append('NULL')
del row[3]
x = row + x
if cntr == 0:
cntr = cntr + 1
else:
if cntr == 1:
# write sql statements
x = "\tVALUES\t(%s, %s, '%s', '%s', '%s', %s, %s, %s)\n" % (x[2], x[0], x[1], x[11], x[12], x[3], x[4], x[5])
myfile.write(x)
cntr = cntr + 1
else:
x = "\t\t\t(%s, %s, '%s', '%s', '%s', %s, %s, %s)\n" % (x[2], x[0], x[1], x[11], x[12], x[3], x[4], x[5])
myfile.write(x)
myfile.write(';')
myfile.close()