1

I have 2 .txt files, and I converted them into .csv files using https://convertio.co/csv-xlsx/. Now, I would like to import these two .csv files into two databases using SQLite in Python (UI is Jupyter Notebook). These two .csv files are labeled person.csv and person_votes.csv. So, I did it by following the code given here (Importing a CSV file into a sqlite3 database table using Python):

import sqlite3, csv

con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("CREATE TABLE person (personid STR,age STR,sex STR,primary_voting_address_id STR,state_code STR,state_fips STR,county_name STR,county_fips STR,city STR,zipcode STR, zip4 STR,  PRIMARY KEY(personid))") 

with open('person.csv','r') as person_table: # `with` statement available in 2.5+
    # csv.DictReader uses first line in file for column headings by default
    dr = csv.DictReader(person_table) # comma is default delimiter
#personid   age sex primary_voting_address_id   state_code  state_fips  county_name county_fips city    zipcode zip4
    to_db = [(i['personid'], i['age'], i['sex'], i['primary_voting_address_id'], i['state_code'], i['state_flips'], i['county_name'], i['county_fips'], i['city'], i['zipcode'], i['zip4']) for i in dr]

cur.executemany("INSERT INTO t (age, sex) VALUES (?, ?);", to_db)
con.commit()

I don't understand why when I tried executing the code above, I keep getting the error message: "KeyError: 'personid'". Could someone please help?

Also, if I create another database table named to_db2 for the file person_votes.csv in the same Python file, would the following query give me all the common elements between two tables:

select ID from to_db, to_db2 WHERE to_db.ID ==  to_db2

The link to the two .csv files above is here: https://drive.google.com/open?id=0B-cyvC6eCsyCQThUeEtGcWdBbXc.

7
  • csvkit.readthedocs.io/en/1.0.2/tutorial/… Commented Sep 3, 2017 at 23:12
  • I tried their instructions, but failed right on their first line of code (csvsql -i sqlite3 person.csv) SyntaxError: invalid syntax at sqlite3 (what??) Commented Sep 3, 2017 at 23:28
  • @cricket_007: could you please let me know why I got that error? I already import csvkit, as well as install csvkit into my environment in Anaconda;p Commented Sep 3, 2017 at 23:45
  • Check that page again. Says sqlite Commented Sep 4, 2017 at 2:05
  • Thanks for your response. My Python's version is 3.0+, so it does not have sqlite. It only has sqlite3, but I believe they are not much different. Commented Sep 4, 2017 at 2:10

2 Answers 2

1

This works for me on Windows 10, but should work under Linux/Unix too. There are several problems:

  1. The last two rows of person.csv are not correct format, but this does not prevent the program from working. You can fix this with a text editor.
  2. person.csv uses tabs as the delimiter not commas.
  3. There is a typo (spelling) in the line that starts with "to_db ="
  4. There is a mismatch in the number of columns to import (2 instead of 11)
  5. Wrong table name on executemany.

In addition, I create the database in a file rather than in memory. It is small enough that performance should not be a problem and also any changes you make will be saved.

Here is my corrected file (you can do the other table yourself):

import sqlite3, csv

# con = sqlite3.connect(":memory:")
con = sqlite3.connect("person.db")
cur = con.cursor()
cur.execute("CREATE TABLE person (personid STR,age STR,sex STR,primary_voting_address_id STR,state_code STR,state_fips STR,county_name STR,county_fips STR,city STR,zipcode STR, zip4 STR,  PRIMARY KEY(personid))") 

with open('person.csv','r') as person_table:
    dr = csv.DictReader(person_table, delimiter='\t') # comma is default delimiter
    to_db = [(i['personid'], i['age'], i['sex'], i['primary_voting_address_id'], i['state_code'], i['state_fips'], i['county_name'], i['county_fips'], i['city'], i['zipcode'], i['zip4']) for i in dr]

cur.executemany("INSERT INTO person VALUES (?,?,?,?,?,?,?,?,?,?,?);", to_db)
con.commit()
Sign up to request clarification or add additional context in comments.

11 Comments

thank you so much for your help, Sir! I would try it and let you know right away in case I run into any problems in executing your code. So far, I have not seen the difference between yours and mine, except the line dr = csv.DictReader(person_table, delimiter='\t')
Btw, do you know if the DictReader() function work with txt file? Apparently, when I tried using the website convertio.co to convert my person.txt file into a .csv file, it included only a portion of the data.
@ghjk it depends on how the txt file is organized, do you have a link to it?
I do!! Here is the link you need: drive.google.com/open?id=0B-cyvC6eCsyCTXBUa1lOSHRVOFU. Please let me know in which situations for a txt file, the above code does not work? And how to fix it if that's the case?
Basically the txt file has to be organized in columns with a single tab between columns. If a value is missing you will then have two tabs. Both of your txt files work fine. If, for example, a value has an embedded tab then you have to edit it to use quotes. A very simple test is to open the txt file with Excel.
|
0

Looks like you might be missing some column names in your INSERT INTO ... statement.

Probably not great practice leaving the Primary Key as NULL too.

6 Comments

Thank you for your comment. I did specify Primary Key = personID. Did you miss it? Other than that, the error I got came from this line: to_db = [(i['personid'], i['age'], i['sex'], i['primary_voting_address_id'], i['state_code'], i['state_flips'], i['county_name'], i['county_fips'], i['city'], i['zipcode'], i['zip4']) for i in dr], so I am not sure if the line INSERT INTO.... is the reason. I still get the same error message when specifying ALL the column names, to be honest.
After I tried adding all the column names into INSERT INTO..., I got a new error message from the cursor.execute... statement: OperationalError: table person already exists. If I change the name of the table from person to person1, I ended up with the original error message at line 11: KeyError: 'personid'. Not sure why:(
@Marichyasana: personID is unique, isn't it? Did you take a look at the .csv files??
@Marichyasana: could you try importing the two csv files by fixing my code above, and let me know what went wrong? I still could not see why I got that error:(
@ghjk I downloaded the two files using wget and checked the first one with csvlint and a little bit using R. Give me a little time and I'll see.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.