importing single .csv into mysql with python

Question

when running this code i am getting a Error while connecting to MySQL Not all parameters were used in the SQL statement

I have tried also to ingest these with another technique

import mysql.connector as msql
from mysql.connector import Error
import pandas as pd

empdata = pd.read_csv('path_to_file', index_col=False, delimiter = ',')
empdata.head()

try:
    conn = msql.connect(host='localhost', user='test345',  
                        password='test123')
    if conn.is_connected():
        cursor = conn.cursor()
        cursor.execute("CREATE DATABASE timetheft")
        print("Database is created")
except Error as e:
    print("Error while connecting to MySQL", e)

try:
    conn = msql.connect(host='localhost', database='timetheft', user='test345', password='test123')
    if conn.is_connected():
        cursor = conn.cursor()
        cursor.execute("select database();")
        record = cursor.fetchone()
        print("You're connected to database: ", record)
        cursor.execute('DROP TABLE IF EXISTS company;')
        print('Creating table....')
        create_contracts_table = """
         CREATE TABLE company ( ID VARCHAR(40) PRIMARY KEY, 
        Company_Name VARCHAR(40), 
        Country VARCHAR(40), 
        City VARCHAR(40), 
        Email VARCHAR(40),  
        Industry VARCHAR(30),
        Employees VARCHAR(30)
        );
        """
        cursor.execute(create_company_table)
        print("Table is created....")
        for i,row in empdata.iterrows():
            sql = "INSERT INTO timetheft.company VALUES (%S, %S, %S, %S, %S,%S,%S,%S)"
            cursor.execute(sql, tuple(row))
            print("Record inserted")
            # the connection is not auto committed by default, so we must commit to save our changes
            conn.commit()
except Error as e:
            print("Error while connecting to MySQL", e)

second technique I tried

LOAD DATA LOCAL INFILE 'path_to_file'  
INTO TABLE copmany  
FIELDS TERMINATED BY ';' 
ENCLOSED BY '"'   
LINES TERMINATED BY '\n'  
IGNORE 1 LINES;

worked better but many errors. only 20% of rows ingested.

Finally here is an excerpt from the .csv (data is consistent throughout all 1K rows)

"ID";"Company_Name";"Country";"City";"Email";"Industry";"Employees" 217520699;"Enim Corp.";"Germany";"Bamberg";"[email protected]";"Internet";"51-100" 352428999;"Lacus Vestibulum Consulting";"Germany";"Villingen-Schwenningen";"[email protected]";"Food Production";"100-500" 371718299;"Dictum Ultricies Ltd";"Germany";"Anklam";"[email protected]";"Primary/Secondary Education";"100-500" 676789799;"A Consulting";"Germany";"Andernach";"[email protected]";"Government Relations";"100-500" 718526699;"Odio LLP";"Germany";"Eisenhüttenstadt";"[email protected]";"E-Learning";"11-50"

Your table has seven columns, but your insert statement has eight placeholders. — snakecharmerb
– snakecharmerb, Commented May 3, 2021 at 9:50
yes but this is not the root of the issue. Tried with 7 placeholders as well — jay_ee
– jay_ee, Commented May 3, 2021 at 12:48

snakecharmerb · Accepted Answer · 2021-05-04 07:03:52Z

2

I fixed these issues to get the code to work:

make the number of placeholders in the insert statement equal to the number of columns
the placeholders should be lower-case '%s'
the cell delimiter appears to be a semi-colon, not a comma.

For simply reading a csv with ~1000 rows Pandas is overkill (and iterrows seems not to behave as you expect). I've used the csv module from the standard library instead.

import csv 

...

sql = "INSERT INTO company VALUES (%s, %s, %s, %s, %s, %s, %s)"

with open("67359903.csv", "r", newline="") as f:
    reader = csv.reader(f, delimiter=";")
    # Skip the header row.
    next(reader)
    # For large files it may be more efficient to commit 
    # rows in batches.
    cursor.executemany(sql, reader)
    conn.commit()

If using the csv module is not convenient, the dataframe's itertuples method may be used to iterate over the data:

empdata = pd.read_csv('67359903.csv', index_col=False, delimiter=';')

for tuple_ in empdata.itertuples(index=False):
    cursor.execute(sql, tuple_)
conn.commit()

Or the dataframe can be dumped to the database directly.

import sqlalchemy as sa

engine = sa.create_engine('mysql+mysqlconnector:///test')
empdata.to_sql('company', engine, index=False, if_exists='replace')

edited May 4, 2021 at 7:03

answered May 3, 2021 at 18:07

snakecharmerb

57.1k13 gold badges136 silver badges200 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

jay_ee Over a year ago

thanks for the answer, but it still gives me same error. And i have a second .csv with over 50MB and i think 1,5 million rows.. :/

snakecharmerb Over a year ago

I've added two ways to load the table using Pandas . All three work with the data provided in the question, with the assumption that each row of data is separated by a newline.

jay_ee Over a year ago

thanks so much! the sqlalchemy dump worked wonders! I could not reproduce it with itertuples, got same argument error from SQL. Thanks again!

Collectives™ on Stack Overflow

importing single .csv into mysql with python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related