Python: writes only one row into SQLite DB despite loading the full file data into pandas framework

Question

I am trying to read the data from "data.txt" and write it to SQLITE database, what I actually get is only first row of the record. I figured out by printing the pandas frame object, it does get all the data from "txt" but failed to iterate throughout, to me the iteration loop looks fine can anybody figure out what I am doing wrong here which making it fail to replicate all the rows of file.

File data looks like this:

Secondly, you will notice that spaces between the columns are inconsistent, at some places there are spaces and at some places these are tab separated. I have few thousands of inconsistent rows , this is just a subset of those for reference. Is there any way to handle this?

I tweaked delimiter parameter and also split function but it doesn't workout here. Any suggestion?

Actual results:

Only one row into DB

862     1

Expected results:

Here is the piece of code I tried:

data = pd.read_csv ("data.txt")   
rows = pd.DataFrame(data)

#print (rows)

for row in rows:
    r = row.split()
    object_id = r[0]
    object_type = r[1]
    cur.execute(''' INSERT INTO objects (object_id, object_type) 
     VALUES (?, ?) ''', (object_id, object_type))
    conn.commit()

Any valuable input is appreciated! Thanks in advance.

Shiva · Accepted Answer · 2021-01-06 07:44:54Z

2

This should work as expected:

data = pd.read_csv ("data.txt", delimiter=' ', header=None, names=['object_id', 'object_type'], skipinitialspace=True)
 
for row in data.itertuples(index=False):
    cur.execute(''' INSERT INTO objects (object_id, object_type) 
     VALUES (?, ?) ''', (row.object_id, row.object_type))

conn.commit()

header=None ensures the first row in the file is not considered as a header row.
names lists out the column names
skipinitialspace=True resolves the whitespace issue.

In your code you were iterating through the column names and not the rows.
Since the first row(862, 1) was parsed as header, only that row got inserted into the db.

Also, you don't have to commit at every iteration. A single commit at the end of iteration is suffice.

edited Jan 6, 2021 at 7:44

answered Jan 6, 2021 at 7:38

Shiva

2,86826 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Sniper Over a year ago

it helped alot thanks for the input. but i figured out itertuples(index=False) except index all values are falling into one column (object_id). whereas it should be in different columns according to columns names specified. If intertuples returns the tuple where first value, index = 0, then how we can replicate the reaming values of tuple into different columns?

Shiva Over a year ago

I tested with your sample "data" and I do see separate columns. Ensure pd.read_csv has skipinitialspace=True argument to clean up the spaces.

Sniper Over a year ago

only first row falls into separate columns other than that all rows falls into first columns and yes its checked skipinitialspace=True

Sniper Over a year ago

This is returned by itertuples it is taking first row fine but other than that it takes both values into one column separated by a tab. Pandas(object_id='862', object_type=1.0) Pandas(object_id='4828\t1', object_type=nan) Pandas(object_id='6429\t1', object_type=nan) Pandas(object_id='10013\t1', object_type=nan) Pandas(object_id='7729\t1', object_type=nan) Pandas(object_id='380\t1', object_type=nan) Pandas(object_id='3808\t1', object_type=nan) Pandas(object_id='7246\t1', object_type=nan)

Sniper Over a year ago

it worked in my case by using only one delimiter argument delim_whitespace=True not need to explicitly define the delimiter = ' '. and then skipinitialspace=True argument will also not be necessary.

|

Elkoss · Accepted Answer · 2021-01-06 08:23:34Z

0

Nevermind this answer, Shiva's is better.

Few things wrong:

If you want to iterate over rows in DataFrame, you have to use:

for row in df.iterrows():
    print(row)

If the file has inconsistent spaces, then I don't think you can use pandas here. I would do something like this to output a dictionary:

with open('test.txt','r') as f:
    lines = f.readlines()

out = []
for line in lines:
    for item in line.strip('\n').split(' '):
        if item:
            out.append(item)

d = {}
i = 0
while i < len(out)-1:
    key = out[i]
    value = out[i+1]
    d[key] = value
    i+=2

then iterate over dictionary d to upload to sql

edited Jan 6, 2021 at 8:23

answered Jan 6, 2021 at 7:40

Elkoss

463 bronze badges

Collectives™ on Stack Overflow

Python: writes only one row into SQLite DB despite loading the full file data into pandas framework

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related