0

I am trying to read the data from "data.txt" and write it to SQLITE database, what I actually get is only first row of the record. I figured out by printing the pandas frame object, it does get all the data from "txt" but failed to iterate throughout, to me the iteration loop looks fine can anybody figure out what I am doing wrong here which making it fail to replicate all the rows of file.

File data looks like this:

862     1
4828    1
6429    1
10013   1
7729    1
380    1
3808 1
7246    1
1663 1

Secondly, you will notice that spaces between the columns are inconsistent, at some places there are spaces and at some places these are tab separated. I have few thousands of inconsistent rows , this is just a subset of those for reference. Is there any way to handle this?

I tweaked delimiter parameter and also split function but it doesn't workout here. Any suggestion?

Actual results:

Only one row into DB

862     1

Expected results:

862     1
4828    1
6429    1
10013   1
7729    1
380     1
3808    1
7246    1
1663    1

Here is the piece of code I tried:

data = pd.read_csv ("data.txt")   
rows = pd.DataFrame(data)

#print (rows)

for row in rows:
    r = row.split()
    object_id = r[0]
    object_type = r[1]
    cur.execute(''' INSERT INTO objects (object_id, object_type) 
     VALUES (?, ?) ''', (object_id, object_type))
    conn.commit()

Any valuable input is appreciated! Thanks in advance.

2 Answers 2

2

This should work as expected:

data = pd.read_csv ("data.txt", delimiter=' ', header=None, names=['object_id', 'object_type'], skipinitialspace=True)
 
for row in data.itertuples(index=False):
    cur.execute(''' INSERT INTO objects (object_id, object_type) 
     VALUES (?, ?) ''', (row.object_id, row.object_type))

conn.commit()

header=None ensures the first row in the file is not considered as a header row.
names lists out the column names
skipinitialspace=True resolves the whitespace issue.

In your code you were iterating through the column names and not the rows.
Since the first row(862, 1) was parsed as header, only that row got inserted into the db.

Also, you don't have to commit at every iteration. A single commit at the end of iteration is suffice.

Sign up to request clarification or add additional context in comments.

6 Comments

it helped alot thanks for the input. but i figured out itertuples(index=False) except index all values are falling into one column (object_id). whereas it should be in different columns according to columns names specified. If intertuples returns the tuple where first value, index = 0, then how we can replicate the reaming values of tuple into different columns?
I tested with your sample "data" and I do see separate columns. Ensure pd.read_csv has skipinitialspace=True argument to clean up the spaces.
only first row falls into separate columns other than that all rows falls into first columns and yes its checked skipinitialspace=True
This is returned by itertuples it is taking first row fine but other than that it takes both values into one column separated by a tab. Pandas(object_id='862', object_type=1.0) Pandas(object_id='4828\t1', object_type=nan) Pandas(object_id='6429\t1', object_type=nan) Pandas(object_id='10013\t1', object_type=nan) Pandas(object_id='7729\t1', object_type=nan) Pandas(object_id='380\t1', object_type=nan) Pandas(object_id='3808\t1', object_type=nan) Pandas(object_id='7246\t1', object_type=nan)
it worked in my case by using only one delimiter argument delim_whitespace=True not need to explicitly define the delimiter = ' '. and then skipinitialspace=True argument will also not be necessary.
|
0

Nevermind this answer, Shiva's is better.

Few things wrong:

  1. If you want to iterate over rows in DataFrame, you have to use:
for row in df.iterrows():
    print(row)
  1. If the file has inconsistent spaces, then I don't think you can use pandas here. I would do something like this to output a dictionary:
with open('test.txt','r') as f:
    lines = f.readlines()

out = []
for line in lines:
    for item in line.strip('\n').split(' '):
        if item:
            out.append(item)

d = {}
i = 0
while i < len(out)-1:
    key = out[i]
    value = out[i+1]
    d[key] = value
    i+=2

then iterate over dictionary d to upload to sql

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.