0

Is there an elegant way to read one file at a time, do some preprocessing, and then merge into one big dataframe. The way I do it is here. I am sure there may be some other way to get rid of variable i here.

i=0
outdf = DataFrame()
for myfile in myfiles:
    tdf = read_csv(myfile) #Read
    #Do some annotations 
    tdf['Class'] = os.path.basename(myfile).split[0]
    ..............
    #-----------------
    if i == 0:
        outdf = tdf
    else:
        outdf = concat([outdf, tdf])
    i = i +1 
2
  • 1
    AFAIK you don't need i and the if clause in that loop as well. Just use outdf = concat([outdf, tdf]). In the first iteration it will do the concatenation with the empty dataframe so it will return the same dataframe. Commented May 12, 2016 at 18:28
  • At some point I started doing this kind of funny things. Thanks a lot. Commented May 12, 2016 at 18:31

2 Answers 2

2

You don't need to concatenate the DataFrames on each iteration, as concat can concatenate multiple DataFrames. Just store each individual DataFrame in a list, and concatenate at the end.

outdf = []
for myfile in myfiles:
    tdf = read_csv(myfile)
    #Do some annotations 
    tdf['Class'] = os.path.basename(myfile).split[0]
    ..............
    #-----------------
    outdf.append(tdf)

outdf = concat(outdf)
Sign up to request clarification or add additional context in comments.

1 Comment

This will also be faster.
0

You can use enumerate.

    outdf = DataFrame()
    for i, myfile in enumerate(myfiles):
       tdf = read_csv(myfile)
       tdf['Class'] = os.path.basename(myfile).split[0]
       if i == 0:
           outdf = tdf
       else:
           outdf = concat([outdf, tdf])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.