1

Now I have written a parser to extract the information from raw html source code, which could return them as a tuple, and I have to loop this function and use the return to construct a DataFrame (each loop's return as a row). Here's what I have done:

import pandas as pd
import leveldb
for key, value in db.RangeIter():
    html = db.Get(key)
    result = parser(html)
    df = df.append(pd.Series(result, index = index), ignore_index = True)

Note that parser and index are already defined, and db is a leveldb object which store all links and corresponding html source code. My problem is what's the more efficient way to construct that DataFrame? THANKS!

3
  • Do you want to keep the tuple in one column, or split among len(tuple) columns? If the former, you're probably better off just appending to a simple list, then converting that list to a series after the for loop. Commented May 10, 2017 at 11:20
  • @pshep123 Sorry for ambiguity, I try to make each return one row and each element under one column. So yes, there are total len(tuple) columns. Commented May 10, 2017 at 11:29
  • just updated my answer, should do what you're looking for. Commented May 10, 2017 at 12:26

1 Answer 1

1

I would create a dataframe before the loop starts, then append successive dataframes to that. Note that if result is a tuple, it needs to be converted to a list before being converted into a dataframe. And I assume your index is already a list. So:

df = pd.DataFrame()

for key, value in db.RangeIter(): 
    html = db.Get(key) 
    result = parser(html)
    df = df.append(pd.DataFrame(list(result), index = index).transpose())

df.reset_index(inplace = True)

This is not to say your parser could not more efficiently return data for the creation of a dataframe, but I'm working within the confines of a single returned tuple.

Also, depending on the number of elements in the tuple, it may be more efficient to create simple python lists within the loop then create dataframes from those lists when complete, but you don't state the tuple size.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.