Convert "Empty Dataframe" / List Items to Dataframe?

Question

I parsed a table from a website using Selenium (by xpath), then used pd.read_html on the table element, and now I'm left with what looks like a list that makes up the table. It looks like this:

[Empty DataFrame
Columns: [Symbol, Expiration, Strike, Last, Open, High, Low, Change, Volume]
Index: [],        Symbol  Expiration  Strike  Last  Open  High   Low  Change   Volume
0  XPEV Dec20  12/18/2020    46.5  3.40  3.00  5.05  2.49    1.08    696.0
1  XPEV Dec20  12/18/2020    47.0  3.15  3.10  4.80  2.00    1.02   2359.0
2  XPEV Dec20  12/18/2020    47.5  2.80  2.67  4.50  1.89    0.91   2231.0
3  XPEV Dec20  12/18/2020    48.0  2.51  2.50  4.29  1.66    0.85   3887.0
4  XPEV Dec20  12/18/2020    48.5  2.22  2.34  3.80  1.51    0.72   2862.0
5  XPEV Dec20  12/18/2020    49.0  1.84  2.00  3.55  1.34    0.49   4382.0
6  XPEV Dec20  12/18/2020    50.0  1.36  1.76  3.10  1.02    0.30  14578.0
7  XPEV Dec20  12/18/2020    51.0  1.14  1.26  2.62  0.78    0.31   4429.0
8  XPEV Dec20  12/18/2020    52.0  0.85  0.95  2.20  0.62    0.19   2775.0
9  XPEV Dec20  12/18/2020    53.0  0.63  0.79  1.85  0.50    0.13   1542.0]

How do I turn this into an actual dataframe, with the "Symbol, Expiration, etc..." as the header, and the far left column as the index?

I've been trying several different things, but to no avail. Where I left off was trying:

# From reading the html of the table step
dfs = pd.read_html(table.get_attribute('outerHTML'))
dfs = pd.DataFrame(dfs)

... and when I print the new dfs, I get this:

0  Empty DataFrame
Columns: [Symbol, Expiration, ...
1         Symbol  Expiration  Strike  Last  Open ...

pd.read_html always returns a list of dataframes - index into the one you want/need — Asish M.
– Asish M., Commented Dec 16, 2020 at 0:34

Parfait · Accepted Answer · 2020-12-16 00:34:15Z

1

Per pandas.read_html docs,

This function will always return a list of DataFrame or it will fail, e.g., it will not return an empty list.

According to your list output the non-empty dataframe is the second element in that list. So retrieve it by indexing (remember Python uses zero as first index of iterables). Do note you can use data frames stored in lists or dicts.

dfs[1].head()
dfs[1].tail()
dfs[1].describe()
...

single_df = dfs[1].copy()
del dfs

Or index on same call

single_df = pd.read_html(...)[1]

answered Dec 16, 2020 at 0:34

Parfait

108k19 gold badges102 silver badges138 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

wildcat89 Over a year ago

Right on, makes sense. Thanks for the answer! I learned something!

Collectives™ on Stack Overflow

Convert "Empty Dataframe" / List Items to Dataframe?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related