0

I parsed a table from a website using Selenium (by xpath), then used pd.read_html on the table element, and now I'm left with what looks like a list that makes up the table. It looks like this:

[Empty DataFrame
Columns: [Symbol, Expiration, Strike, Last, Open, High, Low, Change, Volume]
Index: [],        Symbol  Expiration  Strike  Last  Open  High   Low  Change   Volume
0  XPEV Dec20  12/18/2020    46.5  3.40  3.00  5.05  2.49    1.08    696.0
1  XPEV Dec20  12/18/2020    47.0  3.15  3.10  4.80  2.00    1.02   2359.0
2  XPEV Dec20  12/18/2020    47.5  2.80  2.67  4.50  1.89    0.91   2231.0
3  XPEV Dec20  12/18/2020    48.0  2.51  2.50  4.29  1.66    0.85   3887.0
4  XPEV Dec20  12/18/2020    48.5  2.22  2.34  3.80  1.51    0.72   2862.0
5  XPEV Dec20  12/18/2020    49.0  1.84  2.00  3.55  1.34    0.49   4382.0
6  XPEV Dec20  12/18/2020    50.0  1.36  1.76  3.10  1.02    0.30  14578.0
7  XPEV Dec20  12/18/2020    51.0  1.14  1.26  2.62  0.78    0.31   4429.0
8  XPEV Dec20  12/18/2020    52.0  0.85  0.95  2.20  0.62    0.19   2775.0
9  XPEV Dec20  12/18/2020    53.0  0.63  0.79  1.85  0.50    0.13   1542.0]

How do I turn this into an actual dataframe, with the "Symbol, Expiration, etc..." as the header, and the far left column as the index?

I've been trying several different things, but to no avail. Where I left off was trying:

# From reading the html of the table step
dfs = pd.read_html(table.get_attribute('outerHTML'))
dfs = pd.DataFrame(dfs)

... and when I print the new dfs, I get this:

0  Empty DataFrame
Columns: [Symbol, Expiration, ...
1         Symbol  Expiration  Strike  Last  Open ...
1
  • 1
    pd.read_html always returns a list of dataframes - index into the one you want/need Commented Dec 16, 2020 at 0:34

1 Answer 1

1

Per pandas.read_html docs,

This function will always return a list of DataFrame or it will fail, e.g., it will not return an empty list.

According to your list output the non-empty dataframe is the second element in that list. So retrieve it by indexing (remember Python uses zero as first index of iterables). Do note you can use data frames stored in lists or dicts.

dfs[1].head()
dfs[1].tail()
dfs[1].describe()
...

single_df = dfs[1].copy()
del dfs

Or index on same call

single_df = pd.read_html(...)[1]
Sign up to request clarification or add additional context in comments.

1 Comment

Right on, makes sense. Thanks for the answer! I learned something!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.