1

This one has been relatively tricky for me. I am trying to extract the embedded table sourced from google sheets in python.

Here is the link

I do not own the sheet but it is publicly available.

here is my code thus far, when I go to output the headers it is showing me "". Any help would be greatly appreciated. End goal is to convert this table into a pandas DF. Thanks guys

import lxml.html as lh
import pandas as pd

url = 'https://docs.google.com/spreadsheets/u/0/d/e/2PACX-1vQ--HR_GTaiv2dxaVwIwWYzY2fXTSJJN0dugyQe_QJnZEpKm7bu5o7eh6javLIk2zj0qtnvjJPOyvu2/pubhtml/sheet?headers=false&gid=1503072727'

page = requests.get(url)

doc = lh.fromstring(page.content)

tr_elements = doc.xpath('//tr')

col = []
i = 0

for t in tr_elements[0]:
    i +=1
    name = t.text_content()
    print('%d:"%s"'%(i,name))
    col.append((name,[])) 

1 Answer 1

1

Well if you would like to get the data into a DataFrame, you could load it directly:

df = pd.read_html('https://docs.google.com/spreadsheets/u/0/d/e/2PACX-1vQ--HR_GTaiv2dxaVwIwWYzY2fXTSJJN0dugyQe_QJnZEpKm7bu5o7eh6javLIk2zj0qtnvjJPOyvu2/pubhtml/sheet?headers=false&gid=1503072727', 
                  header=1)[0]
df.drop(columns='1', inplace=True)  # remove unnecessary index column called "1"

This will give you:

                               Target Ticker                   Acquirer  \
0       Acacia Communications Inc Com   ACIA      Cisco Systems Inc Com   
1  Advanced Disposal Services Inc Com   ADSW   Waste Management Inc Com   
2                    Allergan Plc Com    AGN             Abbvie Inc Com   
3           Ak Steel Holding Corp Com    AKS   Cleveland Cliffs Inc Com   
4      Td Ameritrade Holding Corp Com   AMTD  Schwab (Charles) Corp Com   

  Ticker.1 Current Price Take Over Price Price Diff % Diff Date Announced  \
0     CSCO        $68.79          $70.00      $1.21  1.76%       7/9/2019   
1       WM        $32.93          $33.15      $0.22  0.67%      4/15/2019   
2     ABBV       $197.05         $200.22      $3.17  1.61%      6/25/2019   
3      CLF         $2.98           $3.02      $0.04  1.34%      12/3/2019   
4     SCHW        $49.31          $51.27      $1.96  3.97%     11/25/2019   

  Deal Type  
0      Cash  
1      Cash  
2       C&S  
3     Stock  
4     Stock  

Note read_html returns a list. In this case there is only 1 DataFrame, so we can refer to the first and only index location [0]

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.