0

I am trying to use python (selenium) to extract the data from all the RSPO CREDITS highcharts into a pandas dataframe with Name of chart, Year, Month, and values (No of credits and Price (USD)) on https://rspo.org/palmtrace and have been looking at some other posts like this and this to do this. However, it looks like these charts are formatted a bit differently so any help with this is much appreciated.

1 Answer 1

3

Considering your site has two 22-series charts and two 16-series charts, a rough solution would be:

from selenium import webdriver
import time
import pandas as pd

driver = webdriver.Chrome()

website = "https://rspo.org/palmtrace"

driver.get(website)
time.sleep(2)

my_data = []

for chart in range(2):
    for series in range(22):
        temp = driver.execute_script('return window.Highcharts.charts[{}]'
                                '.series[{}].options.data'.format(chart,series))
        temp.insert(0, driver.execute_script('return window.Highcharts.charts[{}]'
                                '.series[{}].options.name'.format(chart,series)))
        my_data.append(temp)

for chart in range(2,4):
    for series in range(16):
        temp = driver.execute_script('return window.Highcharts.charts[{}]'
                                '.series[{}].options.data'.format(chart,series))
        temp.insert(0, driver.execute_script('return window.Highcharts.charts[{}]'
                                '.series[{}].options.name'.format(chart,series)))
        my_data.append(temp)

df = pd.DataFrame(my_data)
print(df)
Sign up to request clarification or add additional context in comments.

5 Comments

Do you know how I can pull the values out into a pandas dataframe? I updated my question to reflect this
Have edited my code now.
Thanks, but it doesn't include the chart title, year, and month. Do you know how I can have those appended to the data frame as well?
Have added the years. Months are the columns. 1st chart is the first 22 lines, 2nd chart is the next 22 lines, 3rd chart is the next 16 lines, 4th is the rest.
This was very helpful for my scenario - thanks. Just to add that you can get the number of charts on the page eg charts_length = driver.execute_script('return window.Highcharts.charts.length'), and then for each chart you can get the number of series eg chart_series_length = driver.execute_script('return window.Highcharts.charts[{}].series.length'.format(chart)). For my scenario, this avoided some hard-coding.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.