I am trying to use python (selenium) to extract the data from all the RSPO CREDITS highcharts into a pandas dataframe with Name of chart, Year, Month, and values (No of credits and Price (USD)) on https://rspo.org/palmtrace and have been looking at some other posts like this and this to do this. However, it looks like these charts are formatted a bit differently so any help with this is much appreciated.
Add a comment
|
1 Answer
Considering your site has two 22-series charts and two 16-series charts, a rough solution would be:
from selenium import webdriver
import time
import pandas as pd
driver = webdriver.Chrome()
website = "https://rspo.org/palmtrace"
driver.get(website)
time.sleep(2)
my_data = []
for chart in range(2):
for series in range(22):
temp = driver.execute_script('return window.Highcharts.charts[{}]'
'.series[{}].options.data'.format(chart,series))
temp.insert(0, driver.execute_script('return window.Highcharts.charts[{}]'
'.series[{}].options.name'.format(chart,series)))
my_data.append(temp)
for chart in range(2,4):
for series in range(16):
temp = driver.execute_script('return window.Highcharts.charts[{}]'
'.series[{}].options.data'.format(chart,series))
temp.insert(0, driver.execute_script('return window.Highcharts.charts[{}]'
'.series[{}].options.name'.format(chart,series)))
my_data.append(temp)
df = pd.DataFrame(my_data)
print(df)
5 Comments
Funkeh-Monkeh
Do you know how I can pull the values out into a pandas
dataframe? I updated my question to reflect thisDuc Nguyen
Have edited my code now.
Funkeh-Monkeh
Thanks, but it doesn't include the chart title, year, and month. Do you know how I can have those appended to the
data frame as well?Duc Nguyen
Have added the years. Months are the columns. 1st chart is the first 22 lines, 2nd chart is the next 22 lines, 3rd chart is the next 16 lines, 4th is the rest.
Mike Honey
This was very helpful for my scenario - thanks. Just to add that you can get the number of charts on the page eg
charts_length = driver.execute_script('return window.Highcharts.charts.length'), and then for each chart you can get the number of series eg chart_series_length = driver.execute_script('return window.Highcharts.charts[{}].series.length'.format(chart)). For my scenario, this avoided some hard-coding.