Parsing data google sheets (javascript) using Xpath or RegExp

Question

I am trying to parse data from the following website (https://www.fundsquare.net/security/information?idInstr=275136)

I want to display the price of the fund in Google sheets. However, when using the 'importxml' function I get an error that the "imported content is empty". Anybody know what I can do to fix it?

Ways I tried the function:

=IMPORTXML("https://www.fundsquare.net/security/summary?idInstr=275136" ,"//*[@class ='surligneorange']" )

=IMPORTXML("https://www.fundsquare.net/security/information?idInstr=275136" , "//*[@id='content']/table[2]/tbody/tr/td[3]/span[1]")

=IMPORTXML("https://www.fundsquare.net/security/information?idInstr=275136" , "//*[@id='content']//span[1]")

I keep on getting the same error. When looking for this error I get the difference between static and dynamic data. This data changes so I guess its dynamic but i'm not sure how that would impact the formula.

I have been trying some things with script editor but no success. Also trying something with RegExp but couldn't get any further than the examples. My knowledge of scraping is limited so any tips and tricks when trying to parse data is greatly appreciated! Any help would be greatly appreciated!

Edit: Within script editor I tried the following code:

function importdata() {
  var found, html, content = '';
  var response = UrlFetchApp.fetch("https://www.fundsquare.net/security/information?idInstr=275136");

  if (response) {
    html = response.getContentText();
    if (html) content = html.match(/<span class="surligneorange">(.*)<\/span>/)[0];
  }
   Logger.log(content);
}

This gives me the following log output:

[20-06-05 07:44:58:529 PDT] <span class="surligneorange">31.15&nbsp;EUR</span>&nbsp;<span style="color:#DD0000;text-align:left;padding:4px 0;"> -0.67 &nbsp;%&nbsp;<img src="/images/share/variationNegative.gif" style="vertical-align:middle;"/></span></td></tr></table><div id="onglet"><a  href="/security/documents?idInstr=275136">Documents</a><a  href="/security/eusd?idInstr=275136">Taxes</a><a  href="/security/histo-divid?idInstr=275136">Dividends</a><a  href="/security/histo-prices?idInstr=275136">Hist. Prices</a><a  href="/security/price?idInstr=275136">Price</a><a  href="/security/order-ref-data?idInstr=275136">Order Ref. Data</a><a class="selected" href="/security/information?idInstr=275136">Security Information</a><a  href="/security/summary?idInstr=275136">Overview</a><br class="clear_r"/></div><div id="blocresume"><table class="portlet100pct" border="0" cellspacing="0" cellpadding="0"><tr><td valign="top"  class="portletleft50pct"><table width="100%" border="0" cellspacing="0" cellpadding="0"><tr><td valign="top" class="portletBordGris"><div style="position: relative; left: 1px;" class="bloctitle"><img src="/Fundsquare/images/share/x.gif" border="0" height="1" width="1" /></div><DIV class="bloctitle" style="position: relative; top: -21px; right: 1px;"><span style="top: 3px;" >General information</span>

The value 31.15 is what I want to scrape. How can I get this value in my spreadsheet?

Edit 06/06 10:14: further questions

Could you please help me understand what you changed. What exactly is the difference between what I tried to match and what you matched.

mine:

if (html) content = html.match(/<span class="surligneorange">(.*)<\/span>/)[0];

yours:

if (html) content = html.match(/<span class="surligneorange">([\d.]*).*?<\/span>/)[1];


and:
if (html) content = html.match(/<span class="surligneorange">([\d.]*).*<\/span>/)[1];

What is the difference between my [0] and your [1]. is it that you only request the first value?

What is the difference between my .* and your ([\d.]*).* or [\d.]*).*???? My knowledge from javascript is not so good so I am unsure what it does. Thanks for the help!

Tanaike · Accepted Answer · 2020-06-06 08:03:33Z

2

How about this answer?

Unfortunately, when I saw the HTML and your formulas, I thought that the value might not be directly retrieved by IMPORTXML. So I think that your approach to use Google Apps Script can be used for retrieving the value you expect. But I think that your script has the modification points a little for achieving it. In this answer, I would like to propose the modification points of your Google Apps Script.

In your case, I think that the method of match and the regex are required to be modified.

Modified script

When your script is modified, please modify as follows.

From:

if (html) content = html.match(/<span class="surligneorange">(.*)<\/span>/)[0];

To:

if (html) content = html.match(/<span class="surligneorange">([\d.]*).*?<\/span>/)[1];

In my test, the modified script returns 31.15.
I think that ([\d.]*).*<\/span> can be also used.

Script for putting to a cell in Google Spreadsheet

As the modified script for putting to a cell in Google Spreadsheet, from your situation, I thought that you might want to use this as the custom function. If it's so, how about the following modified script?

When your script is used as the custom function, please rename the function name from importdata to others. Because importdata has already been used as the built-in function.

function sample() {
  var found, html, content = '';
  var response = UrlFetchApp.fetch("https://www.fundsquare.net/security/information?idInstr=275136");
  if (response) {
    html = response.getContentText();
    if (html) content = html.match(/<span class="surligneorange">([\d.]*).*<\/span>/)[1];
  }
  return content;
}

In this case, when =sample() is put to a cell, the value of 31.15 is put to the cell. If you want to put the value as the number, please modify return content to return Number(content).

Reference:

match()

Added:

For your additional question, I added the answers as follows.

In your script, html.match(/(.*)<\/span>/)[0]; returns the full match. When you want to retrieve the group, in this case, it's html.match(/(.*)<\/span>/)[1];. But, in this regex, the retrieved value is 31.15 EUR.....General information. I thought that the reason of your issue is this.

In order to retrieve the value of 31.15 EUR, .* is modified to .*?. By this, the matched value is from 31.15 EUR.....General information to 31.15 EUR.

But in your goal, you want to retrieve only 31.15. So I used ([\d.]*).*. By this, 31.15 of 31.15 EUR.....General information is retrieved. When ([\d.]*).*? is used, 31.15 of 31.15 EUR is retrieved.

So, in the case of ([\d.]*).*?<\/span>, 31.15 EUR is matched. And ([\d.]*) is 31.15.

In the case of ([\d.]*).*<\/span>, 31.15 EUR.....General information is matched. And ([\d.]*) is 31.15.

Reference:

Regular expressions

edited Jun 6, 2020 at 8:03

answered Jun 5, 2020 at 23:42

Tanaike

203k12 gold badges126 silver badges222 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Mees Mouwen Over a year ago

You are the best! thank you for the help. been struggling on this for a week but it finally works! A added a few questions as a reply that I hope you could still help me with!

Tanaike Over a year ago

@Mees Mouwen Thank you for replying. I'm glad your issue was resolved. 1. When you add more information, please add it to your question. In your case, the question is posted as an answer. Please delete the answer and add it to your question. 2. About your additional question, I added the answer in my answer. Could you please confirm it?

Mees Mouwen Over a year ago

thank you for the tip. I am new to posting on Stack overflow since I started to teach myself some coding and scraping so I wasn't aware of how to add the information. Thanks! My goal with the code is to retrieve data from different websites and different span classes. I adjusted my code to a function with parameters as in an example I found but it gives me errors. Could you take one last look at my code? its added to the question!

Tanaike Over a year ago

@Mees Mouwen Thank you for your response. But when I saw your question now, unfortunately, I thought that your added issue is different from your initial question. In this case, the question was changed. My answer is for your initial question. And you said You are the best! thank you for the help. been struggling on this for a week but it finally works! for my answer. And about your additional question related to the regex, I added the answer in my answer.

Mees Mouwen Over a year ago

Yes I understand this is not easy to understand for different users and you. I will try to be more clearly what exactly the question is. I have closed this post ( I think) by marking your answer and opened a new question for my next issue. Like I said I am new to this platform so appreciate you explaining what the code of conduct is on stack overflow!

|

Collectives™ on Stack Overflow

Parsing data google sheets (javascript) using Xpath or RegExp

1 Answer 1

Modified script

Script for putting to a cell in Google Spreadsheet

Reference:

Added:

Reference:

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Modified script

Script for putting to a cell in Google Spreadsheet

Reference:

Added:

Reference:

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related