2

Having a bit of a headache with a scraping scenario I'm trying in Google Sheets.

In a nutshell, we want to use Google Sheets with ImportXML to create scraped feed from clients' websites pulling product details.

Here is a link to the smaller version of the doc. https://docs.google.com/a/sprt.co.za/spreadsheets/d/1dSbglYniWa_cijb6yDty576j33CTk9Cf8J38a3VXHSU/edit?usp=sharing

Currently this specific client only has the Item Price, etc details in a text area in the code. So when I use =ImportXml($C$2, "//textarea") it gives me the entire text area across two cells. From these cells, actually only the second one I need to pull out details but I am pretty stuck on the Regex on a piece if data this big.

" { ""id"": ""061013AACI9"", ""productId"": ""061013AACI9"", ""name"": ""VANS MEN'S 
PERFORATED LEATHER ERA"", ""price"": ""R 799.00"", ""oldPrice"": """", ""brand"": 
""Vans"", ""brandURL"": ""/plp/vans/_/N-1z140je"", ""defaultImages"": [ ], 
""images"": [ { ""thumb"": 
""http://tfgsrv.wigroup.co/06/Thumbnail/31460739.jpg"", ""large"": 
""http://tfgsrv.wigroup.co/06/Detail/31460739.jpg"" } , { ""thumb"": 
""http://tfgsrv.wigroup.co/06/ThumbnailAlternative/31460739_01.jpg"", 
""large"": ""http://tfgsrv.wigroup.co/06/DetailAlternative/31460739_01.jpg"" } 
, { ""thumb"": 
""http://tfgsrv.wigroup.co/06/ThumbnailAlternative/31460739_02.jpg"", 
""large"": ""http://tfgsrv.wigroup.co/06/DetailAlternative/31460739_02.jpg"" } 
, { ""thumb"": 
""http://tfgsrv.wigroup.co/06/ThumbnailAlternative/31460739_03.jpg"", 
""large"": ""http://tfgsrv.wigroup.co/06/DetailAlternative/31460739_03.jpg"" } 
], ""transientProfile"": ""true"", ""wishListId"": ""anonymous"", ""colors"": [ { 
""id"": ""31460739"", ""name"": ""White"", ""path"": 
""http://tfgsrv.wigroup.co/06/ColourSwatch/31460739_SW.jpg"", ""activeColor"" : 
true, ""available"" : true } ], ""sizes"": [ { ""id"": ""31460740_06"", ""name"": 
""6"", ""available"": false } , { ""id"": ""31460741_06"", ""name"": ""7"", 
""available"": true } , { ""id"": ""31460742_06"", ""name"": ""8"", ""available"": true 
} , { ""id"": ""31460743_06"", ""name"": ""9"", ""available"": false } , { ""id"": 
""31460744_06"", ""name"": ""10"", ""available"": true } , { ""id"": ""31460745_06"", 
""name"": ""11"", ""available"": false } ], ""productType"" : ""ColourSize"" } "

I need to pull out the R 799.00 value from that mess. So if anyone is willing to help out. Because frankly my talent and skill has run it's course in trying to navigate that with RegEx.

2

1 Answer 1

1

Try this:

""price"":\s""([^"]+)""

Demo

Output:

MATCH 1
1.  [124-132]   `R 799.00`
Sign up to request clarification or add additional context in comments.

4 Comments

RegEx working nicely, thanks. ALso works for pulling other values from the Text Area so double bonus. Now trying to get the RegEx to work in Sheets. =REGEXEXTRACT(E3,'""price"":\s""([^"]+)""') Isn't giving me any joy.
try this =REGEXEXTRACT(E3,"price.*?:\s.(.*?)..,")
=REGEXEXTRACT(E3,"price\W*:\W*(.*?)\W?,") RE2 regular expression syntax reference re2.googlecode.com/hg/doc/syntax.html
Fantastic. Thanks a lot!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.