1

I have a giant string as follow obtained from Nokogiri inside a <script> tag:

   ....com\\/shop\",\"url\":\"?search=espadrille&options=reserve-eligible\",\"slug\":\"options\",\"order\":null,\"matchesMainFilter\":null,\"name\":\"Reserve Eligible\",\"type\":\"options\",\"identifier\":\"reserve-eligible\"}],\"title\":\"Options\",\"identifier\":\"options\",\"remove_url\":\"?search=espadrille\",\"classification\":\"\",\"view_all_url\":\"\",\"count\":\"\",\"slug\":\"\"},{\"children\":[{\"id\":95,\"children\":[{\"id\":150,\"children\":[],\"count\":1,\"applied\":false,\"removeUrl\":\"https:\\/\\/www.fashionphile.com\\/shop\",\"url\":\"?brands=chanel&chanel=lambskin&search=espadrille\",\"slug\":\"lambskin\",\"order\":null,\"matchesMainFilter\":false,\"name\":\"Lambskin\",\"type\":\"brand\"}],\"count\":7,\"applied\":false,\"removeUrl\":\"https:\\/\\/www.fashionphile.com\\/shop\",\"url\":\"?brands=chanel&search=espadrille\",\"slug\":\"chanel\",\"order\":null,\"matchesMainFilter\":false,\"name\":\"Chanel\",\"type\":\"brand\",\"identifier\":\"chanel\"},{\"id\":98,\"children\":[],\"count\":1,\"applied\":false,\"removeUrl\":\"https:\\/\\/www.fashionphile.com\\/shop\",\"url\":\"?brands=louboutin&search=espadrille\",\"slug\":\"louboutin\",\"order\":null,\"matchesMainFilter\":false,\"name\":\"Christian Louboutin\",\"type\":\"brand\",\"identifier\":\"christian-louboutin\"},{\"id\":103,\"children\":[],\"count\":3,\"applied\":false,\"removeUrl\":\"https:\\/\\/www.fashionphile.com\\/shop\",\"url\":\"?brands=gucci&search=espadrille\",\"slug\":\"gucci\",\"order\":null,\"matchesMainFilter\":false,\"name\":\"Gucci\",\"type\":\"brand\",\"identifier\":\"gucci\"},{\"id\":104,\"children\":[],\"count\":1,\"applied\":false,\"removeUrl\":\"https:\\/\\/www.fashionphile.com\\/shop\",\"url\":\"?brands=hermes&search=espadrille\",\"slug\":\"hermes\",\"order\":null,\"matchesMainFilter\":false,\"name\":\"Hermes\",\"type\":\"brand\",\"identifier\":\"hermes\"},{\"id\":107,\"children\":[{\"id\":132,\"children\":[],\"count\":1,\"applied\":false,\"removeUrl\":\"https:\\/\\/www.fashionphile.com\\/shop\",\"url\":\"?brands=louis-vuitton&louis-vuitton=louis-vuitton-monogram&search=espadrille\",\"slug\":\"louis-vuitton-monogram\",\"order\":null,\"matchesMainFilter\":false,\"name\":\"Monogram\",\"type\":\"brand\"}],\"count\":2,\"applied\":false,\"removeUrl\":\"https:\\/\\/www.fashionphile.com\\/shop\",\"url\":\"?brands=louis-vuitton&search=espadrille\",\"slug\":\"louis-vuitton\",\"order\":null,\"matchesMainFilter\":false,\"name\":\"Louis Vuitton\",\"type\":\"brand\",\"identifier\":\"louis-vuitton\"},{\"id\":115,\"children\":[],\"count\":4,\"applied\":false,\"removeUrl\":\"https:\\/\\/www.fashionphile.com\\/shop\",\"url\":\"?brands=valentino&search=espadrille\",\"slug\":\"valentino\",\"order...

I want to find a way to get all the brands=xxxxxx inside an array like ["chanel", "LV"] or maybe a hash {brand1: "chanel", brand2: "LV"}.

--- EDIT ---

And how can I access <meta itemprop=\"brand\" content=\"Chanel\"> and associate it with its <span class=\"sale-price\" itemprop=\"price\" content=\"595.00\">in an array or hash like this:

hash = {chanel: "200", LV: "100"}

Here is the script without data to make it smaller:

<script>
    var bootstrappedShopResults = {"products":"<div class=\"container-fluid\">\n    <div class=\"product-flex\">\n            <\/div>\n<\/div>\n","meta":{"pagination":"","total":0,"itemsFrom":null,"itemsTo":null},"aggregations":[{"children":[],"title":"Price","identifier":"price","remove_url":"?","classification":"","view_all_url":"","count":"","slug":""},{"children":[],"title":"Options","identifier":"options","remove_url":"?","classification":"","view_all_url":"","count":"","slug":""},{"children":[],"title":"Brands","identifier":"brands","remove_url":"?","classification":"","view_all_url":"","count":"","slug":""},{"children":[],"title":"Condition","identifier":"condition","remove_url":"?","classification":"","view_all_url":"","count":"","slug":""}],"parameters":{"pageSize":180,"sort":"date-desc","search":"espadrille.json"},"appliedFilters":[],"mainFilter":null,"pageTitle":"Shop Pre owned Designer Handbags | Used Designer Bags | Fashionphile","metaDescription":"Fashionphile offers a wide selection of pre-owned designer handbags and accessories. Add quality, used designer bags and more to your collection today!","removeSearchUrl":"?pageSize=180&sort=date-desc"};
</script>
6
  • 1
    Is that the whole string? I am asking because it looks like JSON and if it was then I would suggest using a JSON parser, but at the start and at the end there seems to be missing something. Where is this string coming from? Commented Jul 6, 2020 at 18:39
  • I second what @spickermann said. This looks like you posted malformed JSON. If that is not the entire string then it is likely the true string properly formed JSON and would be fairly simple to handle with a JSON parser. Commented Jul 6, 2020 at 19:27
  • its from inside a script tag from scraping a web site, I tried to use a parser but I was told its not possible because I get a big string so I cannot just convert it to json. Do you think you can do it ? I would prefer that way if its possible. Commented Jul 6, 2020 at 20:01
  • @Antoine what are you using for scraping? A few notes: 1)page scraping is tough because the page can change at any time. 2) libraries like nokogiri offer css and xpath selectors which can be much more targeted than string selection. 3) how you went about obtaining this data is paramount to your question as it might lead to far better solutions than your original question of segmenting a string. Commented Jul 6, 2020 at 21:55
  • yes I used Nokogiri I just edited the question to also add the ruby object I am expecting. Commented Jul 7, 2020 at 8:15

2 Answers 2

2

you can scan with a regex, like so:

brands_array = string.scan(/brands=([^&]+)/)

Which will provide:

[["chanel"], ["chanel"], ["louboutin"], ["gucci"], ["hermes"], ["louis-vuitton"], ["louis-vuitton"], ["valentino"]]

If you don't wanna repeated, just call uniq:

brands_array = string.scan(/brands=([^&]+)/).uniq

This will return:

[["chanel"], ["louboutin"], ["gucci"], ["hermes"], ["louis-vuitton"], ["valentino"]]
Sign up to request clarification or add additional context in comments.

2 Comments

thanks. And how can I access <meta itemprop=\"brand\" content=\"Chanel\"> and associate it with its <span class=\"sale-price\" itemprop=\"price\" content=\"595.00\">in an array or hash ?
do you know of a way for me to know what regex I need for my different need of fetching the data ?
2
str.gsub(/(?<=\bbrands=)[^&]+/).to_a
  #=> ["chanel", "chanel", "louboutin", "gucci", "hermes", "louis-vuitton",
  #    "louis-vuitton", "valentino"]

Tack on .uniq if desired.

This makes use of the fact that String#gsub returns an enumerator when used without a block.

2 Comments

thanks. how can I access <meta itemprop=\"brand\" content=\"Chanel\"> and associate it with its <span class=\"sale-price\" itemprop=\"price\" content=\"595.00\">in an array or hash ?
That's a different question. You should only ask one question at a time. I suggest you roll-back your question to before you added the EDIT and then post a separate question. (The meaning of your EDIT is not clear to me, btw.) When doing so see if you can reduce the length of your string as much as you can. It's also helpful to a assign a variable to each input to examples, e.g., str = 'com\\/shop\",\"url\":\"?search...'). That way the variable can be referenced in answers and comments without having to define it. Lastly, pleased show the desired result (a Ruby object) for each example.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.