1

I am getting this type of response from the url that I am hitting and I need to parse this to get the desired HTML.

this=ajax({"htmlInfo":"SOME-HTML", "otherInfo": "Blah Blah", "moreInfo": "Bleh Bleh"})

As mentioned above, I have three key-pair values from which I need to get "SOME-HTML", how can I get that and the main problem is that "SOME-HTML" has escape characters. Below is the kind of response that that will be present.

\u003Cdiv class=\u0022container columns-2\u0022\u003E\n\n \u003Csection class=\u0022col-main\u0022\u003E\n \r\n\u003Cdiv class=\u0027visor-article-list list list-view-recent\u0027 \u003E\r\n\u003Cdiv class=\u0027grid_item visor-article-teaser list_default\u0027 \u003E\n \u003Ca class=\u0027grid_img\u0027 href=\u0027/manUnited-is-the-best\u0027\u003E\n \u003Cimg src=\u0022http://www.xyz.com/sites//files/styles/w400h22

Can anyone please help me in this regard. I am not sure how to tackle this.

Thanks in advance.

1
  • Please edit and make the question clearer so it's easier for people to help you out. What was the key-value pair? Was it Javascript? How are you using it with Go? Provide Go code, and real information than just "SOME HTML", "Blah Blah" and "Bleh Bleh". Commented Apr 11, 2016 at 1:19

1 Answer 1

1

The easiest way is to extract the JSON and then unmarshal it into a struct. The \uXXXX parts are unicode characters

package main

import (
    "encoding/json"
    "fmt"
    "regexp"
)

// Data follows the structure of the JSON data in the response
type Data struct {
    HTMLInfo  string `json:"htmlInfo"`
    OtherInfo string `json:"otherInfo"`
    MoreInfo  string `json:"moreInfo"`
}

func main() {
    // input is an example of the raw response data. It's probably a []byte if
    // you got it from ioutil.ReadAll(resp.Body)
    input := []byte(`this=ajax({"htmlInfo":"\u003Cdiv class=\u0022container columns-2\u0022\u003E\n\n \u003Csection class=\u0022col-main\u0022\u003E\n \r\n\u003Cdiv class=\u0027visor-article-list list list-view-recent\u0027 \u003E\r\n\u003Cdiv class=\u0027grid_item visor-article-teaser list_default\u0027 \u003E\n \u003Ca class=\u0027grid_img\u0027 href=\u0027/manUnited-is-the-best\u0027\u003E\n \u003Cimg src=\u0022http://example.com/sites//files/styles/w400h22", "otherInfo": "Blah Blah", "moreInfo": "Bleh Bleh"})`)

    // First we want to extract the data json using regex with a capture group.
    dataRegex, err := regexp.Compile("ajax\\((.*)\\)")
    if err != nil {
        fmt.Println("regex failed to compile:", err)
        return
    }

    // FindSubmatch should return two matches:
    // 0: The full match
    // 1: The contents of the capture group (what we want)
    matches := dataRegex.FindSubmatch(input)
    if len(matches) != 2 {
        fmt.Println("incorrect number of match results:", len(matches))
        return
    }
    dataJSON := matches[1]

    // Since the data is in JSON format, we can unmarshal it into a struct.  If
    // you don't care at all about the fields other than "htmlInfo", you can
    // omit them from the struct.
    data := &Data{}
    if err := json.Unmarshal(dataJSON, data); err != nil {
        fmt.Println("failed to unmarshal data json:", err)
    }

    // You now have access to the "htmlInfo" property
    fmt.Println("HTML INFO:", data.HTMLInfo)
}

Which will produce:

HTML INFO: <div class="container columns-2">

 <section class="col-main">

<div class='visor-article-list list list-view-recent' >
<div class='grid_item visor-article-teaser list_default' >
 <a class='grid_img' href='/manUnited-is-the-best'>
 <img src="http://example.com/sites//files/styles/w400h22
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.