1

I am using Golang regex package, I want to use regex ReplaceAllStringFunc with argument, not only with the source string.

For example, I want to update this text

"<img src=\"/m/1.jpg\" />  <img src=\"/m/2.jpg\" />  <img src=\"/m/3.jpg\" />"

To (change "m" to "a" or anything else):

"<img src=\"/a/1.jpg\" />  <img src=\"/a/2.jpg\" />  <img src=\"/a/3.jpg\" />"

I would like to have something like:

func UpdateText(text string) string {
    re, _ := regexp.Compile(`<img.*?src=\"(.*?)\"`)
    text = re.ReplaceAllStringFunc(text, updateImgSrc) 
    return text
}

// update "/m/1.jpg" to "/a/1.jpg" 
func updateImgSrc(imgSrcText, prefix string) string {
    // replace "m" by prefix
    return "<img src=\"" + newImgSrc + "\""
}

I checked the doc, ReplaceAllStringFunc doesn't support argument, but what would be the best way to achieve my goal?

More generally, I would like to find all occurrences of one pattern then update each with a new string which is composed by source string + a new parameter, could anyone give any idea?

5
  • 3
    No, you don't want to process HTML with regexps. Commented Jun 20, 2016 at 10:12
  • @Volker, uhmm, the text is not an entire html, it is a news article's content, what would be the best solution in your opinion? I think strings.Replace can't easily match a pattern. Commented Jun 20, 2016 at 11:16
  • 2
    Use a proper HTML parser. golang.org/x/net/html is one option, and you might find github.com/PuerkitoBio/goquery useful. Do this search to get an overview of what's there. Commented Jun 20, 2016 at 11:29
  • Parsing it as html5 works in a lot of cases, maybe just add a doctype and a <html> manually. Or parse as xml. Commented Jun 20, 2016 at 11:29
  • Of course it supports an argument. Your question is very unclear. Commented Jun 20, 2016 at 16:00

2 Answers 2

2

I agree with the comments, you probably don't want to parse HTML with regular expressions (bad things will happen).

However, let's pretend it's not HTML, and you want to only replace submatches. You could do this

func UpdateText(input string) (string, error) {
    re, err := regexp.Compile(`img.*?src=\"(.*?)\.(.*?)\"`)
    if err != nil {
        return "", err
    }
    indexes := re.FindAllStringSubmatchIndex(input, -1)

    output := input
    for _, match := range indexes {
        imgStart := match[2]
        imgEnd := match[3]
        newImgName := strings.Replace(input[imgStart:imgEnd], "m", "a", -1)
        output = output[:imgStart] + newImgName + input[imgEnd:]
    }
    return output, nil
}

see on playground

(note that I've slightly changed your regular expression to match the file extension separately)

Sign up to request clarification or add additional context in comments.

1 Comment

1

thanks for kostix's advice, here is my solution using html parser.

func UpdateAllResourcePath(text, prefix string) (string, error) {
    doc, err := goquery.NewDocumentFromReader(strings.NewReader(text))
    if err != nil {
        return "", err
    }

    sel := doc.Find("img")
    length := len(sel.Nodes)
    for index := 0; index < length; index++ {
        imgSrc, ok := sel.Eq(index).Attr("src")
        if !ok {
            continue
        }

        newImgSrc, err := UpdateResourcePath(imgSrc, prefix)    // change the imgsrc here
        if err != nil {
            return "", err
        }

        sel.Eq(index).SetAttr("src", newImgSrc)
    }

    newtext, err := doc.Find("body").Html()
    if err != nil {
        return "", err
    }

    return newtext, nil
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.