1

I want to replace all links of a webpage to a reverse proxy domain.

The rules are

https://test.com/xxx --> https_test_com.proxy.com/xxx
http://sub.test.com/xxx --> http_sub_test_com.proxy.com/xxx

How to achieve it by regex in golang?

The type of response body is []byte, and character encoding of it is UTF-8.
I have tried in this way. But it cannot replace all the dot to underscore in the origin domain. The length of subdomain is variable, that means the number of dot can vary

respBytes := []byte(`_.Xc=function(a){var b=window.google&&window.google.logUrl?"":"https://www.google.com";b+="/gen_204?";b+=a.j(2040-b.length);
        <cite class="iUh30 Zu0yb tjvcx">https://cloud.google.com</cite></div><div class="eFM0qc"><a class="fl" href="https://webcache.googleusercontent.com/search?q=cache:80SWJ_cSDhwJ:https://cloud.google.com/+&amp;cd=1&amp;hl=en&amp;ct=clnk&amp;gl=au" ping="/url?sa=t&amp;source=web&amp;rct=j&amp;url=https://webcache.googleusercontent.com/search%3Fq%3Dcache:80SWJ_cSDhwJ:https://cloud.google.com/%2B%26cd%3D1%26hl%3Den%26ct%3Dclnk%26gl%3Dau&amp;ved=2ahUKEwia5ovYsv3xAhXS4jgGHad0BJYQIDAAegQIBRAG"><span>Cached</span></a></li><li class="action-menu-item OhScic zsYMMe" role="menuitem"><a class="fl" href="/search?q=related:https://cloud.google.com/+google+cloud&amp;sa=X&amp;ved=2ahUKEwia5ovYsv3xAhXS4jgGHad0BJYQHzAAegQIBRAH">
        `)
proxyURI := "proxy.com"
var re = regexp.MustCompile(`(http[s]*):\/\/([a-zA-Z0-9_\-.:]*)`)
content := re.ReplaceAll(respBytes, []byte("${1}_${2}."+proxyURI))


origin result expect
https://www.google.com https_www.google.com.test.com https_www_google_com.test.com
https://cloud.google.com https_cloud.google.com.test.com https_cloud_google_com.test.com
https://https://webcache.googleusercontent.com https_cloud.google.com.test.com https_webcache_googleusercontent_com.test.com
2

1 Answer 1

0

Here's how you can do this:

func replaceAndPrint() {
    src := `
<a href="https://test.com/xxx">link 1</a>
<a href="https://test.com/yyy">link 2</a>
`
    r := regexp.MustCompile("\"https://(test\\.com.*)\"")
    result := r.ReplaceAllString(src, "http://sub.$1")
    fmt.Println(result)
}

Output:

<a href=http://sub.test.com/xxx>link 1</a>
<a href=http://sub.test.com/yyy>link 2</a>

Explanation: regexp.MustCompile's argument defines a capturing group (inside a pair of parentheses). The value of that capturing group is referenced by $1 in the call to r.ReplaceAllString.

UPDATE:

Sorry, misread the example.

Here's an updated version:

func replaceAndPrint2() {
    src := `
<a href="http://test.com/xxx">link 1</a>
<a href="https://sub1.sub2.test.com/yyy">link 2</a>
`
    r := regexp.MustCompile("(\\.|://)([^./]*)")
    replacer := strings.NewReplacer("://", "_", ".", "_")
    res := r.ReplaceAllStringFunc(src, func(g string) string {
        if g == ".com" {
            return replacer.Replace(g) + ".proxy.com"
        }
        return replacer.Replace(g)
    })
    fmt.Println(res)
}

Output:

<a href="http_test_com.proxy.com/xxx">link 1</a>
<a href="https_sub1_sub2_test_com.proxy.com/yyy">link 2</a>
Sign up to request clarification or add additional context in comments.

2 Comments

Sorry, the rule is not to replace the domain to its subdomain. 😂
Thanks, the ReplaceAllFunc and ReplaceAllStringFunc work well by matching the target, and formatting with function and replacing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.