9

I'm learning Golang so I can rewrite some of my shell scripts.

I have URL's that look like this:

https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value

I want to extract the following part:

https://example-1.example.com/a/c482dfad3573acff324c/list.txt

In a shell script I would do something like this:

echo "$myString" | grep -o 'http://.*.txt'

What is the best way to do the same thing in Golang, only by using the standard library?

4 Answers 4

14

There are a few options:

// match regexp as in question
pat := regexp.MustCompile(`https?://.*\.txt`)
s := pat.FindString(myString)

// everything before the ?
s, _, _ := strings.Cut(myString, "?")

// parse and clear query string
u, err := url.Parse(myString)
u.RawQuery = ""
s := u.String()

The last option is the best because it will handle all possible corner cases.

Try it on the playground

Sign up to request clarification or add additional context in comments.

1 Comment

I'd recommend using the url.Parse since that should handle any weird edge cases which might be missed by a regex or a split. For example, URLs without a ?
4

you may use strings.IndexRune, strings.IndexByte, strings.Split, strings.SplitAfter, strings.FieldsFunc, url.Parse, regexp or your function.

first most simple way:
you may use i := strings.IndexRune(s, '?') or i := strings.IndexByte(s, '?') then s[:i] like this (with commented output):

package main

import "fmt"
import "strings"

func main() {
    s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
    i := strings.IndexByte(s, '?')
    if i != -1 {
        fmt.Println(s[:i]) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
    }
}

or you may use url.Parse(s) (I'd use this):

package main

import "fmt"
import "net/url"

func main() {
    s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
    url, err := url.Parse(s)
    if err == nil {
        url.RawQuery = ""
        fmt.Println(url.String()) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
    }
}

or you may use regexp.MustCompile(".*\\.txt"):

package main

import "fmt"
import "regexp"

var rgx = regexp.MustCompile(`.*\.txt`)

func main() {
    s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`

    fmt.Println(rgx.FindString(s)) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}

or you may use splits := strings.FieldsFunc(s, func(r rune) bool { return r == '?' }) then splits[0]:

package main

import "fmt"
import "strings"

func main() {
    s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
    splits := strings.FieldsFunc(s, func(r rune) bool { return r == '?' })
    fmt.Println(splits[0]) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}

you may use splits := strings.Split(s, "?") then splits[0]:

package main

import "fmt"
import "strings"

func main() {
    s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
    splits := strings.Split(s, "?")
    fmt.Println(splits[0]) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}

you may use splits := strings.SplitAfter(s, ".txt") then splits[0]:

package main

import "fmt"
import "strings"

func main() {
    s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
    splits := strings.SplitAfter(s, ".txt")
    fmt.Println(splits[0]) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}

or you may use your function (most independent way):

package main

import "fmt"

func left(s string) string {
    for i, r := range s {
        if r == '?' {
            return s[:i]
        }
    }
    return ""
}

func main() {
    s := `https://example-1.example.com/a/c482dfad3573acff324c/list.txt?parm1=value,parm2=value,parm3=https://example.com/a?parm1=value,parm2=value`
    fmt.Println(left(s)) // https://example-1.example.com/a/c482dfad3573acff324c/list.txt
}

Comments

2

If you are prosessing only URLs, you can use Go's net/url library https://golang.org/pkg/net/url/ to parse the URL, truncate the Query and Fragment parts (Query would be parm1=value,parm2=value etc.), and extract the remaining portion scheme://host/path, as in the following example (https://play.golang.org/p/Ao0jU22NyA):

package main

import (
    "fmt"
    "net/url"
)

func main() {
    u, _ := url.Parse("https://example-1.example.com/a/b/c/list.txt?parm1=value,parm2=https%3A%2F%2Fexample.com%2Fa%3Fparm1%3Dvalue%2Cparm2%3Dvalue#somefragment")
    u.RawQuery, u.Fragment = "", ""
    fmt.Printf("%s\n", u)
}

Output:

https://example-1.example.com/a/b/c/list.txt

Comments

0

I used regexp package extract string from string .

In this example I wanted to extract between and <\PERSON> , did this by re expression and and replaced and <\PERSON> by re1 expression.

for loop used for if there there are multiple match and re1 format used for replace.

package main

import (
    "fmt"
    "regexp"
)

func main() {
    re := regexp.MustCompile(`<PERSON>(.*?)</PERSON>`)

    string_l := "java -mx500m -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz -textFile PatrickYe.txt -outputFormat inlineXML 2> /dev/null I complained to <ORGANIZATION>Microsoft</ORGANIZATION> about <PERSON>Bill Gates</PERSON>.They     told me to see the mayor of <PERSON>New York</PERSON>.,"
    x := re.FindAllString(string_l, -1)        
    fmt.Println(x)
    for v,st:= range x{
            re1 := regexp.MustCompile(`<(.?)PERSON>`)
            y1 := re1.ReplaceAllLiteralString(st,"")
            fmt.Println(v,st," : sdf : ",y1) 

    }    
}

Play with Go

1 Comment

Hi and welcome! I am sorry but it is a bit difficult to understand your question. Could you rephrase it a bit so it is easier to understand what you want to achieve?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.