2

What's the best way to extract inner substrings from strings in Golang?

input:

"Hello <p> this is paragraph </p> this is junk <p> this is paragraph 2 </p> this is junk 2"

output:

"this is paragraph \n
 this is paragraph 2"

Is there any string package/library for Go that already does something like this?

package main

import (
    "fmt"
    "strings"
)

func main() {
    longString := "Hello world <p> this is paragraph </p> this is junk <p> this is paragraph 2 </p> this is junk 2"

    newString := getInnerStrings("<p>", "</p>", longString)

    fmt.Println(newString)
   //output: this is paragraph \n
    //        this is paragraph 2

}
func getInnerStrings(start, end, str string) string {
    //Brain Freeze
        //Regex?
        //Bytes Loop?
}

thanks

2
  • 1
    Here. Read the part about submatches; it should help you. Commented Jan 8, 2014 at 15:53
  • Yeah, I seen that, but I wasn't or sure if that was the right way to go. Bookmarked for future reference though. Commented Jan 8, 2014 at 16:15

3 Answers 3

6

Don't use regular expressions to try and interpret HTML. Use a fully capable HTML tokenizer and parser.

I recommend you read this article on CodingHorror.

Sign up to request clarification or add additional context in comments.

1 Comment

1

Here is my function that I have been using it a lot.

func GetInnerSubstring(str string, prefix string, suffix string) string {
    var beginIndex, endIndex int
    beginIndex = strings.Index(str, prefix)
    if beginIndex == -1 {
        beginIndex = 0
        endIndex = 0
    } else if len(prefix) == 0 {
        beginIndex = 0
        endIndex = strings.Index(str, suffix)
        if endIndex == -1 || len(suffix) == 0 {
            endIndex = len(str)
        }
    } else {
        beginIndex += len(prefix)
        endIndex = strings.Index(str[beginIndex:], suffix)
        if endIndex == -1 {
            if strings.Index(str, suffix) < beginIndex {
                endIndex = beginIndex
            } else {
                endIndex = len(str)
            }
        } else {
            if len(suffix) == 0 {
                endIndex = len(str)
            } else {
                endIndex += beginIndex
            }
        }
    }

    return str[beginIndex:endIndex]
}

You can try it at the playground, https://play.golang.org/p/Xo0SJu0Vq4.

Comments

0

StrExtract Retrieves a string between two delimiters.

StrExtract(sExper, cAdelim, cCdelim, nOccur)

sExper: Specifies the expression to search. sAdelim: Specifies the character that delimits the beginning of sExper.

sCdelim: Specifies the character that delimits the end of sExper.

nOccur: Specifies at which occurrence of cAdelim in sExper to start the extraction.

Go Play

package main

import (
    "fmt"
    "strings"
)

func main() {
    s := "a11ba22ba333ba4444ba55555ba666666b"
    fmt.Println("StrExtract1: ", StrExtract(s, "a", "b", 5))
}

func StrExtract(sExper, sAdelim, sCdelim string, nOccur int) string {

    aExper := strings.Split(sExper, sAdelim)

    if len(aExper) <= nOccur {
        return ""
    }

    sMember := aExper[nOccur]
    aExper = strings.Split(sMember, sCdelim)

    if len(aExper) == 1 {
        return ""
    }

    return aExper[0]
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.