1

I have a superheroes string, all of them have names, but not all of them have attributes.

It has a format of ⛦name⛯attrName☾attrData☽, where the attrName☾attrData☽ is optional.

So, the superheroes string is:

⛦superman⛯shirt☾blue☽⛦joker⛯⛦spiderman⛯age☾15yo☽girlFriend☾Cindy☽

I want to use Regex to extract the string, and populates the result into a slice of map, as such:

[ {name: superman, shirt: blue},
  {name: joker},
  {name: spiderman, age: 15yo, girlFriend: Cindy} ]

I can't get it done in Go playground. I use the regex ⛦(\\w+)⛯(?:(\\w+)☾(\\w+)☽)*, but it only can capture single attribute, i.e. regex unable to capture the age attributes.

My code is:

func main() {
    re := regexp.MustCompile("⛦(\\w+)⛯(?:(\\w+)☾(\\w+)☽)*")
    fmt.Printf("%q\n", re.FindAllStringSubmatch("⛦superman⛯shirt☾blue☽⛦joker⛯⛦spiderman⛯age☾15yo☽girlFriend☾Cindy☽", -1))
}

The Go Playground code is at here: https://play.golang.org/p/Epv66LVwuRK

The run result is:

[
    ["⛦superman⛯shirt☾blue☽" "superman" "shirt" "blue"]
    ["⛦joker⛯" "joker" "" ""]
    ["⛦spiderman⛯age☾15yo☽girlFriend☾Cindy☽" "spiderman" "girlFriend" "Cindy"]
]

The age is missing, any idea?

2 Answers 2

2

You cannot capture arbitrary number of substrings with a single capturing group. You need to match the whole record first, and then match the subparts of it with another regex.

See an example:

package main

import (
    "fmt"
    "regexp"
)

func main() {

    str := "⛦superman⛯shirt☾blue☽⛦joker⛯⛦spiderman⛯age☾15yo☽girlFriend☾Cindy☽"

    re_main := regexp.MustCompile(`⛦(\w+)⛯((?:\w+☾\w+☽)*)`)
    re_aux := regexp.MustCompile(`(\w+)☾(\w+)☽`)
    for _, match := range re_main.FindAllStringSubmatch(str, -1) {
        fmt.Printf("%v\n", match[1])
        for _, match_aux := range re_aux.FindAllStringSubmatch(match[2], -1) {      
            fmt.Printf("%v: %v\n", match_aux[1], match_aux[2])
        }
        fmt.Println("--END OF MATCH--") 
    }  
}

See the Go demo

Output:

superman
shirt: blue
--END OF MATCH--
joker
--END OF MATCH--
spiderman
age: 15yo
girlFriend: Cindy
--END OF MATCH--

Here, ⛦(\w+)⛯((?:\w+☾\w+☽)*) is the main regex that matches and captures into Group 1 the main "key" and the string of the other key-values is captured into Group 2. Then, you need to iterate over the found matches, and collect all key-values from the Group 2 using (\w+)☾(\w+)☽.

Sign up to request clarification or add additional context in comments.

1 Comment

Brilliant. I can't believe I've never thought of doing regex in two phases like this. TIL.
1

You have set your regex like ⛦(\\w+)⛯(?:(\\w+)☾(\\w+)☽)* which prints only two level of key and value, like it prints as per your regex:

[["⛦superman⛯shirt☾blue☽" "superman" "shirt" "blue"]
["⛦joker⛯" "joker" "" ""]
["⛦spiderman⛯age☾15yo☽girl☾Cindy☽" "spiderman" "girl" "Cindy"]]

I increase the regex one more key and value pairs and it prints the age value as well, follow the below code for regex:

re := regexp.MustCompile("⛦(\\w+)⛯(?:(\\w+)☾(\\w+)☽)*(?:(\\w+)☾(\\w+)☽)*")
    fmt.Printf("%q\n", re.FindAllStringSubmatch("⛦superman⛯shirt☾blue☽⛦joker⛯⛦spiderman⛯age☾15yo☽girl☾Cindy☽", -1))

3 Comments

but how about if it has more (?:(\\w+)☾(\\w+)☽)... then i need to add more pairs... I hope to have a solution that match 0 to N pairs
sure, still I'm digging into it to be more valid solution for this.
I tried more but can't get success, you please go through this link, may be you could find something.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.