0

I have a following snapshot of a long String array:

Array[String] = Array("Tony Stark (USA) 16th October 2015", "Peter Comb (Canada) 21st September 2015")

I expect to have output as:

Array[String] = Array("Tony Stark", "Peter Comb")    
Array[String] = Array("USA", "Canada")
Array[String] = Array("16th October 2015", "21st September 2015")

I have tried this:

"[.]+\\(([.]+)\\)[.]+"

But it is unable to parse. What could be the regex pattern to parse my RDD?

0

2 Answers 2

2
val rdd: Array[String] = Array ("Tony Stark (USA) 16th October 2015", "Peter Comb (Canada) 21st September 2015")
(0 to 2).map (i => rdd.map (_.split ("[\\)\\(]")).map (a=> a(i)))
Vector(Array("Tony Stark ", "Peter Comb "), Array(USA, Canada), Array(" 16th October 2015", " 21st September 2015"))

A final trim cleans up the whitespace:

(0 to 2).map (i => rdd.map (_.split ("[\\)\\(]")).map (a=> a(i).trim))
Vector(Array(Tony Stark, Peter Comb), Array(USA, Canada), Array(16th October 2015, 21st September 2015))

Now to the regex:

"[.]+\\(([.]+)\\)[.]+"

A character group of one character makes rarely much sense - [a]+ is the same as a+. But for the dot it is different, it makes the dot a literal dot, since a dot as joker in a group doesn't make sense, it is just .+ .

While your sample text doesn't contain any literal dot, nor multiple in consecutive form, I guess it was just meant as .+

".+\\((.+)\\).+"

But regexes can be used in multiple ways. s.replace, s.matches, s.split and so on. Without information how you used it, it doesn't allow further reasoning.

Sign up to request clarification or add additional context in comments.

Comments

1

The issue with your regex is that inside the [], . is a literal . not a wildcard.

You're also missing groups around the name and the dates. The correct regex would be (.+)\\((.+)\\)(.+).

Calling the array a and the regex r, this gives:

scala> a.map {case r(name, country,year) => (name, country, year)}
res4: Array[(String, String, String)] = Array(("Tony Stark ",USA," 16th October 2015"), ("Peter Comb ",Canada," 21st September 2015"))

Presumably you'd want to match the spaces as well so they don't get pulled out in the groups.

1 Comment

@holyland thanks it worked, while discovering new cases, i found that <(country)> is an option. That is, also exist cases such as "Claudio Stallone 18th September 2009" I tried to extend the above case like this: User_Info.map { | case pattern(name, country, year) => (name, country, year) | case _ => (name, "", year)} But it gives me : missing argument list for method year in object functions Unapplied methods are only converted to functions when a function type is expected.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.