0

I have an array

val a = "((x1,x2),(y1,y2),(z1,z2))"

I want to parse this into a scala array

val arr = Array(("x1","x2"),("y1","y2"),("z1","z2"))

Is there a way of directly doing this with an expr() equivalent ? If not how would one do this using split

Note : x1 x2 x3 etc are strings and can contain special characters so key would be to use () delimiters to parse data -

Code I munged from Dici and Bogdan Vakulenko

val x2 = a.getString(1).trim.split("[\()]").grouped(2).map(x=>x(0).trim).toArray

val x3 = x2.drop(1) // first grouping is always null dont know why

var jmap = new java.util.HashMap[String, String]()

for (i<-x3)
{
 val index = i.lastIndexOf(",")
 val fv = i.slice(0,index)
 val lv = i.substring(index+1).trim
 jmap.put(fv,lv)
}

This is still suceptible to "," in the second string -

9
  • Is the second snippet an array of strings, or is x1 (for example) a variable? Also, I think not using split is an unncessary constraint. Why don't you want to use it? Commented Dec 14, 2018 at 13:25
  • Spark has this useful feature for expr() which directly evaluates the expression - This is already in the format that we declare arrays in scala so splitting is needless if it can be avoided - Also I am assuming split will be long code - Commented Dec 14, 2018 at 13:30
  • I didn't downvote. If you're using this with Spark, please say it in the question. Also, I still don't get your second snippet. Is x1 a variable or is it the string "x1" ? Commented Dec 14, 2018 at 13:31
  • I didnt add spark because this is not a dataset / frame purely a string not a part of any dataframe - And yes its a string not variable Commented Dec 14, 2018 at 13:33
  • But expr in Spark is a very different thing, it's just syntactic sugar for generating a SQL-like query. Here we're talking about pure Scala, expr will be useless. You have to parse this string, and the most sensible way to do it is to use split. Commented Dec 14, 2018 at 13:35

2 Answers 2

2

Actually, I think regex are the most convenient way to solve this.

val a = "((x1,x2),(y1,y2),(z1,z2))"
val regex = "(\\((\\w+),(\\w+)\\))".r
println(
  regex.findAllMatchIn(a)
       .map(matcher => (matcher.group(2), matcher.group(3)))
       .toList
)

Note that I made some assumptions about the format:

  • no whitespaces in the string (the regex could easily be updated to fix this if needed)
  • always tuples of two elements, never more
  • empty string not valid as a tuple element
  • only alphanumeric characters allowed (this also would be easy to fix)
Sign up to request clarification or add additional context in comments.

7 Comments

Damn i need to master regex :/ its like a disability without it
It can be useful sometimes, just don't over-use it :D If you want Hackerrank has some regex problems to train you on. Also use tools such as this one while training: regex101.com
I found that x1 and x2 may have special characters in them . How would we edit this code to hande that
What kind of special characters? What would be the input and desired output?
- hyphens - , commas & ampersands - We cant get rid of them - I munged your code to get this - i dont know if it helps - i added some code in the question itself
|
1
val a = "((x1,x2),(y1,y2),(z1,z2))"

a.replaceAll("[\\(\\) ]","")
 .split(",")
 .sliding(2)
 .map(x=>(x(0),x(1)))
 .toArray

5 Comments

Thanks this is elegant as well
How does one handle spaces and special characters for this is x1 ,x2,y1,y2 are strings with spaces and - ?
just add characters that you want to ignore to replaceAll. I've edited the answer and added space in there.
"[\\(\\) ]" - this is the place where to add characters to be replaced with empty string.
Note: you don't need to escape parentheses inside of the square brackets (whatever is inside of them is interpreted as a simple character)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.