1

I have a String which contains column names and datatypes as below:

val cdt = "header:integer|releaseNumber:numeric|amountCredit:numeric|lastUpdatedBy:numeric(15,10)|orderNumber:numeric(20,0)"

My requirement is to convert the postgres datatypes which are present as numeric, numeric(15,10) into spark-sql compatible datatypes. In this case,

numeric         -> decimal(38,30)
numeric(15,10)  -> decimal(15,10)
numeric(20,0)   -> bigint   (This is an integeral datatype as there its precision is zero.)

In order to access the datatype in the string: cdt, I split it and created a Seq from it.

val dt = cdt.split("\\|").toSeq

Now I have a Seq of elements in which each element is a String in the below format:

Seq("header:integer", "releaseNumber:numeric","amountCredit:numeric","lastUpdatedBy:numeric(15,10)","orderNumber:numeric(20,0)")

I have the pattern matching regex: """numeric\(\d+,(\d+)\)""".r, for numeric(precision, scale) which only works if there is a scale of two digits, ex: numeric(20,23). I am very new to REGEX and Scala & I don't understand how to create regex pattterns for the remaining two cases & apply it on a string to match a condition. I tried it in the below way but it gives me a compilation error: "Cannot resolve symbol findFirstMatchIn"

dt.map(e => e.split("\\:")).map(e => changeDataType(e(0), e(1)))
 def changeDataType(colName: String, cd:String): String = {
    val finalColumns = new String
    val pattern1 = """numeric\(\d+,(\d+)\)""".r
    cd match {
      case pattern1.findFirstMatchIn(dt) =>
    }
  }

I am trying to get the final output into a String as below:

header:integer|releaseNumber:decimal(38,30)|amountCredit:decimal(38,30)|lastUpdatedBy:decimal(15,10)|orderNumber:bigint

How to multiple regex patterns for different cases to check/apply pattern matching on datatype of each value in the seq and change it to my suitable datatype as mentioned above.

Could anyone let me know how can I achieve it ?

2
  • In your example, where does (38,30) come from? And what is the format that changeDataType() returns? Commented Feb 9, 2019 at 9:05
  • @jwvh, It is (38,30) is coming out of spark dataframe. When I read the table on postgres, spark is inferring the schema to its compatible datatypes. But the corresponding datatype on postgres is just numeric. If I try to save the dataframe into a Hive table, it is giving me the exception of "precision 39 exceeds max limit 38." But if you directly read the value in (38,30) it is passing the content properly. Regarding changeDataType(), I've updated the question. Kindly take a look at it now. Commented Feb 9, 2019 at 9:44

1 Answer 1

3

It can be done with a single regex pattern, but some testing of the match results is required.

val numericRE = raw"([^:]+):numeric(?:\((\d+),(\d+)\))?".r

cdt.split("\\|")
   .map{
     case numericRE(col,a,b) =>
       if (Option(b).isEmpty) s"$col:decimal(38,30)"
       else if (b == "0")     s"$col:bigint"
       else                   s"$col:decimal($a,$b)"
     case x => x  //pass-through
  }.mkString("|")

//res0: String = header:integer|releaseNumber:decimal(38,30)|amountCredit:decimal(38,30)|lastUpdatedBy:decimal(15,10)|orderNumber:bigint

Of course it can be done with three different regex patterns, but I think this is pretty clear.


explanation

  • raw - don't need so many escape characters - \
  • ([^:]+) - capture everything up to the 1st colon
  • :numeric - followed by the string ":numeric"
  • (?: - start a non-capture group
  • \((\d+),(\d+)\) - capture the 2 digit strings, separated by a comma, inside parentheses
  • )? - the non-capture group is optional
  • numericRE(col,a,b) - col is the 1st capture group, a and b are the digit captures, but they are inside the optional non-capture group so they might be null
Sign up to request clarification or add additional context in comments.

1 Comment

Awesome, this worked. Could you please explain how the expression: -->raw"([^:]+):numeric(?:((\d+),(\d+)))?" <-- maps the data ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.