I wrote the following regex :
val reg = ".+([A-Z_].+).(\\d{4})_(\\d{2})_(\\d{2})_(\\d{2})\\.orc".r
which is supposed to parse the following strings : "S3//bucket//TS11_YREDED.2018_09_28_02.orc" the parse method is :
val dataExtraction: String => Map[String, String] = {
string: String => {
string match {
case reg(filename, year, month, day) =>
Map(FILE_NAME-> filename, YEAR -> year, MONTH -> month, DAY -> day)
case _ => Map(FILE_NAME-> filename,YEAR -> "", MONTH -> "", DAY -> "")
}
}
}
val YEAR = "YEAR"
val MONTH = "MONTH"
val DAY = "DAY"
val FILE_NAME = "FILE_NAME"
but it doesn't work properly it is supposed to ommit the bucket name and parse filename and date
so the expected output shall rather be : Map(FILE_NAME-> TS11_YREDED, YEAR -> , MONTH -> 09, DAY -> 28) Any idea how to fix it please ?