1

I have a few strings for which I want to tokenize

for example:

123ae4rf468 to be split into [123,ae4rf,468] 
878768stb4hgbjh354 to be split into [878768,stb4hgbjh,354]

I tried below but did not work. Kindly, help

 def groupStrings(): Unit ={
    val pattern: Regex = "\"[^A-Z0-9]+|(?<=[A-Z])(?=[0-9])|(?<=[0-9])(?=[A-Z])\"".r
    for(patternMatch <- pattern.findAllMatchIn("12341abc1234"))
      println(patternMatch.groupCount)
  }
0

2 Answers 2

2

You can use this

(^\d+)(.*?)(?<=[a-z])(\d+)$
  • (^\d+) - Matches digits at start of string
  • (.+?) - Match anything except new line one or more time
  • (?<=[a-z])(\d+)$ - Positive lookbehind matches digits preceded by character at end of string

Demo

On side note:- If you don't need groups you can change to this

^\d+.*?(?<=[a-z])\d+$
Sign up to request clarification or add additional context in comments.

Comments

0

try this (\d+|\D+)
or (\D+(?:\d*\D)*|\d+)
or (\D+(?:\d*\D+)?|\d+)

1 Comment

Welcome to Stack Overflow! Here is a guide on How to Answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.