1

In my project I have to parse a set of dynamic strings which contains numbers, date, other info marion. I tried writing a parser with regular expression. It's working but not all time. Can someone suggest a better solution for this? Below is a sample string

"Thank you for using your HDFC Bank Debit/ATM Card ending 4444 for Rs. 125.25 towards ATM WDL in T NAGAR CAP at ATM on 2012-04-16:17:33:03."

here I want data like

bank name =hdfc
card no =4444
amount = 125.25
category = atm 
date = 2012-04-16:17:33:03
5
  • Isn't there any possibility for the data to be in Json or xml which is actually the right way to do it?, Because this simply is not the reliable solution in my opinion. Commented Aug 1, 2012 at 7:26
  • @AndroSelva thats is just a string. no way unfortunately to get it as xml or json. :( Commented Aug 1, 2012 at 7:29
  • If you don't have control over the way the data comes in, then there's no real other way that I can think of other than using a 'reliable' regular expression. What regular expression do you have and what is it breaking on? Commented Aug 1, 2012 at 7:32
  • i have used too may loops n condition statements that increase the process time . Way to reduce that is what i am searching for Commented Aug 1, 2012 at 7:34
  • if all the responses come in the same way you should use regex for this. Commented Aug 1, 2012 at 7:41

1 Answer 1

2

Solving this just with regular expressions, especially when the exact content of the String is dynamic, won't work very well. What you need is a tokenizer and a lexical analyzer with a grammar. I haven't done something like this in Java, but first of all you need to break down your string into tokens (keywords, values, expressions, phrases etc.) like

"Thank you for using your HDFC Bank Debit/ATM Card ending 4444 for Rs. 125.25 towards ATM WDL in T NAGAR CAP at ATM on 2012-04-16:17:33:03."

phrase[Thank you for using your] 
stringconst[HDFC Bank]
phrase[ending]
numericconst[4444]
keyword[for]
stringconst[Rs.]
numericconst[125.25]
....

You can do so by defining tokens, give them convinient names and defining rules for them i.e. with regular expressions. The focus is on what you have, not what it means Afterwards you need a gramer as regular expressions won't help you understand the 'what':

sentence  ::= intro bankinfo cardinfo valueinfo categoryinfo timeinfo
intro     ::= phrase
bankinfo  ::= bankname phrase | phrase bankname
bankname  ::= stringconst
....

Which basically gives you a tree of rules.

By tokenizing your input string and applying you grammar, you should be able to analyze the string and find the parts that are of interest.

Unfortunately this is only an theoretical introduction to this quiet complex but very interesting topic and I cannot provide any code examples, but I hope this helps to get started.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.