0

I am trying to get the total price from a receipt with Regex.

The formatting is:

TOTAL     15.40

The goal is only to get the price out of the string.

I started with TOTAL[ .0-9], but this only returned the word TOTAL.

I googled around and putted this one together but can't get it to work:

TOTAL(\\s+)(?<value>[.0-9]+)

I have made the following code:

sRegex = "TOTAL(\\s+)(?<value>[.0-9]+)";    
Match match = Regex.Match(this.sHTMLResult, sRegex, RegexOptions.None);
if (match.Success)
    Console.Out.WriteLine("regex good");
else
    Console.Out.WriteLine("regex fail");

But the regex doesn't return a success.

I try to get it out of a HTML file formatted like this:

TOTAL&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;15.40
6
  • 1
    Works for me. I get a match with the capture group containing 15.40. Check your inputs. Commented Mar 12, 2013 at 10:15
  • Good point, I tested with only the text and it works, but I try to get it out of a HTML file formatted like this: TOTAL&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;15.40 Probably the regex doesn't see &nbsp; as space Commented Mar 12, 2013 at 10:21
  • txt2re.com should be your new friend. The outputted regex isn't perfect, but it gives you a good starting point. Commented Mar 12, 2013 at 10:21
  • 1
    There you go. Don't use regex to parse HTML, or at least convert HTML to regular text before hand. Commented Mar 12, 2013 at 10:22
  • @GerardvandenBosch - &nbsp; is not a space, obviously. Commented Mar 12, 2013 at 10:24

5 Answers 5

1

Your initial regular expression works fine with the supplied text:

TOTAL(\\s+)(?<value>[.0-9]+)

However, as you indicated in comments, this is from HTML and contains the character entities for no break spaces, so you need to account for those as well:

TOTAL(\\s+|(&nbsp;)+)(?<value>[.0-9]+)
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the example, I took your suggestion from the comments already and converted it to plain text first before doing the regex and that works great.
1

You might use:

"TOTAL *(\d*.\d*)"

1 Comment

In .NET \d will match on all numerals, not only [0-9] (so, Arabic numerals will match, for example).
1

(?(\b.*\b\s)([0-9.]*[0-9])) should work.

I would recommend you to use the Regex hero online editor which is at least really helpful for me.

1 Comment

Why is this better than the regex from the OP?
0

Your regex works (check your input as suggested), but it has a small bug: it would capture any combination of digits and dots (like 333.3.2.22....) A better one would be:

TOTAL\s+(?<value>\d+\.\d+)

Comments

-2

If you have only a single whitespace between TOTAL and the amount you can use a whitespace in the regex. Additionally, try this:

sRegex = "TOTAL ([0-9]+\.[0-9]+)";

See here for the MSDN reference.

1 Comment

You will note that the regex posted by the OP is fine. If that isn't matching, why would your match any better?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.