2

I am trying to use regex to retrieve Title:Code pair.

(.*?\(CPT-.*?\)|.*?\(ICD-.*?\))

Data:

SENSORINEURAL HEARING LOSS BILATERAL (MILD) (ICD-389.18) RIGHT WRIST GANGLION CYST (ICD-727.41) S/P INJECTION OF DEPO MEDROL INTO LEFT SHOULDER JOINT (CPT-20600)

I would like to capture:

  • SENSORINEURAL HEARING LOSS BILATERAL (MILD) (ICD-389.18)
  • RIGHT WRIST GANGLION CYST (ICD-727.41)
  • S/P INJECTION OF DEPO MEDROL INTO LEFT SHOULDER JOINT (CPT-20600)

What is the proper regex to use?

7
  • 2
    Your capture example #1 and #2 both include RIGHT WRIST, is this intentional? Commented Nov 14, 2013 at 18:46
  • @jmstoker: No, i don't think, since the "hearing loss" is "bilateral" and not located on the "right wrist" Commented Nov 14, 2013 at 18:55
  • @jmstoker I agree with Casimir. See ICD-389.18 and ICD-727.41 Commented Nov 14, 2013 at 18:57
  • Fair enough, I'm not familiar with ICD codes, but looking at it further your comments make sense. Commented Nov 14, 2013 at 18:58
  • You has reassured us! Commented Nov 14, 2013 at 18:59

3 Answers 3

4

What about a pattern like this:

.*?\((CPT|ICD)-[A-Z0-9.]+\)

This will match zero or more of any character, non-greedily, followed by a ( followed by either CPT or ICD, followed by a hyphen, followed by one or more Uppercase Latin letters, decimal digits or periods, followed by a ).

Note that I picked [A-Z0-9.]+ because, to my understanding, all current ICD-9 codes , ICD-10 codes, and CPT codes conform to that pattern.

The C# code might look a bit like this:

var result = Regex.Matches(input, @".*?\((CPT|ICD)-[A-Z0-9.]+\)")
                  .Cast<Match>()
                  .Select(m => m.Value);

If you want to avoid having any surrounding whitespace, you simply trim the result strings (m => m.Value.Trim()), or ensure that the matched prefix starts with a non-whitespace character by putting a \S in front, like this:

var result = Regex.Matches(input, @"\S.*?\((CPT|ICD)-[A-Z0-9.]+\)")
                  .Cast<Match>()
                  .Select(m => m.Value);

Or using a negative lookahead if you need to handle inputs like (ICD-100)(ICD-200):

var result = Regex.Matches(input, @"(?!\s).*?\((CPT|ICD)-[A-Z0-9.]+\)")
                  .Cast<Match>()
                  .Select(m => m.Value);

You can see a working demonstration here.

Sign up to request clarification or add additional context in comments.

3 Comments

Your regex places a space in front of the 2nd and 3rd match. The space can be moved to the end by adding a [ ]* (space) to the end of your pattern. .*?\((CPT|ICD)-[A-Z0-9.]+\)[ ]*
@jmstoker That's a good point. But in case OP doesn't want to capture the surrounding whitespace at all, he'd need something a little different. I've provided a few alternatives.
+1 - I think its a toss-up between runaway errant text and flexibility. Erant text will not return partial result, flexibility might return too much ie: \(ICD-[^)]*\)
1

You can use the split() method:

string input = "SENSORINEURAL HEARING LOSS BILATERAL (MILD) (ICD-389.18) RIGHT WRIST GANGLION CYST (ICD-727.41) S/P INJECTION OF DEPO MEDROL INTO LEFT SHOULDER JOINT (CPT-20600)";
string pattern = @"(?<=\))\s*(?=[^\s(])";
string[] result = Regex.Split(input, pattern);

Comments

0

Consider the following Regex...

.*?\d\)

Good Luck!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.