2

I have a below set of string :

*H. NGUYEN1, J. SATZ2,3,4,5, R. TURK2,3,4,5, K. CAMPBELL2,3,4,5, S. MOORE1
1Pathology, 2Mol. Physiol. and Biophysics, 3Neurol., 4Intrnl. Med., Univ. of Iowa, Iowa City, IA; 5Howard Hughes Med. Inst., Iowa City, IA

The expected output is :

1)  *H. NGUYEN1, J. SATZ2,3,4,5, R. TURK2,3,4,5, K. CAMPBELL2,3,4,5, S. MOORE1
2)  1Pathology, 2Mol. Physiol. and Biophysics, 3Neurol., 4Intrnl. Med., Univ. of Iowa, Iowa City, IA; 5Howard Hughes Med. Inst., Iowa City, IA

The above string is the author names and address combinations.
Sometimes the string contains (;) after the names end i.e. S. MOORE1; and sometimes not i.e. S. MOORE1

I tried the below Regex but its not giving expected results . Please help me as I am a learner of Regex.

;?[\d*]\w+

Pattern is :

Word followed by digit followed by semicolon or space followed by digit followed by words . For Ex: S. MOORE1(; Or Space)1Pathology.Need to split lines as S .MOORE1 and 1Pathology

Thanks

4
  • What are the rules? It looks like you only want to number lines.. Commented Oct 5, 2012 at 17:36
  • You need to describe the pattern a little better if you want a useful answer. Commented Oct 5, 2012 at 17:37
  • The pattern is Name followd by digit followd by semicolon or space followed by digit followed by words . For Ex: S. MOORE1(; Or Space)1Pathology.Need to split lines as S .MOORE1 and 1Pathology. Commented Oct 5, 2012 at 17:40
  • -1, sorry: I find this question very unclear. I don't understand what your input "set of string" is, I don't understand what your expected output is, and I don't understand what connection your described pattern has with the input and expected output. Judging from the other comments, I'm not alone. Commented Oct 5, 2012 at 17:53

3 Answers 3

1

Try this one:

(?<=\w\d)[; ](?=\d\w)

It will match ; or space preceded by a letter then a digit, then followed by a digit and a letter.

Edit: taking into account , and ;space and possible new line characters

(?<=[\w,]\d)[; ]+[\r\n\f]*(?=\d\w)

Also you can use Expresso for testing regular expressions

Sign up to request clarification or add additional context in comments.

9 Comments

Its not splitting in Expresso.
If we have new line char after space or (;) i.e. S. MOORE1 \r\n1Pathology,
Yeah its working , but again not working for this case : M. SASSOÈ-POGNETTO1; \r\n1Dept Anatomy
And with this also : MCCARTHY4,2,5 \r\n1Pharmacol.,
You should specify these cases in your question. You're adding , as a valid char and saying ; AND space: (?<=[\w,]\d)[; ]+\r?\n?(?=\d\w)
|
0

Try this one:

(.*)S. MOORE1;{0,1}(.*)

Catches 2 Groups before and after "S. MOORE1"

1 Comment

Thanks for the answer but S. MOORE1 is not there in every name-address combination.
0

I have read your description many times, but I don't find it clear.

My best guess what you need is breaking the line before a word starting with '1' and continuing with capital letter as second character, which is as simple as:

1[A-Z]

1 Comment

More specifically split the string on pattern "digits followed by space or semicolon followed by digit" . Could you help me now?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.