0

I have a text which can look like this:

36] Smarandache F. (Editor), Proceedings of the First International Conference on Neutrosophics, Univ. of New Mexico, Gallup Campus, NM, USA, 1-3 Dec. 2001, Xiquan, Phoenix, 2002

I want to extract:

Proceedings of the First International Conference on Neutrosophics

I have tried to use regex pattern as follows:

conference = re.search(",(.*)conference(.*),", str(r.lower()))

and I get only this as output: Proceedings of the First International

My text is going to be random but it will contain word like conference

My question is how can I develop pattern which can find word conference inside the text and extract the substring from first comma preceding the word conference to first comma after the word conference.

, xxxxxxxxxxxxxxxxxx conference xxxxxxxxxxxxxxxxxxx ,

Any help will be great

5
  • did the result contain comma in the middle? Commented Aug 7, 2020 at 10:20
  • 1
    regex101.com/r/x1Cicp/1 you are matching it. See on the right Group 1, Group 2, Full match. Use name capturing group if you want to extract something specifically or work with the unnamed groups. Commented Aug 7, 2020 at 10:23
  • I get result in groups without the word conference. It is like splitting it on the word conference and get me result till end of the line this not what I want Commented Aug 7, 2020 at 10:24
  • 1
    Change the regex I linked from ,(.*)conference(.*), to ,(.*conference.*), and you'll see a group that has exactly what you want. You might want to do ,(.*?conference.*?), to have non-greedy/lazy matches. Commented Aug 7, 2020 at 10:25
  • Ok let me try Tin Commented Aug 7, 2020 at 10:26

1 Answer 1

2

You could use a negated character class matching any char except a comma, and in between match Conference with a single capturing group.

You could match Conference starting with a capital C to get the result, or make the pattern case insensitive using re.IGNORECASE

If you use r.lower() you convert the string to lowercase, and the output will be this instead:

proceedings of the first international conference on neutrosophics


,\s*([^,]*\bConference\b[^,]*),

Regex demo

Example code:

import re
r = "36] Smarandache F. (Editor), Proceedings of the First International Conference on Neutrosophics, Univ. of New Mexico, Gallup Campus, NM, USA, 1-3 Dec. 2001, Xiquan, Phoenix, 2002"

conference = re.search(r",\s*([^,]*\bConference\b[^,]*),", r)
if conference:
    print(conference.group(1))

Output

Proceedings of the First International Conference on Neutrosophics
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the reply I see it running in the regex demo but not in my python interactive terminal its empty
@JaskaranSingh I have added example code, you have to get value from group 1.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.