1

I am using python 3.4 in windows 7. I have excel sheet in which data is present in every cell. The data is of different kinds .Two examples :- "Qwert A_B_C_1 uiop" and "Qwert A_X_Y_Z uiop"

To sum up i have to extract keywords which are written in CAPS where just after the first word an underscore is present. The extracting should stop once a whitespace is encountered

I have tried something like this but

x =  "QWERT A_B_C_1 UIOP"
se = re.findall("[A-Z]+_[A-Z]+_[A-Z]+_[0-9A-Z]+",x)

But it is not working with different types of keywords.

5
  • what is the expected output Commented Feb 11, 2015 at 8:41
  • It works; should print A_B_C_1. What's the problem? Commented Feb 11, 2015 at 8:42
  • @Maroun Maroun - what if i dont know as to how many words and underscores are present after A_. How can i read the entire keyword till a whitespace is encountered.?? Commented Feb 11, 2015 at 8:47
  • @vks the expected output is the keyword written in caps starting with A_ Commented Feb 11, 2015 at 8:49
  • answer added........ Commented Feb 11, 2015 at 8:54

2 Answers 2

1
[A-Z]+(?:_[A-Z]+)*_[A-Z0-9]+

You can use this to capture variable _[A-Z] in between.See demo

import re
p = re.compile(r'[A-Z]+(?:_[A-Z]+)*_[A-Z0-9]+')
test_str = "QWERT A_B_C_1 UIOP\nQwert A_X_Y_Z uiop"

re.findall(p, test_str)
Sign up to request clarification or add additional context in comments.

Comments

0

I explored more options and came up with

lst = re.findall('\S+_\S+', test_str)

Works as expected.....

1 Comment

This will match !@#!@#_!@#####!@# that's it.Iw wont match A_B_C_D.it will give that as a list of broken elements

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.