2

I need to one give me the string between ~ and ^

i have string like this

:::ABC???,:::DEF???

I need to get the string between them with python

I want to do all this because i am trying to extract text from an html page. like this example

<td class="cell-1">
    <div><span class="value-frame">&nbsp;~ABC^,~DEF^</span></div>
</td>

2 Answers 2

1

It seems like you want ABC and DEF , so you need write re like this (.*?)

import re
target = ' <td class="cell-1"><div><span class="value-frame">&nbsp;~ABC^,~DEF^</span></div></td>'
matchObj = re.findall(r'~(.*?)\^', target)
print matchObj 
# ['ABC', 'DEF']

you can learn more about re module

Sign up to request clarification or add additional context in comments.

1 Comment

what does it mean (.*?) ?
1

You can use the isalpha() function in a generator expression. Then combine the characters as a single string using join().

def extract_string(s):
    return ''.join(i for i in s if i.isalpha())

Sample output:

print extract_string(':::ABC???,:::DEF???')
>>> ABCDEF

However that is only for extracting all characters, if you want to extract only characters between ~...^:

import re
def extract_string(s):
    match = re.findall(r"~([a-zA-z]*)\^", s)
    return match

Sample output:

s = '&nbsp;~ABC^,~DEF^'
print extract_string(s)
>>> ['ABC', 'DEF']

Just a side note: if you're parsing HTML using regex and/or string manipulation, as the famous S.O. reply suggests, please use a HTML parser; such as the Beautiful Soup library instead :D!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.