1

I am trying to extract strings in Python received by a function.

Consider the following;

I have a script that runs in Python. The script runs continuosly. It binds to a USB port and listens for incoming ZigBee data frames.

I have a function that dissassembles this dataframe;

    # Decoded data frame to use later on
def decodeReceivedFrame(data):
           source_addr_long = toHex(data['source_addr_long'])
           source_addr = toHex(data['source_addr'])
           id = data['id']
           rf_data = data['rf_data']
           #samples = data['samples']
           return [source_addr_long, source_addr, id, rf_data]

When I print this function later on; it gives me the correct incoming values. For example;

decodedData = decodeReceivedFrame(data)
print decodedData

Output:

[None, None, 'rx', '<=>\x80\x02#387239766#XBEE#126#STR:wm2 #STR:c47cb3f57365#']

What I want to do, is to extract the two STR variables of this string. This means the wm2 String, and the c47cb3f57365 string, in two seperate variables.

Which function in Python would be the most efficient to solve this situation?

7
  • is it always in the same format? Commented Jan 19, 2015 at 18:56
  • Unfortunately no. The two STR entries can be different at will. However, the displaying format will be the same. Commented Jan 19, 2015 at 19:03
  • are they always numbers and letters? Commented Jan 19, 2015 at 19:03
  • To clarify: yes. The first STR will always be wm[number] and the second STR shows a MAC-address of a embedded XBee chip Commented Jan 19, 2015 at 19:04
  • 1
    I added an answer based on the strings always being in the same format, it is as efficient as you are likely to get Commented Jan 19, 2015 at 19:12

2 Answers 2

3

presuming the data is always in the format as discussed in the comments, this would be one of the most efficient ways:

s =  '<=>\x80\x02#387239766#XBEE#126#STR:wm2 #STR:c47cb3f57365#'
# rsplit with 2 as the second arg will split twice on #STR starting from the send of s
spl = s.rsplit("#STR:",2)
# get last two elements from spl 
a,b = spl[1],spl[2][:-1] # ,[:-1] removes the final #
print a,b
wm2  c47cb3f57365

Some timings using ipython and timeit:

In [6]: timeit  re.findall(r'STR:(\w+)', s)
1000000 loops, best of 3: 1.67 µs per loop

In [7]: %%timeit
spl = s.rsplit("#STR:",2)
a,b = spl[1],spl[2][:-1]
   ...: 
1000000 loops, best of 3: 409 ns per loop

If you were to use a regex you should compile:

patt = re.compile(r'STR:(\w+)')
patt.findall(s)

Which improves the efficiency:

In [6]: timeit patt.findall(s)
1000000 loops, best of 3: 945 ns per loop
Sign up to request clarification or add additional context in comments.

14 Comments

Seems messy just dumping the content of your interactive session with hardly any comment. Which one was actually better? Looks like the s.rsplit("#STR2",2) took less than 1/4th as long, to me, but you don't really make that clear in your answer.
@ArtOfWarfare, the timings are literally there to see, how can it be any more transparent?
What's the gibberish on lines 2 through 5 of your answer? Why do you have IPython prompts mixed into your answer? Why is timeit mixed in with the code doing the actual work? You've made 6 edits to this post already - it seems to me you must know it can be better. How much was the speed up when you started compiling the regex?
Thanks for the very interesting approach! However, I get the following error applying your theory; spl = decodedData.rsplit("#STR:",2) AttributeError: 'list' object has no attribute 'rsplit'
@MichaelP, then in the context of your list, consider s to be element 3 of your list so decodeReceivedFrame(data)[3]
|
1
>>> import re    
>>> re.findall(r'STR:(\w+)', '<=>\x80\x02#387239766#XBEE#126#STR:wm2 #STR:c47cb3f57365#')
['wm2', 'c47cb3f57365']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.