3

Here is how I reproduce the problem:

Create a log file called 'temp.log' and paste this line into it

DEBUG: packetReceived '\x61\x62\x63'

I want to have a script which will read the line from the log file and decode the binary string part ('\x61\x62\x63'). For the decoding, I am using struct, so:

struct.unpack('BBB', '\x61\x62\x63')

Should give me

(97, 98, 99)

Here is the script which I am using

import re
import struct
import sys

f = open(sys.argv[1], 'r')
for line in f:
    print line
    packet = re.compile(r"packetReceived \'(.*)\'").search(line).group(1)

    # packet is the string r'\x61\x62\x63'
    assert(len(packet), 12)

    # this works ok (returns (97, 98, 99))
    struct.unpack('BBB', '\x61\x62\x63')

    # this fails because packet is interpreted as r'\\x61\\x62\x63'
    struct.unpack('BBB', packet)

I run the script using temp.log as the argument to the script.

Hopefully the comments highlight my problem. How can I get the variable packet to be interpreted as '\x61\x62\x63' ??

ASIDE: On the first edit of this question, I assumed that reading the line from the file was the same as this: line = "DEBUG: packetReceived '\x61\x62\x63'" which made packet == 'abc'

however it is actually the same as this (using rawstring) line = r"DEBUG: packetReceived '\x61\x62\x63'"

4
  • 1
    Are you sure you received twelve characters and not just three that were represented as twelve? Commented Jun 8, 2011 at 11:08
  • @johnsyweb There were 12 characters. The DEBUG statement which I attached is actually a copy/paste from the text file. Commented Jun 8, 2011 at 12:03
  • Have you sniffed what is being sent on-the-wire? Commented Jun 8, 2011 at 12:05
  • @johnsyweb: I havn't tried to snif the packets because what I am really trying to do here is take a string representation of the packet from the log file and decode it. When I just copy/paste the string from the log file, decoding works fine. My real problem is reading the string representation into a variable and using this in the decoder. I updated the question after getting some useful tips from feedback. Commented Jun 8, 2011 at 12:39

4 Answers 4

5

Python doesn't interpret strings that you pass to regular expressions. The escape sequences were most likely interpreted earlier, when you defined variable line. This works correctly for example:

line = r"DEBUG: packetReceived '\x61\x62\x63'"
print re.compile(r"packetReceived '(.*)'").search(line).group(1)

It prints \x61\x62\x63.

Sign up to request clarification or add additional context in comments.

3 Comments

OK, I see better now (I think). So the problem is actually when I do the "for line in f" part of the code. I need some way to make line not interpret the escape sequences.
@ephesian: File reading normally shouldn't interpret escape sequences either. You won't come around debugging your code (and be it with print statments) to find out where exactly that happens because I cannot guess it.
Thanks. You are quite correct. I realised that I made a mistake trying to debug this by manually setting line (and not using rawstring). I have updated the question accordingly.
2
>>> re.compile(r"packetReceived '(.*)'").search(r"DEBUG: packetReceived '\x61\x62\x63'").group(1)
'\\x61\\x62\\x63'

Nope, that line is not where your problem lies.

Comments

1

As described in your question, packet is equal to '\x61\x62\x63'. Its len is 12 bytes, neither 15 nor 3 bytes.

What confuses you, is that ipython (which I understand you are using) and the python interpreter display values using the repr() call, which tries to format values as they would be in your code. Since backslashes are special in Python string constants, repr() displays them duplicated, as they would be in Python code.

This might be of help:

for char in packet:
    print("%5d %2s %2r" % (ord(char), char, char))

Count your characters and see how they are printed. First column displays the ordinal value of the character, second column has the character itself, third column has the repr of the character.

EDIT

Change the last line:

struct.unpack('BBB', packet)

to:

struct.unpack('BBB', packet.decode('string_escape'))

3 Comments

thanks for the clarification on repr in ipython. I updated the question with a script. I'm hoping that someone can see what I'm doing wrong from this.
@ephesian: can you try again with my suggestion?
dude, thank you so much! This is a great solution! up until now, I didn't know anything about the codecs.
1

If you're sure you are receiving twelve characters and not just three represented as twelve, it may be just the printing of the string that is causing you grief.

Compare:

>> print '\x61\x62\x63'
abc
>>> print r'\x61\x62\x63'
\x61\x62\x63

My 50c is on you actually receiving three characters and them being printed like this:

>>> print ''.join('\\x%02x' % ord(c) for c in 'abc')
\x61\x62\x63

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.