1

I am writing a program in python and have some question (I am 100% new to python):

import re

rawData = '7I+8I-7I-9I-8I-'

print len(rawData)

rawData = re.sub("[0-9]I\+","",rawData)
rawData = re.sub("[0-9]I\-","",rawData)

print rawData
  1. How to merge the 2 regex into one using |? It means it will get rid of both 9I- and 9I+ using just one regex operation.
  2. Does len(rawData) return the length of rawData is byte?

Thank you.

2
  • 4
    It's as simple as "[0-9]I[+-]" Commented Jul 1, 2011 at 15:59
  • In Python 2.x rawData would be just some bytes but in Python 3 it would be Unicode text. Commented Jul 1, 2011 at 16:14

3 Answers 3

5

See the difference:

$ python3
Python 3.1.3 (r313:86834, May 20 2011, 06:10:42) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> len('día')   # Unicode text
3
>>> 

$ python
Python 2.7.1 (r271:86832, May 20 2011, 17:19:04) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> len('día')   # bytes
4
>>> len(u'día')  # Unicode text
3
>>>


Python 3.1.3 (r313:86834, May 20 2011, 06:10:42) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> len(b'día')
  File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.
>>> len(b'dia')
3
>>> 
Sign up to request clarification or add additional context in comments.

3 Comments

Wow. I never knew that :( I am using python 2.6.5 on the server.
Then how do you get len in bytes in python 3? Is there an universal way of getting the length in bytes of a string in both python 2 and python 3?
In Python 3, if you want bytes you must use b'bytes'
0

Why don't you take a different approach. With replace method?

Comments

0

len refers to the number of characters when applied to a unicode string (this is nuanced, other answers flush that out more), bytes in a encoded string, items in a list (or set, or keys in a dictionary)...

rawData = re.sub("[0-9]I(\+|-)","",rawData)

1 Comment

@MRAB I've read it. as F.C. put it len('día') is 3 in Python 3 is 3, while it is 4 in Python 2.x. I use Python 3 primarily. "encoded" did nor refer to unicode (used the wrong word there), I was referring to strings such as: '\x03'. While that may appear to have a length of 4, that only has a len of 1 (as it is only one character when printed)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.