Python string literal to regex object

Question

I have a function returning a string "r'^A Plat'" which is written into a text file

get_Pat(file)
    #process text file and now returns "r'^A Plat'"

originally, I had it hard coded inside the code.

pat = r'^A Plat'
use(pat)

now

pat = get_Pat(file)
use(pat)

But its complaining because i suppose its string instead of regex object.

I have tried

re.escape(get_Pat(file))

and

re.compile(get_Pat(file))

but none of them works

How do i convert string literal into regex object?

Is r'^A Plat' a equivalent of simply re.compile("A Plat")?? dumb question, maybe

it would work if its use("^A Plat'")
Doesnt work if its use("r'^A Plat'") <--- what get_Pat(file) is spitting out

I suppose my task is simply tranforming string r'^A Plat' in to ^A Plat.
But i feel like its just a cheap hack.

"Is r'^A Plat' a equivalent of simply re.compile("A Plat")?" No; and they have nothing to do with each other. The r prefix does not stand for "regex" or anything like that. There are multiple misconceptions here. — Karl Knechtel
– Karl Knechtel, Commented Aug 7, 2022 at 10:02

eyquem · Accepted Answer · 2013-08-29 12:43:29Z

Do

from ast import literal_eval
pat = literal_eval(get_Pat(file))

.

EDIT

aelon,

As you wrote in a comment you can't import literal_eval(), the above solution of mine is useless for you. Besides, though expressing interesting information, the other answers didn't brought another solution.
So, I propose a new one, not using literal_eval().

import re

detect = re.compile("r(['\"])(.*?)\\1[ \t]*$")

with open('your_file.txt') as f:
    pat = f.readline()

if detect.match(pat):
    r = re.compile(detect.match(pat).group(2))
else:
    r = re.compile(pat)

.

Explanations:

.

Suppose there is the succession of characters r'^Six o\'clock\nJim' written as first line of *your_file*

The opening and reading of the first line of *your_file* creates an object pat
- its TYPE is <type 'str'> in Python 2 and <class 'str'> in Python 3
- its REPRESENTATION is "r'^Six o\'clock\nJim'"
- its VALUE is r'^Six o\'clock\nJim' , that is to say the succession of characters r , ' , ^ , S , i , x , , o , \ , ' , c , l , o , c , k , \ , n , J , i , m
There may be also the "character" \n at the end if there is a second line in the file. And there may be also blanks or tabs, who knows ?, between the end of r'^Six o\'clock\nJim' written in the file and the end of its line. That's why I close the regex pattern to define detect with [ \t]*$.
So, we may obtain possible additional blanks and tabs and newline after the characters of interest, and then if we do print tuple(pat) we'll obtain for example:

('r', "'", '^', 'S', 'i', 'x', ' ', 'o', '\\', "'", 'c', 'l', 'o', 'c', 'k', '\\', 'n', 'J', 'i', 'm', "'", ' ', ' ', ' ', '\t', '\n')

.

Now, let us consider the object obtained with the expression detect.match(pat).group(2).
Its value is ^Six o\'clock\nJim , composed of 18 characters, \ and ' and n being three distinct characters among them, there are not one escaped character \' and one escaped character \n in it.
This value is exactly the same as the one we would obtain for an object rawS of name rawSby writing the instruction rawS = r'^Six o\'clock\nJim'
Then, we can obtain the regex whose pattern is written in a file under the form r'....' by writing directly r = re.compile(detect.match(pat).group(2))
In my example, there are only the sequences \' and \n in the series of characters written in the file. But all that precedes is valid for any of the Escape Sequences of the language.

In other words, we don't have to wonder about a function that would do the same as the EXPRESSION r'^Six o\'clock\nJim' from the STRING "r'^Six o\'clock\nJim'" of value r'^Six o\'clock\nJim' ,
we have directly the result of r'^Six o\'clock\nJim' as the value of the string catched by detect.match(pat).group(2).

.

Nota Bene

In Python 2, the type <type 'str'> is the type of a limited repertoire of characters.
It is the type of the read content of a file, opened as well with mode 'r' as with mode 'rb'.

In Python 3, the type <class 'str'> covers the unicode characters.
But contrary to Python 3, the read content of a file opened with mode 'r' is of type <type 'str'>
while it is of type <class 'bytes'> if the file is opened with mode 'rb'.

Then, I think the above code works as well in Python 3 as in Python 2, so such the file is opened with mode 'r'.

If the file should be opened with 'rb' the regex pattern should be changed to b"r(['\"])(.*?)\\1[ \t]*\r?\n".

.

AFAIHU

I think this is exactly what the OP wants. It would be a better answer if you explained why, though.
@SethMMorton Maybe you will be interested by the explanation in my edit

Community · Accepted Answer · 2020-06-20 09:12:55Z

2

r'^A Plat' is identical to '^A Plat' without the r. The r stands for raw, not regex. It lets you write strings with special characters like \ without having to escape them.

>>> r'^A Plat'
'^A Plat'
>>> r'/ is slash, \ is backslash'
'/ is slash, \\ is backslash'
>>> r'write \t for tab, \n for newline, \" for double quote'
'write \\t for tab, \\n for newline, \\" for double quote'

Raw strings are commonly used when writing regexes since regexes often contain backslashes that would otherwise need to be escaped. r does not create regex objects, though.

From the Python manual:

§ 2.4.1. String literals

String literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and use different rules for interpreting backslash escape sequences.

...

Unless an 'r' or 'R' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Aug 28, 2013 at 18:06

John Kugelman

364k70 gold badges555 silver badges600 bronze badges

2 Comments

Noelkd Over a year ago

if he had used re.compile(get_Pat(file)) and it was a raw string returned it would have worked fine, so I don't think that is the problem here.

ealeon Over a year ago

ah. i see. so hmm i suppose re doesnt know how to process string which has raw keyword in it and simply treating as a part of the string

bgporter · Accepted Answer · 2013-08-28 19:28:23Z

2

Not sure what you mean by 'none of them works', but re.compile() is what you're looking for:

>>> def getPat():
...     return r'^A Plat'
...
...
>>> getPat()
'^A Plat'
>>> reObj = re.compile(getPat())
>>> reObj
<_sre.SRE_Pattern object at 0x16cfa18>
>>> reObj.match("A Plat")
<_sre.SRE_Match object at 0x16c3058>
>>> reObj.match("foo")

edit:

You can get rid of the extra r' ' cruft after it's returned with this code:

>>> s = "r'^A Plat'"
>>> s = s[1:].strip("'")
>>> s
'^A Plat'

edited Aug 28, 2013 at 19:28

answered Aug 28, 2013 at 18:18

bgporter

36.9k8 gold badges65 silver badges67 bronze badges

4 Comments

Noelkd Over a year ago

he wasn't returning r'^A Plat' he was returning "r'^A Plat'" as far as I can understand the comment in the code, as you have pointed out compile works fine with a raw string.

bgporter Over a year ago

I guess the question is confusingly written.

Noelkd Over a year ago

I agree I think it is a bit of an x/y problem with the raw string being a red herring. Could be wrong as its still ambiguous.

ealeon Over a year ago

the function is returning "r'A Plat'" not just '^A Plat'

Noelkd · Accepted Answer · 2013-08-28 19:13:23Z

1

According to the comment in your get_pat function its returning:

"r'^A Plat'"

Which is not what you thought you were getting:

>>> x = re.compile("r'^A Plat'")
>>> y = "A Plat wins"
>>> x.findall(y)
[]
>>> x = re.compile("^A Plat")
>>> x.findall(y)
['A Plat']
>>>

So the regex your using isn't r'^A Plat' its "r'^A Plat'", r'^A Plat' is fine:

>>> x = re.compile(r'^A Plat')
>>> x.findall(y)
['A Plat']

To fix this I would have to understand how you where getting the string "r'^A Plat'" in the first place.

edited Aug 28, 2013 at 19:13

answered Aug 28, 2013 at 18:19

Noelkd

7,9272 gold badges35 silver badges45 bronze badges

5 Comments

ealeon Over a year ago

yes. the problem is that my func is returning string obj of r'^A Plat' so I suppose i could fix it by tranforming that into "^A Plat"

Noelkd Over a year ago

If you have to use the string the function returns you should have a look at eyquems answer

ealeon Over a year ago

do not have access to ast import.

Noelkd Over a year ago

It could help if you posted the code for get_pat() perhaps there is a simple why to fix your problem.

eyquem Over a year ago

@Noelkd I've edited my answer with what I think is a valid solution

Collectives™ on Stack Overflow

Python string literal to regex object

4 Answers 4

EDIT

Explanations:

3 Comments

§ 2.4.1. String literals

2 Comments

4 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

EDIT

Explanations:

3 Comments

§ 2.4.1. String literals

2 Comments

4 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related