3

I have a function returning a string "r'^A Plat'" which is written into a text file

get_Pat(file)
    #process text file and now returns "r'^A Plat'"

originally, I had it hard coded inside the code.

pat = r'^A Plat'
use(pat)

now

pat = get_Pat(file)
use(pat)

But its complaining because i suppose its string instead of regex object.

I have tried

re.escape(get_Pat(file))

and

re.compile(get_Pat(file))

but none of them works

How do i convert string literal into regex object?

Is r'^A Plat' a equivalent of simply re.compile("A Plat")?? dumb question, maybe

it would work if its use("^A Plat'")
Doesnt work if its use("r'^A Plat'") <--- what get_Pat(file) is spitting out

I suppose my task is simply tranforming string r'^A Plat' in to ^A Plat.
But i feel like its just a cheap hack.

2
  • @aelon See my edit please. I think it gives the solution Commented Aug 29, 2013 at 12:44
  • "Is r'^A Plat' a equivalent of simply re.compile("A Plat")?" No; and they have nothing to do with each other. The r prefix does not stand for "regex" or anything like that. There are multiple misconceptions here. Commented Aug 7, 2022 at 10:02

4 Answers 4

2

Do

from ast import literal_eval
pat = literal_eval(get_Pat(file))

.

EDIT

aelon,

As you wrote in a comment you can't import literal_eval(), the above solution of mine is useless for you. Besides, though expressing interesting information, the other answers didn't brought another solution.
So, I propose a new one, not using literal_eval().

import re

detect = re.compile("r(['\"])(.*?)\\1[ \t]*$")

with open('your_file.txt') as f:
    pat = f.readline()

if detect.match(pat):
    r = re.compile(detect.match(pat).group(2))
else:
    r = re.compile(pat)

.

Explanations:

.

Suppose there is the succession of characters r'^Six o\'clock\nJim' written as first line of *your_file*

The opening and reading of the first line of *your_file* creates an object pat
- its TYPE is <type 'str'> in Python 2 and <class 'str'> in Python 3
- its REPRESENTATION is "r'^Six o\'clock\nJim'"
- its VALUE is r'^Six o\'clock\nJim' , that is to say the succession of characters r , ' , ^ , S , i , x , , o , \ , ' , c , l , o , c , k , \ , n , J , i , m
There may be also the "character" \n at the end if there is a second line in the file. And there may be also blanks or tabs, who knows ?, between the end of r'^Six o\'clock\nJim' written in the file and the end of its line. That's why I close the regex pattern to define detect with [ \t]*$.
So, we may obtain possible additional blanks and tabs and newline after the characters of interest, and then if we do print tuple(pat) we'll obtain for example:

('r', "'", '^', 'S', 'i', 'x', ' ', 'o', '\\', "'", 'c', 'l', 'o', 'c', 'k', '\\', 'n', 'J', 'i', 'm', "'", ' ', ' ', ' ', '\t', '\n')

.

Now, let us consider the object obtained with the expression detect.match(pat).group(2).
Its value is ^Six o\'clock\nJim , composed of 18 characters, \ and ' and n being three distinct characters among them, there are not one escaped character \' and one escaped character \n in it.
This value is exactly the same as the one we would obtain for an object rawS of name rawSby writing the instruction rawS = r'^Six o\'clock\nJim'
Then, we can obtain the regex whose pattern is written in a file under the form r'....' by writing directly r = re.compile(detect.match(pat).group(2))
In my example, there are only the sequences \' and \n in the series of characters written in the file. But all that precedes is valid for any of the Escape Sequences of the language.

In other words, we don't have to wonder about a function that would do the same as the EXPRESSION r'^Six o\'clock\nJim' from the STRING "r'^Six o\'clock\nJim'" of value r'^Six o\'clock\nJim' ,
we have directly the result of r'^Six o\'clock\nJim' as the value of the string catched by detect.match(pat).group(2).

.

Nota Bene

In Python 2, the type <type 'str'> is the type of a limited repertoire of characters.
It is the type of the read content of a file, opened as well with mode 'r' as with mode 'rb'.

In Python 3, the type <class 'str'> covers the unicode characters.
But contrary to Python 3, the read content of a file opened with mode 'r' is of type <type 'str'>
while it is of type <class 'bytes'> if the file is opened with mode 'rb'.

Then, I think the above code works as well in Python 3 as in Python 2, so such the file is opened with mode 'r'.

If the file should be opened with 'rb' the regex pattern should be changed to b"r(['\"])(.*?)\\1[ \t]*\r?\n".

.

AFAIHU

Sign up to request clarification or add additional context in comments.

3 Comments

I think this is exactly what the OP wants. It would be a better answer if you explained why, though.
@SethMMorton Maybe you will be interested by the explanation in my edit
I wish I could +1 again.
2

r'^A Plat' is identical to '^A Plat' without the r. The r stands for raw, not regex. It lets you write strings with special characters like \ without having to escape them.

>>> r'^A Plat'
'^A Plat'
>>> r'/ is slash, \ is backslash'
'/ is slash, \\ is backslash'
>>> r'write \t for tab, \n for newline, \" for double quote'
'write \\t for tab, \\n for newline, \\" for double quote'

Raw strings are commonly used when writing regexes since regexes often contain backslashes that would otherwise need to be escaped. r does not create regex objects, though.

From the Python manual:

§ 2.4.1. String literals

String literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and use different rules for interpreting backslash escape sequences.

...

Unless an 'r' or 'R' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C.

2 Comments

if he had used re.compile(get_Pat(file)) and it was a raw string returned it would have worked fine, so I don't think that is the problem here.
ah. i see. so hmm i suppose re doesnt know how to process string which has raw keyword in it and simply treating as a part of the string
2

Not sure what you mean by 'none of them works', but re.compile() is what you're looking for:

>>> def getPat():
...     return r'^A Plat'
...
...
>>> getPat()
'^A Plat'
>>> reObj = re.compile(getPat())
>>> reObj
<_sre.SRE_Pattern object at 0x16cfa18>
>>> reObj.match("A Plat")
<_sre.SRE_Match object at 0x16c3058>
>>> reObj.match("foo")

edit:

You can get rid of the extra r' ' cruft after it's returned with this code:

>>> s = "r'^A Plat'"
>>> s = s[1:].strip("'")
>>> s
'^A Plat'

4 Comments

he wasn't returning r'^A Plat' he was returning "r'^A Plat'" as far as I can understand the comment in the code, as you have pointed out compile works fine with a raw string.
I guess the question is confusingly written.
I agree I think it is a bit of an x/y problem with the raw string being a red herring. Could be wrong as its still ambiguous.
the function is returning "r'A Plat'" not just '^A Plat'
1

According to the comment in your get_pat function its returning:

"r'^A Plat'"

Which is not what you thought you were getting:

>>> x = re.compile("r'^A Plat'")
>>> y = "A Plat wins"
>>> x.findall(y)
[]
>>> x = re.compile("^A Plat")
>>> x.findall(y)
['A Plat']
>>>

So the regex your using isn't r'^A Plat' its "r'^A Plat'", r'^A Plat' is fine:

>>> x = re.compile(r'^A Plat')
>>> x.findall(y)
['A Plat']

To fix this I would have to understand how you where getting the string "r'^A Plat'" in the first place.

5 Comments

yes. the problem is that my func is returning string obj of r'^A Plat' so I suppose i could fix it by tranforming that into "^A Plat"
If you have to use the string the function returns you should have a look at eyquems answer
do not have access to ast import.
It could help if you posted the code for get_pat() perhaps there is a simple why to fix your problem.
@Noelkd I've edited my answer with what I think is a valid solution

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.