Generate random string from regex character set

Question

I assume there's some beautiful Pythonic way to do this, but I haven't quite figured it out yet. Basically I'm looking to create a testing module and would like a nice simple way for users to define a character set to pull from. I could potentially concatenate a list of the various charsets associated with string, but that strikes me as a very unclean solution. Is there any way to get the charset that the regex represents?

Example:

def foo(regex_set):
    re.something(re.compile(regex_set))

foo("[a-z]")
>>> abcdefghijklmnopqrstuvwxyz

The compile is of course optional, but in my mind that's what this function would look like.

Is the regex guaranteed to match one code-point or do you want the minimal alphabet that covers all symbols in the language specified by the regex? — Mike Samuel
– Mike Samuel, Commented Jul 8, 2013 at 19:33
im pretty sure you cant do that... at least not cleanly ... if its just one char you could bruteforce it but thats gross why not just use string.ascii_lowercase, etc — Joran Beasley
– Joran Beasley, Commented Jul 8, 2013 at 19:34
You'd need to create your own parser, and you'd probably only want to support a subset of regex syntax. I assume [a-z](?<![a-hj-z]) isn't something you'd want to support. (That's an obfuscated way of saying [i], in case you don't recognize the syntax.) — JDB
– JDB, Commented Jul 8, 2013 at 19:34
Then just create your own syntax: az would mean "a to z". aa would mean "just a". That's not hard to do in any language. — JDB
– JDB, Commented Jul 8, 2013 at 19:37
@SlaterTyranus Have a list of letters, each with a check box next to it. Simple, prevalent, well documented functionality. — AJMansfield
– AJMansfield, Commented Jul 8, 2013 at 19:52

unutbu · Accepted Answer · 2013-07-08 20:05:29Z

9

Paul McGuire, author of Pyparsing, has written an inverse regex parser, with which you could do this:

import invRegex
print(''.join(invRegex.invert('[a-z]')))
# abcdefghijklmnopqrstuvwxyz

If you do not want to install Pyparsing, there is also a regex inverter that uses only modules from the standard library with which you could write:

import inverse_regex
print(''.join(inverse_regex.ipermute('[a-z]')))
# abcdefghijklmnopqrstuvwxyz

Note: neither module can invert all regex patterns.

And there are differences between the two modules:

import invRegex
import inverse_regex
print(repr(''.join(invRegex.invert('.'))))
print(repr(''.join(inverse_regex.ipermute('.'))))

yields

'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

Here is another difference, this time pyparsing enumerates a larger set of matches:

x = list(invRegex.invert('[a-z][0-9]?.'))
y = list(inverse_regex.ipermute('[a-z][0-9]?.'))
print(len(x))
# 26884
print(len(y))
# 1100

edited Jul 8, 2013 at 20:05

answered Jul 8, 2013 at 19:45

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Slater Victoroff Over a year ago

Ooh, looks extremely promising. let me check this out for a little bit.

Joran Beasley Over a year ago

what does invert(".") give? just out of curiousity

unutbu Over a year ago

@JoranBeasley: I've added the result for both modules.

Joran Beasley Over a year ago

thanks .... that basically highlights some of the issues with the approach he wants to take...

PaulMcG Over a year ago

@JoranBeasley - try it for yourself and see: utilitymill.com/utility/Regex_inverter/13

AJMansfield · Accepted Answer · 2013-07-08 20:15:25Z

2

A regex is not needed here. If you want to have users select a character set, let them just pick characters. As I said in my comment, simply listing all the characters and putting checkboxes by them would be sufficent. If you want something that is more compact, or just looks cooler, you could do something like one of these:

One way of displaying the letter selection. (green = selected) Another way of displaying the letter selection. (no x = selected Yet another way of displaying the letter selection. (black bg = selected)

Of course, if you actually use this, what you come up with will undoubtedly look better than these (And they will also actually have all the letters in them, not just "A").

If you need, you could include a button to invert the selection, select all, clear selection, save selection, or anything else you need to do.

answered Jul 8, 2013 at 20:15

AJMansfield

4,2123 gold badges32 silver badges53 bronze badges

4 Comments

Slater Victoroff Over a year ago

Woah, I thought you were joking. Upvote for proof of concept, but I don't believe in GUIs.

AJMansfield Over a year ago

I was, actually, but then I realized that that is actually a good solution, too.

Slater Victoroff Over a year ago

Certainly great for some, hence the upvote, but you're speaking to someone with someone that uses dwm.

AJMansfield Over a year ago

I don't really believe in GUIs either, actually. Some people seem to like them, though.

Joran Beasley · Accepted Answer · 2013-07-08 19:39:46Z

1

if its just simple ranges you could manually parse it

def range_parse(rng):
    min,max = rng.split("-")
    return "".join(chr(i) for i in range(ord(min),ord(max)+1))

print range_parse("a-z")+range_parse('A-Z')

but its gross ...

answered Jul 8, 2013 at 19:39

Joran Beasley

114k13 gold badges167 silver badges187 bronze badges

1 Comment

Slater Victoroff Over a year ago

Wasn't thinking of this as being just simple ranges.

AJMansfield · Accepted Answer · 2013-07-11 11:34:08Z

0

Another solution I thought of to simplify the problem:

Stick your own [ and ] on the line as part of the prompt, and disallow those characters in the input. After you scan the input and verify it doesn't contain anything matching [\[\]], you can prepend [ and append ] to the string, and use it like a regex against a string of all the characters needed ("abcdefghijklmnopqrstuvwxyz", fort instance).

answered Jul 11, 2013 at 11:34

AJMansfield

4,2123 gold badges32 silver badges53 bronze badges

Collectives™ on Stack Overflow

Generate random string from regex character set

4 Answers 4

5 Comments

4 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

4 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related