verify a string format in python

Question

How can I verify the format of a string something like this: "123:1,1234:10,12:5,1:0"?

The first split is based on "," and then the next split is ":". For each split, I need to verify the first variable (before :)is an integer and the second variable (after :) is between 0-10.

I tried something like this:

import re
string = "123:1,1234:10,12:5,1:0"
for value in string.split(","):
    if re.search("\d+:+\d[0-9]", value):
        print("this is correct formate")

The issue here is the length of integer before ":" is not fixed and I don't think I can use "\d" to verify this. Any help will be appreciated. Thank you!

You write the first split should be , and yet you split on : in your for loop. — Cow
– Cow, Commented Aug 12, 2022 at 6:18

Christian · Accepted Answer · 2022-08-15 10:39:48Z

3

I hope I understand your requirement correctly. You could try it with a regular expression as follows:

import re
matcher = re.compile(r'^(\d+:([0-9]{1}|10))(,\d+:([0-9]{1}|10))*$')
string = '123:1,1234:10,12:5,1:0'
matches = matcher.match(string) is not None

With the RegEx I check that at least one block of : is contained. Then this pattern can repeat optionally, but has to be separated from the previous one with a comma.

If this is not really what you are looking for, please let me know and I try to adjust my answer.

Edit: For clarification, this is what the RegEx does:

^ -> This sign indicates the beginning of the string. If you are looking for the pattern anywhere in some longer string with content beforehand, you will have to remove it or otherwise it will not match
(\d+:([0-9]{1}|10)) -> This is one capturing group (as it is surrounded by round braces). The content in the group defines, which kind of string I expect. In this example I first want at least (As indicated by the '+' sign after the \d) one digit (indicated by the \d). Then, after the number, a colon (:) follows. Then another capturing group tells what I expect after the colon. This is either a single (The multiplicity is given in the curly brackets) number from 0 - 9 (Indicated by '[0-9]') or (The 'or' condition is given by the pipe symbol: |) the number 10. As there is no multiplicity behind this group, I expect it exactly once.
(,\d+:([0-9]{1}|10))* -> Here I'm doing the same as in the previous point, except that I put the comma as a seprator before. If I had placed that optionally in the previous group and just increased the multiplicity, the matcher would still accept it if there was a comma in the end without a next occurance of matched sequence which might not be desirable. By placing the asterisk (*) after the group, I tell that it is optional, but can occur multiple times.
$ -> This is similar to the ^ sign in the beginning and indicates the ending of the string. If you want to look for the pattern inside a longer string, where content might appear after your pattern, you have to remove it.

edited Aug 15, 2022 at 10:39

answered Aug 12, 2022 at 6:33

Christian

1,5712 gold badges13 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Cow Over a year ago

This will return true if just one match is found. If I enter a bogus match at the end, for instance: 122:11, then it's still true.

Christian Over a year ago

Have you tried that out? If I run this on my computer with matcher.match('122:11,') is not None I get False, what, as I understand it, is correct.

Cow Over a year ago

Well in reference to your code, you match on the whole string. If I use string = '123:1,1234:10,12:5,1:0,122:11' then it's still true. You have not written anything about checking single values.

Cow Over a year ago

I upvoted this, as it's the only correct pattern so far.

Christian Over a year ago

@user3132983 sure thing. I added an explanation to the post. Feel free to ask if something is still unclear. If this answer helped you, it would still be nice to accept it as the correct one.

|

michjnich · Accepted Answer · 2022-08-12 06:42:17Z

1

You can match the entire string with a repeated pattern without the for loop as well (though if you want to know which entry is "bad" maybe you still need that).

(\d+:\d+,{0,1}){1,}

You can see the match here on regex101: https://regex101.com/r/Qh299F/1

edited Aug 12, 2022 at 6:42

answered Aug 12, 2022 at 6:33

michjnich

3,4253 gold badges20 silver badges39 bronze badges

11 Comments

Christian Over a year ago

But you are not ensuring the right pattern there after the comma anymore. A string like '12d3:1,asdfasdfasdf' is considered valid beginning from the 3 which I don't think is desired here.

Cow Over a year ago

If he wants to loop through every possible match as he does in his code, this will not match anything.

michjnich Over a year ago

Hmm, good point on the pattern. I did comment on the loop though - I suspect they only have the loop because they couldn't match the whole thing...

Cow Over a year ago

Even if I use findall and use your pattern only the first match is found. It's my understanding that he wants to verify each value.

michjnich Over a year ago

Updated the pattern as per @Christian 's comment. And the idea was findall should only match 1 item if the string is valid ...

|

Collectives™ on Stack Overflow

verify a string format in python

2 Answers 2

7 Comments

11 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

11 Comments

Your Answer

Sign up or log in

Post as a guest

Related