0

How can I verify the format of a string something like this: "123:1,1234:10,12:5,1:0"?

The first split is based on "," and then the next split is ":". For each split, I need to verify the first variable (before :)is an integer and the second variable (after :) is between 0-10.

I tried something like this:

import re
string = "123:1,1234:10,12:5,1:0"
for value in string.split(","):
    if re.search("\d+:+\d[0-9]", value):
        print("this is correct formate")

The issue here is the length of integer before ":" is not fixed and I don't think I can use "\d" to verify this. Any help will be appreciated. Thank you!

4
  • 1
    You write the first split should be , and yet you split on : in your for loop. Commented Aug 12, 2022 at 6:18
  • @user56700 Thanks for pointing it out. Corrected! Commented Aug 12, 2022 at 6:22
  • Could the integer before : be negative? Commented Aug 12, 2022 at 6:47
  • @Timus No, it's always a positive value. Commented Aug 12, 2022 at 11:51

2 Answers 2

3

I hope I understand your requirement correctly. You could try it with a regular expression as follows:

import re
matcher = re.compile(r'^(\d+:([0-9]{1}|10))(,\d+:([0-9]{1}|10))*$')
string = '123:1,1234:10,12:5,1:0'
matches = matcher.match(string) is not None

With the RegEx I check that at least one block of : is contained. Then this pattern can repeat optionally, but has to be separated from the previous one with a comma.

If this is not really what you are looking for, please let me know and I try to adjust my answer.

Edit: For clarification, this is what the RegEx does:

  1. ^ -> This sign indicates the beginning of the string. If you are looking for the pattern anywhere in some longer string with content beforehand, you will have to remove it or otherwise it will not match
  2. (\d+:([0-9]{1}|10)) -> This is one capturing group (as it is surrounded by round braces). The content in the group defines, which kind of string I expect. In this example I first want at least (As indicated by the '+' sign after the \d) one digit (indicated by the \d). Then, after the number, a colon (:) follows. Then another capturing group tells what I expect after the colon. This is either a single (The multiplicity is given in the curly brackets) number from 0 - 9 (Indicated by '[0-9]') or (The 'or' condition is given by the pipe symbol: |) the number 10. As there is no multiplicity behind this group, I expect it exactly once.
  3. (,\d+:([0-9]{1}|10))* -> Here I'm doing the same as in the previous point, except that I put the comma as a seprator before. If I had placed that optionally in the previous group and just increased the multiplicity, the matcher would still accept it if there was a comma in the end without a next occurance of matched sequence which might not be desirable. By placing the asterisk (*) after the group, I tell that it is optional, but can occur multiple times.
  4. $ -> This is similar to the ^ sign in the beginning and indicates the ending of the string. If you want to look for the pattern inside a longer string, where content might appear after your pattern, you have to remove it.
Sign up to request clarification or add additional context in comments.

7 Comments

This will return true if just one match is found. If I enter a bogus match at the end, for instance: 122:11, then it's still true.
Have you tried that out? If I run this on my computer with matcher.match('122:11,') is not None I get False, what, as I understand it, is correct.
Well in reference to your code, you match on the whole string. If I use string = '123:1,1234:10,12:5,1:0,122:11' then it's still true. You have not written anything about checking single values.
I upvoted this, as it's the only correct pattern so far.
@user3132983 sure thing. I added an explanation to the post. Feel free to ask if something is still unclear. If this answer helped you, it would still be nice to accept it as the correct one.
|
1

You can match the entire string with a repeated pattern without the for loop as well (though if you want to know which entry is "bad" maybe you still need that).

(\d+:\d+,{0,1}){1,}

You can see the match here on regex101: https://regex101.com/r/Qh299F/1

11 Comments

But you are not ensuring the right pattern there after the comma anymore. A string like '12d3:1,asdfasdfasdf' is considered valid beginning from the 3 which I don't think is desired here.
If he wants to loop through every possible match as he does in his code, this will not match anything.
Hmm, good point on the pattern. I did comment on the loop though - I suspect they only have the loop because they couldn't match the whole thing...
Even if I use findall and use your pattern only the first match is found. It's my understanding that he wants to verify each value.
Updated the pattern as per @Christian 's comment. And the idea was findall should only match 1 item if the string is valid ...
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.