0

I am having some issues in finding the correct regex for this task, excuse me for my beginner skills. What I am trying to do is only get the id value from a line where its "available":true not "available":false. I am able to get the ID's of all lines through re.findall('"id":(\d{13})', line, re.DOTALL) (13 is to match exactly 13 digits, as there are other ids in the code with less than 13 digits which i dont need).

{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},

Therefore end results needs to be ['1651572973431','1351572943231']

Appreciate the great help thanks

12
  • 3
    Why would you use a regex for this? Is there a reason why you don't parse the JSON instead? Commented May 21, 2019 at 0:03
  • 2
    @ggorlen please put the original code back as the outcome doesnt look like that on the code Commented May 21, 2019 at 0:10
  • 2
    You're welcome to roll it back if I inadvertently conflicted with your intent, but if your original structure is a string, please use quotes. Commented May 21, 2019 at 0:13
  • 2
    So you're saying that, yes, this is a raw string? If you're asking for a string parsing task, please post the exact string, with quotes around it so there is no ambiguity. Commented May 21, 2019 at 0:16
  • 1
    @sakow0 I think there's some confusion because it's not clear if the code above represents a single string or a list of strings. Your regex looks like it is looking at variable called line. Is line one of these or all of these? Commented May 21, 2019 at 0:28

3 Answers 3

2

This might not be a good answer — it depends on exactly what you have. It looks like you have a list of strings and you want the id's from some of them. If that's the case, it's going to be much cleaner and easier to read if you parse the JSON rather than writing an byzantine regex. For example:

import json

# lines is a list of strings:

lines = ['{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
]

# parse it and you can use regular python to get what you want:
[line['id'] for line in map(json.loads, lines) if line['available']]

result

[1351572943231, 1651572973431]

If the code you posted is one long string, you can wrap it in [] and then parse it as an array with the same result:

import json

line = r'{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}, {"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}, {"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}'

lines = json.loads('[' + line + ']')
[line['id'] for line in lines if line['available']]
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your effort @Mark Meyer this would have been a perfect answer if the data would be whole json, thats my fault for not explaining too well, however i have definitely learnt something from this, thanks alot!
1

This works to match what you want

(?<="id":)\d{13}(?=(?:,"[^"]*":[^,]*?)*?,"available":true)

https://regex101.com/r/FseimH/1

Expanded

 (?<= "id": )
 \d{13} 
 (?=
      (?: ," [^"]* ": [^,]*? )*?
      ,"available":true
 )

Explained

 (?<= "id": )                        # Lookbehind assertion for id
 \d{13}                              # Consume 13 digit id
 (?=                                 # Lookahead assertion
      (?:                                 # Optional sequence
           ,                                   # comma
           " [^"]* "                           # quoted string
           :                                   # colon
           [^,]*?                              # optional non-comma's
      )*?                                 # End sequence, do 0 to many times - 
      ,"available":true                   # until we find  available = true
 )

2 Comments

could you please explain what third part of regex is doing
@PIG - I've added more. What exactly are you having trouble understanding ?
1

Here, we can simply use the "id" as a left boundary, and collect the desired numbers in a capturing group:

"id":([0-9]+)

enter image description here

Then, we can continue add boundaries to it. For example, if 13 digits are desired, we can simply:

\"id\":([0-9]{13})

2 Comments

The OP only wants rows matching a certain condition.
Thanks for the response emma, i need to match "available":true condition

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.