I have a long string with key values in this format:
"info":"infotext","day":"today","12":"here","info":"infotext2","info":"infotext3"
I want to get the value (=infotexts) of all "info" keys. How can this be done?
Use the json, Luke
s = '"info":"infotext","day":"today","12":"here","info":"infotext2","info":"infotext3"'
import json
def pairs_hook(pairs):
return [val for key, val in pairs if key == 'info']
p = json.loads('{' + s + '}', object_pairs_hook=pairs_hook)
print p # [u'infotext', u'infotext2', u'infotext3']
From the docs:
object_pairs_hook is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of object_pairs_hook will be used instead of the dict.
Just for the sake of completeness, here's a regular expression that does the same:
rg = r'''(?x)
"info"
\s* : \s*
"
(
(?:\\.|[^"])*
)
"
'''
re.findall(rg, s) # ['infotext', 'infotext2', 'infotext3']
This also handles spaces around : and escaped quotes inside strings, like e.g.
"info" : "some \"interesting\" information"
As long as your infotext does not contain (escaped) quotes, you could try something like this:
>>> m = re.findall(r'"info":"([^"]+)', str)
>>> m
['infotext', 'infotext2', 'infotext3']
We simply match "info":" and then as many non-" characters as possible (which are captured and thus returned).
json- if so, why?