1

I want to search a pattern in a string and then again search some invalid character in matching pattern and then remove them or replace with some valid characters.

I have some sample dictionaries eg. sample_dict = {"randomId":"123y" uhnb\n g", "desc": ["sample description"]}

In this case I want to find the value of a dictionary let say "123y" uhnb\n g" and then remove invalid characters in it such as (", \t, \n) etc.. what I have tried is stored all the dictionaries in a file then read file and matching pattern for dictionary value, but this gives me a list of matching pattern, I can also compile these matches but I am not sure how to perform replace in original dictionary value so my final output will be: {"randomId":"123y uhnb g", "desc": ["sample description"]}

pattern = re.findall("\":\"(.+?)\"", sample_dict)

expected result:

{"randomId":"123y uhnb g", "desc": ["sample description"]}

actual result:

['123y" uhnb\n g']
7
  • 2
    Don't parse JSON with regex, use a JSON parser Commented Apr 20, 2019 at 5:47
  • Possible duplicate of Parse JSON in Python Commented Apr 20, 2019 at 5:48
  • @miken32: I can use json parser but in that case as well I need to remove those invalid characters else it won't work, so in order to remove those characters I am using regex. Commented Apr 20, 2019 at 6:06
  • How did you end up with this strange sample_dict to begin with? Perhaps you can avoid that already earlier so that you do not need to replace or remove the strange characters. Commented Apr 20, 2019 at 6:32
  • Why are you using re.findall instead of re.sub (in some capacity)? Commented Apr 20, 2019 at 6:40

1 Answer 1

1

You can just substitute non-alphanumeric characters in your value using re.sub as below

dct = {"randomId":"123y uhnb\n g", "desc": ["sample description"]}
import re

for key, value in dct.items():
    val = None
    #If the value is a string, directly substitute
    if isinstance(value, str):
       val = re.sub(r"[^a-zA-Z0-9 ]", '', str(value))
    #If value is a list, substitute for all string in the list
    elif isinstance(value, list):
       val = []
       for item in value:
           val.append(re.sub(r"[^a-zA-Z0-9]", ' ', str(item)))
    dct[key] = val

print(dct)
#{'randomId': '123y uhnb g', 'desc': ['sample description']}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.