0

I want to remove all the digits from the string other than the digits given in the dictionary. I have written code to remove it but not getting the expected result. Please have a look below:

mystr=" hey I want to delete all the digits ex 600 502 700 m 8745 given in this string. And getting request from the ip address 122521587502. This string tells about deleting digits 502 600 765 from this."

myDict={'600','700'} # set()

# snippet to remove digits from string other than digits given in the myDict 

My Solution

for w in myDict:
    for x in mystr.split():
        if (x.isdigit()):
            if(x != w):
                mystr.replace(x," ")

Expected result:

mystr=" hey I want to delete all the digits ex 600 700 m  given in this string. And getting request from the  
ip address . This string tells about deleting digits  600  from this."
3
  • Your dictionary is a set? Commented Aug 23, 2018 at 18:08
  • @BallpointBen, Yes. Thanks, i will edit question. Commented Aug 23, 2018 at 18:08
  • 2
    That's not a dictionary but a set, and instead of looping over the set you can check the membership using in operator. Also, never change an iterator that you're looping over. Commented Aug 23, 2018 at 18:09

7 Answers 7

2

This is one approach.

Ex:

import string

mystr= "hey I want to delete all the digits ex 600 502 700 m 8745 given in this string. And getting request from the  ip address 122521587502. This string tells about deleting digits 502 600 765 from this."
mySet={'600','700'}

rep = lambda x: x if x in mySet else None
print( " ".join(filter(None, [rep(i) if i.strip(string.punctuation).isdigit() else i for i in mystr.split()])) )

Output:

hey I want to delete all the digits ex 600 700 m given in this string. And getting request from the ip address This string tells about deleting digits 600 from this.
Sign up to request clarification or add additional context in comments.

2 Comments

check your output
Thanks..Edited answer.
1

This is another alternative. It adds spaces to the dots but removes also the number after ip address. This is not done in other solutions because of the dot after the number.

import re

mystr= "hey I want to delete all the digits ex 600 502 700 m 8745 given in this 
    string. And getting request from the  ip address 122521587502. This string 
    tells about deleting digits 502 600 765 from this."

myDict={'600','700'}

print(" ".join("" if (i.isdigit() and i not in myDict) \
    else i for i in re.findall(r'(?:\w+|\d+|\S)', mystr)))

Output:

hey I want to delete all the digits ex 600  700 m  given in this string . And 
getting request from the ip address  . This string tells about deleting digits  
600  from this .

PS: There is a terrible alternative to fix the spaces of the dots:

print("".join("" if (i.isdigit() and i not in myDict) \
    else i if i == '.' or i == ',' \
    else ''.join([' ', i]) for i in re.findall(r'(?:\w+|\d+|\S)', mystr))
    .strip())

Which produces the output:

hey I want to delete all the digits ex 600 700 m given in this string. And 
getting request from the ip address. This string tells about deleting digits 
600 from this.

5 Comments

Thanks!! This solution is efficient too. One quick question, is it possible to add a check for alphanumeric values? If in a string alphanumeric value is present, then remove that value.
Sure. It'll become a monster of function though... "".join("" if (i.isdigit() and i not in myDict) or re.search(r'^(?=.*[a-zA-Z])(?=.*[0-9])', i) else i if i == '.' or i == ',' else ''.join([' ', i]) for i in re.findall(r'(?:\w+|\d+|\S)', mystr)).strip()
You may wanna use the solution of @Bear Brown though... The code is better organized (and it's a little faster)...
Hey Thanks! But this code has removed the digits present in myDict.
So you don't get any number at all? Cause I get the right output when I try it
1
In [1]: mystr=" hey I want to delete all the digits ex 600 502 700 m 8745 given in this string. And getting request from the ip address
   ...:  122521587502. This string tells about deleting digits 502 600 765 from this."
   ...: myDict={'600','700'}

first you can prepare data to remove:

   ...: mystr_l = mystr.replace('.', "").split()
   ...: to_remove = sorted(list({x for x in set(mystr_l) if x.isdigit() and x not in myDict}))
   ...: 
   ...: print(to_remove)
['122521587502', '502', '765', '8745']

and the remove it from your string:

In [4]: for x in to_remove:
   ...:     mystr = mystr.replace(x, " ")
   ...:  

my result is:

In [6]: print(mystr)
hey I want to delete all the digits ex 600  700 m  given in this string. 
And getting request from the ip addres . This string tells about deleting digits  600  from this.

And some test of perfomance:

def replace_digits(src_string, exclude_list):
    result = src_string
    string_l = src_string.replace('.', "").split()
    to_remove = sorted(list({x for x in set(string_l) if x.isdigit() and x not in exclude_list}))
    for x in to_remove:
        result = result.replace(x, "")
    return result

import re

def reg(src_string, exclude_list):
    return " ".join("" if (i.isdigit() and i not in exclude_list) \
                    else i for i in re.findall(r'(?:\w+|\d+|\S)', src_string))

the tests:

In [8]: %timeit replace_digits(mystr, mySet)
11.3 µs ± 31.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [9]: %timeit reg(mystr, mySet)
    ...: 
25.1 µs ± 21.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Comments

1

You can do this with re.sub. Match any number and use a callable replacer to filter out only the numbers which are not desired.

Using a set to store the sequences of digits you want to keep then allows O(1) lookup as the string is traversed.

Code

import re

def remove_numbers(s, keep=None):
    if keep:
        keep = set(str(x) for x in keep)
        return re.sub(r'\b\d+\b', lambda m: m.group() if m.group() in keep else '', s)
    else:
        # Shortcircuit the use of a set if there is no sequence to keep
        return re.sub(r'\b\d+\b', '', s)

Example

allowed = {600, 700}
s = 'I want to delete this: 100 200. But keep this: 600 700'
print(remove_numbers(s, allowed))

Output

I want to delete this:  . But keep this: 600 700

Comments

0

You can use simple code like this with only Boolean logic and basic functions of string manipulations.

mystr= "hey I want to delete all the digits ex 600 502 700 m 8745 given in this string. And getting request from the  ip address 122521587502. This string tells about deleting digits 502 600 765 from this."
myDict={'600','700'}

print( " ".join(c if not(bool(c.isdigit()) ^ bool(c in myDict)) else "" for c in mystr.split()) )

But the problem with this is, this will not consider edge digits that comes with full stop or other special character such as 122521587502. in above example. So, if you still need to consider those you can use a custom function with regex pattern matching instead of isdigit() and write a little bit complex code to get desired result. Here is an example for considering numbers end bu fullstop and comma.

^[0-9]*[\,\.]?$ can be used as a regex pattern to match the above scenario. ( you can use this tool to debug regex patterns easily ). So the code snippet is as follows:

import re

isNum = lambda c: True if re.match("^[0-9]*[\,\.]?$",c) else False
func = lambda c: True if re.compile("[\,\.]").split(c) in myDict else False

print(" ".join(c if not(isNum(c) ^ func(c)) else "" for c in mystr.split()))

Comments

0

This shows a combined algorithm for pre-processing and removing digits. This program also handles the 122521587502. edge case and 12.5 float values if they happen to be in the input string

Code

exclude_set = {'600', '700'}

mystr=' hey I want to delete all the digits ex 600 502 700 8745 given in this string. And getting request from the ip address 122521587502. 12.5 This string tells about deleting digits 502 600 765 from this.'

# Pre-process the string converting all whitespaces to single spaces
mystr = " ".join(mystr.split())

# Check if the word is both a digit and to be excluded
# Also catch any floats and full-stops
mystr_list = mystr.split()
for word in mystr_list:
  if word.replace('.', '').isdigit() and word not in exclude_set:
    # Replace word or remove digit
    if word.endswith('.'):
      mystr_list[mystr_list.index(word)] = '.'
    else:
      mystr_list.remove(word)

# Combine the list to form your string
mystr = ' '.join(mystr_list)
print (mystr)

Comments

0

You do not need to complicate yourself. Just ensure mySet is a dictionary by doing dict(zip(mySet, mySet)), then use this to replace:

import re
mySet1 =dict(zip(mySet, mySet))

re.sub("\\d+", lambda x:mySet1.get(x.group()), mystr)

Out[604]: 'hey I want to delete all the digits ex 600  700 m  given in this string.
           And getting request from the  ip address . This string tells about 
           deleting digits  600  from this.'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.