How to extract characters and numeric values from a given string?

Question

I have the following strings:

'10000 ABC = 1 DEF'
'1 AM = 0,30$'
'3500 ABC = 1 GTY'
'1000 HUYT=1ABC'
'1 MONET Data = 1 ABC'

I want to find a flexible way to extract numeric and string values from left and right sides of =. I do not know all possible string values. Therefore I cannot pre-define them. The only thing that I know is that left and right sides are divided by =.

The goal is to get this result for the above-given example:

String-pairs:

ABC-DEF
AM-$
ABC-GTY
HUYT-ABC
MONET Data-ABC

Numeric-pairs:

I was trying to use .lstrip('...') and rstrip("..."), but it does not give me the expected result.

Should '1 MONET Data = 1 ABC' result in 'MONET Data-ABC' and not 'MONETData-ABC' when getting the string pair? — Sean Francis N. Ballais
– Sean Francis N. Ballais, Commented Jan 14, 2018 at 15:36
@SeanFrancisN.Ballais: there should a space between MONET and Data. — Markus
– Markus, Commented Jan 14, 2018 at 15:37
Where is your code? This problem is riddled with nasty egde cases, and seeing your attempt would make it much easier to help you. — Tim Biegeleisen
– Tim Biegeleisen, Commented Jan 14, 2018 at 15:38

Jongware · Accepted Answer · 2018-01-14 15:42:23Z

3

Remove the unwanted characters and replace the = with a -.

import re

str = ['10000 ABC = 1 DEF',
    '1 AM = 0,30$',
    '3500 ABC = 1 GTY',
    '1000 HUYT=1ABC',
    '1 MONET Data = 1 ABC']

String_pairs = []
Numeric_pairs = []

for s in str:
    String_pairs.append (re.sub(r'\s*=\s*','-', re.sub(r'\s*\d+(,\d+)?\s*','', s)))
    Numeric_pairs.append (re.sub(r'\s*=\s*','-', re.sub(r'\s*[^\d,=]+\s*','', s)))

print String_pairs
print Numeric_pairs

Result:

['ABC-DEF', 'AM-$', 'ABC-GTY', 'HUYT-ABC', 'MONET Data-ABC']
['10000-1', '1-0,30', '3500-1', '1000-1', '1-1']

or a more cooler list comprehension (with the same result):

String_pairs = [re.sub(r'\s*=\s*','-', re.sub(r'\s*\d+(,\d+)?\s*','', s)) for s in str]
Numeric_pairs = [re.sub(r'\s*=\s*','-', re.sub(r'\s*[^\d,=]+\s*','', s)) for s in str]

edited Jan 14, 2018 at 15:42

answered Jan 14, 2018 at 15:36

Jongware

22.6k8 gold badges56 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sean Francis N. Ballais · Accepted Answer · 2018-01-14 15:30:12Z

1

As an alternative to regex, what you could do is to loop through each string and extract the relevant characters. It could look something along the lines of the following.

def extract_string_pairs(source_string):
    string_pair = ''
    for c in source_string:
        if c.isalpha() or c == '$':
            string_pair += c
        elif c == '=':
            string_pair += '-'

    return string_pair

def extract_numeric_pairs(source_string):
    string_pair = ''
    for c in source_string:
        if c.isdigit():
            string_pair += c
        elif c == '.':
            string_pair += '.'
        elif c == '=':
            string_pair += '-'

    return string_pair

answered Jan 14, 2018 at 15:30

Sean Francis N. Ballais

2,4982 gold badges28 silver badges44 bronze badges

Comments

PythonProgrammi · Accepted Answer · 2018-01-16 18:03:11Z

0

import re

str = ['10000 ABC = 1 DEF',
       '1 AM = 0,30$',
       '3500 ABC = 1 GTY',
       '1000 HUYT=1ABC',
       '1 MONET Data = 1 ABC']


def getThePat(pat):
    for i in str:
        i = i.split("=")
        x = re.findall(pat, i[0])
        y = re.findall(pat, i[1])
        print(" ".join(x), "-", " ".join(y))


pat1 = "\$+|[a-z]+|[A-Z][a-z]+|[A-Z]+"
pat2 = "\d+|\,+"
getThePat(pat1)
getThePat(pat2)

output:

ABC - DEF
AM - $
ABC - GTY
HUYT - ABC
MONET Data - ABC
10000 - 1
1 - 0 , 30
3500 - 1
1000 - 1
1 - 1

answered Jan 16, 2018 at 18:03

PythonProgrammi

23.6k3 gold badges44 silver badges35 bronze badges

Collectives™ on Stack Overflow

How to extract characters and numeric values from a given string?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related