3

I have the following strings:

'10000 ABC = 1 DEF'
'1 AM = 0,30$'
'3500 ABC = 1 GTY'
'1000 HUYT=1ABC'
'1 MONET Data = 1 ABC'

I want to find a flexible way to extract numeric and string values from left and right sides of =. I do not know all possible string values. Therefore I cannot pre-define them. The only thing that I know is that left and right sides are divided by =.

The goal is to get this result for the above-given example:

String-pairs:

ABC-DEF
AM-$
ABC-GTY
HUYT-ABC
MONET Data-ABC

Numeric-pairs:

10000-1
1-0.30
3500-1
1000-1
1-1

I was trying to use .lstrip('...') and rstrip("..."), but it does not give me the expected result.

4
  • 1
    show us your code? It looks like you need regex. Commented Jan 14, 2018 at 15:20
  • Should '1 MONET Data = 1 ABC' result in 'MONET Data-ABC' and not 'MONETData-ABC' when getting the string pair? Commented Jan 14, 2018 at 15:36
  • @SeanFrancisN.Ballais: there should a space between MONET and Data. Commented Jan 14, 2018 at 15:37
  • 1
    Where is your code? This problem is riddled with nasty egde cases, and seeing your attempt would make it much easier to help you. Commented Jan 14, 2018 at 15:38

3 Answers 3

3

Remove the unwanted characters and replace the = with a -.

import re

str = ['10000 ABC = 1 DEF',
    '1 AM = 0,30$',
    '3500 ABC = 1 GTY',
    '1000 HUYT=1ABC',
    '1 MONET Data = 1 ABC']

String_pairs = []
Numeric_pairs = []

for s in str:
    String_pairs.append (re.sub(r'\s*=\s*','-', re.sub(r'\s*\d+(,\d+)?\s*','', s)))
    Numeric_pairs.append (re.sub(r'\s*=\s*','-', re.sub(r'\s*[^\d,=]+\s*','', s)))

print String_pairs
print Numeric_pairs

Result:

['ABC-DEF', 'AM-$', 'ABC-GTY', 'HUYT-ABC', 'MONET Data-ABC']
['10000-1', '1-0,30', '3500-1', '1000-1', '1-1']

or a more cooler list comprehension (with the same result):

String_pairs = [re.sub(r'\s*=\s*','-', re.sub(r'\s*\d+(,\d+)?\s*','', s)) for s in str]
Numeric_pairs = [re.sub(r'\s*=\s*','-', re.sub(r'\s*[^\d,=]+\s*','', s)) for s in str]
Sign up to request clarification or add additional context in comments.

Comments

1

As an alternative to regex, what you could do is to loop through each string and extract the relevant characters. It could look something along the lines of the following.

def extract_string_pairs(source_string):
    string_pair = ''
    for c in source_string:
        if c.isalpha() or c == '$':
            string_pair += c
        elif c == '=':
            string_pair += '-'

    return string_pair

def extract_numeric_pairs(source_string):
    string_pair = ''
    for c in source_string:
        if c.isdigit():
            string_pair += c
        elif c == '.':
            string_pair += '.'
        elif c == '=':
            string_pair += '-'

    return string_pair

Comments

0
import re

str = ['10000 ABC = 1 DEF',
       '1 AM = 0,30$',
       '3500 ABC = 1 GTY',
       '1000 HUYT=1ABC',
       '1 MONET Data = 1 ABC']


def getThePat(pat):
    for i in str:
        i = i.split("=")
        x = re.findall(pat, i[0])
        y = re.findall(pat, i[1])
        print(" ".join(x), "-", " ".join(y))


pat1 = "\$+|[a-z]+|[A-Z][a-z]+|[A-Z]+"
pat2 = "\d+|\,+"
getThePat(pat1)
getThePat(pat2)

output:

ABC - DEF
AM - $
ABC - GTY
HUYT - ABC
MONET Data - ABC
10000 - 1
1 - 0 , 30
3500 - 1
1000 - 1
1 - 1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.