extract and filter values from string list python

Question

so I have an array that looks like the one below. the "error" substring always starts with this character a special character "‘" so I was able to just get the errors with something like this

a = [' 276ARDUINO_i2c.c:70:27: error: ‘ARDUINO_I2C_nI2C', ' 248rpy_i2c.h:76:40: error: ‘RPY_I2C_BASE_ADDR_LIST', ' 452rpy_i2c.c:79:77: error: ‘RPY_I2C_IRQ_LIST']
newlist = [x.split('‘')[1] for x in a]
print(newlist)

and the output would look like this

['ARDUINO_I2C_nI2C', 'RPY_I2C_BASE_ADDR_LIST', 'RPY_I2C_IRQ_LIST']

but now, i also need to get the name of the file related to that error. The name of the file always start with a numeric substring that I also need to remove. the output I want would look like this

   ['ARDUINO_i2c.c', 'ARDUINO_I2C_nI2C'], ['rpy_i2c.h', 'RPY_I2C_BASE_ADDR_LIST'], ['rpy_i2c.c','RPY_I2C_IRQ_LIST']

I'll apreciate any suggestions. thanks.

Why can't you use similar logic to split out the file name? The file name is preceded by numbers, and followed by a colon. Don't try and shove everything into a list comprehension, first make a regular loop. Later, you can try and condense it down to a list comprehension if possible — pho
– pho, Commented Dec 7, 2022 at 18:40

pho · Accepted Answer · 2022-12-07 19:02:14Z

1

You could use a regular expression to capture the required parts of your string. For example, the following regex (Try it online):

\d+([^:]+):.*‘(.*)$

Explanation:
-----------
\d+                     : One or more numbers
   (     )    (  )      : Capturing groups
    [^:]+               : One or more non-colon characters (in capturing group 1)
          :             : One colon
           .*           : Any number of any character
             ‘          : The ‘ character
               .*       : Any number of any character (in capturing group 2)
                  $     : End of string

To use it:

import re

regex = re.compile(r"\d+([^:]+):.*‘(.*)$")

newlist = [regex.search(s).groups() for s in a]

which gives a list of tuples:

[('ARDUINO_i2c.c', 'ARDUINO_I2C_nI2C'),
 ('rpy_i2c.h', 'RPY_I2C_BASE_ADDR_LIST'),
 ('rpy_i2c.c', 'RPY_I2C_IRQ_LIST')]

If you really want a list of lists, you can convert the result of .groups() to a list:

newlist = [list(regex.search(s).groups()) for s in a]

edited Dec 7, 2022 at 19:02

answered Dec 7, 2022 at 18:54

pho

25.7k8 gold badges48 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

pekoms Over a year ago

sweet, I'm still a noob in python. I was just looking into that that re library. I really appreciate the explanation.

pho Over a year ago

@pekoms you can find more information about regular expressions here: regular-expressions.info

Ajay Pun Magar · Accepted Answer · 2022-12-07 19:07:26Z

0

I have created this code to get the exact result as you like but there could be more efficient ways too. I have split the values and used regex to get the needed result.

import re
a = [' 276ARDUINO_i2c.c:70:27: error: ‘ARDUINO_I2C_nI2C', '248rpy_i2c.h:76:40: error: ‘RPY_I2C_BASE_ADDR_LIST', ' 452rpy_i2c.c:79:77: error: ‘RPY_I2C_IRQ_LIST']
r=[]
for x in a:
    d=x.split(": error: ‘")
    r.append([re.sub("[0-9]{3}","",d[0].split(":")[0].strip()),d[1]])
print(r)

edited Dec 7, 2022 at 19:07

answered Dec 7, 2022 at 18:55

Ajay Pun Magar

4641 gold badge4 silver badges15 bronze badges

4 Comments

pho Over a year ago

You haven't used regex

Ajay Pun Magar Over a year ago

@PranavHosangadi yes I did mistake, I m fixing it. Thank you.

pho Over a year ago

You can do "[0-9]{3}" or r"\d{3}" to match three digits, no need to multiply the string three times

Ajay Pun Magar Over a year ago

Thank you for your suggestion. I don't know about regex much, I just learnt a little. I will consider your suggestion.

Collectives™ on Stack Overflow

extract and filter values from string list python

2 Answers 2

2 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related