2

so I have an array that looks like the one below. the "error" substring always starts with this character a special character "‘" so I was able to just get the errors with something like this

a = [' 276ARDUINO_i2c.c:70:27: error: ‘ARDUINO_I2C_nI2C', ' 248rpy_i2c.h:76:40: error: ‘RPY_I2C_BASE_ADDR_LIST', ' 452rpy_i2c.c:79:77: error: ‘RPY_I2C_IRQ_LIST']
newlist = [x.split('‘')[1] for x in a]
print(newlist)

and the output would look like this

['ARDUINO_I2C_nI2C', 'RPY_I2C_BASE_ADDR_LIST', 'RPY_I2C_IRQ_LIST']  

but now, i also need to get the name of the file related to that error. The name of the file always start with a numeric substring that I also need to remove. the output I want would look like this

   ['ARDUINO_i2c.c', 'ARDUINO_I2C_nI2C'], ['rpy_i2c.h', 'RPY_I2C_BASE_ADDR_LIST'], ['rpy_i2c.c','RPY_I2C_IRQ_LIST']

I'll apreciate any suggestions. thanks.

1
  • Why can't you use similar logic to split out the file name? The file name is preceded by numbers, and followed by a colon. Don't try and shove everything into a list comprehension, first make a regular loop. Later, you can try and condense it down to a list comprehension if possible Commented Dec 7, 2022 at 18:40

2 Answers 2

1

You could use a regular expression to capture the required parts of your string. For example, the following regex (Try it online):

\d+([^:]+):.*‘(.*)$

Explanation:
-----------
\d+                     : One or more numbers
   (     )    (  )      : Capturing groups
    [^:]+               : One or more non-colon characters (in capturing group 1)
          :             : One colon
           .*           : Any number of any character
             ‘          : The ‘ character
               .*       : Any number of any character (in capturing group 2)
                  $     : End of string

To use it:

import re

regex = re.compile(r"\d+([^:]+):.*‘(.*)$")

newlist = [regex.search(s).groups() for s in a]

which gives a list of tuples:

[('ARDUINO_i2c.c', 'ARDUINO_I2C_nI2C'),
 ('rpy_i2c.h', 'RPY_I2C_BASE_ADDR_LIST'),
 ('rpy_i2c.c', 'RPY_I2C_IRQ_LIST')]

If you really want a list of lists, you can convert the result of .groups() to a list:

newlist = [list(regex.search(s).groups()) for s in a]
Sign up to request clarification or add additional context in comments.

2 Comments

sweet, I'm still a noob in python. I was just looking into that that re library. I really appreciate the explanation.
@pekoms you can find more information about regular expressions here: regular-expressions.info
0

I have created this code to get the exact result as you like but there could be more efficient ways too. I have split the values and used regex to get the needed result.

import re
a = [' 276ARDUINO_i2c.c:70:27: error: ‘ARDUINO_I2C_nI2C', '248rpy_i2c.h:76:40: error: ‘RPY_I2C_BASE_ADDR_LIST', ' 452rpy_i2c.c:79:77: error: ‘RPY_I2C_IRQ_LIST']
r=[]
for x in a:
    d=x.split(": error: ‘")
    r.append([re.sub("[0-9]{3}","",d[0].split(":")[0].strip()),d[1]])
print(r)

4 Comments

You haven't used regex
@PranavHosangadi yes I did mistake, I m fixing it. Thank you.
You can do "[0-9]{3}" or r"\d{3}" to match three digits, no need to multiply the string three times
Thank you for your suggestion. I don't know about regex much, I just learnt a little. I will consider your suggestion.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.