0

I have quite a complicated problem, and I wondered if any of you coding wizards would be able to give me a hand :p

I want to use two regex patterns using one lambda expression.
The code is applied to a column of a pandas Dataframe.

We loop over all the elements in the column. If the string contains a '[' ,square bracket, one regex pattern has to be executed. If the string doesn't contain the square bracket the other regex pattern has to be executed.

The two working regex patterns can be found below.
For the moment they are separated, but I want to combine them.

I have following code which works fine:

chunk['http'] = chunk.loc[chunk['Protocol'] == 'HTTP', 'Information'].apply(
                    lambda x: re.sub(r'\b[^A-Z\s]+\b', '', x))


chunk['http'] = chunk.loc[chunk['Protocol'] == 'HTTP', 'Information'].apply(
                lambda x: re.sub(r'\[(.*?)\]', '', x))

The first expression only keeps the values in CAPS. The second expression only keeps the values between square brackets.

I have tried to combine both of them in the next piece of code:

chunk['http'] = chunk.loc[chunk['Protocol'] == 'HTTP', 'Information'].apply(
                    lambda x: re.sub(r'\b[^A-Z\s]+\b', '', x)) \
                    if '[' in x == False\
                    else re.sub(r'\[(.*?)\]', '', x)

However this returns following error:

NameError: free variable 'x' referenced before assignment in enclosing scope

2 Answers 2

1

You misplaced a parentheses. It should be

chunk['http'] = chunk.loc[chunk['Protocol'] == 'HTTP', 'Information'].apply(
                    lambda x: re.sub(r'\b[^A-Z\s]+\b', '', x) \
                    if '[' in x == False\
                    else re.sub(r'\[(.*?)\]', '', x))
Sign up to request clarification or add additional context in comments.

Comments

1

Lambda is just a function that is short and returns the value. You can write your function instead - def function_name(x) somewhere and do much more there than in the lambda. Just remember to return the value at the end!

def function_name(x):
    x = re.sub(r'\b[^A-Z\s]+\b', '', x)) # lambda by default returns the value of the expression, here 
    #I really didn't understood your if/else block, but it should be here
    return re.sub(r'\[(.*?)\]', '', x) #last value, as opposed to lambda, should explicitly use return statement

chunk['http'] = chunk.loc[chunk['Protocol'] == 'HTTP', 'Information'].apply(function_name)

1 Comment

Didn't think about that, the lambda expression is getting a bit messy with all those lines of code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.