Add substrings in column values in Pandas if pattern mattches

Question

I have two dataframes, with cell names and some values for that cells, like this: cell_df:

cell_name    cell_values
abc1b        (h 1, a 2, a4)
adc2g        (h 2, a 4, a5)
daf1g        (h 3, a 7, a2)
adg2d        (h 1, a 4, a4)

And the other one:

record_df:

record_id record_values
1        start abc1b 1 2 , daf1g  3 5
2        start adc2g 6 7 , adg2d  6 5
3        start abc1b 10 13 , adc2g  2 3

What I need is to put cell_values before each comma, for that cell_name appear before that same comma and string "from" before first number, string "to" between two numbers

Desired output:

record_id record_values
1        start abc1b from 1 to 2 (h 1, a 2, a4), daf1g from 3 to 5 (h 3, a 7, a2)
2        start adc2g from 6 to 7 (h 2, a 4, a5), adg2d from 6 to 5 (h 1, a 4, a4)
3        start abc1b from 10 to 13 (h 1, a 2, a4), adc2g from 2 to 3 (h 1, a 4, a4)

I think I got that with my code below, but it takes a huge amount of time to proceed, a few minutes, but dataframe has just 80 rows.

for cn, cv in cell_df[['cell_name', 'cell_values']].values:
    record_df['record_values'] = record_df['record_values'].apply(lambda x: (re.sub(r"%s(\s+)(\d+)\s+(\d+)" % cn, r"%s from \1 to \2 %s" % (cn, cv), x)))

So, the question is: is there any way to speed that up? Maybe a whole different approach?

I am using Python 2.7

piRSquared · Accepted Answer · 2018-06-26 13:37:15Z

1

With Python 3.6 f-strings

Create a dictionary from cell_df

m = dict(cell_df.values)

def fmt(rec):
    pre, txt = rec.split(maxsplit=1)
    return pre + ' ' + ', '.join(
        f'{a} from {b} to {c} {m[a]}'
        for a, b, c in map(str.split, map(str.strip, txt.split(',')))
    )

record_df.record_values.apply(fmt)

0    start abc1b from 1 to 2 (h 1, a 2, a4), daf1g ...
1    start adc2g from 6 to 7 (h 2, a 4, a5), adg2d ...
2    start abc1b from 10 to 13 (h 1, a 2, a4), adc2...
Name: record_values, dtype: object

pre, txt = rec.split(maxsplit=1) lops off that initial start bit and puts it into the pre name. This leaves txt with the triples we want to reformat.
Then I want to split(',') the value in txt
For each element in that split I want to strip off excess spaces
Then I want to split those results by spaces
This should result in a list of lists or Iterable of Iterables where each Iterable should be of length 3
I can unpack those 3 values into a, b, and c
Then I reformat those with an f-string or str.format function
Put everything back together with a ', '.join

Pre-Python 3.6

m = dict(cell_df.values)

def fmt(rec):
    pre, txt = rec.split(None, 1)
    return pre + ' ' + ', '.join(
        '{} from {} to {} {}'.format(a, b, c, m[a])
        for a, b, c in map(str.split, map(str.strip, txt.split(',')))
    )

record_df.record_values.apply(fmt)

Tailored for OP

m = dict(cell_df.values)

def fmt(rec):
    pre, txt = rec.split(None, 1)
    return pre + ' ' + ', '.join(
        '{} from {} to {} {}'.format(a, b, c, m[a])
        for a, b, c in map(str.split, map(str.strip, map(str, txt.split(','))))
    )

record_df.record_values.apply(fmt)

edited Jun 26, 2018 at 13:37

answered Jun 26, 2018 at 12:51

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

jovicbg Over a year ago

Invalid syntax at the end of this part : f'{a} from {b} to {c} {m[a]}', I can't figure it out where exactly.

jovicbg Over a year ago

Now I got an error for line : pre, txt = rec.split(maxsplit=1), TypeError: split() takes no keyword arguments. Btw, thank you very much for helping. :)

jovicbg Over a year ago

TypeError: descriptor 'strip' requires a 'str' object but received a 'unicode'

piRSquared Over a year ago

I added an additional map to convert stuff to strings. Not sure if that'll work.

jovicbg Over a year ago

Unfortunately won't. ValueError: too many values to unpack To be honest I'm not getting what you have done here (in fmt function) so I can't make any minor changes without you, to make this works.

|

Collectives™ on Stack Overflow

Add substrings in column values in Pandas if pattern mattches

1 Answer 1

With Python 3.6 f-strings

Pre-Python 3.6

Tailored for OP

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

With Python 3.6 f-strings

Pre-Python 3.6

Tailored for OP

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related