1

I have two dataframes, with cell names and some values for that cells, like this: cell_df:

cell_name    cell_values
abc1b        (h 1, a 2, a4)
adc2g        (h 2, a 4, a5)
daf1g        (h 3, a 7, a2)
adg2d        (h 1, a 4, a4)

And the other one:

record_df:

record_id record_values
1        start abc1b 1 2 , daf1g  3 5
2        start adc2g 6 7 , adg2d  6 5
3        start abc1b 10 13 , adc2g  2 3

What I need is to put cell_values before each comma, for that cell_name appear before that same comma and string "from" before first number, string "to" between two numbers

Desired output:

record_id record_values
1        start abc1b from 1 to 2 (h 1, a 2, a4), daf1g from 3 to 5 (h 3, a 7, a2)
2        start adc2g from 6 to 7 (h 2, a 4, a5), adg2d from 6 to 5 (h 1, a 4, a4)
3        start abc1b from 10 to 13 (h 1, a 2, a4), adc2g from 2 to 3 (h 1, a 4, a4)

I think I got that with my code below, but it takes a huge amount of time to proceed, a few minutes, but dataframe has just 80 rows.

for cn, cv in cell_df[['cell_name', 'cell_values']].values:
    record_df['record_values'] = record_df['record_values'].apply(lambda x: (re.sub(r"%s(\s+)(\d+)\s+(\d+)" % cn, r"%s from \1 to \2 %s" % (cn, cv), x)))

So, the question is: is there any way to speed that up? Maybe a whole different approach?

I am using Python 2.7

1 Answer 1

1

With Python 3.6 f-strings

Create a dictionary from cell_df

m = dict(cell_df.values)

def fmt(rec):
    pre, txt = rec.split(maxsplit=1)
    return pre + ' ' + ', '.join(
        f'{a} from {b} to {c} {m[a]}'
        for a, b, c in map(str.split, map(str.strip, txt.split(',')))
    )

record_df.record_values.apply(fmt)

0    start abc1b from 1 to 2 (h 1, a 2, a4), daf1g ...
1    start adc2g from 6 to 7 (h 2, a 4, a5), adg2d ...
2    start abc1b from 10 to 13 (h 1, a 2, a4), adc2...
Name: record_values, dtype: object
  • pre, txt = rec.split(maxsplit=1) lops off that initial start bit and puts it into the pre name. This leaves txt with the triples we want to reformat.
  • Then I want to split(',') the value in txt
  • For each element in that split I want to strip off excess spaces
  • Then I want to split those results by spaces
  • This should result in a list of lists or Iterable of Iterables where each Iterable should be of length 3
  • I can unpack those 3 values into a, b, and c
  • Then I reformat those with an f-string or str.format function
  • Put everything back together with a ', '.join

Pre-Python 3.6

m = dict(cell_df.values)

def fmt(rec):
    pre, txt = rec.split(None, 1)
    return pre + ' ' + ', '.join(
        '{} from {} to {} {}'.format(a, b, c, m[a])
        for a, b, c in map(str.split, map(str.strip, txt.split(',')))
    )

record_df.record_values.apply(fmt)

Tailored for OP

m = dict(cell_df.values)

def fmt(rec):
    pre, txt = rec.split(None, 1)
    return pre + ' ' + ', '.join(
        '{} from {} to {} {}'.format(a, b, c, m[a])
        for a, b, c in map(str.split, map(str.strip, map(str, txt.split(','))))
    )

record_df.record_values.apply(fmt)
Sign up to request clarification or add additional context in comments.

7 Comments

Invalid syntax at the end of this part : f'{a} from {b} to {c} {m[a]}', I can't figure it out where exactly.
Now I got an error for line : pre, txt = rec.split(maxsplit=1), TypeError: split() takes no keyword arguments. Btw, thank you very much for helping. :)
TypeError: descriptor 'strip' requires a 'str' object but received a 'unicode'
I added an additional map to convert stuff to strings. Not sure if that'll work.
Unfortunately won't. ValueError: too many values to unpack To be honest I'm not getting what you have done here (in fmt function) so I can't make any minor changes without you, to make this works.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.