1

I have the following table:

                         TABLE_NAME  jace  building    equipment
0  R338_1_FAHU2_SUPAIR_TEMP          NaN       NaN        NaN
1  R1001_1_R1005_1_FAHU_1_CO2_SEN2   NaN       NaN        NaN

I wrote this sub-function that analyses one column and (should) return three.

 def fill(tablename='R338_1_FAHU2_SUPAIR_TEMP'):
    jace,building,equipment=re.findall('(^[RP].*?_[1-9])_*(.*?)_(F.*)',tablename)[0]
    if not len(building):
        building=re.findall('(.*)_',jace)[0]
    return jace,building,equipment

this function returns for the first row

('R338_1', 'R338', 'FAHU2_SUPAIR_TEMP')

I want to insert those into jace,building, and equipment columns above I tried this:

df[['jace','building','equipment']]=df['TABLE_NAME'].apply(lambda x: (fill(x)))

it gave me

ValueError: Must have equal len keys and value when setting with an iterable

I also tried axis=1 inside apply(), which seems to contradict with lambda for some reason

Any ideas how to make it work? (I can use fill(x)[0], fill(x)[1], which should solve the issue, but that feels like hard-coding)

0

2 Answers 2

4

Your function should return a Series:

def fill(tablename='R338_1_FAHU2_SUPAIR_TEMP'):
    jace,building,equipment=re.findall('(^[RP].*?_[1-9])_*(.*?)_(F.*)',tablename)[0]
    if not len(building):
        building=re.findall('(.*)_',jace)[0]
    return pd.Series([jace,building,equipment])

df[['jace','building','equipment']] = df['TABLE_NAME'].apply(fill)

output:

                        TABLE_NAME     jace building          equipment
0         R338_1_FAHU2_SUPAIR_TEMP   R338_1     R338  FAHU2_SUPAIR_TEMP
1  R1001_1_R1005_1_FAHU_1_CO2_SEN2  R1001_1  R1005_1    FAHU_1_CO2_SEN2

Now you could also use vectorial code for efficiency:

(df['TABLE_NAME']
 .str.extract('(?P<jace>^[RP].*?_[1-9])_*(?P<building>.*?)_(?P<equipment>F.*)')
 .assign(building=lambda d: d['building'].mask(d['building'].eq(''),
                                               d['jace'].str.extract('(.*)_',
                                                                   expand=False)))
)

output:

      jace building          equipment
0   R338_1     R338  FAHU2_SUPAIR_TEMP
1  R1001_1  R1005_1    FAHU_1_CO2_SEN2
Sign up to request clarification or add additional context in comments.

Comments

1

You can use result_type="expand"

Can you try the following:

def fill(tablename='R338_1_FAHU2_SUPAIR_TEMP'):
    jace,building,equipment = re.findall(
        '(^[RP].*?_[1-9])_*(.*?)_(F.*)',
        tablename['TABLE_NAME'])[0]
    if not len(building):
        building=re.findall('(.*)_',jace)[0]
    return jace, building, equipment

df[['jace','building','equipment']] = df[['TABLE_NAME']].apply(
    fill, axis=1, result_type="expand")

2 Comments

There is no result_type parameter for Series.apply (only for DataFrame)
@mozway Yeah, I modified it to dataframe apply. Your solution looks cool..

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.