1

I have a dataframe that is consist of 3 columns where one of these columns includes as values 2 values separated with - in one record.

I want to another columns that includes all these values but with one value each record

For this I created a function but when i run the code it crash and display the below error:

KeyError: "['loc1-loc2'] not found in axis"
Traceback:
File "f:\aienv\lib\site-packages\streamlit\script_runner.py", line 333, in _run_script
    exec(code, module.__dict__)
File "F:\AIenv\streamlit\app.py", line 346, in <module>
    df[y] = df[y].apply(splitting)
File "f:\aienv\lib\site-packages\pandas\core\series.py", line 4200, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas\_libs\lib.pyx", line 2401, in pandas._libs.lib.map_infer
File "F:\AIenv\streamlit\app.py", line 295, in splitting
    df1=df.drop(record,axis=1).join(record.str.split("-",expand=True).stack().reset_index(level=1,drop=True).rename("spltting"))
File "f:\aienv\lib\site-packages\pandas\core\frame.py", line 4169, in drop
    errors=errors,
File "f:\aienv\lib\site-packages\pandas\core\generic.py", line 3884, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "f:\aienv\lib\site-packages\pandas\core\generic.py", line 3918, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
File "f:\aienv\lib\site-packages\pandas\core\indexes\base.py", line 5278, in drop
    raise KeyError(f"{labels[mask]} not found in axis")

Code:

import numpy as np
import pandas as pd

df =pd.DataFrame({
            "source_number":                        [ 
             [11199,11328,11287,32345,12342,1232,13456,123244,13456],
             "location":          
             ["loc2","loc1-loc3","loc3","loc1","loc2-loc1","loc2","loc3-loc2","loc2","loc1"],
              "category": 
             ["cat1","cat2","cat1","cat3","cat3","cat3","cat2","cat3","cat2"],
             })    

def splitting(record):
    df1=df.drop(record,axis=1).join(record.str.split("-",expand=True).stack().reset_index(level=1,drop=True).rename("spltting"))
    return df1
for y in df.columns:
      df[y] = df[y].apply(splitting)

1 Answer 1

3

I think you need DataFrame.explode with split column passed to function:

def splitting(df, r):

    df[r] = df[r].str.split("-")
    return df.explode(r)

df = splitting(df, 'location')
print (df)
   source_number location category
0          11199     loc2     cat1
1          11328     loc1     cat2
1          11328     loc3     cat2
2          11287     loc3     cat1
3          32345     loc1     cat3
4          12342     loc2     cat3
4          12342     loc1     cat3
5           1232     loc2     cat3
6          13456     loc3     cat2
6          13456     loc2     cat2
7         123244     loc2     cat3
8          13456     loc1     cat2
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.