Removing characters from the dataframe python

Question

I want to replace a str from one of the column from the table. example: i want to remove b"SET and b"MULTISET from the df column. how to achieve that. I need output like Details are as below,

columns = ['cust_id', 'cust_name', 'vehicle', 'details', 'bill'] 
df = pd.DataFrame(data=t, columns=columns)
df
    
        cust_id     cust_name                   vehicle                             details                                                 bill
0   101         b"SET{'Tom','C'}"           b"MULTISET{'Toyota','Cruiser'}"     b"ROW('Street 1','12345678','NewYork, US')"             1200.00
1   102         b"SET{'Rachel','Green'}"    b"MULTISET{'Ford','se'}"            b"ROW('Street 2','12344444','Florida, US')"             2400.00
2   103         b"SET{'Chandler','Bing'}"   b"MULTISET{'Dodge','mpv'}"          b"ROW('Street 1','12345555','Georgia, US')"             601.10

Required Output:

    cust_id     cust_name                   vehicle                             details                                         bill
0   101         {'Tom','C'}                 {'Toyota','Cruiser'}            ('Street 1','12345678','NewYork, US')               1200.00
1   102         {'Rachel','Green'}          {'Ford','se'}                   ('Street 2','12344444','Florida, US')               2400.00
2   103         {'Chandler','Bing'}         {'Dodge','mpv'}                 ('Street 1','12345555','Georgia, US')               601.10

output of print(df) is as below, cust_id cust_name vehicle details bill 0 101 b"SET{'Tom','C'}" b"MULTISET{'Toyota','Cruiser'}" b"ROW('Street 1','12345678','NewYork, US')" 1200.00 1 102 b"SET{'Rachel','Green'}" b"MULTISET{'Ford','se'}" b"ROW('Street 2','12344444','Florida, US')" 2400.00 2 103 b"SET{'Chandler','Bing'}" b"MULTISET{'Dodge','mpv'}" b"ROW('Street 1','12345555','Georgia, US')" 601.10 — Shilpa S Jadhav
– Shilpa S Jadhav, Commented Aug 13, 2020 at 9:20
Hi Sushanth, sorry i got confused and pasted print(df) output .. print(t) output is as below, [(101, b"SET{'Tom','C'}", b"MULTISET{'Toyota','Cruiser'}", b"ROW('Street 1','12345678','NewYork, US')", 1200.0), (102, b"SET{'Rachel','Green'}", and goes on — Shilpa S Jadhav
– Shilpa S Jadhav, Commented Aug 13, 2020 at 14:41

sushanth · Accepted Answer · 2020-08-13 16:20:32Z

1

Here is a possible solution,

Let's define column of interest,

columns = ['cust_name', 'vehicle', 'details']

Use regex expression to extract values between {} or ()

regex_ = r"([\{|\(].*[\}|\)])"

Putting together, str.decode('ascii') is to convert columns values from byte to string.

columns = ['cust_name', 'vehicle', 'details']

regex_ = r"([\{|\(].*[\}|\)])"

for col in columns:
    df[col] = df[col].str.decode('ascii').str.extract(regex_)

   cust_id            cust_name  ...                                details    bill
0      101          {'Tom','C'}  ...  ('Street 1','12345678','NewYork, US')  1200.0
1      102   {'Rachel','Green'}  ...  ('Street 2','12344444','Florida, US')  2400.0
2      103  {'Chandler','Bing'}  ...  ('Street 1','12345555','Georgia, US')   601.1

answered Aug 13, 2020 at 16:20

sushanth

8,2923 gold badges20 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Shilpa S Jadhav Over a year ago

Further continuation to above scenario, can i access cust_name's first value something like this if i want to access only Tom how do i do that ?

Shilpa S Jadhav Over a year ago

Output should be I want to access first value of a cust_name in above case in first row it is cust_name[0] Tom then cust_name[1] 'C' in second row i want to access 'Rachel' then 'Green' Is there any way to do the same ?

sushanth Over a year ago

see this post, stackoverflow.com/a/56842372/4985099

Shilpa S Jadhav Over a year ago

I referred above link , df['cust_name2'] = df['cust_name'].apply(ast.literal_eval) df['cust_name2'] i get output as below, 0 {C, Tom} 1 {Rachel, Green} But i want first value of row as C then 2nd as Tom. From 2nd row's 1st value as Rachel then 2nd value as Green

Collectives™ on Stack Overflow

Removing characters from the dataframe python

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related