1

I want to replace a str from one of the column from the table. example: i want to remove b"SET and b"MULTISET from the df column. how to achieve that. I need output like Details are as below,

columns = ['cust_id', 'cust_name', 'vehicle', 'details', 'bill'] 
df = pd.DataFrame(data=t, columns=columns)
df
    
        cust_id     cust_name                   vehicle                             details                                                 bill
0   101         b"SET{'Tom','C'}"           b"MULTISET{'Toyota','Cruiser'}"     b"ROW('Street 1','12345678','NewYork, US')"             1200.00
1   102         b"SET{'Rachel','Green'}"    b"MULTISET{'Ford','se'}"            b"ROW('Street 2','12344444','Florida, US')"             2400.00
2   103         b"SET{'Chandler','Bing'}"   b"MULTISET{'Dodge','mpv'}"          b"ROW('Street 1','12345555','Georgia, US')"             601.10 

Required Output:

    cust_id     cust_name                   vehicle                             details                                         bill
0   101         {'Tom','C'}                 {'Toyota','Cruiser'}            ('Street 1','12345678','NewYork, US')               1200.00
1   102         {'Rachel','Green'}          {'Ford','se'}                   ('Street 2','12344444','Florida, US')               2400.00
2   103         {'Chandler','Bing'}         {'Dodge','mpv'}                 ('Street 1','12345555','Georgia, US')               601.10 
3
  • print(t) and include output in the post. Commented Aug 13, 2020 at 9:08
  • output of print(df) is as below, cust_id cust_name vehicle details bill 0 101 b"SET{'Tom','C'}" b"MULTISET{'Toyota','Cruiser'}" b"ROW('Street 1','12345678','NewYork, US')" 1200.00 1 102 b"SET{'Rachel','Green'}" b"MULTISET{'Ford','se'}" b"ROW('Street 2','12344444','Florida, US')" 2400.00 2 103 b"SET{'Chandler','Bing'}" b"MULTISET{'Dodge','mpv'}" b"ROW('Street 1','12345555','Georgia, US')" 601.10 Commented Aug 13, 2020 at 9:20
  • Hi Sushanth, sorry i got confused and pasted print(df) output .. print(t) output is as below, [(101, b"SET{'Tom','C'}", b"MULTISET{'Toyota','Cruiser'}", b"ROW('Street 1','12345678','NewYork, US')", 1200.0), (102, b"SET{'Rachel','Green'}", and goes on Commented Aug 13, 2020 at 14:41

1 Answer 1

1

Here is a possible solution,

  • Let's define column of interest,
columns = ['cust_name', 'vehicle', 'details']
  • Use regex expression to extract values between {} or ()
regex_ = r"([\{|\(].*[\}|\)])"
  • Putting together, str.decode('ascii') is to convert columns values from byte to string.
columns = ['cust_name', 'vehicle', 'details']

regex_ = r"([\{|\(].*[\}|\)])"

for col in columns:
    df[col] = df[col].str.decode('ascii').str.extract(regex_)

   cust_id            cust_name  ...                                details    bill
0      101          {'Tom','C'}  ...  ('Street 1','12345678','NewYork, US')  1200.0
1      102   {'Rachel','Green'}  ...  ('Street 2','12344444','Florida, US')  2400.0
2      103  {'Chandler','Bing'}  ...  ('Street 1','12345555','Georgia, US')   601.1
Sign up to request clarification or add additional context in comments.

4 Comments

Further continuation to above scenario, can i access cust_name's first value something like this if i want to access only Tom how do i do that ?
Output should be I want to access first value of a cust_name in above case in first row it is cust_name[0] Tom then cust_name[1] 'C' in second row i want to access 'Rachel' then 'Green' Is there any way to do the same ?
I referred above link , df['cust_name2'] = df['cust_name'].apply(ast.literal_eval) df['cust_name2'] i get output as below, 0 {C, Tom} 1 {Rachel, Green} But i want first value of row as C then 2nd as Tom. From 2nd row's 1st value as Rachel then 2nd value as Green

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.