how to get specific string of pandas column value?

Question

what I want to do is delete certain parts of a string and take the rest and insert it into a new column.

Example:

df = pd.read_excel("sdAll.xlsx")
print(df)

output =

0      asin="ASF23KJSA"
1      asin="SAFSAF3324S"
2      asin="ASFAS213434"
3      asin="1SF23AF2342S"
4      asin="ASF23KJSA"
             ...
424    asin="ASF23KJSA"
425    asin="1SF23AF2342S"
426    asin="ASF23KJSA"
427    asin="BSAFSAF3324S"
428    asin="B095437HDM"

I want to delete the asin="" part and insert the remaining part into another column.

df.head()

 Timeframe Ad Type Start Date   End Date                           Portfolio name Currency  ...    Spend 14 Day Total Sales Total Advertising Cost of Sales (ACOS)  Total Return on Advertising Spend (ROAS)  14 Day Total Orders (#)  14 Day Total Units (#)
0      L30D      SD 2022-11-08 2022-11-08                                        -      USD  ...  0.00000                  0                                    NaN                                       NaN                        0                       0
1      L30D      SD 2022-11-11 2022-12-03                                        -      USD  ...  0.00530                  0                                    NaN                                       0.0                        0                       0
2      L30D      SD 2022-11-09 2022-11-22                                        -      USD  ...  0.00000                  0                                    NaN                                       NaN                        0                       0
3      L30D      SD 2022-11-25 2022-12-04                                        -      USD  ...  0.09434                  0                                    NaN                                       0.0                        0                       0
4      L30D      SD 2022-11-09 2022-11-23                                        -      USD  ...  0.00000                  0                                    NaN                                       NaN                        0                       0

Can you show it with df.head() to check the columns please — rafidini
– rafidini, Commented Dec 9, 2022 at 20:11

Mahdi F. · Accepted Answer · 2022-12-09 20:37:03Z

2

You can use str.replace and regex with capturing group.

import pandas as pd
df = pd.DataFrame({'old_column' : ['asin="ASF23KJSA"' , 'asin="SAFSAF3324S"', 'asin="ASFAS213434"' , 'asin="1SF23AF2342S"' , 'asin="ASF23KJSA"']})
df['new_column'] = df['old_column'].str.replace(r'asin=\"(.*)\"', r'\1', regex=True)
print(df)

Output:

            old_column    new_column
0     asin="ASF23KJSA"     ASF23KJSA
1   asin="SAFSAF3324S"   SAFSAF3324S
2   asin="ASFAS213434"   ASFAS213434
3  asin="1SF23AF2342S"  1SF23AF2342S
4     asin="ASF23KJSA"     ASF23KJSA

Explanation:

Capturing group (

.* : means "0 or more of any character"

) Close capturing group

edited Dec 9, 2022 at 20:37

answered Dec 9, 2022 at 20:16

Mahdi F.

24.1k5 gold badges25 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

INGl0R1AM0R1 Over a year ago

I think the following capture group would be better since it runs away from the idea that will start with as_in, and only capture the string value nonetheless yours is good for the given question here upvote '\"(.*)\"'

INGl0R1AM0R1 Over a year ago

I would also utilize str extract really good one for capture groups based matchs

Shamna Sama Over a year ago

I can't do this solution because the column value changes for each file

Mahdi F. Over a year ago

@ShamnaSama, What do you mean the column value is changing?

Shamna Sama Over a year ago

example today's values = asin="ASF23KJSA" asin="SAFSAF3324S" asin="ASFAS213434" tomorrow it can be like this= asin="shjghsw324" asin="reotoisdgk" asin="asfglassl423"

|

INGl0R1AM0R1 · Accepted Answer · 2022-12-09 20:25:55Z

1

Why dont you try this

df.insert_your_col_name.str.split('=').str[-1].str.replace('"', '').str.strip()

This will return your wanted string series, usually I also like to do a strip after for good measure.

You can also try str extract, with the following capture group

df.your_col.str.extract(r'\"(.*)\"')

edited Dec 9, 2022 at 20:25

answered Dec 9, 2022 at 20:12

INGl0R1AM0R1

1,6287 silver badges18 bronze badges

1 Comment

Shamna Sama Over a year ago

I am still learning ^^ and thanks it works for me.

Gandhi · Accepted Answer · 2022-12-09 20:17:47Z

0

You replace the asin= part with an empty string, strip leading/ending whitespaces and write it in a new column.

df["new_column_name"] = df["asin_column_name"].str.replace("asin=", "").str.strip()

answered Dec 9, 2022 at 20:17

Gandhi

3642 silver badges9 bronze badges

Comments

Timeless · Accepted Answer · 2022-12-09 20:18:39Z

0

You can use pandas.Series.str.extract :

df["new_col"] = df["original_col"].str.extract('"([A-Z0-9]+)"', expand=False) #or pat = '"(.+)"'

# Output :

print(df)
            original_col       new_col
0       asin="ASF23KJSA"     ASF23KJSA
1     asin="SAFSAF3324S"   SAFSAF3324S
2     asin="ASFAS213434"   ASFAS213434
3    asin="1SF23AF2342S"  1SF23AF2342S
4       asin="ASF23KJSA"     ASF23KJSA
424     asin="ASF23KJSA"     ASF23KJSA
425  asin="1SF23AF2342S"  1SF23AF2342S
426     asin="ASF23KJSA"     ASF23KJSA
427  asin="BSAFSAF3324S"  BSAFSAF3324S
428    asin="B095437HDM"    B095437HDM

answered Dec 9, 2022 at 20:18

Timeless

38.3k6 gold badges33 silver badges54 bronze badges

Collectives™ on Stack Overflow

how to get specific string of pandas column value?

4 Answers 4

6 Comments

1 Comment

Comments

# Output :

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

1 Comment

Comments

# Output :

Comments

Your Answer

Sign up or log in

Post as a guest

Related