1

what I want to do is delete certain parts of a string and take the rest and insert it into a new column.

Example:

df = pd.read_excel("sdAll.xlsx")
print(df)

output =

0      asin="ASF23KJSA"
1      asin="SAFSAF3324S"
2      asin="ASFAS213434"
3      asin="1SF23AF2342S"
4      asin="ASF23KJSA"
             ...
424    asin="ASF23KJSA"
425    asin="1SF23AF2342S"
426    asin="ASF23KJSA"
427    asin="BSAFSAF3324S"
428    asin="B095437HDM"

I want to delete the asin="" part and insert the remaining part into another column.

df.head()

 Timeframe Ad Type Start Date   End Date                           Portfolio name Currency  ...    Spend 14 Day Total Sales Total Advertising Cost of Sales (ACOS)  Total Return on Advertising Spend (ROAS)  14 Day Total Orders (#)  14 Day Total Units (#)
0      L30D      SD 2022-11-08 2022-11-08                                        -      USD  ...  0.00000                  0                                    NaN                                       NaN                        0                       0
1      L30D      SD 2022-11-11 2022-12-03                                        -      USD  ...  0.00530                  0                                    NaN                                       0.0                        0                       0
2      L30D      SD 2022-11-09 2022-11-22                                        -      USD  ...  0.00000                  0                                    NaN                                       NaN                        0                       0
3      L30D      SD 2022-11-25 2022-12-04                                        -      USD  ...  0.09434                  0                                    NaN                                       0.0                        0                       0
4      L30D      SD 2022-11-09 2022-11-23                                        -      USD  ...  0.00000                  0                                    NaN                                       NaN                        0                       0
1
  • 1
    Can you show it with df.head() to check the columns please Commented Dec 9, 2022 at 20:11

4 Answers 4

2

You can use str.replace and regex with capturing group.

import pandas as pd
df = pd.DataFrame({'old_column' : ['asin="ASF23KJSA"' , 'asin="SAFSAF3324S"', 'asin="ASFAS213434"' , 'asin="1SF23AF2342S"' , 'asin="ASF23KJSA"']})
df['new_column'] = df['old_column'].str.replace(r'asin=\"(.*)\"', r'\1', regex=True)
print(df)

Output:

            old_column    new_column
0     asin="ASF23KJSA"     ASF23KJSA
1   asin="SAFSAF3324S"   SAFSAF3324S
2   asin="ASFAS213434"   ASFAS213434
3  asin="1SF23AF2342S"  1SF23AF2342S
4     asin="ASF23KJSA"     ASF23KJSA

Explanation:

  • Capturing group (

    .* : means "0 or more of any character"

    ) Close capturing group

Sign up to request clarification or add additional context in comments.

6 Comments

I think the following capture group would be better since it runs away from the idea that will start with as_in, and only capture the string value nonetheless yours is good for the given question here upvote '\"(.*)\"'
I would also utilize str extract really good one for capture groups based matchs
I can't do this solution because the column value changes for each file
@ShamnaSama, What do you mean the column value is changing?
example today's values = asin="ASF23KJSA" asin="SAFSAF3324S" asin="ASFAS213434" tomorrow it can be like this= asin="shjghsw324" asin="reotoisdgk" asin="asfglassl423"
|
1

Why dont you try this

df.insert_your_col_name.str.split('=').str[-1].str.replace('"', '').str.strip()

This will return your wanted string series, usually I also like to do a strip after for good measure.

You can also try str extract, with the following capture group

df.your_col.str.extract(r'\"(.*)\"')

1 Comment

I am still learning ^^ and thanks it works for me.
0

You replace the asin= part with an empty string, strip leading/ending whitespaces and write it in a new column.

df["new_column_name"] = df["asin_column_name"].str.replace("asin=", "").str.strip()

Comments

0

You can use pandas.Series.str.extract :

df["new_col"] = df["original_col"].str.extract('"([A-Z0-9]+)"', expand=False) #or pat = '"(.+)"'

# Output :

print(df)
            original_col       new_col
0       asin="ASF23KJSA"     ASF23KJSA
1     asin="SAFSAF3324S"   SAFSAF3324S
2     asin="ASFAS213434"   ASFAS213434
3    asin="1SF23AF2342S"  1SF23AF2342S
4       asin="ASF23KJSA"     ASF23KJSA
424     asin="ASF23KJSA"     ASF23KJSA
425  asin="1SF23AF2342S"  1SF23AF2342S
426     asin="ASF23KJSA"     ASF23KJSA
427  asin="BSAFSAF3324S"  BSAFSAF3324S
428    asin="B095437HDM"    B095437HDM

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.