2

I am absolute beginner. I have a problem in slicing string in a Excel file using Python. My Excel file contains the following info:

Column 1:

ordercode   
PMC11-AA1L1FAVWJA   
PMC21-AA1A1CBVXJA   
PMP11-AA1L1FAWJJ    
PMP21-AA1A1FBWJJ    
PMP23-AA1A1FA3EJ+JA
PTP31B-AA3D1HGBVXJ  
PTC31B-AA3D1CGBWBJA 
PTP33B-AA3D1HGB1JJ  

I want to slice the string in column "ordercode" based on whether it is
"PMC11"/"PMC21"/"PMP21"/"PMP11"/"PMP23"/"PTP31B"/"PTP33B"/"PTC31B" at different position and save it in new column "pressurerange". In Excel I have used the below code and it worked fine:

=IF(OR(ISNUMBER(SEARCH("PMC11",A2)),ISNUMBER(SEARCH("PMC21",A2)),ISNUMBER(SEARCH("PMP11",A2)),ISNUMBER(SEARCH("PMP21",A2)),ISNUMBER(SEARCH("PMP23",A2))),MID(A2,11,2),MID(A2,12,2))

but in Python I used the below coding, and it didn't work properly.

Python Code:

import pandas as pd
#Assigning the worksheet to file
file="Stratification_worksheet.xlsx"
#Loading the spreadsheet 
data= pd.ExcelFile(file)
#sheetname
print(data.sheet_names)
#loading the sheetname to df1
df=data.parse("Auftrag")
print(df)

#creating a new column preessurerange and slicing the pressure range from order code

for index,row in df.iterrows():
    if "PMC11" in df.loc[index,"ordercode"]:
        df["pressurerange"]=df["ordercode"].str.slice(10,12)
    elif "PMC21" in df.loc[index,"ordercode"]:
        df["pressurerange"]=df["ordercode"].str.slice(10,12)
    elif "PMP11" in df.loc[index,"ordercode"]:
        df["pressurerange"]=df["ordercode"].str.slice(10,12)
    elif "PMP21" in df.loc[index,"ordercode"]:
        df["pressurerange"]=df["ordercode"].str.slice(10,12)
    elif "PMP23" in df.loc[index,"ordercode"]:
        df["pressurerange"]=df["ordercode"].str.slice(10,12)
    elif "PTP31B" in df.loc[index,"ordercode"]:
        df["pressurerange"]=df["ordercode"].str.slice(11,13)
    elif "PTP33B" in df.loc[index,"ordercode"]:
        df["pressurerange"]=df["ordercode"].str.slice(11,13)
    elif "PTC31B" in df.loc[index,"ordercode"]:
        df["pressurerange"]=df["ordercode"].str.slice(11,13)
    else:
        df["pressurerange"]="NONE"
    print(df.loc[:,["pressurerange"]])
    break

Here what it does is it checked the first IF condition and it sliced the string at the position (10,12) for all the column. I know I have done mistake in the below code. But I don't know what is the exact code to use.

=df["pressurerange"]=df["ordercode"].str.slice(10,12)
1
  • I need to extract the string "1F"/"1C"/"1H" from the string. which is positioned at (10,12) for "PMC11,PMC21,PMP21,PMP11,PMP23" and for "PTP31B,PTP33B,PTC31B" it is positioned at (11,13) Commented Sep 3, 2018 at 8:14

3 Answers 3

1

Genera solution working with data with no -, then are returned NaNs.

I believe need numpy.select with conditions created by str.startswith:

L1 = ["PMC11","PMC21","PMP21","PMP11","PMP23"]
L2 = ["PTP31B","PTP33B","PTC31B"]
m1 = df["ordercode"].str.startswith(tuple(L1))
m2 = df["ordercode"].str.startswith(tuple(L2))

a = df["ordercode"].str.slice(10,12)
b = df["ordercode"].str.slice(11,13)

df["pressurerange"] = np.select([m1, m2], [a, b], default=np.nan)
print (df)
             ordercode pressurerange
0    PMC11-AA1L1FAVWJA            1F
1    PMC21-AA1A1CBVXJA            1C
2     PMP11-AA1L1FAWJJ            1F
3     PMP21-AA1A1FBWJJ            1F
4  PMP23-AA1A1FA3EJ+JA            1F
5   PTP31B-AA3D1HGBVXJ            1H
6  PTC31B-AA3D1CGBWBJA            1C
7   PTP33B-AA3D1HGB1JJ            1H

If all values have - solution should be simplify with str.split, then select second lists by str[1] and last select 5-6 character by str[4:6] or Series.str.slice:

df["pressurerange"] = df['ordercode'].str.split('-', n=1).str[1].str[4:6]
#alternative solution
#df["pressurerange"] = df['ordercode'].str.split('-', n=1).str[1].str.slice(4,6)
print (df)
             ordercode pressurerange
0    PMC11-AA1L1FAVWJA            1F
1    PMC21-AA1A1CBVXJA            1C
2     PMP11-AA1L1FAWJJ            1F
3     PMP21-AA1A1FBWJJ            1F
4  PMP23-AA1A1FA3EJ+JA            1F
5   PTP31B-AA3D1HGBVXJ            1H
6  PTC31B-AA3D1CGBWBJA            1C
7   PTP33B-AA3D1HGB1JJ            1H
Sign up to request clarification or add additional context in comments.

4 Comments

This seems a bit complicated for what is really a very simple problem?
@alkanen - Thank you for feed back, +3
For what it's worth, I think your solution is pretty nifty =)
@jezrael Thanks. its easy and simple.
1

Python gives you a lot more options than Excel. If you have a string code = "PMC21-AA1A1CBVXJA", you can write

pressurerange, rest = code.split("-")

and you have the part before the - and the part after. I'll let you figure out how to use this in your workflow.

(Note: If the rest part can contain additional hyphens, use code.split("-", 1) to limit the splitting to one match.)

Comments

0

I'd use split:

string = 'PMC11-AA1L1FAVWJA'
pressure_range, columns = string.split('-', 1)
column = columns[4:6]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.