0

I have a df that looks like this

    Identifier      Ticker      ISIN        Relationship Country .......
0   5.037663e+09    BTGGg.F     D10080162   Supplier     Germany .......
1   4.295870e+09    IVXG.DE     NaN         Supplier     Germany .......
2   5.043321e+09    SAPG.DE     D66992104   Customer     Germany
3   4.295869e+09    BMWG.DE     D12096109   Customer     Germany    
4   4.295870e+09    DTEGn.DE    D2035M136   Customer     Germany
5   4.295870e+09    IFXGn.DE    D35415104   Supplier     Germany
6   4.295869e+09    NSUG.DE     D04312100   Customer     Germany
7   5.000074e+09    EVKn.DE     D2R90Y117   Customer     Germany
8   4.295869e+09    LHAG.DE     D1908N106   Customer     Germany
9   4.295869e+09    MTXGn.DE    D5565H104   Supplier     Germany
10  4.295869e+09    SIEGn.DE    D69671218   Supplier     Germany
11  4.295870e+09    TKAG.DE     D8398Q119   Supplier     Germany
12  5.059963e+09    BNRGn.DE    D12459117   Customer     Germany
13  4.295869e+09    RHMG.DE     D65111102   Supplier     Germany
14  5.001195e+09    GBFG.DE     D11648108   Supplier     Germany
15  4.295869e+09    NXUG.DE     D5650J106   Customer     Germany
16  4.295870e+09    DPWGn.DE    D19225107   Supplier     Germany
17  4.295870e+09    ILM1k.DE    D22430116   Supplier     Germany
18  4.295869e+09    ADSGn.DE    D0066B185   Customer     Germany
19. 5.125125e+12    DBS.SG      D12300523.  Supplier     SG
........................................................................

In the rows where the df['Country'] == 'Germany',

I have 2 functions I want to perform.

function 1: I want to isolate the rows where there is a smaller case letter, any smaller case letter before the ".", I want to drop it, so BTGGg.F would become BTGG.F, 1LM1k.DE would become 1LM1.DE but NXUG.DE would be unaffected.

Using that new dataframe after the first function.

function 2: Then for the rows where there is an upper case G before the ".", I want to drop the G so RHMG.DE would become RHM.DE but 1LM1.DE would be unaffected.

If I wanted to drop just the letter before the ".", the function would be simple like x = x.replace(x[x.find(".")-1],"").

But I don't know how to do this in a dataframe, and I don't know how to apply the conditions I mentioned. Can it be done and if so, how?

I was thinking it might look something like this but this obviously didn't work I already tried it.

df.loc[df['Country'].eq('Germany'),'Ticker'] = df.loc[df['Country'].eq('Germany'),'Ticker'].str.replace((df['Ticker'][df['Ticker'].find(".")-1],"") if df['Ticker'][df['Ticker'].find(".")-1] == '([a-z])')

followed by

df.loc[df['Country'].eq('Germany'),'Ticker'] = df.loc[df['Country'].eq('Germany'),'Ticker'].str.replace((df['Ticker'][df['Ticker'].find(".")-1],"") if df['Ticker'][df['Ticker'].find(".")-1] == 'G')

Here is what the output should look like after the first round, from Country == Germany, remove the first lower case letter preceding "." :

    Identifier      Ticker      ISIN        Relationship Country .......
0   5.037663e+09    BTGG.F      D10080162   Supplier     Germany .......
1   4.295870e+09    IVXG.DE     NaN         Supplier     Germany .......
2   5.043321e+09    SAPG.DE     D66992104   Customer     Germany
3   4.295869e+09    BMWG.DE     D12096109   Customer     Germany    
4   4.295870e+09    DTEG.DE     D2035M136   Customer     Germany
5   4.295870e+09    IFXG.DE     D35415104   Supplier     Germany
6   4.295869e+09    NSUG.DE     D04312100   Customer     Germany
7   5.000074e+09    EVK.DE      D2R90Y117   Customer     Germany
8   4.295869e+09    LHAG.DE     D1908N106   Customer     Germany
9   4.295869e+09    MTXG.DE     D5565H104   Supplier     Germany
10  4.295869e+09    SIEG.DE     D69671218   Supplier     Germany
11  4.295870e+09    TKAG.DE     D8398Q119   Supplier     Germany
12  5.059963e+09    BNRG.DE     D12459117   Customer     Germany
13  4.295869e+09    RHMG.DE     D65111102   Supplier     Germany
14  5.001195e+09    GBFG.DE     D11648108   Supplier     Germany
15  4.295869e+09    NXUG.DE     D5650J106   Customer     Germany
16  4.295870e+09    DPWG.DE     D19225107   Supplier     Germany
17  4.295870e+09    ILM1.DE     D22430116   Supplier     Germany
18  4.295869e+09    ADSG.DE     D0066B185   Customer     Germany
19. 5.125125e+12    DBS.SG      D12300523   Supplier     SG
........................................................................

this is after the second round, from Country == Germany, remove the first upper case "G" preceding the ".":

    Identifier      Ticker      ISIN        Relationship Country .......
0   5.037663e+09    BTG.F       D10080162   Supplier     Germany .......
1   4.295870e+09    IVX.DE      NaN         Supplier     Germany .......
2   5.043321e+09    SAP.DE      D66992104   Customer     Germany
3   4.295869e+09    BMW.DE      D12096109   Customer     Germany    
4   4.295870e+09    DTE.DE      D2035M136   Customer     Germany
5   4.295870e+09    IFX.DE      D35415104   Supplier     Germany
6   4.295869e+09    NSU.DE      D04312100   Customer     Germany
7   5.000074e+09    EVK.DE      D2R90Y117   Customer     Germany
8   4.295869e+09    LHA.DE      D1908N106   Customer     Germany
9   4.295869e+09    MTX.DE      D5565H104   Supplier     Germany
10  4.295869e+09    SIE.DE      D69671218   Supplier     Germany
11  4.295870e+09    TKA.DE      D8398Q119   Supplier     Germany
12  5.059963e+09    BNR.DE      D12459117   Customer     Germany
13  4.295869e+09    RHM.DE      D65111102   Supplier     Germany
14  5.001195e+09    GBF.DE      D11648108   Supplier     Germany
15  4.295869e+09    NXU.DE      D5650J106   Customer     Germany
16  4.295870e+09    DPW.DE      D19225107   Supplier     Germany
17  4.295870e+09    ILM1.DE     D22430116   Supplier     Germany
18  4.295869e+09    ADS.DE      D0066B185   Customer     Germany
19. 5.125125e+12    DBS.SG      D12300523   Supplier     SG
........................................................................

2
  • kindly post the expected output Commented May 21, 2020 at 0:11
  • okay I've updated the question with the expected output Commented May 21, 2020 at 4:56

2 Answers 2

1
df.loc[df.Country == 'Germany', 'Ticker'] = df[df.Country == 'Germany']['Ticker'].str.replace(r'[a-z](?=\.)', '').str.replace(r'G(?=\.)', '')
print(df)

Prints:

      Identifier   Ticker        ISIN Relationship  Country
0   5.037663e+09    BTG.F   D10080162     Supplier  Germany
1   4.295870e+09   IVX.DE         NaN     Supplier  Germany
2   5.043321e+09   SAP.DE   D66992104     Customer  Germany
3   4.295869e+09   BMW.DE   D12096109     Customer  Germany
4   4.295870e+09   DTE.DE   D2035M136     Customer  Germany
5   4.295870e+09   IFX.DE   D35415104     Supplier  Germany
6   4.295869e+09   NSU.DE   D04312100     Customer  Germany
7   5.000074e+09   EVK.DE   D2R90Y117     Customer  Germany
8   4.295869e+09   LHA.DE   D1908N106     Customer  Germany
9   4.295869e+09   MTX.DE   D5565H104     Supplier  Germany
10  4.295869e+09   SIE.DE   D69671218     Supplier  Germany
11  4.295870e+09   TKA.DE   D8398Q119     Supplier  Germany
12  5.059963e+09   BNR.DE   D12459117     Customer  Germany
13  4.295869e+09   RHM.DE   D65111102     Supplier  Germany
14  5.001195e+09   GBF.DE   D11648108     Supplier  Germany
15  4.295869e+09   NXU.DE   D5650J106     Customer  Germany
16  4.295870e+09   DPW.DE   D19225107     Supplier  Germany
17  4.295870e+09  ILM1.DE   D22430116     Supplier  Germany
18  4.295869e+09   ADS.DE   D0066B185     Customer  Germany
19  5.125125e+12   DBS.SG  D12300523.     Supplier       SG
Sign up to request clarification or add additional context in comments.

5 Comments

wow thank you! I've been trying to figure this out for ages! do you know where I can learn the regex syntax for stuff like this?
@anarchy Good starting place is re module documentation and you can try regexes on regex101 for example.
thanks! does using (r'G?[a-z]?(?=\.)', '') achieve the same result as using both those methods above.
@anarchy Yes, you can try it. But I separated it to be more explicit.
okay thanks, just wondering it there would be an error that's all
0

For dealing with the functions you mentioned, you can use some regex to perform them both

re.sub(r'G?[a-z]?(?=\.)', '', 'HERE_YOUR_COUNTRY_THINGY')

You didn't given more rules

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.