2

I have a dataframe, df, where I would like specific separations of values within my column to display the first word and the number along with its 'T' value. I would like the first 'word' that is separated by '-', and its #T value. With the exception of 'Azure' case, where the first word is separated by '_'

It is tricky because some of the #T values are separated by '-', while others are separated by '_' ex. -12T in one of the values , as well as _14T in another value I would like to maintain the original values in the type column

Sample Data

data = {'type': ['Azure_Standard_E64is_v4_SPECIAL_DB-A.0', 'Azure_Standard_E64is_v4_SPECIAL_DB-A.0', 'Hello-HEL-HE-A6123-123A-12T_TYPE-v.A', 'Hello-HEL-HE-A6123-123A-12T_TYPE-v.E', 'Hello-HEL-HE-A6123-123A-50T_TYPE-v.C', 'Hello-HEL-HE-A6123-123A-50T_TYPE-v.A', 'Happy-HAP-HA-R650-570A-90T_version-v.A', 'Kind-KIN-KI-T490-NET_14T-A.0', 'Kind-KIN-KI-T490-NET_14T-A.0', 'AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A', 'AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A'], 'free': [6, 5, 10, 5, 1, 2, 10, 7, 6, 3, 0], 'use': [1, 1, 10, 1, 4, 1, 0, 4, 3, 0, 20], 'total': [7, 6, 20, 6, 5, 1, 10, 3, 2, 3, 20]}
df = pd.DataFrame(data)


                                      type  free  use  total
0   Azure_Standard_E64is_v4_SPECIAL_DB-A.0     6    1      7
1   Azure_Standard_E64is_v4_SPECIAL_DB-A.0     5    1      6
2     Hello-HEL-HE-A6123-123A-12T_TYPE-v.A    10   10     20
3     Hello-HEL-HE-A6123-123A-12T_TYPE-v.E     5    1      6
4     Hello-HEL-HE-A6123-123A-50T_TYPE-v.C     1    4      5
5     Hello-HEL-HE-A6123-123A-50T_TYPE-v.A     2    1      1
6   Happy-HAP-HA-R650-570A-90T_version-v.A    10    0     10
7             Kind-KIN-KI-T490-NET_14T-A.0     7    4      3
8             Kind-KIN-KI-T490-NET_14T-A.0     6    3      2
9      AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A     3    0      3
10     AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A     0   20     20

Desired:

   Name                                          type                free   use  total
  
   Azure_Standard_E64is_v4_SPECIAL_DB-A.0        Azure               6       1    7       
   Azure_Standard_E64is_v4_SPECIAL_DB-A.0        Azure               5       1    6                                       
   Hello-HEL-HE-A6123-123A-12T_TYPE-v.A          Hello   12T         10      10  20
   Hello-HEL-HE-A6123-123A-12T_TYPE-v.E          Hello   12T         5       1    6
   Hello-HEL-HE-A6123-123A-50T_TYPE-v.C          Hello   50T         1       4    5
   Hello-HEL-HE-A6123-123A-50T_TYPE-v.A          Hello   50T         2       1    1
   Happy-HAP-HA-R650-570A-90T_version-v.A        Happy   90T         10      0   10
   Kind-KIN-KI-T490-NET_14T-A.0                  Kind    14T         7      4    3
   Kind-KIN-KI-T490-NET_14T-A.0                  Kind    14T         6      3    2
   AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A           AY14.5  6.4T        3      0    3
   AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A           AY14.5  6.4T        0      20   20
        

Doing:

df['type']= df['type'].str.extract(r'(^\w+.\d|^\w+)')+' '+df['type'].str.extract(r'(\d.\d+T|\d+T)')

This works below, however, the 'AZURE' value disappears, and the original value is not maintained. I am still researching this, any assistance is appreciated.

1
  • 1
    use df['type'].str.extract(r'(\d.\d+T|\d+T)').fillna('') instead of df['type'].str.extract(r'(\d.\d+T|\d+T)'), then the 'AZURE' value will not disappear. Commented Jan 12, 2021 at 6:20

1 Answer 1

2

You can use Series.str.replace with Series.str.cat and last add Series.str.strip, also is added expand=False to Series.str.extract for Series.

For new column for second position is used DataFrame.insert.

s = (df['type'].str.replace('_','-')
               .str.extract(r'(^\w+.\d|^\w+)', expand=False)
               .str.cat(df['type'].str.extract(r'(\d.\d+T|\d+T)', expand=False), 
                        sep=' ', 
                        na_rep='')
               .str.strip())

Thank you @Trenton McKinney for another solution - splitting values and get first one values of lists:

s = (df['type'].str.split('_|-')
               .str[0]
               .str.cat(df['type'].str.extract(r'(\d.\d+T|\d+T)', expand=False), 
                        sep=' ', 
                        na_rep='')
               .str.strip())

df = df.rename(columns={'type': 'Name'})
df.insert(1, 'type', s)
print (df)
                                      Name         type  free  use  total
0   Azure_Standard_E64is_v4_SPECIAL_DB-A.0        Azure     6    1      7
1   Azure_Standard_E64is_v4_SPECIAL_DB-A.0        Azure     5    1      6
2     Hello-HEL-HE-A6123-123A-12T_TYPE-v.A    Hello 12T    10   10     20
3     Hello-HEL-HE-A6123-123A-12T_TYPE-v.E    Hello 12T     5    1      6
4     Hello-HEL-HE-A6123-123A-50T_TYPE-v.C    Hello 50T     1    4      5
5     Hello-HEL-HE-A6123-123A-50T_TYPE-v.A    Hello 50T     2    1      1
6   Happy-HAP-HA-R650-570A-90T_version-v.A    Happy 90T    10    0     10
7             Kind-KIN-KI-T490-NET_14T-A.0     Kind 14T     7    4      3
8             Kind-KIN-KI-T490-NET_14T-A.0     Kind 14T     6    3      2
9      AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A  AY14.5 6.4T     3    0      3
10     AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A  AY14.5 6.4T     0   20     20
Sign up to request clarification or add additional context in comments.

5 Comments

Ok thank you- is there a way to maintain the original values in that type column? I will try this
df['type'].str.replace('_','-').str.split('-', expand=True)[0] also works for the first part
@TrentonMcKinney - thank you, I a bit change it, but your idea is used.
@Lynn It's to bad you don't need the 'DA'. I noticed that the group of words you want is always at index 5, if you split the string. So the entire thing could be something like df['type'].str.split('_|-', expand=True).iloc[:, [0, 5]]. However, the excellent answer from jezrael gives you exactly what you want.
thank you for the assistance with this- I am trying this now

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.