0

Objective: to reformat the contents of a pandas dataframe based on what has been provided to me.

I have the following dataframe: Example dataframe

I am looking to change each column with the following style:

enter image description here

I am using the following code to produce the style I need, but it is not efficient:

lt = []
for i in patterns['Components'][0]:
    for x in i.split('__'):
        lt.append(x)
lt[1].replace('(','').replace(', ',' < '+str(lt[0])+' ≤ ').replace(']','')

I have attempted Pandas Replace to no avail - it throws no errors and seems to ignore what I am aiming to do.

2
  • Are all the columns of type string? What do you get when you type(df.Components.iloc[0])? Commented Sep 10, 2017 at 16:10
  • non-null object Commented Sep 10, 2017 at 16:21

2 Answers 2

1

Source DF:

In [37]: df
Out[37]:
                           Components                             Outcome
0          (Quantity__(0.0, 16199.0])  (UnitPrice__(-1055.648, 3947.558])
1  (UnitPrice__(-1055.648, 3947.558])          (Quantity__(0.0, 16199.0])

Solution:

In [38]: cols = ['Components','Outcome']
    ...: df[cols] = df[cols].replace(r'\(([^_]*)__\(([^,\s]+),\s*([^\]]+)\]\).*',
    ...:                             r'\2 < \1 <= \3',
    ...:                             regex=True)

Result:

In [39]: df
Out[39]:
                          Components                            Outcome
0          0.0 < Quantity <= 16199.0  -1055.648 < UnitPrice <= 3947.558
1  -1055.648 < UnitPrice <= 3947.558          0.0 < Quantity <= 16199.0

UPDATE:

In [113]: df
Out[113]:
                                Components                               Outcome
0             (Quantity__(0.0, 16199.0])     (UnitPrice__(-1055.648, 3947.558])
1    (UnitPrice__(-1055.648, 3947.558])             (Quantity__(0.0, 16199.0])

In [114]: cols = ['Components','Outcome']

In [115]: pat = r'\s*\(([^_]*)__\(([^,\s]+),\s*([^\]]+)\]\)\s*'

In [116]: df[cols] = df[cols].replace(pat, r'\2 < \1 <= \3', regex=True)

In [117]: df
Out[117]:
                          Components                            Outcome
0          0.0 < Quantity <= 16199.0  -1055.648 < UnitPrice <= 3947.558
1  -1055.648 < UnitPrice <= 3947.558          0.0 < Quantity <= 16199.0

or witout parentheses:

In [119]: df
Out[119]:
                         Components                           Outcome
0         Quantity__(0.0, 16199.0])  UnitPrice__(-1055.648, 3947.558]
1  UnitPrice__(-1055.648, 3947.558]          Quantity__(0.0, 16199.0]

In [120]: pat = r'([^_]*)__\(([^,\s]+),\s*([^\]]+)\]'

In [121]: df[cols] = df[cols].replace(pat, r'\2 < \1 <= \3', regex=True)

In [122]: df
Out[122]:
                          Components                            Outcome
0         0.0 < Quantity <= 16199.0)  -1055.648 < UnitPrice <= 3947.558
1  -1055.648 < UnitPrice <= 3947.558          0.0 < Quantity <= 16199.0
Sign up to request clarification or add additional context in comments.

3 Comments

your solution looks wonderful, but I only get back the original results of the dataframe (nothing new). In case it matters, the original results in the Pandas dataframe (['Components','Outcome']) are both non-null objects.
@Student, that means that your real data (strings) are slightly different and your sample data set is not reproducible - because of that the RegEx is working for your sample DF and isn't working on your real data.... Can you provide a reproducible sample data set (in text format, so we could copy and paste it)?
I ran the following: patterns['Components'][0],df['Components'][0] which produced the following: (frozenset({'Quantity__(0.0, 16199.0]'}), '(Quantity__(0.0, 16199.0])'). I am not sure if this is helpful, but all I have is the output from the original dataframe (patterns). Since you theorized that the two dataframe's may not be the same (and they are not based on patterns=df=False), I have tried to cleanup stuff with patterns.replace('(^\s+|\s+$)', '', regex=True, inplace=True). At present, this has made no difference in the output. Any ideas?
0
import pandas as pd
import re
data=pd.DataFrame({'components':
['(quantity__(0.0,16199.0])','(unitprice__(-1055.648,8494.557])'],'outcome':
['(unitprice__(-1055.648,8494.557])','quantity__(0.0,16199.0])']})


def func(x):
    x=str(x)
    x=x.split('__')
    dx=x[0].replace("(",'')
    mt=re.findall('\d*\.\d*',x[1])
    return('{}<{}<={}'.format(dx,mt[0],mt[1]))


df=data.applymap(func)
print(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.