Python Pandas Split DF

Question

pls review the code below, is there a more efficient way of splitting one DF into two? In the code below, the query is run twice. Would it be faster to just run the query once, and basically say if true send to DF1, else to DF2 ; or maybe after DF1 is created, someway to say that DF2 = DF minus DF1

code:

x1='john'
df = pd.read_csv(file, sep='\n', header=None, engine='python', quoting=3)
df = df[0].str.strip(' \t"').str.split('[,|;: \t]+', 1, expand=True).rename(columns={0: 'email', 1: 'data'}) 
df1= df[df.email.str.startswith(x1)]
df2= df[~df.email.str.startswith(x1)]

BENY · Accepted Answer · 2020-06-13 14:19:23Z

2

There's no need to compute the mask df.emailclean.str.startswith(x1) twice.

mask = df.emailclean.str.startswith(x1)
df1 = df[mask].copy() # in order not have SettingWithCopyWarning 
df2 = df[~mask].copy() # https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas

edited Jun 13, 2020 at 14:19

BENY

324k22 gold badges176 silver badges250 bronze badges

answered Jun 13, 2020 at 14:12

timgeb

79.2k20 gold badges129 silver badges150 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

BENY Over a year ago

kindly add copy at the end :-)

timgeb Over a year ago

@YOBEN_S Good suggestion, but do we know if OP needs a copy?

BENY Over a year ago

In this df1 , when setting a new column , it will have copy warning ~ , this is just my coding behavior ~

timgeb Over a year ago

@YOBEN_S I'm not entirely sure what you mean but feel free to edit your suggestion into my answer.

timgeb Over a year ago

@rogerwhite With del mask the object will be garbage collected eventually if mask was the only reference.

|

Collectives™ on Stack Overflow

Python Pandas Split DF

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related