0

I want to perform an operation on a specific column of a pandas.dataframe.

From this:

# admit gre gpa rank
# 0 0 1123 3.61 3
# 1 1 4454 3.67 3
# 2 1 8000 4.00 1
# 3 1 6405 3.19 4
# 4 0 5205 2.93 4

I want to change the gre column this way: Select just the last two numbers.. (ex: df['gre':0] = 1123 => 23). I have a very big set of data so, I am looking for a simple way to do so without running through the pandas data frame, I tried the python for loop:

for i in df.index:
   df.loc[i, 'gre'] = str(df.loc[i, 'gre'])[2:3]

This work but it take time.. A lot of it.. Thanks

1
  • Is the dtype already a string or not? Commented Jan 16, 2015 at 8:40

1 Answer 1

2

If the dtype is numeric then you can convert to a string and then take the last 2 characters:

In [4]:

df['gre'] = df['gre'].astype(str).str[-2:]
df
Out[4]:
   admit gre   gpa  rank
0      0  23  3.61     3
1      1  54  3.67     3
2      1  00  4.00     1
3      1  05  3.19     4
4      0  05  2.93     4

If it's already a string then df['gre'] = df['gre'].str[-2:] would work fine.

You can then convert back:

In [7]:

df['gre'] = df['gre'].astype(np.int64)
df.dtypes
Out[7]:
admit      int64
gre        int64
gpa      float64
rank       int64
dtype: object

timings

In [9]:

%%timeit 
for i in df.index:
   df.loc[i, 'gre'] = str(df.loc[i, 'gre'])[2:3]
100 loops, best of 3: 2.98 ms per loop
In [11]:

%timeit df['gre'] = df['gre'].astype(str).str[-2:]

1000 loops, best of 3: 380 µs per loop

We can see that using the vectorised str method is over 700X faster

Sign up to request clarification or add additional context in comments.

3 Comments

How to do it? I am sorry I am relatively new here :/
There will be a tick mark underneath the down arrow next to the answer, at the top left
I need to wait minute at least :p

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.