3

I have a data frame, duration is one of the attributes. The duration's content is like:

            array(['487', '346', ...,  '227', '17']). 

And the df.info(), I get: Data columns (total 22 columns):

             duration        2999 non-null object
             campaign        2999 non-null object
             ...

Now I want to convert duration into int. Is there any solution?

3 Answers 3

5

Use astype:

df['duration'] = df['duration'].astype(int)

Timings

Using the following setup to produce a large sample dataset:

n = 10**5
data = list(map(str, np.random.randint(10**4, size=n)))
df = pd.DataFrame({'duration': data})

I get the following timings:

%timeit -n 100 df['duration'].astype(int)
100 loops, best of 3: 10.9 ms per loop

%timeit -n 100 df['duration'].apply(int)
100 loops, best of 3: 44.3 ms per loop

%timeit -n 100 df['duration'].apply(lambda x: int(x))
100 loops, best of 3: 60.1 ms per loop
Sign up to request clarification or add additional context in comments.

2 Comments

nice timings, though I suggest tweaking it to use the same number of loops for easier comparison
Edited to have the same number of loops.
3
df['duration'] = df['duration'].astype(int)

Comments

0

Use int(str):

df['duration'] = df['duration'].apply(lambda x: int(x)) #df is your dataframe with attribute 'duration'

2 Comments

No need for the lambda, .apply(int) will work and give better performance.
In general, lambda *args, **kwargs: f(*args, **kwargs) is exactly equivalent to f

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.