1

I have about 200.000 rows in df and 20 columns, and one column contains a name of the station. It looks like this:

00001 OPPT YY G
00002 LIMO DA G
000016 KAPAL VS G
0000663 TAPS VS G
...

What is the best way to take just numbers from column values? Desired output:

00001
00001
000016
0000663

Thanks

3 Answers 3

1

Assuming Col1 is your column

df
Out: 
                Col1
0    00001 OPPT YY G
1    00002 LIMO DA G
2  000016 KAPAL VS G
3  0000663 TAPS VS G

Split on space and take the first element:

df['Col1'].str.split().str[0]
Out: 
0      00001
1      00002
2     000016
3    0000663
Name: Col1, dtype: object
Sign up to request clarification or add additional context in comments.

2 Comments

Well, sometimes it can be different order, first word, then digit, my bad.
@jovicbg I think that requires regex and I am not very good at it. Can you un-accept the answer so people won't consider this solved and look at the question. It might be better to edit the question to include that by the way.
1

May be smth like this:

df['col_1'] = df['col_1'].replace(r'^(\b\d+\b).*$', r'\1', regex=True)

Comments

0

so you can use this function: for eg if your dataframe is df and your first column contain this data then:

(df.T).ix[0]

i hope this will help you.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.