I need to run a task which can be done with a loop, but I imagine that there is a more efficient and pretty way to do this. I have a DataFrame which has an integer column, which I would like to transform to a 4-digit string representation. That is, 3 should be converted to '0003', 234 should be converted to '0234'. I am looking for a vector operation that will do this to all entries in the column at once, quickly with simple code.
Add a comment
|
2 Answers
you can use Series.str.zfill() method:
df['column_name'] = df['column_name'].astype(str).str.zfill(4)
Demo:
In [29]: df = pd.DataFrame({'a':[1,2], 'b':[3,234]})
In [30]: df
Out[30]:
a b
0 1 3
1 2 234
In [31]: df['b'] = df['b'].astype(str).str.zfill(4)
In [32]: df
Out[32]:
a b
0 1 0003
1 2 0234
3 Comments
nipy
First time i have seen this one, nice @MaxU +1
jpmartineau
Nice, but the transformation is applied AFTER the int column is converted to string. Technically, does not answer the quesstion.
MaxU - stand with Ukraine
@jpmartineau, OP asked: "which I would like to transform to a 4-digit string representation" - i think it answers exactly that. ;)
You can also do this using the Series.apply() method and an f-string wrapped in a lambda function:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a':[1,2], 'b':[3,234]})
In [3]: df
Out[3]:
a b
0 1 3
1 2 234
In [4]: df['b'] = df['b'].apply(lambda x: f"{x:04d}")
In [5]: df
Out[5]:
a b
0 1 0003
1 2 0234
In the f-string, the part after the colon says "zero-pad the field with four characters and make it a signed base-10 integer".
4 Comments
Yano
This reply better answers to the title (but not to the description) of the question
craigim
How does it not fit the description? It is "a vector operation that will do this to all entries in the column at once, quickly with simple code".
Yano
The description only asks for zero padding so str.zfill seemed more appropriate. However, from a quick timeit, it seems apply is twice faster for a 100000 vector.
craigim
From the linked format specification for f-strings: "When no explicit alignment is given, preceding the width field by a zero ('0') character enables sign-aware zero-padding for numeric types. This is equivalent to a fill character of '0' with an alignment type of '='." (emphasis mine).