201

Given a DataFrame:

np.random.seed(0)
df = pd.DataFrame(np.random.randn(3, 3), columns=list('ABC'), index=[1, 2, 3])
df

          A         B         C
1  1.764052  0.400157  0.978738
2  2.240893  1.867558 -0.977278
3  0.950088 -0.151357 -0.103219

What is the simplest way to add a new column containing a constant value eg 0?

          A         B         C  new
1  1.764052  0.400157  0.978738    0
2  2.240893  1.867558 -0.977278    0
3  0.950088 -0.151357 -0.103219    0

This is my solution, but I don't know why this puts NaN into 'new' column?

df['new'] = pd.Series([0 for x in range(len(df.index))])

          A         B         C  new
1  1.764052  0.400157  0.978738  0.0
2  2.240893  1.867558 -0.977278  0.0
3  0.950088 -0.151357 -0.103219  NaN
5
  • 12
    if you use an index its okay. df['new'] = pd.Series([0 for x in range(len(df.index))], index=df.index). Commented Jun 4, 2014 at 13:52
  • 8
    also, a list comprehension is entirely unnecessary here. just do [0] * len(df.index) Commented Jun 4, 2014 at 14:01
  • @joris, I meant that df['new']=0 shows the proper why of assigning zeros to the whole column, but it doesn't explain why my first attempt inserts NaN. This was answered by the Philip Cloud in the answer I accepted. Commented Jun 4, 2014 at 18:44
  • 13
    Simply do df['new'] = 0 Commented May 20, 2019 at 6:29
  • 2
    @flow2k it gives a warning A value is trying to be set on a copy of a slice from a DataFrame. Commented Dec 12, 2022 at 14:02

4 Answers 4

221

Super simple in-place assignment: df['new'] = 0

For in-place modification, perform direct assignment. This assignment is broadcasted by pandas for each row.

df = pd.DataFrame('x', index=range(4), columns=list('ABC'))
df

   A  B  C
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x

df['new'] = 'y'
# Same as,
# df.loc[:, 'new'] = 'y'
df

   A  B  C new
0  x  x  x   y
1  x  x  x   y
2  x  x  x   y
3  x  x  x   y

Note for object columns

If you want to add an column of empty lists, here is my advice:

  • Consider not doing this. object columns are bad news in terms of performance. Rethink how your data is structured.
  • Consider storing your data in a sparse data structure. More information: sparse data structures
  • If you must store a column of lists, ensure not to copy the same reference multiple times.

    # Wrong
    df['new'] = [[]] * len(df)
    # Right
    df['new'] = [[] for _ in range(len(df))]
    

Generating a copy: df.assign(new=0)

If you need a copy instead, use DataFrame.assign:

df.assign(new='y')

   A  B  C new
0  x  x  x   y
1  x  x  x   y
2  x  x  x   y
3  x  x  x   y

And, if you need to assign multiple such columns with the same value, this is as simple as,

c = ['new1', 'new2', ...]
df.assign(**dict.fromkeys(c, 'y'))

   A  B  C new1 new2
0  x  x  x    y    y
1  x  x  x    y    y
2  x  x  x    y    y
3  x  x  x    y    y

Multiple column assignment

Finally, if you need to assign multiple columns with different values, you can use assign with a dictionary.

c = {'new1': 'w', 'new2': 'y', 'new3': 'z'}
df.assign(**c)

   A  B  C new1 new2 new3
0  x  x  x    w    y    z
1  x  x  x    w    y    z
2  x  x  x    w    y    z
3  x  x  x    w    y    z
Sign up to request clarification or add additional context in comments.

1 Comment

Is there any elegant way to define the type (dtype) of the newly added column?
64

With modern pandas you can just do:

df['new'] = 0

7 Comments

Can you point out which specific answers are out of date? Let's leave a comment under them so the authors have a chance to improve.
Fyi the only difference between this answer and cs95 (AKA, me) answer is the column name and value. All the pieces are there.
It is not so much that they are out of date, but this answer is less verbose than the others and is easier to read.
@Joey Can't argue with that logic, I suppose this answer is more suited to people who are just looking to copy paste anything that will work, rather than looking to understand and learn more about the library. Touche.
@cs95 yes your answer lets people learn more. Also the df['new'] = 0 highlighted in the title is good for readability. I have upvoted that too. Less verbose than df.apply(lambda x: 0, axis=1)
|
25

The reason this puts NaN into a column is because df.index and the Index of your right-hand-side object are different. @zach shows the proper way to assign a new column of zeros. In general, pandas tries to do as much alignment of indices as possible. One downside is that when indices are not aligned you get NaN wherever they aren't aligned. Play around with the reindex and align methods to gain some intuition for alignment works with objects that have partially, totally, and not-aligned-all aligned indices. For example here's how DataFrame.align() works with partially aligned indices:

In [7]: from pandas import DataFrame

In [8]: from numpy.random import randint

In [9]: df = DataFrame({'a': randint(3, size=10)})

In [10]:

In [10]: df
Out[10]:
   a
0  0
1  2
2  0
3  1
4  0
5  0
6  0
7  0
8  0
9  0

In [11]: s = df.a[:5]

In [12]: dfa, sa = df.align(s, axis=0)

In [13]: dfa
Out[13]:
   a
0  0
1  2
2  0
3  1
4  0
5  0
6  0
7  0
8  0
9  0

In [14]: sa
Out[14]:
0     0
1     2
2     0
3     1
4     0
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
Name: a, dtype: float64

4 Comments

i didnt downvote but your code lacks comments, makes it hard to follow along with that youre trying to achieve in the snippet
This does not really answer the question. OP is asking about how to add a new column containing a constant value.
I don't agree that there's just one question here. There's "How do I assign a constant value to a column?" as well as "My attempt to do this doesn't work in X way, why is it behaving unexpectedly?" I believe I've addressed both points, the first by referring to another answer. Please read all of the text in my answer.
I think the problem is with the question rather than with your answer. There are two distinct questions contained in this post and as a result two distinct answers are required to answer the question. I believe this should have been flagged as being too broad and the poster should have asked two separate questions.
10

Here is another one liner using lambdas (create column with constant value = 10)

df['newCol'] = df.apply(lambda x: 10, axis=1)

before

df
    A           B           C
1   1.764052    0.400157    0.978738
2   2.240893    1.867558    -0.977278
3   0.950088    -0.151357   -0.103219

after

df
        A           B           C           newCol
    1   1.764052    0.400157    0.978738    10
    2   2.240893    1.867558    -0.977278   10
    3   0.950088    -0.151357   -0.103219   10

6 Comments

df['newCol'] = 10 is also a one liner (and is faster). What is the advantage of using apply here?
not trying to compete with you here - just showing an alternative approach.
@cs95 This is helpful. I wanted to create a new column where each value was a separate empty list. Only this method works.
@YatharthAgarwal If you need assign empty lists this is still a subpar solution because it uses apply. Try df['new'] = [[] for _ in range(len(df))]
I like this solution more for beginners like me. The df.apply function can be used for a variety of problems, and this use-case makes sense. On the other hand, df['newCol'] = 10 is easy to use and works "magically", but it make much logical sense, and is something one just needs to learn off-by-heart.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.