Add column with constant value to pandas dataframe [duplicate]

Question

Given a DataFrame:

np.random.seed(0)
df = pd.DataFrame(np.random.randn(3, 3), columns=list('ABC'), index=[1, 2, 3])
df

          A         B         C
1  1.764052  0.400157  0.978738
2  2.240893  1.867558 -0.977278
3  0.950088 -0.151357 -0.103219

What is the simplest way to add a new column containing a constant value eg 0?

          A         B         C  new
1  1.764052  0.400157  0.978738    0
2  2.240893  1.867558 -0.977278    0
3  0.950088 -0.151357 -0.103219    0

This is my solution, but I don't know why this puts NaN into 'new' column?

df['new'] = pd.Series([0 for x in range(len(df.index))])

          A         B         C  new
1  1.764052  0.400157  0.978738  0.0
2  2.240893  1.867558 -0.977278  0.0
3  0.950088 -0.151357 -0.103219  NaN

if you use an index its okay. df['new'] = pd.Series([0 for x in range(len(df.index))], index=df.index). — zach
– zach, Commented Jun 4, 2014 at 13:52
also, a list comprehension is entirely unnecessary here. just do [0] * len(df.index) — acushner
– acushner, Commented Jun 4, 2014 at 14:01
@joris, I meant that df['new']=0 shows the proper why of assigning zeros to the whole column, but it doesn't explain why my first attempt inserts NaN. This was answered by the Philip Cloud in the answer I accepted. — yemu
– yemu, Commented Jun 4, 2014 at 18:44
@flow2k it gives a warning A value is trying to be set on a copy of a slice from a DataFrame. — KansaiRobot
– KansaiRobot, Commented Dec 12, 2022 at 14:02

cs95 · Accepted Answer · 2020-04-19 03:24:17Z

Super simple in-place assignment: `df['new'] = 0`

For in-place modification, perform direct assignment. This assignment is broadcasted by pandas for each row.

df = pd.DataFrame('x', index=range(4), columns=list('ABC'))
df

   A  B  C
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x

df['new'] = 'y'
# Same as,
# df.loc[:, 'new'] = 'y'
df

   A  B  C new
0  x  x  x   y
1  x  x  x   y
2  x  x  x   y
3  x  x  x   y

Note for object columns

If you want to add an column of empty lists, here is my advice:

Consider not doing this. object columns are bad news in terms of performance. Rethink how your data is structured.
Consider storing your data in a sparse data structure. More information: sparse data structures
If you must store a column of lists, ensure not to copy the same reference multiple times.
```
# Wrong
df['new'] = [[]] * len(df)
# Right
df['new'] = [[] for _ in range(len(df))]
```

Generating a copy: `df.assign(new=0)`

If you need a copy instead, use DataFrame.assign:

df.assign(new='y')

   A  B  C new
0  x  x  x   y
1  x  x  x   y
2  x  x  x   y
3  x  x  x   y

And, if you need to assign multiple such columns with the same value, this is as simple as,

c = ['new1', 'new2', ...]
df.assign(**dict.fromkeys(c, 'y'))

   A  B  C new1 new2
0  x  x  x    y    y
1  x  x  x    y    y
2  x  x  x    y    y
3  x  x  x    y    y

Multiple column assignment

Finally, if you need to assign multiple columns with different values, you can use assign with a dictionary.

c = {'new1': 'w', 'new2': 'y', 'new3': 'z'}
df.assign(**c)

   A  B  C new1 new2 new3
0  x  x  x    w    y    z
1  x  x  x    w    y    z
2  x  x  x    w    y    z
3  x  x  x    w    y    z

Is there any elegant way to define the type (dtype) of the newly added column?

cs95 · Accepted Answer · 2020-07-07 08:08:24Z

64

With modern pandas you can just do:

df['new'] = 0

edited Jul 7, 2020 at 8:08

cs95

406k106 gold badges744 silver badges797 bronze badges

answered Apr 1, 2020 at 15:57

Roko Mijic

7,0654 gold badges32 silver badges37 bronze badges

7 Comments

cs95 Over a year ago

Can you point out which specific answers are out of date? Let's leave a comment under them so the authors have a chance to improve.

cs95 Over a year ago

Fyi the only difference between this answer and cs95 (AKA, me) answer is the column name and value. All the pieces are there.

Joey Over a year ago

It is not so much that they are out of date, but this answer is less verbose than the others and is easier to read.

cs95 Over a year ago

@Joey Can't argue with that logic, I suppose this answer is more suited to people who are just looking to copy paste anything that will work, rather than looking to understand and learn more about the library. Touche.

Joey Over a year ago

@cs95 yes your answer lets people learn more. Also the df['new'] = 0 highlighted in the title is good for readability. I have upvoted that too. Less verbose than df.apply(lambda x: 0, axis=1)

|

georg · Accepted Answer · 2017-02-09 10:57:08Z

25

The reason this puts NaN into a column is because df.index and the Index of your right-hand-side object are different. @zach shows the proper way to assign a new column of zeros. In general, pandas tries to do as much alignment of indices as possible. One downside is that when indices are not aligned you get NaN wherever they aren't aligned. Play around with the reindex and align methods to gain some intuition for alignment works with objects that have partially, totally, and not-aligned-all aligned indices. For example here's how DataFrame.align() works with partially aligned indices:

In [7]: from pandas import DataFrame

In [8]: from numpy.random import randint

In [9]: df = DataFrame({'a': randint(3, size=10)})

In [10]:

In [10]: df
Out[10]:
   a
0  0
1  2
2  0
3  1
4  0
5  0
6  0
7  0
8  0
9  0

In [11]: s = df.a[:5]

In [12]: dfa, sa = df.align(s, axis=0)

In [13]: dfa
Out[13]:
   a
0  0
1  2
2  0
3  1
4  0
5  0
6  0
7  0
8  0
9  0

In [14]: sa
Out[14]:
0     0
1     2
2     0
3     1
4     0
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
Name: a, dtype: float64

edited Feb 9, 2017 at 10:57

georg

6557 silver badges16 bronze badges

answered Jun 4, 2014 at 14:29

Phillip Cloud

25.8k12 gold badges72 silver badges91 bronze badges

4 Comments

redress Over a year ago

i didnt downvote but your code lacks comments, makes it hard to follow along with that youre trying to achieve in the snippet

cs95 Over a year ago

This does not really answer the question. OP is asking about how to add a new column containing a constant value.

Phillip Cloud Over a year ago

I don't agree that there's just one question here. There's "How do I assign a constant value to a column?" as well as "My attempt to do this doesn't work in X way, why is it behaving unexpectedly?" I believe I've addressed both points, the first by referring to another answer. Please read all of the text in my answer.

Kevin Over a year ago

I think the problem is with the question rather than with your answer. There are two distinct questions contained in this post and as a result two distinct answers are required to answer the question. I believe this should have been flagged as being too broad and the poster should have asked two separate questions.

Grant Shannon · Accepted Answer · 2019-08-15 20:41:21Z

10

Here is another one liner using lambdas (create column with constant value = 10)

df['newCol'] = df.apply(lambda x: 10, axis=1)

before

df
    A           B           C
1   1.764052    0.400157    0.978738
2   2.240893    1.867558    -0.977278
3   0.950088    -0.151357   -0.103219

after

df
        A           B           C           newCol
    1   1.764052    0.400157    0.978738    10
    2   2.240893    1.867558    -0.977278   10
    3   0.950088    -0.151357   -0.103219   10

edited Aug 15, 2019 at 20:41

answered Aug 14, 2019 at 19:58

Grant Shannon

5,1432 gold badges51 silver badges39 bronze badges

6 Comments

cs95 Over a year ago

df['newCol'] = 10 is also a one liner (and is faster). What is the advantage of using apply here?

Grant Shannon Over a year ago

not trying to compete with you here - just showing an alternative approach.

Yatharth Agarwal Over a year ago

@cs95 This is helpful. I wanted to create a new column where each value was a separate empty list. Only this method works.

cs95 Over a year ago

@YatharthAgarwal If you need assign empty lists this is still a subpar solution because it uses apply. Try df['new'] = [[] for _ in range(len(df))]

abrac Over a year ago

I like this solution more for beginners like me. The df.apply function can be used for a variety of problems, and this use-case makes sense. On the other hand, df['newCol'] = 10 is easy to use and works "magically", but it make much logical sense, and is something one just needs to learn off-by-heart.

|

Collectives™ on Stack Overflow

Add column with constant value to pandas dataframe [duplicate]

4 Answers 4

Super simple in-place assignment: `df['new'] = 0`

Note for object columns

Generating a copy: `df.assign(new=0)`

Multiple column assignment

1 Comment

7 Comments

4 Comments

6 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Super simple in-place assignment: df['new'] = 0

Note for object columns

Generating a copy: df.assign(new=0)

Multiple column assignment

1 Comment

7 Comments

4 Comments

6 Comments

Linked

Related

Super simple in-place assignment: `df['new'] = 0`

Generating a copy: `df.assign(new=0)`