0

I am trying to understand the difference between these two statements

dataframe['newColumn'] = 'stringconst'

and

for x in y:
   if x=="value":
      csv = pd.read_csv(StringIO(table), header=None, names=None)
      dataframe['newColumn'] = csv[0]

In the first case pandas populates all the rows with the constant value, but in the second case it populates only the first row and assigns NaN to rest of the rows. Why is this? How can I assign the value in the second case to all the rows in the dataframe?

1 Answer 1

2

Because csv[0] is not a scalar value. It's a pd.Series, and when you do assignment with pd.Series it tries to align by index (the whole point of pandas), and probably it's getting NAN everywhere except the first row because only the first-row's index aligns with the pd.DataFrame index. So, consider two data-frames (note, they are copies except for the index, which is shifted by 20):

>>> df
   0  1  2  3  4
0  4 -5 -1  0  3
1 -2 -2  1  3  4
2  1  2  4  4 -4
3 -5  2 -3 -5  1
4 -5 -3  1  1 -1
5 -4  0  4 -3 -4
6 -2 -5 -3  1  0
7  4  0  0 -4 -4
8 -4  4 -2 -5  4
9  1 -2  4  3  0
>>> df2
    0  1  2  3  4
20  4 -5 -1  0  3
21 -2 -2  1  3  4
22  1  2  4  4 -4
23 -5  2 -3 -5  1
24 -5 -3  1  1 -1
25 -4  0  4 -3 -4
26 -2 -5 -3  1  0
27  4  0  0 -4 -4
28 -4  4 -2 -5  4
29  1 -2  4  3  0
>>> df['new'] = df[1]
>>> df
   0  1  2  3  4  new
0  4 -5 -1  0  3   -5
1 -2 -2  1  3  4   -2
2  1  2  4  4 -4    2
3 -5  2 -3 -5  1    2
4 -5 -3  1  1 -1   -3
5 -4  0  4 -3 -4    0
6 -2 -5 -3  1  0   -5
7  4  0  0 -4 -4    0
8 -4  4 -2 -5  4    4
9  1 -2  4  3  0   -2
>>> df['new2'] = df2[1]
>>> df
   0  1  2  3  4  new  new2
0  4 -5 -1  0  3   -5   NaN
1 -2 -2  1  3  4   -2   NaN
2  1  2  4  4 -4    2   NaN
3 -5  2 -3 -5  1    2   NaN
4 -5 -3  1  1 -1   -3   NaN
5 -4  0  4 -3 -4    0   NaN
6 -2 -5 -3  1  0   -5   NaN
7  4  0  0 -4 -4    0   NaN
8 -4  4 -2 -5  4    4   NaN
9  1 -2  4  3  0   -2   NaN

So, one thing you can do to assign the whole column is to simply assign the values:

>>> df
   0  1  2  3  4  new  new2
0  4 -5 -1  0  3   -5   NaN
1 -2 -2  1  3  4   -2   NaN
2  1  2  4  4 -4    2   NaN
3 -5  2 -3 -5  1    2   NaN
4 -5 -3  1  1 -1   -3   NaN
5 -4  0  4 -3 -4    0   NaN
6 -2 -5 -3  1  0   -5   NaN
7  4  0  0 -4 -4    0   NaN
8 -4  4 -2 -5  4    4   NaN
9  1 -2  4  3  0   -2   NaN
>>> df['new2'] = df2[1].values
>>> df
   0  1  2  3  4  new  new2
0  4 -5 -1  0  3   -5    -5
1 -2 -2  1  3  4   -2    -2
2  1  2  4  4 -4    2     2
3 -5  2 -3 -5  1    2     2
4 -5 -3  1  1 -1   -3    -3
5 -4  0  4 -3 -4    0     0
6 -2 -5 -3  1  0   -5    -5
7  4  0  0 -4 -4    0     0
8 -4  4 -2 -5  4    4     4
9  1 -2  4  3  0   -2    -2

Or, if you want to assign the first value in the first column, then actually select the first value using iloc or another selector and then do assignment:

>>> df
   0  1  2  3  4  new  new2
0  4 -5 -1  0  3   -5    -5
1 -2 -2  1  3  4   -2    -2
2  1  2  4  4 -4    2     2
3 -5  2 -3 -5  1    2     2
4 -5 -3  1  1 -1   -3    -3
5 -4  0  4 -3 -4    0     0
6 -2 -5 -3  1  0   -5    -5
7  4  0  0 -4 -4    0     0
8 -4  4 -2 -5  4    4     4
9  1 -2  4  3  0   -2    -2
>>> df['newest'] = df2.iloc[0,0]
>>> df
   0  1  2  3  4  new  new2  newest
0  4 -5 -1  0  3   -5    -5       4
1 -2 -2  1  3  4   -2    -2       4
2  1  2  4  4 -4    2     2       4
3 -5  2 -3 -5  1    2     2       4
4 -5 -3  1  1 -1   -3    -3       4
5 -4  0  4 -3 -4    0     0       4
6 -2 -5 -3  1  0   -5    -5       4
7  4  0  0 -4 -4    0     0       4
8 -4  4 -2 -5  4    4     4       4
9  1 -2  4  3  0   -2    -2       4
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.