8

How can I add a header to a DF without replacing the current one? In other words I just want to shift the current header down and just add it to the dataframe as another record.

*secondary question: How do I add tables (example dataframe) to stackoverflow question?

I have this (Note header and how it is just added as a row:

   0.213231  0.314544
0 -0.952928 -0.624646
1 -1.020950 -0.883333

I need this (all other records are shifted down and a new record is added) (also: I couldn't read the csv properly because I'm using s3_text_adapter for the import and I couldn't figure out how to have an argument that ignores header similar to pandas read_csv):

       A          B
0  0.213231  0.314544
1 -1.020950 -0.883333
4
  • 2
    re the tables, you can just copy and paste the text repr, then make sure you highlight and CTRL+K / indent 4 spaces (puts it in code formatting). Commented Oct 23, 2013 at 1:08
  • What is s3_text_adapter and how are you using it? It ought to have a header=None option... Commented Oct 24, 2013 at 0:39
  • docs.continuum.io/iopro/TextAdapter.html Commented Oct 24, 2013 at 11:01
  • @AndyHayden, you were absolutely right. I went back and double checked and found that field_names=False does the trick. Thank you again ! Commented Oct 24, 2013 at 11:02

2 Answers 2

13

Another option is to add it as an additional level of the column index, to make it a MultiIndex:

In [11]: df = pd.DataFrame(randn(2, 2), columns=['A', 'B'])

In [12]: df
Out[12]: 
          A         B
0 -0.952928 -0.624646
1 -1.020950 -0.883333

In [13]: df.columns = pd.MultiIndex.from_tuples(zip(['AA', 'BB'], df.columns))

In [14]: df
Out[14]: 
         AA        BB
          A         B
0 -0.952928 -0.624646
1 -1.020950 -0.883333

This has the benefit of keeping the correct dtypes for the DataFrame, so you can still do fast and correct calculations on your DataFrame, and allows you to access by both the old and new column names.

.

For completeness, here's DSM's (deleted answer), making the columns a row, which, as mentioned already, is usually not a good idea:

In [21]: df_bad_idea = df.T.reset_index().T

In [22]: df_bad_idea
Out[22]: 
              0         1
index         A         B
0     -0.952928 -0.624646
1      -1.02095 -0.883333

Note, the dtype may change (if these are column names rather than proper values) as in this case... so be careful if you actually plan to do any work on this as it will likely be slower and may even fail:

In [23]: df.sum()
Out[23]: 
A   -1.973878
B   -1.507979
dtype: float64

In [24]: df_bad_idea.sum()  # doh!
Out[24]: Series([], dtype: float64)

If the column names are actually a row that was mistaken as a header row then you should correct this on reading in the data (e.g. read_csv use header=None).

Sign up to request clarification or add additional context in comments.

8 Comments

I'm going to delete mine in favour of this, because I think your point about changing dtypes is a good one.
@DSM you always do that after I +1! It was what the OP was after, but this is more correct I think (though could/should be easier)...
Thank you. This is really cool and good to know but I meant how to replace the header 'A' and 'B' from the first df above but also just add the values 'A' and 'B' as another row, in other words move values 'A' and 'B' down to index 0 as the new first record in df.
@prometheus2305 for that you could do df.T.reset_index().T but you should think hard about why you would want to do that.
@tom which was DSMs deleted answer!
|
4

The key is to specify header=None and use column to add header:

data = pd.read_csv('file.csv', skiprows=2, header=None ) # skip blank rows if applicable
df = pd.DataFrame(data)
df = df.iloc[ : , [0,1]] # columns 1 and 2
df.columns = ['A','B'] # title

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.