365

I have a dynamic DataFrame which works fine, but when there are no data to be added into the DataFrame I get an error. And therefore I need a solution to create an empty DataFrame with only the column names.

For now I have something like this:

df = pd.DataFrame(columns=COLUMN_NAMES) # Note that there is no row data inserted.

PS: It is important that the column names would still appear in a DataFrame.

But when I use it like this I get something like that as a result:

Index([], dtype='object')
Empty DataFrame

The "Empty DataFrame" part is good! But instead of the Index thing I need to still display the columns.

An important thing that I found out: I am converting this DataFrame to a PDF using Jinja2, so therefore I'm calling out a method to first output it to HTML like that:

df.to_html()

This is where the columns get lost I think.

In general, I followed this example: http://pbpython.com/pdf-reports.html. The css is also from the link. That's what I do to send the dataframe to the PDF:

env = Environment(loader=FileSystemLoader('.'))
template = env.get_template("pdf_report_template.html")
template_vars = {"my_dataframe": df.to_html()}

html_out = template.render(template_vars)
HTML(string=html_out).write_pdf("my_pdf.pdf", stylesheets=["pdf_report_style.css"])

5 Answers 5

452

You can create an empty DataFrame with either column names or an Index:

In [4]: import pandas as pd
In [5]: df = pd.DataFrame(columns=['A','B','C','D','E','F','G'])
In [6]: df
Out[6]:
Empty DataFrame
Columns: [A, B, C, D, E, F, G]
Index: []

Or

In [7]: df = pd.DataFrame(index=range(1,10))
In [8]: df
Out[8]:
Empty DataFrame
Columns: []
Index: [1, 2, 3, 4, 5, 6, 7, 8, 9]

Edit: Even after your amendment with the .to_html, I can't reproduce. This:

df = pd.DataFrame(columns=['A','B','C','D','E','F','G'])
df.to_html('test.html')

Produces:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>A</th>
      <th>B</th>
      <th>C</th>
      <th>D</th>
      <th>E</th>
      <th>F</th>
      <th>G</th>
    </tr>
  </thead>
  <tbody>
  </tbody>
</table>
Sign up to request clarification or add additional context in comments.

Comments

23

Are you looking for something like this?

    COLUMN_NAMES=['A','B','C','D','E','F','G']
    df = pd.DataFrame(columns=COLUMN_NAMES)
    df.columns

   Index(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype='object')

2 Comments

Also, I din't lose my column names when I tried . It is in a html table format.
18

Creating colnames with iterating

df = pd.DataFrame(columns=['colname_' + str(i) for i in range(5)])
print(df)

# Empty DataFrame
# Columns: [colname_0, colname_1, colname_2, colname_3, colname_4]
# Index: []

to_html() operations

print(df.to_html())

# <table border="1" class="dataframe">
#   <thead>
#     <tr style="text-align: right;">
#       <th></th>
#       <th>colname_0</th>
#       <th>colname_1</th>
#       <th>colname_2</th>
#       <th>colname_3</th>
#       <th>colname_4</th>
#     </tr>
#   </thead>
#   <tbody>
#   </tbody>
# </table>

this seems working

print(type(df.to_html()))
# <class 'str'>

The problem is caused by

when you create df like this

df = pd.DataFrame(columns=COLUMN_NAMES)

it has 0 rows × n columns, you need to create at least one row index by

df = pd.DataFrame(columns=COLUMN_NAMES, index=[0])

now it has 1 rows × n columns. You are be able to add data. Otherwise its df that only consist colnames object(like a string list).

1 Comment

Massive thanks to you. I struggled with not being able to add data for 2 hours
3

df.to_html() has a columns parameter.

Just pass the columns into the to_html() method.

df.to_html(columns=['A','B','C','D','E','F','G'])

Comments

1

If you have a completely empty dataframe without columns or index, you can let it have columns by assigning None to these columns.

df = pd.DataFrame()                    # <---- shape: (0, 0)
df[['col1', 'col2', 'col3']] = None    # <---- shape: (0, 3)

Then to assign a row to it, you can use loc indexer. This can actually be used in a loop to add more rows (something that's inadvisable as pd.concat exists to do that particular task).

df.loc[len(df)] = ['abc', 10, 3.33]    # <---- shape: (1, 3)

res

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.