Create a Pandas Dataframe by appending one row at a time [duplicate]

Question

How do I create an empty DataFrame, then add rows, one by one?

I created an empty DataFrame:

df = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))

Then I can add a new row at the end and fill a single field with:

df = df._set_value(index=len(df), col='qty1', value=10.0)

It works for only one field at a time. What is a better way to add new row to df?

Note this is a very inefficient way to build a large DataFrame; new arrays have to be created (copying over the existing data) when you append a row. — Wes McKinney
– Wes McKinney, Commented May 23, 2012 at 13:46
@WesMcKinney: Thx, that's really good to know. Is it very fast to add columns to huge tables? — max
– max, Commented Aug 28, 2012 at 4:27
If it is too inefficient for you, you may preallocate an additional row and then update it. — user1154664
– user1154664, Commented Apr 19, 2013 at 19:54
Hey you... yes, you... I see what you're up to... you want to run this inside a loop and iteratively add rows to an empty DataFrame, don't you... well, don't! — cs95
– cs95, Commented Jul 13, 2020 at 12:52
I might understand this can be in general wrong but, what about real-time processing? So say I have some data that comes in every second and I have a thread that just wants to fill a dataframe and have another even-based thread that goes and look at the dataframe? I find this use case valid and where that solution is applicable to — Giuseppe Salvatore
– Giuseppe Salvatore, Commented Nov 20, 2020 at 17:24

fred · Accepted Answer · 2020-12-18 15:00:26Z

925

You can use df.loc[i], where the row with index i will be what you specify it to be in the dataframe.

>>> import pandas as pd
>>> from numpy.random import randint

>>> df = pd.DataFrame(columns=['lib', 'qty1', 'qty2'])
>>> for i in range(5):
>>>     df.loc[i] = ['name' + str(i)] + list(randint(10, size=2))

>>> df
     lib qty1 qty2
0  name0    3    3
1  name1    2    4
2  name2    2    8
3  name3    2    1
4  name4    9    6

edited Dec 18, 2020 at 15:00

answered Jul 22, 2014 at 13:10

fred

10.1k3 gold badges27 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

FooBar Over a year ago

Consider adding the index to preallocate memory (see my answer)

hobs Over a year ago

.loc is referencing the index column, so if you're working with a pre-existing DataFrame with an index that isn't a continous sequence of integers starting with 0 (as in your example), .loc will overwrite existing rows, or insert rows, or create gaps in your index. A more robust (but not fool-proof) approach for appending an existing nonzero-length dataframe would be: df.loc[df.index.max() + 1] = [randint(... or prepopulating the index as @FooBar suggested.

flow2k Over a year ago

@hobs df.index.max() is nan when the DataFrame is empty.

hobs Over a year ago

@flow2k good catch! Only solution I can think of is a try accept (on the first row insertion only) with a pd.DataFrame() constructor call. Do you know any better ways?

flow2k Over a year ago

@hobs One solution I thought of is using the ternary operator: df.loc[0 if pd.isnull(df.index.max()) else df.index.max() + 1]

|

wjandrea · Accepted Answer · 2023-07-19 21:27:21Z

814

In case you can get all data for the data frame upfront, there is a much faster approach than appending to a data frame:

Create a list of dictionaries in which each dictionary corresponds to an input data row.
Create a data frame from this list.

I had a similar task for which appending to a data frame row by row took 30 min, and creating a data frame from a list of dictionaries completed within seconds.

rows_list = []
for row in input_rows:
    dict1 = {}
    # get input row in dictionary format
    # key = col_name
    dict1.update(blah..) 

    rows_list.append(dict1)

df = pd.DataFrame(rows_list)

edited Jul 19, 2023 at 21:27

wjandrea

33.9k10 gold badges69 silver badges105 bronze badges

answered Jul 5, 2013 at 20:38

ShikharDua

10.1k1 gold badge28 silver badges22 bronze badges

18 Comments

fantabolous Over a year ago

I've moved to doing this as well for any situation where I can't get all the data up front. The speed difference is astonishing.

thikonom Over a year ago

Copying from pandas docs:

It is worth noting however, that concat (and therefore append) makes a full copy of the data, and that constantly reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension.

(pandas.pydata.org/pandas-docs/stable/…)

user5359531 Over a year ago

This works great! Except when I created the data frame, the columns names were all in the wrong order...

ShikharDua Over a year ago

@user5359531 You can use ordered dict in that case

Marcello Grechi Lins Over a year ago

@user5359531 You can manually specify the columns and the order will be preserved. pd.DataFrame(rows_list, columns=['C1', 'C2','C3']) will do the trick

|

Peter Mortensen · Accepted Answer · 2021-07-14 09:51:59Z

464

In the case of adding a lot of rows to dataframe, I am interested in performance. So I tried the four most popular methods and checked their speed.

Performance

Using .append (NPE's answer)
Using .loc (fred's answer)
Using .loc with preallocating (FooBar's answer)
Using dict and create DataFrame in the end (ShikharDua's answer)

Runtime results (in seconds):

Approach	1000 rows	5000 rows	10 000 rows
.append	0.69	3.39	6.78
.loc without prealloc	0.74	3.90	8.35
.loc with prealloc	0.24	2.58	8.70
dict	0.012	0.046	0.084

So I use addition through the dictionary for myself.

Code:

import pandas as pd
import numpy as np
import time

del df1, df2, df3, df4
numOfRows = 1000
# append
startTime = time.perf_counter()
df1 = pd.DataFrame(np.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])
for i in range( 1,numOfRows-4):
    df1 = df1.append( dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E']), ignore_index=True)
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df1.shape)

# .loc w/o prealloc
startTime = time.perf_counter()
df2 = pd.DataFrame(np.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])
for i in range( 1,numOfRows):
    df2.loc[i]  = np.random.randint(100, size=(1,5))[0]
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df2.shape)

# .loc with prealloc
df3 = pd.DataFrame(index=np.arange(0, numOfRows), columns=['A', 'B', 'C', 'D', 'E'] )
startTime = time.perf_counter()
for i in range( 1,numOfRows):
    df3.loc[i]  = np.random.randint(100, size=(1,5))[0]
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df3.shape)

# dict
startTime = time.perf_counter()
row_list = []
for i in range (0,5):
    row_list.append(dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E']))
for i in range( 1,numOfRows-4):
    dict1 = dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E'])
    row_list.append(dict1)

df4 = pd.DataFrame(row_list, columns=['A','B','C','D','E'])
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df4.shape)

P.S.: I believe my realization isn't perfect, and maybe there is some optimization that could be done.

edited Jul 14, 2021 at 9:51

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Dec 26, 2017 at 14:02

Mikhail_Sam

11.3k11 gold badges81 silver badges112 bronze badges

14 Comments

krassowski Over a year ago

The use of df2.index.max() for .loc needlessly increases computational complexity. Simple df2.loc[i] = ... would do. For me it reduced the time from 10s to 8.64s

flow2k Over a year ago

@Mikhail_Sam For the last, dict approach, what's the rationale behind using two loops, for i in range (0,5): and for i in range( 1,numOfRows-4):?

trumpetlicks Over a year ago

Just wanted to throw out another comment as to why the Dict to Pandas DataFrame is a better way. In my experimentation with a dataset that has multiple different data types in the table, using the Pandas append methods destroy the typing, whereas using a Dict, and only creating the DataFrame from it ONCE, seems to keep the original datatypes intact.

EricLavault Over a year ago

I think the dict approach should be renamed the list.append approach (and append into df.append), it is faster because it relies on a list row_list.append() and then creates a dataframe from that list instead of appending data directly on the dataframe with df1.append(). Both methods use dictionaries, the point is using list() vs pd.DataFrame() when populating data row by row.

qwr Over a year ago

You should also use timeit.timeit for micro-benchmarks to avoid some common benchmarking issues.

|

bbrame · Accepted Answer · 2022-09-13 14:44:26Z

358

You could use pandas.concat(). For details and examples, see Merge, join, and concatenate.

For example:

def append_row(df, row):
    return pd.concat([
                df, 
                pd.DataFrame([row], columns=row.index)]
           ).reset_index(drop=True)

df = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))
new_row = pd.Series({'lib':'A', 'qty1':1, 'qty2': 2})

df = append_row(df, new_row)

edited Sep 13, 2022 at 14:44

bbrame

18.6k10 gold badges40 silver badges54 bronze badges

answered May 23, 2012 at 8:14

NPE

503k114 gold badges970 silver badges1k bronze badges

5 Comments

notilas Over a year ago

Hi, so what is the answer for the methods using append() or concat(). I have the same problem, but still trying to figuring it out.

jwg Over a year ago

This is the right answer, but it isn't a very good answer (almost link only).

Ken Williams Over a year ago

I think @fred's answer is more correct. IIUC the problem with this answer is that it needlessly copies the entire DataFrame every time a row is appended. Using the .loc mechanism that can be avoided, especially if you're careful.

StayFoolish Over a year ago

But if you want to use DataFrame.append(), you have to make sure your row data is also a DataFrame in the first place, not a list.

3r1c Over a year ago

DataFrame.append() is deprecated since version 1.4.0, use only pandas.concat() in future like pandas.concat([DF1, DF2])

cs95 · Accepted Answer · 2023-04-27 15:11:51Z

358

from pandas >= 2.0, `append` has been removed!

DataFrame.append was deprecated in version 1.4 and removed from the pandas API entirely in version 2.0.

See the docs on Deprecations as well as this github issue that originally proposed its deprecation.

If you are running pandas version 2.0 or later, you will likely run into the following error:

AttributeError: 'DataFrame' object has no attribute 'append' for DataFrame

Keep reading if you would like to learn about more idiomatic alternatives to append.

NEVER grow a DataFrame!

Yes, people have already explained that you should NEVER grow a DataFrame, and that you should append your data to a list and convert it to a DataFrame once at the end. But do you understand why?

Here are the most important reasons, taken from my post here.

It is always cheaper/faster to append to a list and create a DataFrame in one go.
Lists take up less memory and are a much lighter data structure to work with, append, and remove.
dtypes are automatically inferred for your data. On the flip side, creating an empty frame of NaNs will automatically make them object, which is bad.
An index is automatically created for you, instead of you having to take care to assign the correct index to the row you are appending.

This is The Right Way™ to accumulate your data

data = []
for a, b, c in some_function_that_yields_data():
    data.append([a, b, c])

df = pd.DataFrame(data, columns=['A', 'B', 'C'])

Note that if some_function_that_yields_data() returns smaller DataFrames, you can accumulate individual DataFrames inside a list and then make a single call to pd.concat at the end.

These options are horrible

append or concat inside a loop

append and concat aren't inherently bad in isolation. The problem starts when you iteratively call them inside a loop - this results in quadratic memory usage.

# Creates empty DataFrame and appends
df = pd.DataFrame(columns=['A', 'B', 'C'])
for a, b, c in some_function_that_yields_data():
    df = df.append({'A': i, 'B': b, 'C': c}, ignore_index=True)  
    # This is equally bad:
    # df = pd.concat(
    #       [df, pd.Series({'A': i, 'B': b, 'C': c})], 
    #       ignore_index=True)

Empty DataFrame of NaNs

Never create a DataFrame of NaNs as the columns are initialized with object (slow, un-vectorizable dtype).

# Creates DataFrame of NaNs and overwrites values.
df = pd.DataFrame(columns=['A', 'B', 'C'], index=range(5))
for a, b, c in some_function_that_yields_data():
    df.loc[len(df)] = [a, b, c]

The Proof is in the Pudding

Timing these methods is the fastest way to see just how much they differ in terms of their memory and utility.

Benchmarking code for reference.

It's posts like this that remind me why I'm a part of this community. People understand the importance of teaching folks getting the right answer with the right code, not the right answer with wrong code. Now you might argue that it is not an issue to use loc or append if you're only adding a single row to your DataFrame. However, people often look to this question to add more than just one row - often the requirement is to iteratively add a row inside a loop using data that comes from a function (see related question). In that case it is important to understand that iteratively growing a DataFrame is not a good idea.

edited Apr 27, 2023 at 15:11

answered Jul 4, 2020 at 22:15

cs95

406k106 gold badges744 silver badges797 bronze badges

11 Comments

user1657853 Over a year ago

Fair enough. Are there any solution in case you need (or would like) a dataframe, but all your samples really do come one after the other? (Typically online learning or active learning)

Dev Aggarwal Over a year ago

This doesn't factor in the case where one needs the dataframe after every append(). In that case, the dataframe gets copied anyway, so the df.loc method is faster

cs95 Over a year ago

@DevAggarwal incorrect, loc also creates a copy each time. Please see the graph timings in my answer. Append and loc_append are equally bad. I've also shared my code and process so you're free to convince yourself.

Muhammad Yasirroni Over a year ago

how if the data come from another dataframe?

WestCoastProjects Over a year ago

@user1657853 This likely points to python not being a good solution for that use case. Python is basically non performant: anything that does require performance is in native c-code with a python wrapper.

|

FooBar · Accepted Answer · 2015-04-02 12:03:48Z

133

If you know the number of entries ex ante, you should preallocate the space by also providing the index (taking the data example from a different answer):

import pandas as pd
import numpy as np
# we know we're gonna have 5 rows of data
numberOfRows = 5
# create dataframe
df = pd.DataFrame(index=np.arange(0, numberOfRows), columns=('lib', 'qty1', 'qty2') )

# now fill it up row by row
for x in np.arange(0, numberOfRows):
    #loc or iloc both work here since the index is natural numbers
    df.loc[x] = [np.random.randint(-1,1) for n in range(3)]
In[23]: df
Out[23]: 
   lib  qty1  qty2
0   -1    -1    -1
1    0     0     0
2   -1     0    -1
3    0    -1     0
4   -1     0     0

Speed comparison

In[30]: %timeit tryThis() # function wrapper for this answer
In[31]: %timeit tryOther() # function wrapper without index (see, for example, @fred)
1000 loops, best of 3: 1.23 ms per loop
100 loops, best of 3: 2.31 ms per loop

And - as from the comments - with a size of 6000, the speed difference becomes even larger:

Increasing the size of the array (12) and the number of rows (500) makes the speed difference more striking: 313ms vs 2.29s

edited Apr 2, 2015 at 12:03

answered Jul 23, 2014 at 14:21

FooBar

16.7k20 gold badges94 silver badges188 bronze badges

2 Comments

ely Over a year ago

Great answer. This should be the norm so that row space doesn't have to allocated incrementally.

Tickon Over a year ago

Increasing the size of the array(12) and the number of rows(500) makes the speed difference more striking: 313ms vs 2.29s

Lydia · Accepted Answer · 2015-06-24 21:06:31Z

94

mycolumns = ['A', 'B']
df = pd.DataFrame(columns=mycolumns)
rows = [[1,2],[3,4],[5,6]]
for row in rows:
    df.loc[len(df)] = row

answered Jun 24, 2015 at 21:06

Lydia

2,44719 silver badges13 bronze badges

3 Comments

Eike P. Over a year ago

This! I've been searching for quite a while, and this is the first post that really shows how to assign particular values to a row! Bonus question: Which is the syntax for column-name/value pairs? I guess it must be something using a dict, but I can't seem to get it right.

waterproof Over a year ago

this is not efficient as it actually copies the entire DataFrame when you extend it.

PatrickT Over a year ago

consider doing len(df.index) instead.

W.P. McNeill · Accepted Answer · 2016-02-23 16:43:07Z

81

You can append a single row as a dictionary using the ignore_index option.

>>> f = pandas.DataFrame(data = {'Animal':['cow','horse'], 'Color':['blue', 'red']})
>>> f
  Animal Color
0    cow  blue
1  horse   red
>>> f.append({'Animal':'mouse', 'Color':'black'}, ignore_index=True)
  Animal  Color
0    cow   blue
1  horse    red
2  mouse  black

answered Feb 23, 2016 at 16:43

W.P. McNeill

17.2k13 gold badges85 silver badges123 bronze badges

4 Comments

Blairg23 Over a year ago

You might also mention that f.append(<stuff>) creates a new object, rather than simply appending to the current object in place, so if you're trying to append to a dataframe in a script, you need to say f = f.append(<stuff>)

lol Over a year ago

is there a way to do this in place?

waterproof Over a year ago

@lol no. see github.com/pandas-dev/pandas/issues/2801 - the underlying arrays can't be extended so they have to be copied.

Gene M Over a year ago

I prefer this method because it is very SQL-like (not dependent on indices semantically) and I use it whenever possible.

Peter Mortensen · Accepted Answer · 2021-07-14 09:24:54Z

79

For efficient appending, see How to add an extra row to a pandas dataframe and Setting With Enlargement.

Add rows through loc/ix on non existing key index data. For example:

In [1]: se = pd.Series([1,2,3])

In [2]: se
Out[2]:
0    1
1    2
2    3
dtype: int64

In [3]: se[5] = 5.

In [4]: se
Out[4]:
0    1.0
1    2.0
2    3.0
5    5.0
dtype: float64

Or:

In [1]: dfi = pd.DataFrame(np.arange(6).reshape(3,2),
   .....:                 columns=['A','B'])
   .....:

In [2]: dfi
Out[2]:
   A  B
0  0  1
1  2  3
2  4  5

In [3]: dfi.loc[:,'C'] = dfi.loc[:,'A']

In [4]: dfi
Out[4]:
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
In [5]: dfi.loc[3] = 5

In [6]: dfi
Out[6]:
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
3  5  5  5

edited Jul 14, 2021 at 9:24

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Apr 30, 2014 at 17:31

Nasser Al-Wohaibi

4,6912 gold badges39 silver badges29 bronze badges

3 Comments

Guilherme Felipe Reis Over a year ago

The users asked for implement (add a new row). Here we see how to add a row in a defined index or add a column.

PirateApp Over a year ago

any benchmarks on how this works out compared to the dict method

waterproof Over a year ago

this is not efficient as it actually copies the entire DataFrame.

Peter Mortensen · Accepted Answer · 2021-07-14 09:30:01Z

48

For the sake of a Pythonic way:

res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))
res = res.append([{'qty1':10.0}], ignore_index=True)
print(res.head())

   lib  qty1  qty2
0  NaN  10.0   NaN

edited Jul 14, 2021 at 9:30

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Aug 6, 2017 at 5:06

hkyi

3,9342 gold badges21 silver badges9 bronze badges

Comments

Brian Burns · Accepted Answer · 2019-07-26 11:34:51Z

38

You can also build up a list of lists and convert it to a dataframe -

import pandas as pd

columns = ['i','double','square']
rows = []

for i in range(6):
    row = [i, i*2, i*i]
    rows.append(row)

df = pd.DataFrame(rows, columns=columns)

giving

    i   double  square
0   0   0   0
1   1   2   1
2   2   4   4
3   3   6   9
4   4   8   16
5   5   10  25

edited Jul 26, 2019 at 11:34

answered Oct 13, 2017 at 12:16

Brian Burns

22.4k10 gold badges93 silver badges80 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2021-07-14 10:27:20Z

24

If you always want to add a new row at the end, use this:

df.loc[len(df)] = ['name5', 9, 0]

edited Jul 14, 2021 at 10:27

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Mar 6, 2021 at 13:53

Prajot Kuvalekar

6,7943 gold badges28 silver badges34 bronze badges

1 Comment

autonopy Over a year ago

This presumes that the index of the dataframe is numbered AND that it is perfectly sequential. Using df.reset_index() could resolve this, but as it stands it may actually overwrite an existing row.

Peter Mortensen · Accepted Answer · 2021-07-14 09:57:31Z

18

I figured out a simple and nice way:

>>> df
     A  B  C
one  1  2  3
>>> df.loc["two"] = [4,5,6]
>>> df
     A  B  C
one  1  2  3
two  4  5  6

Note the caveat with performance as noted in the comments.

edited Jul 14, 2021 at 9:57

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Aug 30, 2018 at 3:19

Qinsi

82810 silver badges16 bronze badges

1 Comment

waterproof Over a year ago

Note that this will copy the entire DataFrame under the hood. The underlying arrays can't be extended so they have to be copied.

qwr · Accepted Answer · 2024-06-09 02:30:19Z

17

Instead of a list of dictionaries as in ShikharDua's answer (row-based), we can also represent our table as a dictionary of lists (column-based), where each list stores one column, given we know our columns beforehand. This data structure is like how we would access a column as df["col"]. At the end we construct our DataFrame once.

In both cases, the dictionary keys are always the column names. Row order is stored implicitly as order in a list. For c columns and n rows, this uses one dictionary of c lists (of length n), versus one list of n dictionaries (with c entries). The list-of-dictionaries method has each dictionary storing all keys redundantly and requires creating a new dictionary for every row. Here we only append to lists which is simpler and more efficient than creating new dictionaries.

# Current data
data = {"Animal":["cow", "horse"], "Color":["blue", "red"]}

# Adding a new row (be careful to ensure every column gets another value)
data["Animal"].append("mouse")
data["Color"].append("black")

# At the end, construct our DataFrame
df = pd.DataFrame(data)
#   Animal  Color
# 0    cow   blue
# 1  horse    red
# 2  mouse  black

edited Jun 9, 2024 at 2:30

answered Dec 30, 2019 at 1:35

qwr

11.6k6 gold badges75 silver badges121 bronze badges

1 Comment

Shajirr Over a year ago

I found this method to be the most convenient to use, as you never have to worry about column order when appending new values.

Peter Mortensen · Accepted Answer · 2021-07-14 09:28:01Z

This is not an answer to the OP question, but a toy example to illustrate ShikharDua's answer which I found very useful.

While this fragment is trivial, in the actual data I had 1,000s of rows, and many columns, and I wished to be able to group by different columns and then perform the statistics below for more than one target column. So having a reliable method for building the data frame one row at a time was a great convenience. Thank you ShikharDua!

import pandas as pd

BaseData = pd.DataFrame({ 'Customer' : ['Acme','Mega','Acme','Acme','Mega','Acme'],
                          'Territory'  : ['West','East','South','West','East','South'],
                          'Product'  : ['Econ','Luxe','Econ','Std','Std','Econ']})
BaseData

columns = ['Customer','Num Unique Products', 'List Unique Products']

rows_list=[]
for name, group in BaseData.groupby('Customer'):
    RecordtoAdd={} #initialise an empty dict
    RecordtoAdd.update({'Customer' : name}) #
    RecordtoAdd.update({'Num Unique Products' : len(pd.unique(group['Product']))})
    RecordtoAdd.update({'List Unique Products' : pd.unique(group['Product'])})

    rows_list.append(RecordtoAdd)

AnalysedData = pd.DataFrame(rows_list)

print('Base Data : \n',BaseData,'\n\n Analysed Data : \n',AnalysedData)

Peter Mortensen · Accepted Answer · 2021-07-14 10:00:25Z

13

You can use a generator object to create a Dataframe, which will be more memory efficient over the list.

num = 10

# Generator function to generate generator object
def numgen_func(num):
    for i in range(num):
        yield ('name_{}'.format(i), (i*i), (i*i*i))

# Generator expression to generate generator object (Only once data get populated, can not be re used)
numgen_expression = (('name_{}'.format(i), (i*i), (i*i*i)) for i in range(num) )

df = pd.DataFrame(data=numgen_func(num), columns=('lib', 'qty1', 'qty2'))

To add raw to existing DataFrame you can use append method.

df = df.append([{ 'lib': "name_20", 'qty1': 20, 'qty2': 400  }])

edited Jul 14, 2021 at 10:00

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Oct 21, 2019 at 7:26

RockStar

1,3142 gold badges14 silver badges35 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2021-07-14 09:28:53Z

9

Create a new record (data frame) and add to old_data_frame.

Pass a list of values and the corresponding column names to create a new_record (data_frame):

new_record = pd.DataFrame([[0, 'abcd', 0, 1, 123]], columns=['a', 'b', 'c', 'd', 'e'])

old_data_frame = pd.concat([old_data_frame, new_record])

edited Jul 14, 2021 at 9:28

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Jul 18, 2016 at 9:54

Jack Daniel

2,6214 gold badges36 silver badges57 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2021-07-14 09:58:34Z

8

Here is the way to add/append a row in a Pandas DataFrame:

def add_row(df, row):
    df.loc[-1] = row
    df.index = df.index + 1
    return df.sort_index()

add_row(df, [1,2,3])

It can be used to insert/append a row in an empty or populated Pandas DataFrame.

edited Jul 14, 2021 at 9:58

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Sep 5, 2018 at 19:30

shivampip

2,15222 silver badges19 bronze badges

1 Comment

Parthiban Rajendran Over a year ago

this is adding with index in descending order

Giorgos Myrianthous · Accepted Answer · 2020-05-01 15:10:31Z

4

All you need is loc[df.shape[0]] or loc[len(df)]

# Assuming your df has 4 columns (str, int, str, bool)
df.loc[df.shape[0]] = ['col1Value', 100, 'col3Value', False]

or

df.loc[len(df)] = ['col1Value', 100, 'col3Value', False]

edited May 1, 2020 at 15:10

answered May 1, 2020 at 14:39

Giorgos Myrianthous

40.4k21 gold badges156 silver badges175 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2021-07-14 10:08:38Z

4

If you want to add a row at the end, append it as a list:

valuestoappend = [va1, val2, val3]
res = res.append(pd.Series(valuestoappend, index = ['lib', 'qty1', 'qty2']), ignore_index = True)

edited Jul 14, 2021 at 10:08

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Mar 26, 2020 at 14:09

Shahir Ansari

1,86818 silver badges21 bronze badges

Comments

qed · Accepted Answer · 2016-11-11 18:25:00Z

3

Another way to do it (probably not very performant):

# add a row
def add_row(df, row):
    colnames = list(df.columns)
    ncol = len(colnames)
    assert ncol == len(row), "Length of row must be the same as width of DataFrame: %s" % row
    return df.append(pd.DataFrame([row], columns=colnames))

You can also enhance the DataFrame class like this:

import pandas as pd
def add_row(self, row):
    self.loc[len(self.index)] = row
pd.DataFrame.add_row = add_row

edited Nov 11, 2016 at 18:25

answered Nov 11, 2016 at 18:18

qed

23.3k25 gold badges131 silver badges212 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2021-07-14 10:11:33Z

3

You can concatenate two DataFrames for this. I basically came across this problem to add a new row to an existing DataFrame with a character index (not numeric).

So, I input the data for a new row in a duct() and index in a list.

new_dict = {put input for new row here}
new_list = [put your index here]

new_df = pd.DataFrame(data=new_dict, index=new_list)

df = pd.concat([existing_df, new_df])

edited Jul 14, 2021 at 10:11

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Apr 30, 2020 at 14:07

hansrajswapnil

6791 gold badge7 silver badges14 bronze badges

1 Comment

IOzc Over a year ago

this is literally what I need

Peter Mortensen · Accepted Answer · 2021-07-14 10:12:24Z

3

initial_data = {'lib': np.array([1,2,3,4]), 'qty1': [1,2,3,4], 'qty2': [1,2,3,4]}

df = pd.DataFrame(initial_data)

df

lib    qty1    qty2
0    1    1    1
1    2    2    2
2    3    3    3
3    4    4    4

val_1 = [10]
val_2 = [14]
val_3 = [20]

df.append(pd.DataFrame({'lib': val_1, 'qty1': val_2, 'qty2': val_3}))

lib    qty1    qty2
0    1    1    1
1    2    2    2
2    3    3    3
3    4    4    4
0    10    14    20

You can use a for loop to iterate through values or can add arrays of values.

val_1 = [10, 11, 12, 13]
val_2 = [14, 15, 16, 17]
val_3 = [20, 21, 22, 43]

df.append(pd.DataFrame({'lib': val_1, 'qty1': val_2, 'qty2': val_3}))

lib    qty1    qty2
0    1    1    1
1    2    2    2
2    3    3    3
3    4    4    4
0    10    14    20
1    11    15    21
2    12    16    22
3    13    17    43

edited Jul 14, 2021 at 10:12

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Jun 13, 2020 at 15:09

Harshal Deore

1,2581 gold badge14 silver badges11 bronze badges

1 Comment

Peter Mortensen Over a year ago

An explanation for the first part would be in order. And why isn't there a "for" loop in the example code when it is talked about? Can you make it more clear? Please respond by editing your answer, not here in comments (without "Edit:", "Update:", or similar - the answer should appear as if it was written today).

Peter Mortensen · Accepted Answer · 2021-07-14 09:30:48Z

1

Make it simple. By taking a list as input which will be appended as a row in the data-frame:

import pandas as pd
res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))
for i in range(5):
    res_list = list(map(int, input().split()))
    res = res.append(pd.Series(res_list, index=['lib', 'qty1', 'qty2']), ignore_index=True)

edited Jul 14, 2021 at 9:30

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Aug 25, 2017 at 15:47

Vineet Jain

1,5734 gold badges21 silver badges32 bronze badges

Comments

Mahdi · Accepted Answer · 2020-12-21 09:57:20Z

0

If you have a data frame df and want to add a list new_list as a new row to df, you can simply do:

df.loc[len(df)] = new_list

If you want to add a new data frame new_df under data frame df, then you can use:

df.append(new_df)

answered Dec 21, 2020 at 9:57

Mahdi

2351 gold badge2 silver badges7 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2021-07-14 09:59:25Z

0

We often see the construct df.loc[subscript] = … to assign to one DataFrame row. Mikhail_Sam posted benchmarks containing, among others, this construct as well as the method using dict and create DataFrame in the end. He found the latter to be the fastest by far.

But if we replace the df3.loc[i] = … (with preallocated DataFrame) in his code with df3.values[i] = …, the outcome changes significantly, in that that method performs similar to the one using dict. So we should more often take the use of df.values[subscript] = … into consideration. However note that .values takes a zero-based subscript, which may be different from the DataFrame.index.

edited Jul 14, 2021 at 9:59

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Aug 22, 2019 at 12:39

Armali

19.6k15 gold badges64 silver badges184 bronze badges

1 Comment

Armali Over a year ago

@baxx - One code example is at the benchmarks link (# .loc with prealloc), another example is in the question I have to compare data from each row of a Pandas DataFrame with data from the rest of the rows, is there a way to speed up the computation? and its accepted answer.

Peter Mortensen · Accepted Answer · 2021-07-14 10:02:46Z

0

pandas.DataFrame.append

DataFrame.append(self, other, ignore_index=False, verify_integrity=False, sort=False) → 'DataFrame'

Code

df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
df.append(df2)

With ignore_index set to True:

df.append(df2, ignore_index=True)

edited Jul 14, 2021 at 10:02

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Feb 19, 2020 at 6:35

kamran kausar

4,6532 gold badges25 silver badges17 bronze badges

1 Comment

Peter Mortensen Over a year ago

It is not clear why the first two lines are not literal code. Brevity is good, but can you elaborate in your answer, e.g. by adding some supporting text? But without "Edit:", "Update:", or similar - the answer should appear as if it was written today.

Peter Mortensen · Accepted Answer · 2021-07-14 10:10:55Z

0

Before going to add a row, we have to convert the dataframe to a dictionary. There you can see the keys as columns in the dataframe and the values of the columns are again stored in the dictionary, but there the key for every column is the index number in the dataframe.

That idea makes me to write the below code.

df2 = df.to_dict()
values = ["s_101", "hyderabad", 10, 20, 16, 13, 15, 12, 12, 13, 25, 26, 25, 27, "good", "bad"] # This is the total row that we are going to add
i = 0
for x in df.columns:   # Here df.columns gives us the main dictionary key
    df2[x][101] = values[i]   # Here the 101 is our index number. It is also the key of the sub dictionary
    i += 1

edited Jul 14, 2021 at 10:10

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Apr 17, 2020 at 17:54

srikanth Gattu

233 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2021-07-14 10:18:06Z

0

If all data in your Dataframe has the same dtype you might use a NumPy array. You can write rows directly into the predefined array and convert it to a dataframe at the end. It seems to be even faster than converting a list of dicts.

import pandas as pd
import numpy as np
from string import ascii_uppercase

startTime = time.perf_counter()
numcols, numrows = 5, 10000
npdf = np.ones((numrows, numcols))
for row in range(numrows):
    npdf[row, 0:] = np.random.randint(0, 100, (1, numcols))
df5 = pd.DataFrame(npdf, columns=list(ascii_uppercase[:numcols]))
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df5.shape)

edited Jul 14, 2021 at 10:18

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Oct 11, 2020 at 18:46

Gerard

1774 silver badges7 bronze badges

1 Comment

Peter Mortensen Over a year ago

Re "It seems to be even faster": Can you quantify that (by editing (changing) your answer)? What order are we talking about? 10% faster? 100% faster? 10 times faster? 1,000,000 times faster? At what scale (it could quadratic/exponential)?

mpa · Accepted Answer · 2023-03-24 13:24:05Z

Here are the 3 regularly mentioned options and their shortcomings for adding

a single row (not multiple rows)
optimized for readability (not for runtime performance, e.g. allow copy the DataFrame even though not preferred)
columns can have different dtypes
keep the dtype of all columns
the index can have any form, e.g. 'holes' in an integer series
keep the dtype of the df.index

The code setup:

df = pd.DataFrame({'carId': [1, 4, 7], 'maxSpeed': [1.1, 4.4, 7.7]})
df = df.astype({
    'carId': np.uint16,
    'maxSpeed': np.float32,
})
df.set_index('carId', drop=False, inplace=True)
assert df.index.dtype == np.uint64

# the row to add
additional_row = [9, 9.9]
assert len(df.columns) == len(additional_row)
original_dtypes = df.dtypes
original_index_dtype = df.index.dtype

1) pd.concat()

df_new_row = pd.DataFrame([additional_row], columns=df.columns)
newDf = pd.concat([df, df_new_row])
assert df.dtypes.equals(newDf.dtypes)  # fails: carId is np.int64 and maxSpeed is np.float64
assert newDf.dtypes.equals(original_dtypes)  # fails: newDf.index.dype is np.float64

2) df.loc[]

df.loc[additional_row[0], :] = additional_row
assert df.index.dtype == original_index_dtype
assert df.dtypes.equals(original_dtypes)  # fails: carId and maxSpeed are np.float64

3) df.append()

depreciated since pandas 1.4.0

solution

df.loc[] leaves the df.index intact, so I typically convert the types of the columns:

df.loc[additional_row[0], :] = additional_row
df = df.astype(original_dtypes)
assert df.index.dtype == original_index_dtype
assert df.dtypes.equals(original_dtypes)

Note that df.astype() creates a copy of the df. df.astype(copy=False) avoids this if you can accept the side effects of the copy parameter.

If you do not want to set the index explicitly, use e.g. df.loc[df.index.max() + 1, :] = additional_row. Note that df.index.max() fails if df is empty.

Unfortunately, How to add an extra row to a pandas dataframe has been marked as duplicate and points to this question. The title of this post "by appending one row at a time" implies that regularly adding multiple lines to a DataFrame is a good idea. I agree with many previous comments that there are probably not many uses cases for this. However, adding a single row to a DataFrame occurs more often, even though it's still an edge case.

Collectives™ on Stack Overflow

32 Answers 32

9 Comments

18 Comments

Performance

14 Comments

5 Comments

from pandas >= 2.0, append has been removed!

NEVER grow a DataFrame!

This is The Right Way™ to accumulate your data

These options are horrible

The Proof is in the Pudding

11 Comments

2 Comments

3 Comments

4 Comments

3 Comments

Comments

Comments

1 Comment

1 Comment

1 Comment

Comments

Comments

Comments

1 Comment

Comments

Comments

Comments

1 Comment

1 Comment

Comments

Comments

1 Comment

Code

1 Comment

Comments

1 Comment

1) pd.concat()

2) df.loc[]

3) df.append()

solution

Comments

Linked

Related

from pandas >= 2.0, `append` has been removed!