python for loop using index to create values in dataframe

Question

I have a very simple for loop problem and I haven't found a solution in any of the similar questions on Stack. I want to use a for loop to create values in a pandas dataframe. I want the values to be strings that contain a numerical index. I can make the correct value print, but I can't make this value get saved in the dataframe. I'm new to python.

# reproducible example
import pandas as pd
df1 = pd.DataFrame({'x':range(5)})
# for loop to add a row with an index
for i in range(5):
    print("data_{i}.txt".format(i=i)) # this prints the value that I want
    df1['file'] = "data_{i}.txt".format(i=i)

This loop prints the exact value that I want to put into the 'file' column of df1, but when I look at df1, it only uses the last value for the index.

   x        file
0  0  data_4.txt
1  1  data_4.txt
2  2  data_4.txt
3  3  data_4.txt
4  4  data_4.txt

I have tried using enumerate, but can't find a solution with this. I assume everyone will yell at me for posting a duplicate question, but I have not found anything that works and if someone points me to a solution that solves this problem, I'll happily remove this question.

vinci mojamdar · Accepted Answer · 2020-06-10 10:44:17Z

3

There are better ways to create a DataFrame, but to answer your question:

Replace the last line in your code:

df1['file'] = "data_{i}.txt".format(i=i)

with:

df1.loc[i, 'file'] = "data_{0}.txt".format(i)

For more information, read about the .loc here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html

On the same page, you can read about accessors like .at and .iloc as well.

answered Jun 10, 2020 at 10:44

vinci mojamdar

6445 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Andrej Kesely · Accepted Answer · 2020-06-10 10:36:46Z

3

You can do list-comprehension:

df1['file'] = ["data_{i}.txt".format(i=i) for i in range(5)]
print(df1)

Prints:

   x        file
0  0  data_0.txt
1  1  data_1.txt
2  2  data_2.txt
3  3  data_3.txt
4  4  data_4.txt

OR at the creating of DataFrame:

df1 = pd.DataFrame({'x':range(5), 'file': ["data_{i}.txt".format(i=i) for i in range(5)]})
print(df1)

OR:

df1 = pd.DataFrame([{'x':i, 'file': "data_{i}.txt".format(i=i)} for i in range(5)])
print(df1)

answered Jun 10, 2020 at 10:36

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Comments

jda5 · Accepted Answer · 2020-06-10 10:37:26Z

1

I've found success with the .at method

for i in range(5):
    print("data_{i}.txt".format(i=i)) # this prints the value that I want
    df1.at[i, 'file'] = "data_{i}.txt".format(i=i)

Returns:

   x        file
0  0  data_0.txt
1  1  data_1.txt
2  2  data_2.txt
3  3  data_3.txt
4  4  data_4.txt

answered Jun 10, 2020 at 10:37

jda5

1,45411 silver badges22 bronze badges

Comments

yoav_aaa · Accepted Answer · 2020-06-10 11:17:04Z

1

when you assign a variable to a dataframe column the way you do -
using the df['colname'] = 'val', it assigns the val across all rows. That is why you are seeing only the last value.

Change your code to:

import pandas as pd
df1 = pd.DataFrame({'x':range(5)})
# for loop to add a row with an index
to_assign = []
for i in range(5):
    print("data_{i}.txt".format(i=i)) # this prints the value that I want
    to_assign.append(data_{i}.txt".format(i=i))
##outside of the loop - only once - to all dataframe rows
df1['file'] = to_assign.

As a thought, pandas has a great API for performing these type of actions without for loops.
You should start practicing those.

edited Jun 10, 2020 at 11:17

answered Jun 10, 2020 at 10:38

yoav_aaa

3973 silver badges11 bronze badges

Collectives™ on Stack Overflow

python for loop using index to create values in dataframe

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related