0

I have a very simple for loop problem and I haven't found a solution in any of the similar questions on Stack. I want to use a for loop to create values in a pandas dataframe. I want the values to be strings that contain a numerical index. I can make the correct value print, but I can't make this value get saved in the dataframe. I'm new to python.

# reproducible example
import pandas as pd
df1 = pd.DataFrame({'x':range(5)})
# for loop to add a row with an index
for i in range(5):
    print("data_{i}.txt".format(i=i)) # this prints the value that I want
    df1['file'] = "data_{i}.txt".format(i=i)

This loop prints the exact value that I want to put into the 'file' column of df1, but when I look at df1, it only uses the last value for the index.

   x        file
0  0  data_4.txt
1  1  data_4.txt
2  2  data_4.txt
3  3  data_4.txt
4  4  data_4.txt

I have tried using enumerate, but can't find a solution with this. I assume everyone will yell at me for posting a duplicate question, but I have not found anything that works and if someone points me to a solution that solves this problem, I'll happily remove this question.

4 Answers 4

3

There are better ways to create a DataFrame, but to answer your question:

Replace the last line in your code:

df1['file'] = "data_{i}.txt".format(i=i)

with:

df1.loc[i, 'file'] = "data_{0}.txt".format(i)

For more information, read about the .loc here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html

On the same page, you can read about accessors like .at and .iloc as well.

Sign up to request clarification or add additional context in comments.

Comments

3

You can do list-comprehension:

df1['file'] = ["data_{i}.txt".format(i=i) for i in range(5)]
print(df1)

Prints:

   x        file
0  0  data_0.txt
1  1  data_1.txt
2  2  data_2.txt
3  3  data_3.txt
4  4  data_4.txt

OR at the creating of DataFrame:

df1 = pd.DataFrame({'x':range(5), 'file': ["data_{i}.txt".format(i=i) for i in range(5)]})
print(df1)

OR:

df1 = pd.DataFrame([{'x':i, 'file': "data_{i}.txt".format(i=i)} for i in range(5)])
print(df1)

Comments

1

I've found success with the .at method

for i in range(5):
    print("data_{i}.txt".format(i=i)) # this prints the value that I want
    df1.at[i, 'file'] = "data_{i}.txt".format(i=i)

Returns:

   x        file
0  0  data_0.txt
1  1  data_1.txt
2  2  data_2.txt
3  3  data_3.txt
4  4  data_4.txt

Comments

1

when you assign a variable to a dataframe column the way you do -
using the df['colname'] = 'val', it assigns the val across all rows. That is why you are seeing only the last value.

Change your code to:

import pandas as pd
df1 = pd.DataFrame({'x':range(5)})
# for loop to add a row with an index
to_assign = []
for i in range(5):
    print("data_{i}.txt".format(i=i)) # this prints the value that I want
    to_assign.append(data_{i}.txt".format(i=i))
##outside of the loop - only once - to all dataframe rows
df1['file'] = to_assign.

As a thought, pandas has a great API for performing these type of actions without for loops.
You should start practicing those.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.