I only want the column name when iterating using
for index, row in df.iterrows()
When iterating over a dataframe using df.iterrows:
for i, row in df.iterrows():
...
Each row row is converted to a Series, where row.index corresponds to df.columns, and row.values corresponds to df.loc[i].values, the column values at row i.
Minimal Code Sample
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['a', 'b'])
df
A B
a 1 3
b 2 4
row = None
for i, row in df.iterrows():
print(row['A'], row['B'])
# 1 3
# 2 4
row # outside the loop, `row` holds the last row
A 2
B 4
Name: b, dtype: int64
row.index
# Index(['A', 'B'], dtype='object')
row.index.equals(df.columns)
# True
row.index[0]
# A
You are already getting to column name, so if you just want to drop the series you can just use the throwaway _ variable when starting the loop.
for column_name, _ in df.iteritems():
# do something
However, I don't really understand the use case. You could just iterate over the column names directly:
for column in df.columns:
# do something
when we use for index, row in df.iterrows() the right answer is row.index[i] to get the cloumn name, for example:
pdf = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
pdf.head(5)
A B C D
0 3 1 2 6
1 5 8 7 3
2 7 2 2 5
3 0 9 9 4
4 1 8 1 4
for index, row in pdf[:3].iterrows():# we check only 3 rows in the dataframe
for i in range(4):
if row[i] > 7 :
print(row.index[i]) #then the answer is B
B
row.index corresponds to df.columns.