Reading Column Names using Python Pandas

Question

I am new to python and have a question regarding using pandas and matplotlib. I am parsing a CSV file and creating a bar chart using the column names. Depending on the CSV file I am parsing for that week, the column names may be different. For example, this week the column names may be "Wednesday", "Thursday", and "Sunday", but next week they may be other days of the week. My script works fine and I want it to be doing the same thing regardless. My question is how may I read in column names so that I don't have to manually specify the name and just have it find the column based on the location (column1, column2, column3,...".

NOTE: The first column is "Names" as shown in the script below and this part will always be the same, so it's okay to have the column name for this hard-coded in. So I would like to soft-code the column names for column 2, 3, and 4.

Here is a portion of my current script where I have to manually read in column names:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

files = "myfile.csv"
df = pd.read_csv(files)
names = df['Names'].values
x = np.arange(len(names))
w = 0.40
difference = df['Sunday'] - df['Thursday']
colors = ['Red' if d < -5 else 'Blue' for d in difference]
plt.bar(x-w, df['Wednesday'].values, width=w*0.7, label='Wednesday', color = "cyan")
plt.bar(x, df['Thursday'].values, width=w*0.7, label='Thursday', color = "green")
plt.bar(x+w, df['Sunday'].values, width=w*0.7, label='Sunday', color = colors)
...

Ideally, I want a program that looks like this:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

files = "myfile.csv"
df = pd.read_csv(files)
names = df['Names'].values
x = np.arange(len(names))
w = 0.40
column2 = ...
column3 = ...
column4 = ...
difference = df[col4] - df[col3]
colors = ['Red' if d < -5 else 'Blue' for d in difference]
plt.bar(x-w, df[column2].values, width=w*0.7, label='column2', color = "cyan")
plt.bar(x, df[column3].values, width=w*0.7, label='column3', color = "green")
plt.bar(x+w, df[column4].values, width=w*0.7, label='column4', color = colors)
...

For a more clear understanding, here is a sample of what the CSV file looks like:

Name       Monday       Wednesday       Saturday
Derick       45             60                 52
Jenna       56             87                 89
Lisa       78             93                 76
Harry          98             84                 79

abhilb · Accepted Answer · 2019-12-21 20:54:14Z

1

You can simply replace

column2 = ...
column3 = ...
column4 = ...

with

column2 = df.columns[2]
column3 = df.columns[3]
column4 = df.columns[4]

answered Dec 21, 2019 at 20:54

abhilb

5,7672 gold badges22 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

programminglearner Over a year ago

Deleted my last comment. This worked perfectly! The mistake I had was that I didn't realize the first column would be columns[0] and not [1]. Working now

Pierre V. · Accepted Answer · 2019-12-21 20:46:00Z

0

You can use df.items() to loop over columns (and column data):

colors = {"Monday": "cyan", ...}  # you get the idea

for day_of_week, data in df.items():
    plt.bar(x-w, data, width=w*0.7, label=day_of_week, color=colors[day_of_week])

answered Dec 21, 2019 at 20:46

Pierre V.

1,6151 gold badge11 silver badges14 bronze badges

1 Comment

programminglearner Over a year ago

Mmmm not exactly what I'm looking for. This seems to be mapping a color to the day of the week. I will have to go in manually and put in the day of the week with corresponding color each time the csv column name changes. I just want something general to grab the column name. It's possible that column names may change to any different name at any time, so I don't wanna have to go back in manually ever to change it in my code.

Gius · Accepted Answer · 2019-12-22 00:24:13Z

0

not sure what you are looking for, but here's my way:

df = pd.read_excel('test.xlsx')
names = df['Name'].values
df = df.loc[:, df.columns != 'Name']
print(names)
# ['Derick' 'Jenna' 'Lisa' 'Harry']

x = np.arange(len(names))
w = 0.40
columns = [col for col in df]
print(columns)
# ['Monday', 'Wednesday', 'Saturday']

column2 = df[columns[0]]
column3 = df[columns[1]]
column4 = df[columns[2]]
difference = column4 - column3
colors = ['Red' if d < -5 else 'Blue' for d in difference]
plt.bar(x-w, column2, width=w*0.7, label=columns[0], color = "cyan")
plt.bar(x, column3, width=w*0.7, label=columns[1], color = "green")
plt.bar(x+w, column4, width=w*0.7, label=columns[2], color = colors)
plt.legend()
plt.show()

edited Dec 22, 2019 at 0:24

answered Dec 22, 2019 at 0:06

Gius

5146 silver badges15 bronze badges

Collectives™ on Stack Overflow

Reading Column Names using Python Pandas

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related