1

I am new to python and have a question regarding using pandas and matplotlib. I am parsing a CSV file and creating a bar chart using the column names. Depending on the CSV file I am parsing for that week, the column names may be different. For example, this week the column names may be "Wednesday", "Thursday", and "Sunday", but next week they may be other days of the week. My script works fine and I want it to be doing the same thing regardless. My question is how may I read in column names so that I don't have to manually specify the name and just have it find the column based on the location (column1, column2, column3,...".

NOTE: The first column is "Names" as shown in the script below and this part will always be the same, so it's okay to have the column name for this hard-coded in. So I would like to soft-code the column names for column 2, 3, and 4.

Here is a portion of my current script where I have to manually read in column names:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

files = "myfile.csv"
df = pd.read_csv(files)
names = df['Names'].values
x = np.arange(len(names))
w = 0.40
difference = df['Sunday'] - df['Thursday']
colors = ['Red' if d < -5 else 'Blue' for d in difference]
plt.bar(x-w, df['Wednesday'].values, width=w*0.7, label='Wednesday', color = "cyan")
plt.bar(x, df['Thursday'].values, width=w*0.7, label='Thursday', color = "green")
plt.bar(x+w, df['Sunday'].values, width=w*0.7, label='Sunday', color = colors)
...

Ideally, I want a program that looks like this:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

files = "myfile.csv"
df = pd.read_csv(files)
names = df['Names'].values
x = np.arange(len(names))
w = 0.40
column2 = ...
column3 = ...
column4 = ...
difference = df[col4] - df[col3]
colors = ['Red' if d < -5 else 'Blue' for d in difference]
plt.bar(x-w, df[column2].values, width=w*0.7, label='column2', color = "cyan")
plt.bar(x, df[column3].values, width=w*0.7, label='column3', color = "green")
plt.bar(x+w, df[column4].values, width=w*0.7, label='column4', color = colors)
...

For a more clear understanding, here is a sample of what the CSV file looks like:

Name               Monday               Wednesday               Saturday           
Derick                 45                             60                           52                   
Jenna                 56                             87                           89                   
Lisa                    78                             93                           76                   
Harry                  98                             84                           79           

3 Answers 3

1

You can simply replace

column2 = ...
column3 = ...
column4 = ...

with

column2 = df.columns[2]
column3 = df.columns[3]
column4 = df.columns[4]
Sign up to request clarification or add additional context in comments.

1 Comment

Deleted my last comment. This worked perfectly! The mistake I had was that I didn't realize the first column would be columns[0] and not [1]. Working now
0

You can use df.items() to loop over columns (and column data):

colors = {"Monday": "cyan", ...}  # you get the idea

for day_of_week, data in df.items():
    plt.bar(x-w, data, width=w*0.7, label=day_of_week, color=colors[day_of_week])

1 Comment

Mmmm not exactly what I'm looking for. This seems to be mapping a color to the day of the week. I will have to go in manually and put in the day of the week with corresponding color each time the csv column name changes. I just want something general to grab the column name. It's possible that column names may change to any different name at any time, so I don't wanna have to go back in manually ever to change it in my code.
0

not sure what you are looking for, but here's my way:

df = pd.read_excel('test.xlsx')
names = df['Name'].values
df = df.loc[:, df.columns != 'Name']
print(names)
# ['Derick' 'Jenna' 'Lisa' 'Harry']

x = np.arange(len(names))
w = 0.40
columns = [col for col in df]
print(columns)
# ['Monday', 'Wednesday', 'Saturday']

column2 = df[columns[0]]
column3 = df[columns[1]]
column4 = df[columns[2]]
difference = column4 - column3
colors = ['Red' if d < -5 else 'Blue' for d in difference]
plt.bar(x-w, column2, width=w*0.7, label=columns[0], color = "cyan")
plt.bar(x, column3, width=w*0.7, label=columns[1], color = "green")
plt.bar(x+w, column4, width=w*0.7, label=columns[2], color = colors)
plt.legend()
plt.show()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.