1

I am using Python pandas read_excel to create a histogram or line plot. I would like to read in the entire file. It is a large file and I only want to plot certain values on it. I know how to use skiprows and parse_cols in read_excel, but if I do this, it does not read a part of the file that I need to use for the axis labels. I also do not know how to tell it to plot what I want for x-values and what I want for the y-values. Heres what I have:

df=pd.read_excel('JanRain.xlsx',parse_cols="C:BD")

years=df[0]
precip=df[31:32]
df.plot.bar()

I want the x axis to be row 1 of the excel file(years) and I want each bar in the bar graph to be the values on row 31 of the excel file. Im not sure how to isolate this. Would it be easier to read with pandas then plot with matplotlib?

Here is a sample of the excel file. The first row is years and the second column is days of the month (this file is only for 1 month:

Here is a sample of the excel file. The first row is years and the second column is days of the month (this file is only for 1 month

1
  • 1
    Do you have sample of your excel spreadsheet that you can post? Commented Oct 17, 2017 at 14:22

1 Answer 1

4

Here's how I would plot the data in row 31 of a large dataframe, setting row 0 as the x-axis. (updated answer)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

create a random array with 32 rows, and 10 columns

df = pd.DataFrame(np.random.rand(320).reshape(32,10), columns=range(64,74), index=range(1,33))
df.to_excel(r"D:\data\data.xlsx")

Read only the columns and rows that you want using "parse_cols" and "skiprows." The first column in this example is the dataframe index.

# load desired columns and rows into a dataframe
# in this method, I firse make a list of all skipped_rows
desired_cols = [0] + list(range(2,9))
skipped_rows = list(range(1,33))
skipped_rows.remove(31)
df = pd.read_excel(r"D:\data\data.xlsx", index_col=0, parse_cols=desired_cols, skiprows=skipped_rows)

Currently this yields a dataframe with only one row.

      65        66       67        68        69        70        71
31  0.310933  0.606858  0.12442  0.988441  0.821966  0.213625  0.254897

isolate only the row that you want to plot, giving a pandas.Series with the original column header as the index

ser = df.loc[31, :]

Plot the series.

fig, ax = plt.subplots()
ser.plot(ax=ax)
ax.set_xlabel("year")
ax.set_ylabel("precipitation")

enter image description here

fig, ax = plt.subplots()
ser.plot(kind="bar", ax=ax)
ax.set_xlabel("year")
ax.set_ylabel("precipitation")

enter image description here

Sign up to request clarification or add additional context in comments.

6 Comments

This helped with the y-axis! But the first row in my file is years written as (64 65 66 ... 14 15 16). How do I get the x-axis to display this? Currently it displays 1-37. Also, I do not want there to be a legend. I just want the same color for all bars. Write now my legend reflects the years properly. I want what is in my legend to be displayed as the x-axis.
I see what you did with index_cols=0, but I basically want to make the x-axis index_rows=0. I know index_rows isnt valid but is there anyways to do that? I want the first row of the excel file to be my x-axis
df.ix[0] will give you the first row.
Jonathan thanks to your comments I understand the problem better, and the updated answer should now hopefully answer your question directly. I use .loc[31,:] or .iloc[0,:] to isolate rows of interest. But transpose (df.T) and selecting a column would also do the job.
Is there a simple way to add a straight line to this graph while still using the code you helped me with? For example: I want to simply add an average line for the precip that month. Thanks
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.