0

I am new to data visualization, so please bear with me. I am trying to create a data plot that describes various different attributes on a data set on blockbuster movies. The x-axis will be year of the movie and the y-axis will be worldwide gross. Now, some movies have made upwards of a billion in this category, and it seems that my y axis is overwhelmed as it completely blocks out the numbers and becomes illegible. Here is what I have thus far:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('blockbusters.csv')
fig, ax = plt.subplots()
ax.set_title('Top Grossing Films')
ax.set_xlabel('Year')
ax.set_ylabel('Worldwide Grossing')

x = df['year'] #xaxis
y = df['worldwide_gross'] #yaxis

plt.show()

Any tips on how to scale this down? Ideally it could be presented on a scale of 10. Thanks in advance!

1 Answer 1

1

You could try logarithmic scaling:

ax.set_yscale('log')

You might want to manually set the ticks on the y-axis using

ax.set_yticks([list of values for which you want to have a tick])
ax.set_yticklabels([list of labels you want on each tick]) # optional

Another way to approach this might be to rank the movies (which gross is the highest, second highest, ...), i.e. on the y axis you would plot

df['worldwide_gross'].rank()

Edit: as you indicate, one might also check the dtypes to make sure the data is numerical. If not, use .astype(int) or .astype(float) to convert it.

Sign up to request clarification or add additional context in comments.

1 Comment

It turns out 'worldwide_gross' was an object type and thus jupyter notebook was very confused. I cast it as an int and now everything is working great. Thank you - I appreciate the help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.