15

I'd like to create what my statistics book calls a "dot plot" where the number of dots in the plot equals the number of observations. Here's an example from mathisfun.com:

example dot plot

In the example, there are six dots above the 0 value on the X-axis representing the six observations of the value zero.

It seems that a "dot plot" can have several variations. In looking up how to create this with Matplotlib, I only came across what I know of as a scatter plot with a data point representing the relationship between the X and Y value.

Is the type of plot I'm trying to create possible with Matplotlib?

2
  • This is just a histogram. Commented Apr 7, 2018 at 4:51
  • 4
    for x, y in zip(xs, ys): plt.plot([x]*y, list(range(y)), 'ro') plt.show() Commented Apr 7, 2018 at 4:54

6 Answers 6

20

Supoose you have some data that would produce a histogram like the following,

import numpy as np; np.random.seed(13)
import matplotlib.pyplot as plt

data = np.random.randint(0,12,size=72)

plt.hist(data, bins=np.arange(13)-0.5, ec="k")

plt.show()

enter image description here

You may create your dot plot by calculating the histogram and plotting a scatter plot of all possible points, the color of the points being white if they exceed the number given by the histogram.

import numpy as np; np.random.seed(13)
import matplotlib.pyplot as plt

data = np.random.randint(0,12,size=72)
bins = np.arange(13)-0.5

hist, edges = np.histogram(data, bins=bins)

y = np.arange(1,hist.max()+1)
x = np.arange(12)
X,Y = np.meshgrid(x,y)

plt.scatter(X,Y, c=Y<=hist, cmap="Greys")

plt.show()

Alternatively you may set the unwanted points to nan,

Y = Y.astype(np.float)
Y[Y>hist] = np.nan

plt.scatter(X,Y)

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

I tried this method on a different dataset. I got ValueError: operands could not be broadcast together with shapes (25,350) (15,) It has to do with the '"Y<=hist"'. Do you happen to know if this is a common problem with a simple solution? Thanks.
5

This answer is built on the code posted by eyllanesc in his comment to the question as I find it elegant enough to merit an illustrative example. I provide two versions: a simple one where formatting parameters have been set manually and a second version where some of the formatting parameters are set automatically based on the data.

Simple version with manual formatting

import numpy as np                 # v 1.19.2
import matplotlib.pyplot as plt    # v 3.3.2

# Create random data
rng = np.random.default_rng(123) # random number generator
data = rng.integers(0, 13, size=40)
values, counts = np.unique(data, return_counts=True)

# Draw dot plot with appropriate figure size, marker size and y-axis limits
fig, ax = plt.subplots(figsize=(6, 2.25))
for value, count in zip(values, counts):
    ax.plot([value]*count, list(range(count)), 'co', ms=10, linestyle='')
for spine in ['top', 'right', 'left']:
    ax.spines[spine].set_visible(False)
ax.yaxis.set_visible(False)
ax.set_ylim(-1, max(counts))
ax.set_xticks(range(min(values), max(values)+1))
ax.tick_params(axis='x', length=0, pad=8, labelsize=12)

plt.show()

dotplot_manual


Advanced version with automated formatting

If you plan on using this plot quite often, it can be useful to add some automated formatting parameters to get appropriate figure dimensions and marker size. In the following example, the parameters are defined in a way that works best with the kind of data for which this type of plot is typically useful (integer data with a range of up to a few dozen units and no more than a few hundred data points).

# Create random data
rng = np.random.default_rng(1) # random number generator
data = rng.integers(0, 21, size=100)
values, counts = np.unique(data, return_counts=True)

# Set formatting parameters based on data
data_range = max(values)-min(values)
width = data_range/2 if data_range<30 else 15
height = max(counts)/3 if data_range<50 else max(counts)/4
marker_size = 10 if data_range<50 else np.ceil(30/(data_range//10))

# Create dot plot with appropriate format
fig, ax = plt.subplots(figsize=(width, height))
for value, count in zip(values, counts):
    ax.plot([value]*count, list(range(count)), marker='o', color='tab:blue',
            ms=marker_size, linestyle='')
for spine in ['top', 'right', 'left']:
    ax.spines[spine].set_visible(False)
ax.yaxis.set_visible(False)
ax.set_ylim(-1, max(counts))
ax.set_xticks(range(min(values), max(values)+1))
ax.tick_params(axis='x', length=0, pad=10)

plt.show()

dotplot_automated

1 Comment

I like your dot plot design best so I used your automatic formatting template. (I asked a question about it in Staging Ground) However, my x-axis numbers are much too big and the dots themselves are highly blurry and low resolution for some reason. Any ideas why this would happen?
1

Pass your dataset to this function:

def dot_diagram(dataset):
    values, counts = np.unique(dataset, return_counts=True)
    data_range = max(values)-min(values)
    width = data_range/2 if data_range<30 else 15
    height = max(counts)/3 if data_range<50 else max(counts)/4
    marker_size = 10 if data_range<50 else np.ceil(30/(data_range//10))
    fig, ax = plt.subplots(figsize=(width, height))
    for value, count in zip(values, counts):
        ax.plot([value]*count, list(range(count)), marker='o', color='tab:blue',
                ms=marker_size, linestyle='')
    for spine in ['top', 'right', 'left']:
        ax.spines[spine].set_visible(False)
    ax.yaxis.set_visible(False)
    ax.set_ylim(-1, max(counts))
    ax.set_xticks(range(min(values), max(values)+1))
    ax.tick_params(axis='x', length=0, pad=10)

Comments

1

Let's say this is my data:

data  = [5,8,3,7,1,5,3,2,3,3,8,5]

In order to plot a "dot plot", I will need the data (x-axis) and frequency (y-axis)

pos = [] 
keys = {} # this dict will help to keep track ...

# this loop will give us a list of frequencies to each number
for num in data: 
   if num not in keys:
      keys[num] = 1
      pos.append(1)
   else:
      keys[num] += 1
      apos.append(keys[num])


print(pos)
[1, 1, 1, 1, 1, 2, 2, 1, 3, 4, 2, 3]

plt.scatter(data, pos)
plt.show()

enter image description here

Comments

0

Recently, I have also come up with something like this. And I have made the following for my case.

Hope this is helpful.

Well, we will first generate the frequency table and then we will generate points from that to do a scatter plot. Thats all! Superb simple.

For example, in your case, we have for 0 minutes, 6 people. This frequency can be converted into

[(0,1),(0,2),(0,3),(0,4),(0,5),(0,6)]

Then, these points has to be simply plotted using the pyplot.scatter.

import numpy as np
import matplotlib.pyplot as plt

def generate_points_for_dotplot(arr):
    freq = np.unique(arr,return_counts=True)
    ls = []
    for (value, count) in zip(freq[0],freq[1]):
        ls += [(value,num) for num in range(count)]
    x = [x for (x,y) in ls]
    y = [y for (x,y) in ls]
    return np.array([x,y])

Of course, this function return an array of two arrays, one for x co-ordinates and the other for y co-ordinates (Just because, thats how pyplot needs the points!). Now, we have the function to generate the points required to us, let us plot it then.

arr = np.random.randint(1,21,size=100)
x,y = generate_points_for_dotplot(arr)

# Plotting
fig,ax = plt.subplots(figsize = (max(x)/3,3)) # feel free to use Patricks answer to make it more dynamic
ax.scatter(x,y,s=100,facecolors='none',edgecolors='black')
ax.set_xticks(np.unique(x))
ax.yaxis.set_visible(False)
# removing the spines
for spine in ['top', 'right', 'left']:
    ax.spines[spine].set_visible(False)
plt.show()

Output:

dotplot

Probably, if the x ticks becomes over whelming, you can rotate them. However, for more number of values, that also becomes clumsy.

Comments

0

Doing it the Easy Way

Use ArviZ

If you can use additional packages I would suggest using ArviZ which uses Matplotlib under the hood and offers proper dotplot.

Documentation of ArviZ dotplot

Sample code

import matplotlib.pyplot as plt
import numpy as np
import arviz as az


# Data is hardcoded here while a more sophisticated method can be used
data = np.array([0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 4, 4, 5, 5, 5, 5, 5, 8, 8, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 12])

# The main plotting function call
ax = az.plot_dot(data, dotcolor="C1", dotsize=0.8)

# Setting title
ax.set_title("Minutes to Eat Breakfast")

plt.show()

Output

Desired dot plot

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.