Getting frequency values from histogram in R

Question

I know how to draw histograms or other frequency/percentage related tables. But now I want to know, how can I get those frequency values in a table to use after the fact.

I have a massive dataset, now I draw a histogram with a set binwidth. I want to extract the frequency value (i.e. value on y-axis) that corresponds to each binwidth and save it somewhere.

Can someone please help me with this? Thank you!

rcs · Accepted Answer · 2011-10-12 13:20:47Z

59

The hist function has a return value (an object of class histogram):

R> res <- hist(rnorm(100))
R> res
$breaks
[1] -4 -3 -2 -1  0  1  2  3  4

$counts
[1]  1  2 17 27 34 16  2  1

$intensities
[1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01

$density
[1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01

$mids
[1] -3.5 -2.5 -1.5 -0.5  0.5  1.5  2.5  3.5

$xname
[1] "rnorm(100)"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

answered Oct 12, 2011 at 13:20

rcs

69.1k24 gold badges177 silver badges157 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Tomas Over a year ago

Aaaah, just got the same idea and wanted to post this! You were faster :-)

Tomas Over a year ago

You shall also note that he should use plot = FALSE, so that he only gets results without plotting the histogram.

xealits Over a year ago

is there a way without hist? I am trying to make hist with custom breaks and it doesn't work. Could there be something else?

rcs Over a year ago

@xealits: table(cut(rnorm(100), breaks=c(-Inf, -1, 1, Inf)))

rslite · Accepted Answer · 2013-03-10 21:37:55Z

From ?hist: Value

an object of class "histogram" which is a list with components:

breaks the n+1 cell boundaries (= breaks if that was a vector). These are the nominal breaks, not with the boundary fuzz.
counts n integers; for each cell, the number of x[] inside.
density values f^(x[i]), as estimated density values. If all(diff(breaks) == 1), they are the relative frequencies counts/n and in general satisfy sum[i; f^(x[i]) (b[i+1]-b[i])] = 1, where b[i] = breaks[i].
intensities same as density. Deprecated, but retained for compatibility.
mids the n cell midpoints.
xname a character string with the actual x argument name.
equidist logical, indicating if the distances between breaks are all the same.

breaks and density provide just about all you need:

histrv<-hist(x)
histrv$breaks
histrv$density

PatrickT · Accepted Answer · 2017-11-07 12:50:55Z

Just in case someone hits this question with ggplot's geom_histogram in mind, note that there is a way to extract the data from a ggplot object.

The following convenience function outputs a dataframe with the lower limit of each bin (xmin), the upper limit of each bin (xmax), the mid-point of each bin (x), as well as the frequency value (y).

## Convenience function
get_hist <- function(p) {
    d <- ggplot_build(p)$data[[1]]
    data.frame(x = d$x, xmin = d$xmin, xmax = d$xmax, y = d$y)
}

# make a dataframe for ggplot
set.seed(1)
x = runif(100, 0, 10)
y = cumsum(x)
df <- data.frame(x = sort(x), y = y)

# make geom_histogram 
p <- ggplot(data = df, aes(x = x)) + 
    geom_histogram(aes(y = cumsum(..count..)), binwidth = 1, boundary = 0,
                color = "black", fill = "white")

Illustration:

hist = get_hist(p)
head(hist$x)
## [1] 0.5 1.5 2.5 3.5 4.5 5.5
head(hist$y)
## [1]  7 13 24 38 52 57
head(hist$xmax)
## [1] 1 2 3 4 5 6
head(hist$xmin)
## [1] 0 1 2 3 4 5

A related question I answered here (Cumulative histogram with ggplot2).

Collectives™ on Stack Overflow

Getting frequency values from histogram in R

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related