49

I know how to draw histograms or other frequency/percentage related tables. But now I want to know, how can I get those frequency values in a table to use after the fact.

I have a massive dataset, now I draw a histogram with a set binwidth. I want to extract the frequency value (i.e. value on y-axis) that corresponds to each binwidth and save it somewhere.

Can someone please help me with this? Thank you!

0

3 Answers 3

59

The hist function has a return value (an object of class histogram):

R> res <- hist(rnorm(100))
R> res
$breaks
[1] -4 -3 -2 -1  0  1  2  3  4

$counts
[1]  1  2 17 27 34 16  2  1

$intensities
[1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01

$density
[1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01

$mids
[1] -3.5 -2.5 -1.5 -0.5  0.5  1.5  2.5  3.5

$xname
[1] "rnorm(100)"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"
Sign up to request clarification or add additional context in comments.

4 Comments

Aaaah, just got the same idea and wanted to post this! You were faster :-)
You shall also note that he should use plot = FALSE, so that he only gets results without plotting the histogram.
is there a way without hist? I am trying to make hist with custom breaks and it doesn't work. Could there be something else?
@xealits: table(cut(rnorm(100), breaks=c(-Inf, -1, 1, Inf)))
23

From ?hist: Value

an object of class "histogram" which is a list with components:

  • breaks the n+1 cell boundaries (= breaks if that was a vector). These are the nominal breaks, not with the boundary fuzz.
  • counts n integers; for each cell, the number of x[] inside.
  • density values f^(x[i]), as estimated density values. If all(diff(breaks) == 1), they are the relative frequencies counts/n and in general satisfy sum[i; f^(x[i]) (b[i+1]-b[i])] = 1, where b[i] = breaks[i].
  • intensities same as density. Deprecated, but retained for compatibility.
  • mids the n cell midpoints.
  • xname a character string with the actual x argument name.
  • equidist logical, indicating if the distances between breaks are all the same.

breaks and density provide just about all you need:

histrv<-hist(x)
histrv$breaks
histrv$density

Comments

14

Just in case someone hits this question with ggplot's geom_histogram in mind, note that there is a way to extract the data from a ggplot object.

The following convenience function outputs a dataframe with the lower limit of each bin (xmin), the upper limit of each bin (xmax), the mid-point of each bin (x), as well as the frequency value (y).

## Convenience function
get_hist <- function(p) {
    d <- ggplot_build(p)$data[[1]]
    data.frame(x = d$x, xmin = d$xmin, xmax = d$xmax, y = d$y)
}

# make a dataframe for ggplot
set.seed(1)
x = runif(100, 0, 10)
y = cumsum(x)
df <- data.frame(x = sort(x), y = y)

# make geom_histogram 
p <- ggplot(data = df, aes(x = x)) + 
    geom_histogram(aes(y = cumsum(..count..)), binwidth = 1, boundary = 0,
                color = "black", fill = "white")

Illustration:

hist = get_hist(p)
head(hist$x)
## [1] 0.5 1.5 2.5 3.5 4.5 5.5
head(hist$y)
## [1]  7 13 24 38 52 57
head(hist$xmax)
## [1] 1 2 3 4 5 6
head(hist$xmin)
## [1] 0 1 2 3 4 5

A related question I answered here (Cumulative histogram with ggplot2).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.