2

How can I scale the points in a Stata scatterplot by another variable? (I would like to get the area of each point, so I would like to scale by the area, or the square root of the variable, but this step is trivial.)

In R, I would do it as follows:

library(ggplot2)
ggplot(mtcars, aes(x = mpg, y = hp, size = sqrt(wt))) +
    geom_point() + 
    scale_size_continuous(name = "weight") 

enter image description here

In Stata, I would like to do something like:

sysuse auto2, clear
gen weight2 = sqrt(weight)
scatter price mpg, msize(weight2)

But it doesn't work, and says:

(note:  named style weight2 not found in class symbolsize, default attributes used)

I can use weights, but apparently, these do not produce the desired result - see https://www.statalist.org/forums/forum/general-stata-discussion/general/1538359-scatterplots-with-weighted-marker-size-revisited?q=scatter%20area.

Any ideas of what I can do?

2 Answers 2

3

Much depends what you want. Most people using this kind of plot seem to want mostly some almost qualitative specification of which data points correspond to very large ... very small values on some third variable.

Marker size is for that case given not by the msize() option, but by specifying weights.

. sysuse auto, clear
(1978 automobile data)

. twoway scatter mpg weight [fw=price^3] , ms(Oh)
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! I would like the size of each point to exactly reflect the area as defined by the third variable.
I don't think any choice guarantees that. What would that mean if dynamic range was millions or bigger? What does size mean? Area or diameter?
Thanks. In my application, the dynamic range is much smaller (number of observations that a point is based on). If it were too big—millions or billions, say—I would probably take the log. And size would be area, not diameter.
I think you'd need to elicit more details from StataCorp technical support.
-2

If it doesn't HAVE to be a scatter plot, the general consensus is to just use bubble. basically, the exact same code but use bubble instead of scatter plot

sysuse auto2, clear
gen weight2 = sqrt(weight)
twoway bubble price mpg, size(weight2)

2 Comments

Thanks! I tried the code, and get this error: "bubble is not a twoway plot type"
The code is provided was not correct, it was a great solution but there was a simple syntax error, always try and test your answers first before submitting them

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.