4

I got a scatter graph of Volume(x-axis) against Price(dMidP,y-axis) scatter plot, and I want to divide the x-axis into 30 evenly spaced sections and average the values, then plot the average value I.e. the red dots

here is my data:enter image description here

my code here does not return me the desired plot:

V_norm = Average_Buy['Volume_norm']
df = pd.DataFrame({'X' : np.log(Average_Buy['Volume_norm']), 'Y' : Average_Buy['dMidP']})  #we build a dataframe from the data
total_bins = 30
bins = np.geomspace(V_norm.min(), V_norm.max(), total_bins)
data_cut = pd.cut(df.X,bins)         
grp = df.groupby(by = data_cut)        #we group the data by the cut
ret = grp.aggregate(np.mean)         #we produce an aggregate representation (median) of each bin
plt.loglog(np.log(Average_Buy['Volume_norm']),Average_Buy['dMidP'],'o')
plt.loglog(ret.X,ret.Y,'r-')

plt.show()

here is what I got: enter image description here

my bin returns me: (which looks correct)

array([ 0.59101371,  0.64421962,  0.70221538,  0.76543219,  0.83434009,
    0.90945141,  0.99132461,  1.08056843,  1.17784641,  1.28388183,
    1.39946306,  1.52544948,  1.6627778 ,  1.81246908,  1.97563628,
    2.15349259,  2.34736038,  2.55868108,  2.7890259 ,  3.04010746,
    3.3137926 ,  3.61211619,  3.93729631,  4.29175071,  4.67811481,
    5.09926127,  5.55832137,  6.05870826,  6.6041424 ,  7.19867916])

However, my data_cut returns me:

Time  Time
11    0                  NaN
      1                  NaN
      2                  NaN
      3                  NaN
      4                  NaN
      5                  NaN
      6                  NaN
      7                  NaN
      8                  NaN
      9                  NaN
      10      (0.991, 1.081]
      11                 NaN
      12                 NaN
      13                 NaN
      14                 NaN
      15                 NaN
      16                 NaN
      17                 NaN
      18                 NaN
      19                 NaN
      20                 NaN
      21                 NaN
      22                 NaN
      23                 NaN
      24                 NaN
      25                 NaN
      26                 NaN
      27                 NaN
      28                 NaN
      29                 NaN
                   ...      
14    30                 NaN
      31                 NaN
      32                 NaN
      33                 NaN
      34                 NaN
      35                 NaN
      36                 NaN
      37                 NaN
      38                 NaN
      39                 NaN
      40                 NaN
      41                 NaN
      42                 NaN
      43                 NaN
      44                 NaN
      45                 NaN
      46                 NaN
      47                 NaN
      48                 NaN
      49                 NaN
      50                 NaN
      51                 NaN
      52                 NaN
      53                 NaN
      54                 NaN
      55                 NaN
      56                 NaN
      57                 NaN
      58                 NaN
      59                 NaN
2
  • Possible duplicate of How to overplot a line on a scatter plot in python? Commented Sep 7, 2017 at 18:16
  • I am not trying to plot a linear line of best fit, but rather average the scatter plot and then connect the dots to construct a line Commented Sep 8, 2017 at 9:39

1 Answer 1

2

Your bins variable is not what you want. Either you back-transform bins from log space back to linear space, or you get the bins in linear space with log spacing from the get-go:

bins = np.geomspace(Volume.min(), Volume.max(), total_bins)

EDIT: Changed np.logspace to np.geomspace

Sign up to request clarification or add additional context in comments.

6 Comments

thanks, but when I include this code with total_bins=100 i got an error saying Bin edges must be unique
Notice that I changed my answer from np.logspace to np.geomspace (start and stop in np.logspace are not what I thought they were; np.geomspace does the intuitive thing). If the problem persists, please post the values of bins (and min/max of Volume).
the graph changes, but does not look right either. bin: array([ 4.50996122e-03, 1.79450189e-02, 7.14027653e-02, 2.84109754e-01, 1.13046535e+00, 4.49809235e+00, 1.78977929e+01, 7.12148546e+01, 2.83362062e+02, 1.12749030e+03]); Volume min = 0.0045099612158282188; Volume max= 1127 (so the range is correct)
but please see the update of the question including the problem with this code
Hi Paul, np.geomspace does not work if the start is negative (geomspace(np.log(Volume.min()), np.log(Volume.max()), total_bins))
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.