0

Plotting some covid19 data with gnuplot, I am trying to find a way to pick a raw in my dataset to use it as a starting point.

E. g. I have something like:

#date       #cases
2010-03-01  11
2010-03-02  13
2020-03-03  17
2020-03-04  20
2020-03-05  29
2020-03-06  38
2020-03-07  50
2020-03-08  63
2020-03-09  82
2020-03-10  105
2020-03-11  140
2020-03-12  180
2020-03-13  240
...

Now I want to find the date when the number of cases became greater than 100 and use this date to adjust/normalise/whatever all my plots.

So I what to somehow find the "2020-03-10 105" row and set two variables

start_date = '2020-03-10'
start_index = 11

to be able to do things like

stats <datafile> every ::start_index
set xrange [start_date:]
...

etc etc to basically ignore everything before the date of >100 cases.

I suppose, it can be possible using some basic for+if loop over the raws of my datafile but I am a bit stuck with his as I can't find a good example/explanation of how to iterate through datafile raws.

P. S. Of course, I could do this with external tools but I'd prefer to not as I'm doing some batch plotting with quite some scripting around to gather the data, and would like to keep all the plotting logic inside my gnuplot jinja2 template to not over-complicate the whole stuff.

2 Answers 2

1

Another option is to apply a filter to the value in the 2nd column in the using part of the plot command. That avoids using every. I personally don't like using set xdata time; I prefer to perform the time conversions explicitly. For example, this will plot the portion of your data file for which the value in column 2 is greater than 100:

  set xrange noextend        # limit range to exactly the data points
  tf = "%Y-%m-%d"
  set xtics time format tf
  plot 'data' using (($2>100) ? timecolumn(1,tf) : NaN): 2 with linespoints

That produces a plot of the part of the data you want. The xrange exactly spans the selected dates. The first date selected can be retrieved by

  start_time = strftime(tf, GPVAL_DATA_X_MIN)
  print start_time
       2020-03-10

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

0

I'm not a gnuplot expert, so this may not be the best way. You can go through the data file twice, once "plotting" it into a dummy array, but calling a function f(y,row) on each line. The function takes the data point (y) and the row number, and sets a variable (start) to remember the row if the data point is over 100:

set xdata time
set timefmt "%Y-%m-%d"
start=0
f(y,row) = (y>=100 && start==0?(start=row,y):y)
set table $Data
  plot "data" using 1:(f($2,$0))
unset table
plot "data" every ::start using 1:2 with lines

If you need the starting string date, for example as a legend title, you can similarly extend the function with an extra parameter and retain it:

...
startdate="???"
f(x,y,row) = (y>=100 && start==0?(startdate=x,start=row,y):y)
...
plot "data" using 1:(f(stringcolumn(1),$2,$0))

4 Comments

Thanks, that seem to be the right way to try. I didn't know "set table" mode at all.
Any ideas of how to extract the date stamp from $Data[start] with not too much of string parsing hackery?
I'm not sure why you would need it, but I've extended the answer to hold the date string.
Thanks a lot. timecolumn() instead of column() looks even better. I'm doing a bit more sophisticated stuff than what my example describes with several curves on a multiplot with some fit approximations, so I need to know the point no the time-based x axis, instead of (or, let's say, in addition to) just stripping off the graph with using ::start.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.