Finding fitting range using gnuplot

Question

I am new to Gnuplot, I have a non-linear data set and I want to fit the data within the linear range only. I normally do the fitting and specifies the fit range using the following command and redo the fitting process by changing the fit range manually until I get the optimum range for the fit:

fit [0.2:0.6]f(x) "data.txt" u 2:3:6 yerror via m1,m2 

plot "<(sed -n '15,500p' data.txt)" u 2:3:6 w yerr title 'Window A',[0:.6] f(x) notitle lc rgb 'black'

Is it possible to iteratively run the fit within some data range to obtain the optimum data range for the fit in Gnuplot?

The data is typically like this one: data

Welcome to StackOverflow! Yes, probably it's possible and probably it depends on the data. So, please provide some example data. stackoverflow.com/help/minimal-reproducible-example — theozh
– theozh, Commented Feb 2, 2021 at 5:53
Hi @theozh, sorry I did not put the data. I edited my question and add typical data that I use. — mas
– mas, Commented Feb 2, 2021 at 6:35
what else do you know about your data? For example, is this linear range typically always at the beginning? Could it also be at the end? How large is the linear range typically compared the the full data range...? Anything which gives more information about where and how to roughly find the fitting range might make the procedure easier and more reliable. — theozh
– theozh, Commented Feb 2, 2021 at 7:27
Thanks for the comment @theozh, In this data, linearity at the end of the data is not important. I want to use the linear range at the beginning of the data only as it is the high-efficiency value range. The x-axis (column 2) represents the inefficiency value, so I want to find the lowest efficiency value that can be included in the linear fit. In a manual way, I start the fit range from the 15th data and extend the range to some point (85th data) and find the fit range that gives me the best fit. I want to know how to make Gnuplot iteratively fit for every nth data range.. — mas
– mas, Commented Feb 2, 2021 at 8:17

theozh · Accepted Answer · 2021-02-02 11:58:42Z

Your data (I named the file 'mas_data.txt') looks like the following (please always show/provide relevant data in your question).

Data: (how to plot with zoom-in)

### plotting data with zoom-in
reset session
FILE = 'mas_data.txt'

colX = 2
colY = 3
set key top left

set multiplot
    plot FILE u colX:colY w lp pt 7 ps 0.3 lc rgb "red" ti "Data", \

    set title "Zoom in"
    set origin 0.45,0.1
    set size 0.5, 0.6
    set xrange [0:1.0]
    plot FILE u colX:colY w lp pt 7 ps 0.3 lc rgb "red" ti "Data"
    
unset multiplot
### end of code

Regarding the "optimum" fitting range, you could try the following procedure:

find the absolute y-minimum of your data using stats (see help stats)
limit the x-range from this minimum to the maximum x-value
do a linear fit with f(x)=a*x+b and remember the standard error value for the slope (here: a_err)
reduce the x-range by a factor of 2
go back to 3. until you have reached the number of iteration (here: N=10)
find the minimum of Aerr[i] and get the corresponding x-range

The assumption is if the relative error (Aerr[i]) has a minimum then you will have the "best" fitting range for a linear fit starting from the minimum of your data. However, I'm not sure if this procedure will be robust for all of your datasets. Maybe there are smarter procedures. Of course, you can also decrease the xrange in different steps. This procedure could be a starting point for further adaptions and optimizations.

Code:

### finding "best" fitting range
reset session

FILE = 'mas_data.txt'
colX = 2
colY = 3

stats FILE u colX:colY nooutput   # do some statistics
MinY = STATS_min_y          # minimum y-value
MinX = STATS_pos_min_y      # x position of minimum y-value
Xmax = STATS_max_x          # maximum x-value
XRangeMax = Xmax-MinX

f(x,a,b) = a*x + b 
set fit quiet nolog

N = 10
array A[N]
array B[N]
array Aerr[N]
array R[N]

set print $myRange
    do for [i=1:N] {
        XRange = XRangeMax/2**(i-1)
        R[i] = MinX+XRange
        fit [MinX:R[i]] f(x,a,b) FILE u colX:colY via a,b
        A[i] = a
        Aerr[i] = a_err/a*100   # asymptotic standard error in %
        B[i] = b
        print sprintf("% 9.3g % 9.3f   %g",MinX,R[i],Aerr[i])
    }
set print 
print $myRange

set key bottom right
set xrange [0:1.5]

plot FILE    u colX:colY w lp pt 7 ps 0.3 lc rgb "red" ti "Data", \
     for [i=1:N] [MinX:R[i]] f(x,A[i],B[i]) w l lc i title sprintf("%.2f%%",Aerr[i])

stats [*:*] $myRange u 2:3 nooutput
print sprintf('"Best" fitting range %.3f to %.3f', MinX, STATS_pos_min_y)
### end of code

Result:

Zoom-in xrange[0:1.0]

0.198    19.773   1.03497
0.198     9.985   1.09066
0.198     5.092   1.42902
0.198     2.645   1.53509
0.198     1.421   1.81259
0.198     0.810   0.659631
0.198     0.504   0.738046
0.198     0.351   0.895321
0.198     0.274   2.72058
0.198     0.236   8.50502


"Best" fitting range 0.198 to 0.810

awesome! thanks @theozh for the procedure, code, and for the clear explanation! With some optimizations, I can solve the problem. Thank you once again, and I'll remember to provide relevant data and enough information on the next question.

Collectives™ on Stack Overflow

Finding fitting range using gnuplot

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related