1

I am new to Gnuplot, I have a non-linear data set and I want to fit the data within the linear range only. I normally do the fitting and specifies the fit range using the following command and redo the fitting process by changing the fit range manually until I get the optimum range for the fit:

fit [0.2:0.6]f(x) "data.txt" u 2:3:6 yerror via m1,m2 

plot "<(sed -n '15,500p' data.txt)" u 2:3:6 w yerr title 'Window A',[0:.6] f(x) notitle lc rgb 'black'

Is it possible to iteratively run the fit within some data range to obtain the optimum data range for the fit in Gnuplot?

The data is typically like this one: data

4
  • Welcome to StackOverflow! Yes, probably it's possible and probably it depends on the data. So, please provide some example data. stackoverflow.com/help/minimal-reproducible-example Commented Feb 2, 2021 at 5:53
  • Hi @theozh, sorry I did not put the data. I edited my question and add typical data that I use. Commented Feb 2, 2021 at 6:35
  • what else do you know about your data? For example, is this linear range typically always at the beginning? Could it also be at the end? How large is the linear range typically compared the the full data range...? Anything which gives more information about where and how to roughly find the fitting range might make the procedure easier and more reliable. Commented Feb 2, 2021 at 7:27
  • Thanks for the comment @theozh, In this data, linearity at the end of the data is not important. I want to use the linear range at the beginning of the data only as it is the high-efficiency value range. The x-axis (column 2) represents the inefficiency value, so I want to find the lowest efficiency value that can be included in the linear fit. In a manual way, I start the fit range from the 15th data and extend the range to some point (85th data) and find the fit range that gives me the best fit. I want to know how to make Gnuplot iteratively fit for every nth data range.. Commented Feb 2, 2021 at 8:17

1 Answer 1

1

Your data (I named the file 'mas_data.txt') looks like the following (please always show/provide relevant data in your question).

Data: (how to plot with zoom-in)

### plotting data with zoom-in
reset session
FILE = 'mas_data.txt'

colX = 2
colY = 3
set key top left

set multiplot
    plot FILE u colX:colY w lp pt 7 ps 0.3 lc rgb "red" ti "Data", \

    set title "Zoom in"
    set origin 0.45,0.1
    set size 0.5, 0.6
    set xrange [0:1.0]
    plot FILE u colX:colY w lp pt 7 ps 0.3 lc rgb "red" ti "Data"
    
unset multiplot
### end of code

enter image description here

Regarding the "optimum" fitting range, you could try the following procedure:

  1. find the absolute y-minimum of your data using stats (see help stats)
  2. limit the x-range from this minimum to the maximum x-value
  3. do a linear fit with f(x)=a*x+b and remember the standard error value for the slope (here: a_err)
  4. reduce the x-range by a factor of 2
  5. go back to 3. until you have reached the number of iteration (here: N=10)
  6. find the minimum of Aerr[i] and get the corresponding x-range

The assumption is if the relative error (Aerr[i]) has a minimum then you will have the "best" fitting range for a linear fit starting from the minimum of your data. However, I'm not sure if this procedure will be robust for all of your datasets. Maybe there are smarter procedures. Of course, you can also decrease the xrange in different steps. This procedure could be a starting point for further adaptions and optimizations.

Code:

### finding "best" fitting range
reset session

FILE = 'mas_data.txt'
colX = 2
colY = 3

stats FILE u colX:colY nooutput   # do some statistics
MinY = STATS_min_y          # minimum y-value
MinX = STATS_pos_min_y      # x position of minimum y-value
Xmax = STATS_max_x          # maximum x-value
XRangeMax = Xmax-MinX

f(x,a,b) = a*x + b 
set fit quiet nolog

N = 10
array A[N]
array B[N]
array Aerr[N]
array R[N]

set print $myRange
    do for [i=1:N] {
        XRange = XRangeMax/2**(i-1)
        R[i] = MinX+XRange
        fit [MinX:R[i]] f(x,a,b) FILE u colX:colY via a,b
        A[i] = a
        Aerr[i] = a_err/a*100   # asymptotic standard error in %
        B[i] = b
        print sprintf("% 9.3g % 9.3f   %g",MinX,R[i],Aerr[i])
    }
set print 
print $myRange

set key bottom right
set xrange [0:1.5]

plot FILE    u colX:colY w lp pt 7 ps 0.3 lc rgb "red" ti "Data", \
     for [i=1:N] [MinX:R[i]] f(x,A[i],B[i]) w l lc i title sprintf("%.2f%%",Aerr[i])

stats [*:*] $myRange u 2:3 nooutput
print sprintf('"Best" fitting range %.3f to %.3f', MinX, STATS_pos_min_y)
### end of code

Result:

enter image description here

Zoom-in xrange[0:1.0]

enter image description here

0.198    19.773   1.03497
0.198     9.985   1.09066
0.198     5.092   1.42902
0.198     2.645   1.53509
0.198     1.421   1.81259
0.198     0.810   0.659631
0.198     0.504   0.738046
0.198     0.351   0.895321
0.198     0.274   2.72058
0.198     0.236   8.50502


"Best" fitting range 0.198 to 0.810
Sign up to request clarification or add additional context in comments.

1 Comment

awesome! thanks @theozh for the procedure, code, and for the clear explanation! With some optimizations, I can solve the problem. Thank you once again, and I'll remember to provide relevant data and enough information on the next question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.