I am trying to run a large-scale linear program and am translating some of my previous code from MATLAB into Python. However, the problem is that MATLAB and Python are giving me drastically conflicting answers: the MATLAB code finds the optimal solution, yet the Python code says that the problem is infeasible. This is an LP modelling of ell_infinity regression or minimax regression. I am fairly confident in the setup of both functions.
MATLAB code:
function [ x_opt, f_opt, exitflag, output] = ell_infinity_reg_solver( A, b )
% Solves the ell_infinity regression problem ||Ax - b||_inf. That is finds
% the least t for which Ax - b < t.ones and Ax - b > -t.ones.
[n,d] = size(A) ;
if n == 0
f_opt = 0 ;
x_opt = zeros(d,1) ;
return
end
% infinity norm
f = [ zeros(d,1); 1 ];
A = sparse([ A, -ones(n,1) ; -A, -ones(n,1) ]);
b = [ b; -b ];
[xt, f_opt, exitflag, output] = linprog(f,A,b);
x_opt = xt(1:d,:);
end
Python code
from scipy.optimize import linprog
import numpy as np
def ell_infinity_regression_solver(A, b):
"""Compute the solution to the ell_infinity regression problem.
x_hat = arg min_x ||Ax - b||_infty by a reformulating as LP:
min t :
Ax - b <= t * ones
b - Ax >= - t*ones
i.e. finds minimal radius ball enclosing distance between all data points
and target values b.
Input:
A - array or datafram of dimension n by d
b - target vector of size n by 1.
Output:
x_opt which solves the optimisation.
f_opt the radius of the enclosing ball
"""
n,d = A.shape
# include a check on n and d to ensure dimensionality makes sense
if n > 0:
f = np.vstack((np.zeros((d,1)), [1])).ravel()
print("shape of f {}".format(f.shape)) # should be d+1 length
big_A = np.hstack((np.vstack((A,-A)), np.ones((2*n,1))))
print("shape of big_A {}".format(big_A.shape)) # should be 2n*(d+1)
#big_b = np.vstack((b,-b)) # should be length 2n
big_b = b.append(-b) # only works for data frames -- switch to np array?
print("Type of b {}".format(type(b)))
print("Shape of b {}".format(b.shape))
print("shape of big_b {}".format(big_b.shape))
output = linprog(f, A_ub=big_A, b_ub=big_b)
#print(output)
else:
f_opt = 0
x_opt = np.zeros_like(b)
return output
As the scipy method didn't work wI also tried CVXOPT with the added lines
c = cvxopt.matrix(f)
G = cvxopt.matrix(X)
h = cvxopt.matrix(Y)
output = cvxopt.solvers.lp(c, G, h, solver='glpk')
But again this returned a flag of dual-infeasible which contrasts the MATLAB exitflag=1 (success) and dual-simplex algorithm used.
The data I tested on is a 500 by 11 data matrix with an associated set of response variables.
I am interested in what has caused such a significant difference in the outputs and which of these is correct. One thing which does add confusion is that reducing the size of the input to one which is less than full-rank doesn't seem to affect the MATLAB output flag but the Python code doesn't find an optimal solution (as expected I think).
The data is at https://www.dropbox.com/s/g4qcj1q0ncljawq/dataset.csv?dl=0 and the matrix A is the first 11 columns and the target vector is the final column.
scipy) such as mosek (mosek.com), which was the only solver capable of producing a solution in reasonable time for me.