I am trying to split my numpy array of data points into test and training sets. To do that, I'm randomly selecting rows from the array to use as the training set and the remaining are the test set.
This is my code:
matrix = numpy.loadtxt("matrix_vals.data", delimiter=',', dtype=float)
matrix_rows, matrix_cols = matrix.shape
# training set
randvals = numpy.random.randint(matrix_rows, size=50)
train = matrix[randvals,:]
test = numpy.delete(matrix, randvals, 0)
print matrix.shape
print train.shape
print test.shape
But the output I get is:
matrix.shape: (130, 14)
train.shape: (50, 14)
test.shape: (89, 14)
This is obviously wrong since the number of rows from train and test should add up to the total number of rows in the matrix but here it's clearly more. Can anyone help me figure out what's going wrong?