I am working on testing log-linear learning for the class of potential games, and I want to test it with more than two players. I am having issues optimising the code.
Currently, I am utilising the fact that log linear learning induces a Markov chain and the transitions can be explained with a transition matrix. However as the number of players increases the dimensions of the transition matrix increase rapidly making it impossible to use the dense numpy arrays to store the transition matrix. I have tried using the scipy.sparse matrices but the computation is very slow - the way the matrix is used is by multiplying stationary distribution with the transition matrix.
I would like to make it feasible to test for multiple players and optimise the computation time of the code.
Here is the formulation of the transition matrix with numpy.
self.action_profiles = enumerate(np.array(list(product(np.arange(self.no_actions), repeat = self.no_players))))
self.potential = np.zeros((self.no_action_profiles, 1))
P = np.zeros([self.no_action_profiles, self.no_action_profiles])
for idx, profile in self.action_profiles:
self.potential[idx] = self.potential_function(profile)
for player_id in range(self.no_players):
mask = np.arange(len(profile)) != player_id
opponents_actions = profile[mask] # extract the opponents actions from the action profile
utilities = np.array([self.utility_functions[player_id](i, opponents_actions) for i in range(self.no_actions)])
exp_values = np.exp(beta * utilities)
p = exp_values/np.sum(exp_values)
i = idx - profile[player_id]*self.no_actions**(self.no_players - 1 - player_id)
stride = self.no_actions ** (self.no_players - 1 - player_id)
P[idx, i: i + self.no_actions**(self.no_players - player_id) : stride] += 1/self.no_players*p
self.P = P
And with scipy.
self.action_profiles = enumerate(np.array(list(product(np.arange(self.no_actions), repeat = self.no_players))))
self.potential = lil_matrix((self.no_action_profiles, 1))
# P = np.zeros([self.no_action_profiles, self.no_action_profiles])
P_row, P_col, P_data = [], [], []
for idx, profile in self.action_profiles:
self.potential[idx] = self.potential_function(profile)
for player_id in range(self.no_players):
mask = np.arange(len(profile)) != player_id
opponents_actions = profile[mask] # extract the opponents actions from the action profile
utilities = np.array([self.utility_functions[player_id](i, opponents_actions) for i in range(self.no_actions)])
exp_values = np.exp(beta * utilities)
p = exp_values/np.sum(exp_values)
i = idx - profile[player_id]*self.no_actions**(self.no_players - 1 - player_id)
stride = self.no_actions ** (self.no_players - 1 - player_id)
for j, prob in enumerate(p):
P_row.append(idx)
P_col.append(i + j * stride)
P_data.append(prob / self.no_players)
P = coo_matrix((P_data, (P_row, P_col)), shape=(self.no_action_profiles, self.no_action_profiles))
self.P = P.tocsr()
return self.P
And they are to be used like this.
P = self.gameSetup.formulate_transition_matrix(beta)
mu0 = self.mu_matrix.copy()
self.expected_value = np.zeros((int(self.max_iter), 1))
P = np.linalg.matrix_power(P, scale_factor)
for i in range(self.max_iter):
mu = mu0 @ P
mu0 = mu
self.expected_value[i] = mu @ self.gameSetup.potential
self.expected_value = self.expected_value
self.stationary = mu