This verilog restriction is just a pain in... and etc... but we have to deal with it.
You can map the 2D array onto a 1D array like this :
wire [32*32-1:0]One_D_array;
integer i;
for (i=0; i<32; i=i+1) assign One_D_array[32*i+31:32*i] = A[i];
Then in your module, you can recreate the 2D array with the inverted for loop :
wire [31:0]local_2D_array[0:31];
integer i;
for (i=0;i<32;i=i+1) assign local_2D_array[i] = input[32*i+31:32*i];
The synthesis tool will handle it as wire remapping, so no LUT/FLIP_FLOP will be used. This is the easiest workaround I found for this limitation.
module #(.para(A)) M1 (.output(B));. I do not think map parameter as InputOutput is legal.generateto convertAto one dimensional array to map.