I have a 1D array of strings (gene_name_list). I need to find rows in another 2D array (fully_split) where each string of the first array is present. Of course I can solve it brute force like that:
longest_gene_name = len(max(gene_name_list, key=len))
ensembl_list = np.full((len(gene_name_list)), '', dtype='U{}'.format(longest_gene_name))
for idx, gene_name in enumerate(gene_name_list):
for row in fully_split:
if gene_name in row:
ensembl_list[idx] = row[0]
But it takes too long, I need a faster solution.
row[0] contains special symbols that I am mapping to. So, if a string is found, it will be found in row[1:] portion, and then I am taking row[0]. Not relevant, but to clarify.