I am working with data formatted in a .txt file in the format below:
family1 1 0 0 2 0 2 2 0 0 0 1 0 1 1 0 0 0 0 1 NA NA 4
family1 2 0 0 2 2 1 4 0 0 0 0 0 0 0 0 0 0 0 0 NA NA 4
family1 3 0 0 2 5 1 2 0 0 0 1 1 0 1 1 1 0 0 0 NA NA 2
family2 1 0 0 2 5 2 1 1 1 1 0 0 0 0 0 0 0 0 0 NA NA 3
etc.
where the second column is a member of the family and the other columns are numbers that correspond to traits. I need to compare the relatives listed in this data set to create an output like this:
family1 1 2 traitnumber traitnumber ...
family1 1 3 traitnumber traitnumber ...
family1 2 3 traitnumber traitnumber ...
where the numbers are the relatives.
I have created a data frame using:
import pandas as pd
data = pd.read_csv('file.txt.', sep=" ", header = None)
print(data)
Can you offer any advice on the most efficient way to concatenate this data into the desired rows? I am having trouble comparing thinking of a way to write code for the different combinations i.e. relative 1 and 2, 1 and 3, and 2 and 3. Thank you!