0

I am working with data formatted in a .txt file in the format below:

family1 1 0 0 2 0 2 2 0 0 0 1 0 1 1 0 0 0 0 1 NA NA 4
family1 2 0 0 2 2 1 4 0 0 0 0 0 0 0 0 0 0 0 0 NA NA 4
family1 3 0 0 2 5 1 2 0 0 0 1 1 0 1 1 1 0 0 0 NA NA 2
family2 1 0 0 2 5 2 1 1 1 1 0 0 0 0 0 0 0 0 0 NA NA 3
etc. 

where the second column is a member of the family and the other columns are numbers that correspond to traits. I need to compare the relatives listed in this data set to create an output like this:

family1 1 2 traitnumber traitnumber ...
family1 1 3 traitnumber traitnumber ...
family1 2 3 traitnumber traitnumber ...

where the numbers are the relatives.

I have created a data frame using:

import pandas as pd
data = pd.read_csv('file.txt.', sep=" ", header = None)
print(data)

Can you offer any advice on the most efficient way to concatenate this data into the desired rows? I am having trouble comparing thinking of a way to write code for the different combinations i.e. relative 1 and 2, 1 and 3, and 2 and 3. Thank you!

2 Answers 2

1

You might find combinations from itertools to be helpful.

from itertools import combinations
print([thing for thing in combinations((1,2,3), 2)])

Yields

[(1, 2), (1, 3), (2, 3)]
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you that would be helpful, but could I do that/ how would I do that since I have about 24 elements per row?
Can you offer any more advice?
0

Building on DragonBobZ comment. You could do something like this using the groupby function of the dataframe to split out the families

import pandas as pd
data = pd.read_csv('file.txt', sep=" ", header = None)
print(data)

from itertools import combinations
grouped_df = data.groupby(0)

for key, item in grouped_df:
    print key
    current_subgroup = grouped_df.get_group(key)
    print current_subgroup
    print current_subgroup.shape, "\n"
    print([thing for thing in combinations(range(current_subgroup.shape[0]), 2)])

Grabbing the output of the "combinations" line will give you a list of tuples that you can use in conjunction with row indexing to perform the comparisons for the appropriate columns.

1 Comment

Needless to say you would have to iterate over the list of tuples. :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.