Concatenate Data -Python

Question

I am working with data formatted in a .txt file in the format below:

family1 1 0 0 2 0 2 2 0 0 0 1 0 1 1 0 0 0 0 1 NA NA 4
family1 2 0 0 2 2 1 4 0 0 0 0 0 0 0 0 0 0 0 0 NA NA 4
family1 3 0 0 2 5 1 2 0 0 0 1 1 0 1 1 1 0 0 0 NA NA 2
family2 1 0 0 2 5 2 1 1 1 1 0 0 0 0 0 0 0 0 0 NA NA 3
etc.

where the second column is a member of the family and the other columns are numbers that correspond to traits. I need to compare the relatives listed in this data set to create an output like this:

family1 1 2 traitnumber traitnumber ...
family1 1 3 traitnumber traitnumber ...
family1 2 3 traitnumber traitnumber ...

where the numbers are the relatives.

I have created a data frame using:

import pandas as pd
data = pd.read_csv('file.txt.', sep=" ", header = None)
print(data)

Can you offer any advice on the most efficient way to concatenate this data into the desired rows? I am having trouble comparing thinking of a way to write code for the different combinations i.e. relative 1 and 2, 1 and 3, and 2 and 3. Thank you!

DragonBobZ · Accepted Answer · 2017-07-26 21:09:02Z

1

You might find combinations from itertools to be helpful.

from itertools import combinations
print([thing for thing in combinations((1,2,3), 2)])

Yields

[(1, 2), (1, 3), (2, 3)]

answered Jul 26, 2017 at 21:09

DragonBobZ

2,51824 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

G. Keeler Over a year ago

Thank you that would be helpful, but could I do that/ how would I do that since I have about 24 elements per row?

G. Keeler Over a year ago

Can you offer any more advice?

Luis Antonio Dominguez · Accepted Answer · 2017-07-26 22:39:54Z

0

Building on DragonBobZ comment. You could do something like this using the groupby function of the dataframe to split out the families

import pandas as pd
data = pd.read_csv('file.txt', sep=" ", header = None)
print(data)

from itertools import combinations
grouped_df = data.groupby(0)

for key, item in grouped_df:
    print key
    current_subgroup = grouped_df.get_group(key)
    print current_subgroup
    print current_subgroup.shape, "\n"
    print([thing for thing in combinations(range(current_subgroup.shape[0]), 2)])

Grabbing the output of the "combinations" line will give you a list of tuples that you can use in conjunction with row indexing to perform the comparisons for the appropriate columns.

answered Jul 26, 2017 at 22:39

Luis Antonio Dominguez

1

1 Comment

Luis Antonio Dominguez Over a year ago

Needless to say you would have to iterate over the list of tuples. :-)

Collectives™ on Stack Overflow

Concatenate Data -Python

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related