0

I am trying to create a code for the following data:

List of Carbon A's and Carbon B's

I have imported the data using the code:

import csv
import itertools
import pandas as pd

input_file="computation.csv"
cmd=pd.read_csv(input_file)
subset = cmd[['Carbon A', 'Carbon B']]
carbon_pairs = [tuple(y) for y in subset.values]
c_pairs = carbon_pairs

I want to create a code that has the output:

1 is connected to
  2
  4
  6
  7 
  8
2 is connected to
  1
  4
  5

Note that for 'carbon' 2, I would like it to repeat that it is connected to carbon 1. I was thinking that some permutation would be able to show this, but I am very unsure where to start. Basically, the code needs to output:

for every cell with the same value, print adjacent cell
1
  • I would recommend simplifying your question, and putting the backstory below: "Given a list of (a, b) pairs like [(1, 2), (1, 4), (1, 6), (1, 7), (1, 8), (2, 1), (2, 4), (2, 5)], how can I find all b values associated with each a value?" Commented Mar 30, 2017 at 20:35

2 Answers 2

1

You can get your desired output without the pandas dependency with the following function (Python 2), which will allow you to pass in any filename you want, and control with indices (zero-based) you're trying to query. This solution assumes that the data is sorted as in the example you provided.

import csv

def printAdjacentNums(filename, firstIdx, secondIdx):
    with open(filename, 'rb') as csvfile:
        # handle header line
        header = next(csvfile)
        reader = csv.reader(csvfile)
        current_val = ''
        current_adj = []
        # dict of lists for lookback
        lookback = {}
        for row in reader:
            if current_val == '':
                current_val = row[firstIdx]
            if row[firstIdx] == current_val:
                current_adj.append(row[secondIdx])
            else:
                # check lookback
                for k, v in lookback.items():
                    if current_val in v:
                        current_adj.append(k)

                # print what we need to
                print current_val + ' is connected to'
                for i in current_adj:
                    print i

                # append current vals to lookback
                lookback[current_val] = current_adj

                # reassign
                current_val = row[firstIdx]
                current_adj = [row[secondIdx]]

     # print final set
    for k, v in lookback.items():
        if current_val in v:
            current_adj.append(k)
    print current_val + ' is connected to'
    for i in current_adj:
        print i

Then call it like so, based on your example:

printAdjacentNums('computation.csv', 0, 1)
Sign up to request clarification or add additional context in comments.

2 Comments

This is great! But (for example), it prints '2 is connected to: 4, 5' and I want to include that it is connected to 1 (the first row). Any ideas how to add this?
Added lookback logic. This logic does not handle duplicate rows, as that was not in scope of the original question. If you need that, use set instead of list as the datatype of current_adj. Hope this helps.
0

Starting from the end of your question:

c_pairs = [(1, 2), (1, 4), (1, 6), (1, 7), (1, 8), (2, 1), (2, 4), (2, 5)]

You presumably want to end up with something more like:

groups = {1: [2, 4, 6, 7, 8], 2: [1, 4, 5]}

There are many ways to obtain this.

A very fast way, if you know your data is sorted, is to use itertools.groupby, e.g.:

first_item = lambda (a, b): a
for key, items in itertools.groupby(c_pairs, first_item):
    print '%s is connected to' % key
    for (a, b) in items:
        print '  %s' % b

it is still probably the fastest way if your data is not sorted, simply sort it first:

c_pairs = sorted(c_pairs, key=first_item)

A more do-it-yourself solution is to use defaultdict or a standard dictionary to create a mapping from one to the other.

groups = collections.defaultdict(list)
for a, b in c_pairs:
    groups[a].append(b)

which is equivalent to without collections:

groups = {}
for a, b in c_pairs:
    groups.setdefault(a, [])  # many ways to do this as well
    groups[a].append(b)

2 Comments

Is there a way to add some type of lookback logic to itertools.groupby? As you said, I want to end up with groups = {1: [2, 4, 6, 7, 8], 2: [1, 4, 5]} But this code is giving me groups = {1: [2, 4, 6, 7, 8], 2: [4, 5]}
@AlexIndeglia which code? There are 3-4 options above, depending on how you read it. I see exactly the desired output above when running all versions of them except one. This one is where you ran the groupby on unsorted data without running the line called about above - and that code doesn't return a dictionary, it is used in a loop that prints. Are you sure you didn't forget to add the (2, 1) value?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.