Python from csv file analyze data in columns and cells

Question

I am trying to create a code for the following data:

I have imported the data using the code:

import csv
import itertools
import pandas as pd

input_file="computation.csv"
cmd=pd.read_csv(input_file)
subset = cmd[['Carbon A', 'Carbon B']]
carbon_pairs = [tuple(y) for y in subset.values]
c_pairs = carbon_pairs

I want to create a code that has the output:

1 is connected to
  2
  4
  6
  7 
  8
2 is connected to
  1
  4
  5

Note that for 'carbon' 2, I would like it to repeat that it is connected to carbon 1. I was thinking that some permutation would be able to show this, but I am very unsure where to start. Basically, the code needs to output:

for every cell with the same value, print adjacent cell

I would recommend simplifying your question, and putting the backstory below: "Given a list of (a, b) pairs like [(1, 2), (1, 4), (1, 6), (1, 7), (1, 8), (2, 1), (2, 4), (2, 5)], how can I find all b values associated with each a value?" — Cireo
– Cireo, Commented Mar 30, 2017 at 20:35

Paul Back · Accepted Answer · 2017-03-30 22:39:10Z

1

You can get your desired output without the pandas dependency with the following function (Python 2), which will allow you to pass in any filename you want, and control with indices (zero-based) you're trying to query. This solution assumes that the data is sorted as in the example you provided.

import csv

def printAdjacentNums(filename, firstIdx, secondIdx):
    with open(filename, 'rb') as csvfile:
        # handle header line
        header = next(csvfile)
        reader = csv.reader(csvfile)
        current_val = ''
        current_adj = []
        # dict of lists for lookback
        lookback = {}
        for row in reader:
            if current_val == '':
                current_val = row[firstIdx]
            if row[firstIdx] == current_val:
                current_adj.append(row[secondIdx])
            else:
                # check lookback
                for k, v in lookback.items():
                    if current_val in v:
                        current_adj.append(k)

                # print what we need to
                print current_val + ' is connected to'
                for i in current_adj:
                    print i

                # append current vals to lookback
                lookback[current_val] = current_adj

                # reassign
                current_val = row[firstIdx]
                current_adj = [row[secondIdx]]

     # print final set
    for k, v in lookback.items():
        if current_val in v:
            current_adj.append(k)
    print current_val + ' is connected to'
    for i in current_adj:
        print i

Then call it like so, based on your example:

printAdjacentNums('computation.csv', 0, 1)

edited Mar 30, 2017 at 22:39

answered Mar 30, 2017 at 21:11

Paul Back

1,31916 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Alex Indeglia Over a year ago

This is great! But (for example), it prints '2 is connected to: 4, 5' and I want to include that it is connected to 1 (the first row). Any ideas how to add this?

Paul Back Over a year ago

Added lookback logic. This logic does not handle duplicate rows, as that was not in scope of the original question. If you need that, use set instead of list as the datatype of current_adj. Hope this helps.

Cireo · Accepted Answer · 2017-03-30 20:46:07Z

0

Starting from the end of your question:

c_pairs = [(1, 2), (1, 4), (1, 6), (1, 7), (1, 8), (2, 1), (2, 4), (2, 5)]

You presumably want to end up with something more like:

groups = {1: [2, 4, 6, 7, 8], 2: [1, 4, 5]}

There are many ways to obtain this.

A very fast way, if you know your data is sorted, is to use itertools.groupby, e.g.:

first_item = lambda (a, b): a
for key, items in itertools.groupby(c_pairs, first_item):
    print '%s is connected to' % key
    for (a, b) in items:
        print '  %s' % b

it is still probably the fastest way if your data is not sorted, simply sort it first:

c_pairs = sorted(c_pairs, key=first_item)

A more do-it-yourself solution is to use defaultdict or a standard dictionary to create a mapping from one to the other.

groups = collections.defaultdict(list)
for a, b in c_pairs:
    groups[a].append(b)

which is equivalent to without collections:

groups = {}
for a, b in c_pairs:
    groups.setdefault(a, [])  # many ways to do this as well
    groups[a].append(b)

answered Mar 30, 2017 at 20:46

Cireo

4,4572 gold badges22 silver badges27 bronze badges

2 Comments

Alex Indeglia Over a year ago

Is there a way to add some type of lookback logic to itertools.groupby? As you said, I want to end up with groups = {1: [2, 4, 6, 7, 8], 2: [1, 4, 5]} But this code is giving me groups = {1: [2, 4, 6, 7, 8], 2: [4, 5]}

Cireo Over a year ago

@AlexIndeglia which code? There are 3-4 options above, depending on how you read it. I see exactly the desired output above when running all versions of them except one. This one is where you ran the groupby on unsorted data without running the line called about above - and that code doesn't return a dictionary, it is used in a loop that prints. Are you sure you didn't forget to add the (2, 1) value?

Collectives™ on Stack Overflow

Python from csv file analyze data in columns and cells

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related