2

I'm a Python beginner. I had inspired by some Python courses. This is the example CSV file below.

Name Location Number
Andrew Platt Andrew A B C 100
Steven Thunder Andrew A B C 50
Jeff England Steven A B C 30
Andrew England Jeff A B C 30

I want to get a result like that

['Andrew', 180
'Platt', 100
'Steven', 50
'Jeff', 60
'England', 60
'Andrew Platt', 100
'Platt Andrew', 100
'Steven Thunder', 50
'Thunder Andrew', 50
........]

Logic:

  1. One-word name, e.g. 'Andrew', as it shows rows 1, 2 and 4, so the result is 180 (100+50+30)
  2. Two-word name, e.g. 'Andrew Platt', as it shows row 1 only, so the result is 100
  3. Export result to a new CSV file
import csv
#from itertools import chain

#find one-word
filename=open('sample.csv', 'r')
file = csv.DictReader(filename)
one_word=[]
for col in file:
    one_word.append(col['Name'].split()) #find one-word
print(one_word)
#list(chain.from_iterable(one_word)) #this is another code I learned

#get result
#find two-word
#get result
#combine
#sorted by value
#export to a new CSV file

My problem is how to get value, i.e. 180..., which means I need to match the word, then get 'Number' and sum them all?

Note: the location is useless, it's just a coding practice.

Updated: Maybe make 2 lists, i.e. one-word and two-word, then zip them

2 Answers 2

1

Looking at your expected result, I'm not sure how you get:

'Andrew Platt', 100
'Platt Andrew', 50

I see "Andrew Platt" and "Platt Andrew" in the first row, but both two-word combos should have the same value of 100, yes?

import csv
from collections import Counter
from itertools import combinations
from pprint import pprint

one_words = Counter()
two_words = Counter()

with open("input.csv", newline="") as f:
    reader = csv.DictReader(f)
    for row in reader:
        items = row["Name"].split(" ")

        # Unique one-word
        for item in set(items):
            one_words[item] += int(row["Number"])

        for two_word in combinations(items, 2):
            # Skip combos like [Andrew Andrew]
            if len(set(two_word)) == 1:
                continue

            print(f"row is {type(row)}")
            print(f"two_word is {type(two_word)}")
            print(f"two_words is {type(two_words)}")

            two_words[" ".join(two_word)] += int(row["Number"])


pprint(one_words)
pprint(two_words)

I got:

Counter({'Andrew': 180,
         'Platt': 100,
         'Steven': 80,
         'England': 60,
         'Jeff': 60,
         'Thunder': 50})
Counter({'Andrew Platt': 100,
         'Platt Andrew': 100,
         'Steven Thunder': 50,
         'Steven Andrew': 50,
         'Thunder Andrew': 50,
         'Jeff England': 30,
         'Jeff Steven': 30,
         'England Steven': 30,
         'Andrew England': 30,
         'Andrew Jeff': 30,
         'England Jeff': 30})

My debug-print statements print:

row is <class 'dict'>
two_word is <class 'tuple'>
two_words is <class 'collections.Counter'>
Sign up to request clarification or add additional context in comments.

11 Comments

'Platt Andrew', 100. This is typo, my wrong, corrected
I tried running it in another data set. It shows tuple indices must be integers or slices, not str Data type: Name: object Number: int64
@Peter I bet that’s because you’re trying to access a column by name, but you’re using the regular reader (not DictReader) which means you need to access columns by their 0-based position. Take a look, and if thst isn’t it, please include the line number of the exception.
I got an error message. string indices must be integers it's says two_word[" ".]........
In my code, two_word is a two-item sequence. two_words (with an “s”) is the Counter that can accessed by key (like a dict). But… I’m not sure what’s going on because that error message says two_word (no “s”) is a string… did you redefine that variable?
|
0

You need to get the unique names, and find combinations of two names. Then you can find if each name (1 or 2 words) is included in the first column.

import pandas as pd
import numpy as np
import itertools
#this is your data
df = pd.DataFrame([['Andrew Platt Andrew', 'Steven Thunder Andrew', 'Jeff England Steven',
              'Andrew England Jeff'], [100,50,30,30]] ).transpose()
df.columns = ['names','x']

#get the unique names that appear in the columns
names = df.names.apply(lambda x : x.split(' '))
one_words = np.unique(names.sum())

#get all combinations of two names
two_words = [a+' '+b for a,b in itertools.combinations(one_words, 2)]


#fill the dictionnaries with the values 
d_1 = {w : df.loc[df.names.str.contains(w),'x'].sum() for w in one_words}
d_2 = {w : df.loc[df.names.str.contains(w),'x'].sum() for w in two_words}

d = d_1 | d_2 #merge the disctionnaries

The output :

{'Andrew': 180,
 'England': 60,
 'Jeff': 60,
 'Platt': 100,
 'Steven': 80,
 'Thunder': 50,
 'Andrew England': 30,
 'Andrew Jeff': 0,
 'Andrew Platt': 100,
 'Andrew Steven': 0,
 'Andrew Thunder': 0,
 'England Jeff': 30,
 'England Platt': 0,
 'England Steven': 30,
 'England Thunder': 0,
 'Jeff Platt': 0,
 'Jeff Steven': 0,
 'Jeff Thunder': 0,
 'Platt Steven': 0,
 'Platt Thunder': 0,
 'Steven Thunder': 50}

4 Comments

Thanks for your answer, but what if names is float, how to do modify in this case.
names cannot be float I mean if it is a normal name:)
I tried with another data set. But it shows error: nothing to repeat at position 0
A quick fix is to use names = df.names.apply(lambda x : str(x).split(' ')), But feel free to edit the question with your actual data if this does not solve it !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.