1

I have been tasked with filtering tags from a contact list in order to form calling lists

The original CSV has a column listed "Tags" that had no more than 6 values, so I split them into 6 different columns. There are 75 unique tags among the 6 columns, but the various tags do not appear in specific columns, the order in which they appear in the columns is random.

However the person I'm working with is asking for each single contact to be placed into a larger grouping while still preserving the original tags. So I decided on creating a 7th tag based on the conditions of the individual tags in the 6 columns. He doesn't care so much that it's an exact match to the columns, only that each person with a tag is placed in a single list for calling.

I have been provided with basically a key-value pair for the tags so I know which calling list they belong in.

Normally I would have simply done a replace with the key-value pair to limit the tags and go from there, but I have to preserve the original tags. Additionally I've dealt with numbers, and I can bin numbers on something such as age or income bracket. But I'm at a loss of how to string match other columns in the same row. Please let me know if I should be searching different terms, anything helps.

# the key-value pairs
'work' : list1
'hobby' : list2
'family' : list3
'conference' : list4
'extended family' : list3
'high school' : list5
'college' : list5
# sample dataframe
data = [[1,'family','extended family','','','',''], [2,'college','hobby','','','',''], 
[3,'college','family','work','','',''], [4,'conference','','','','',''],
[5,'hobby','','','','',''], [6,'college','','','','',''],
[7,'college','work','family','high school','conference','hobby']]  
df = pd.DataFrame(data, columns = ['contactID', 'tag1','tag2','tag3','tag4','tag5','tag6'])   
df 

Here's the sort of output that I'm trying to get

contactID   tag1        tag2                tag3    tag4            tag5        tag6    call_list
001         family      extended family                                                 list3
002         college     hobby                                                           list2
003         college     family              work                                        list1
004         conference                                                                  list4
005         hobby                                                                       list2
006         college                                                                     list5
007         college     work                family  high school     conference  hobby   list2

1 Answer 1

1

If want last matched values per tags use Series.map with DataFrame.stack and DataFrame.unstack, then forard filling missing values and select last column:

df['call list'] = df.iloc[:, 1:].stack().map(d).unstack().ffill(axis=1).iloc[:, -1]
print (df)
   contactID        tag1             tag2    tag3         tag4        tag5  \
0          1      family  extended family                                    
1          2     college            hobby                                    
2          3     college           family    work                            
3          4  conference                                                     
4          5       hobby                                                     
5          6     college                                                     
6          7     college             work  family  high school  conference   

    tag6 call list  
0            list3  
1            list2  
2            list1  
3            list4  
4            list2  
5            list5  
6  hobby     list2  

For all tags use apply with join anf filtering out missing values:

df['call list'] = (df.iloc[:, 1:].stack()
                                 .map(d)
                                 .unstack()
                                 .apply(lambda x: ','.join(y for y in x if y == y), axis=1))
print (df)
   contactID        tag1             tag2    tag3         tag4        tag5  \
0          1      family  extended family                                    
1          2     college            hobby                                    
2          3     college           family    work                            
3          4  conference                                                     
4          5       hobby                                                     
5          6     college                                                     
6          7     college             work  family  high school  conference   

    tag6                            call list  
0                                 list3,list3  
1                                 list5,list2  
2                           list5,list3,list1  
3                                       list4  
4                                       list2  
5                                       list5  
6  hobby  list5,list1,list3,list5,list4,list2  
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much, I went with the first solution because lambda functions scare me...I'm not as familiar with the stack functions used so I have some homework. The code works quickly and cleanly,.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.