0

Let's say we have following code in R, what would be it's equivalent Pandas data frame syntax/method in Python ?

network_tickets <- contains(comcast_data$CustomerComplaint, match = 'network', ignore.case = T)
internet_tickets <- contains(comcast_data$CustomerComplaint, match = 'internet', ignore.case = T)
billing_tickets <- contains(comcast_data$CustomerComplaint, match = 'bill', ignore.case = T)
email_tickets <- contains(comcast_data$CustomerComplaint, match = 'email', ignore.case = T)
charges_ticket <- contains(comcast_data$CustomerComplaint, match = 'charge', ignore.case = T)
    
comcast_data$ComplaintType[internet_tickets] <- "Internet"
comcast_data$ComplaintType[network_tickets] <- "Network"
comcast_data$ComplaintType[billing_tickets] <- "Billing"
comcast_data$ComplaintType[email_tickets] <- "Email"
comcast_data$ComplaintType[charges_ticket] <- "Charges"
    
comcast_data$ComplaintType[-c(internet_tickets, network_tickets, billing_tickets, c
                              harges_ticket, email_tickets)] <- "Others"

I could convert the first set of operation like below in Python:

network_tickets = df.ComplaintDescription.str.contains ('network', regex=True, case=False)

But, finding challenge to assign the variable network_tickets as value "Internet" into a new pandas dataframe column i.e. ComplaintType. In R, it seems you can do that in just one single line.

However, not sure how we could do this in Python in one single line of code, tried below ways but with errors:

a) df['ComplaintType'].apply(internet_tickets) = "Internet"
b) df['ComplaintType'] = df.apply(internet_tickets)
c) df['ComplaintType'] = internet_tickets.apply("Internet")

I think we could first create a new column in dataframe :

df['ComplaintType'] = internet_tickets

But not sure about next steps.

1 Answer 1

1

Use Series.str.contains with DataFrame.loc for set values by list:

df = pd.DataFrame(data = {"ComplaintDescription":["BiLLing is super","email","new"]})

L = [ "Internet","Network", "Billing", "Email", "Charges"]
for val in L:
    df.loc[df['ComplaintDescription'].str.contains(val, case=False), 'ComplaintType'] = val

df['ComplaintType'] = df['ComplaintType'].fillna('Others')
print (df)
  ComplaintDescription ComplaintType
0     BiLLing is super       Billing
1                email         Email
2                  new        Others

EDIT:

If need use values separately:

df.loc[df['ComplaintDescription'].str.contains('network', case=False), 'ComplaintType'] = "Internet"
Sign up to request clarification or add additional context in comments.

5 Comments

ComplaintType would be a new dataframe column based on values from variables like internet_tickets, service_tickets etc. where value == 'True' i.e. there was a string/expression match.
@ManiK - I got it, need new column
thanks, one twist though - the list L is just a category type. But , the regexp search could be on different words. For example- "Network" - the search criteria could be either network, netwrk, wifi, bandwidth etc. and then assign it to common category "Network". So, I think these two operations should be separated out rather than searching and listing on the same list. So, lets say if data is like: {"ComplaintDescription":["BiLLing is super","email","new", "bill", "netwrk", "old"]}; Then- we want output of ComplaintType as : Billing, Email, Others, Billing, Network, Others
@ManiK - I am confused, so contains(comcast_data$CustomerComplaint, match = 'network', ignore.case = T) match network, netwrk, wifi, bandwidth ? Or need different code like r ?
Hey, I think I got the answer.. all I need was to use df.loc [rows, columns] = value df.loc[internet_tickets,'ComplaintType'] = "Internet"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.