How to replace elements in Pandas Dataframes with list values based on specific conditions?

Question

I have a CSV file which contains 2 columns, Query and Description. This is the example description of the file:-

| Query                                        | Description |
| --------                                     | -------------- |
| What is the type of \<mach-name> machine>    |  \<mach-name> is ...       |
| What is the use of \<mach-name> machine>     |  The use of \<mach-name> is ...         |
| How long it takes to rain in \<state-name>   | It rains for ... hours in \<state-name>          |
| What is the best restaurant in \<state-name> | \<state-name>'s best food is in ...         |
|
...
etc.

Each of the query column and description column has unique strings like these. Suppose the CSV file is read via Pandas into a dataframe df. The goal is to replace the \<> type elements such as \<mach-name> etc. based on specific conditions.

These replacements need to be made by replacing the tags <> with the corresponding list elements.

mach_name = ["Drilling", "ABC", XYZ".... etc.]
state_name = ["New York", "London", "Delhi"... etc.]

Example : if(\<mach-name>) appears in the Query and Description Columns for any row, replace the tags by corresponding elements in the mach_name list. So, e.g. if the mach_name list has 10 elements, more such sentences need to be appended to the dataframe df. The expected output would be like this:

| Query                                   | Description |
| --------                                | -------------- |
| What is the type of Drilling machine.   |  Drilling is ...        |
| What is the type of ABC machine.        |  ABC is ...        |
| What is the type of XYZ machine.        |  XYZ is ...      |
| What is the use of Drilling machine     |  The use of Drilling is ...        |
| What is the use of ABC machine          |  The use of ABC is ...       |
| What is the use of XYZ machine.         |  The use of XYZ is ...       |
| How long it takes to rain in New York   | It rains for ... hours in New York          |
| How long it takes to rain in London     | It rains for ... hours in London          |
| How long it takes to rain in Delhi      | It rains for ... hours in Delhi          |

| What is the best restaurant in New York | New York's best food is in ...         |
| What is the best restaurant in London   | London's best food is in ...         |
| What is the best restaurant in Delhi    |Delhi's best food is in ...         |
|

.... etc.

I was hoping to perform a simple Python replacement using str.replace() for instance, but it would potentially involve a for loop for iterating over the Pandas dataframe, and SO answers recommend not iterating over the dataframe but I couldn't find a clear way to replace values based on such conditions while also appending new rows based on the list elements. Any help/guidance is appreciated. Thanks.

Can it be assumed that there will only be one wildcard (\ <match-name> or \ <state-name>) per phrase? — Ángel Igualada
– Ángel Igualada, Commented Feb 11, 2021 at 13:18
Could you just clarify how you are selecting out the elements within the list? For instance, in your first example, based on what condition is it "Drilling", and not "XYZ"? — RDoc
– RDoc, Commented Feb 11, 2021 at 13:19
@RDoc the idea is that all elements in the corresponding list would be covered. So there would be as many new strings/sentences appended to the dataframe to make sure that each element like Drilling, XYZ etc. come in individual sentences. — JChat
– JChat, Commented Feb 11, 2021 at 14:12
@ÁngelIgualada yes, there will only be one wild card per phrase. — JChat
– JChat, Commented Feb 11, 2021 at 14:12
So it doesn't matter which order they come in? You just want each of them to appear once? — RDoc
– RDoc, Commented Feb 11, 2021 at 14:13

Ángel Igualada · Accepted Answer · 2021-02-11 18:59:18Z

1

This will be easier if you read the raw csv, process it and then convert the result to pandas dataframe, but if you need to read the dataframe before, this could be an option:

data=[ {"query": "What is the type of \<mach-name> machine>", "description": "\<mach-name> is ..."},
      {"query": "What is the use of \<mach-name> machine>", "description": "The use of \<mach-name> is ..."},
      {"query": "How long it takes to rain in \<state-name>", "description": "It rains for ... hours in \<state-name>"}]
      
df = pd.DataFrame(data)

#mark rows that should that satisfy the conditions
df["replace_mach"] = df['query'].str.contains('\<mach-name>') &\ 
                     df['description'].str.contains('\<mach-name>')
df["replace_state"] = df['query'].str.contains('\<state-name>') &\ 
                      df['description'].str.contains('\<state-name>')


dfs_list = []
mach_name = ["Drilling", "ABC", "XYZ"]
state_name = ["New York", "London", "Delhi"]


for n in mach_name:
    aux = df[df["replace_mach"]].copy()
    aux["query"] = aux["query"].str.replace(r"\\<mach-name>",n)
    aux["description"] = aux["description"].str.replace(r"\\<mach-name>",n)
    dfs_list.append(aux)
    
for n in state_name:
    aux = df[df["replace_state"]].copy()
    aux["query"] = aux["query"].str.replace(r"\\<state-name>",n)
    aux["description"] = aux["description"].str.replace(r"\\<state-name>",n)
    dfs_list.append(aux)
    
# add records without wild cards to dataframe
dfs_list.append(df[~((df["replace_mach"])|(df["replace_state"]))]

replaced_df = pd.concat(dfs_list)
replaced_df

edited Feb 11, 2021 at 18:59

answered Feb 11, 2021 at 14:33

Ángel Igualada

8919 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

JChat Over a year ago

Thanks for the very helpful and easy to understand answer @Angel. I notice that in case there are sentences wherein their is no wildcards in the original dataframe, they are not present in the replaced_df. Is there a way to make sure that they are also present in the new dataframe? Thanks!

Ángel Igualada Over a year ago

I have added one line to solve this. Check if works correctly.

JChat Over a year ago

thank you! I think you have missed out a closing bracket ) on that line, and on adding it it worked perfectly. This worked - dfs_list.append(df[~((df["replace_mach"])|(df["replace_state"]))]) Thanks again!

Collectives™ on Stack Overflow

How to replace elements in Pandas Dataframes with list values based on specific conditions?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related