Filtering empty elements in a nested list in pandas dataframe

Question

I have a list inside a pandas dataframe and I want to filter it. For example, I have a dataframe like this:

{
    "examples": [
 
        {
            "website": "info",
            "df": [
                {
                    "Question": "What?",
                    "Answers": []
                },
                {
                    "Question": "how?",
                    "Answers": []
                },
                {
                    "Question": "Why?",
                    "Answers": []
                }
            ],
            "whitelisted_url": true,
            "exResponse": {
                "pb_sentence": "",
                "solution_sentence": "",
                "why_sentence": ""
            }
        },            
         {
            "website": "info2",
            "df": [
                {
                    "Question": "What?",
                    "Answers": ["example answer1"]
                },
                {
                    "Question": "how?",
                    "Answers": ["example answer1"]
                },
                {
                    "Question": "Why?",
                    "Answers": []
                }
            ],
            "whitelisted_url": true,
            "exResponse": {
                "pb_sentence": "",
            }
        },

    ]
}

my filter function:

def filter(data, name):
   resp = pd.concat([pd.DataFrame(data),
                         pd.json_normalize(data['examples'])],
                        axis=1)

    resp = pd.concat([pd.DataFrame(resp),
                         pd.json_normalize(resp['df'])],
                        axis=1)

    resp['exResponse.pb_sentence'].replace(
        '', np.nan, inplace=True)
    resp.dropna(
        subset=['exResponse.pb_sentence'], inplace=True)
    

    resp.drop(resp[resp['df.Answers'].apply(len) == 0].index, inplace=True)

I want to remove the empty 'answers' elements in this dataframe. I have already filtered the empty 'problem_summary' elements using the following code:

    resp['exResponse.pb_sentence'].replace(
        '', np.nan, inplace=True)
    resp.dropna(
        subset=['exResponse.pb_sentence'], inplace=True)

How can I do the same for the 'answers' elements?

I don't actually expect a specific output. the following part of my code It throws the error "AttributeError: 'list' object has no attribute 'keys'". I think this is due to empty answers arrays, so I want to remove these parts.

 resp.rename(
        columns={0: 'Challenge', 1: 'Solution', 2: 'Importance'}, inplace=True)
    # challenge deserializing
    resp = pd.concat([pd.DataFrame(df_resp),
                         pd.json_normalize(resp['Challenge'])],
                        axis=1)
    resp = pd.concat([pd.DataFrame(resp),
                         pd.json_normalize(resp['Answers'])],
                        axis=1)

error line:

     29 resp = pd.concat([pd.DataFrame(resp),
---> 30                      pd.json_normalize(resp['Answers'])],
     31                     axis=1)

I've updated my question to respond to your request. Thanks @Psidom — Serkan Gün
– Serkan Gün, Commented Jan 1, 2023 at 18:45

Jason Baker · Accepted Answer · 2023-01-01 17:31:39Z

1

If I'm understanding based on the sample data you will end up with an empty dataframe?

df = pd.json_normalize(
    data=data["examples"],
    meta=["website", "whitelisted_url", "exResponse"],
    record_path=["df"]
)
df = df.join(pd.DataFrame(df.pop("exResponse").tolist()))
df = df[df["Answers"].map(lambda d: len(d)) > 0]
df = df.replace("", np.nan).dropna(subset=["pb_sentence"], how="all")

answered Jan 1, 2023 at 17:31

Jason Baker

3,7262 gold badges14 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Serkan Gün Over a year ago

I edited my question and explained why I want to remove empty Answers arrays. I would be glad if you could review it again. Thanks.

Collectives™ on Stack Overflow

Filtering empty elements in a nested list in pandas dataframe

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related