2

I have a list inside a pandas dataframe and I want to filter it. For example, I have a dataframe like this:

{
    "examples": [
 
        {
            "website": "info",
            "df": [
                {
                    "Question": "What?",
                    "Answers": []
                },
                {
                    "Question": "how?",
                    "Answers": []
                },
                {
                    "Question": "Why?",
                    "Answers": []
                }
            ],
            "whitelisted_url": true,
            "exResponse": {
                "pb_sentence": "",
                "solution_sentence": "",
                "why_sentence": ""
            }
        },            
         {
            "website": "info2",
            "df": [
                {
                    "Question": "What?",
                    "Answers": ["example answer1"]
                },
                {
                    "Question": "how?",
                    "Answers": ["example answer1"]
                },
                {
                    "Question": "Why?",
                    "Answers": []
                }
            ],
            "whitelisted_url": true,
            "exResponse": {
                "pb_sentence": "",
            }
        },

    ]
}

my filter function:

def filter(data, name):
   resp = pd.concat([pd.DataFrame(data),
                         pd.json_normalize(data['examples'])],
                        axis=1)

    resp = pd.concat([pd.DataFrame(resp),
                         pd.json_normalize(resp['df'])],
                        axis=1)

    resp['exResponse.pb_sentence'].replace(
        '', np.nan, inplace=True)
    resp.dropna(
        subset=['exResponse.pb_sentence'], inplace=True)
    

    resp.drop(resp[resp['df.Answers'].apply(len) == 0].index, inplace=True)

I want to remove the empty 'answers' elements in this dataframe. I have already filtered the empty 'problem_summary' elements using the following code:

    resp['exResponse.pb_sentence'].replace(
        '', np.nan, inplace=True)
    resp.dropna(
        subset=['exResponse.pb_sentence'], inplace=True)

How can I do the same for the 'answers' elements?

I don't actually expect a specific output. the following part of my code It throws the error "AttributeError: 'list' object has no attribute 'keys'". I think this is due to empty answers arrays, so I want to remove these parts.

 resp.rename(
        columns={0: 'Challenge', 1: 'Solution', 2: 'Importance'}, inplace=True)
    # challenge deserializing
    resp = pd.concat([pd.DataFrame(df_resp),
                         pd.json_normalize(resp['Challenge'])],
                        axis=1)
    resp = pd.concat([pd.DataFrame(resp),
                         pd.json_normalize(resp['Answers'])],
                        axis=1)

error line:

     29 resp = pd.concat([pd.DataFrame(resp),
---> 30                      pd.json_normalize(resp['Answers'])],
     31                     axis=1)
2
  • 1
    Can you post your expected output ? Commented Jan 1, 2023 at 17:39
  • 1
    I've updated my question to respond to your request. Thanks @Psidom Commented Jan 1, 2023 at 18:45

1 Answer 1

1

If I'm understanding based on the sample data you will end up with an empty dataframe?

df = pd.json_normalize(
    data=data["examples"],
    meta=["website", "whitelisted_url", "exResponse"],
    record_path=["df"]
)
df = df.join(pd.DataFrame(df.pop("exResponse").tolist()))
df = df[df["Answers"].map(lambda d: len(d)) > 0]
df = df.replace("", np.nan).dropna(subset=["pb_sentence"], how="all")
Sign up to request clarification or add additional context in comments.

1 Comment

I edited my question and explained why I want to remove empty Answers arrays. I would be glad if you could review it again. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.