0

I have a pyspark dataframe which contains string json. Looks like below:

+---------------------------------------------------------------------------+
|col                                                                        | 
+---------------------------------------------------------------------------+
|{"fields":{"list1":[{"list2":[{"list3":[{"type":false}]}]}]}}            | 
+----------------------------------------------------------------------------+--

I wrote udfs to try to parse the json and then count the value that matches phone and return to a new column in df

def item_count(json,type):
    count=0
    for i in json.get("fields",{}).get("list1",[]):
        for j in i.get("list2",[]):
            for k in j.get("list3",[]):
                count+=k.get("type",None)==type
    return count

def item_phone_count(json):
    return item_count(json,False)

df2= df\
.withColumn('item_phone_count', (F.udf(lambda j: item_phone_count(json.loads(j)), t.StringType()))('col'))

But I got the error:

AttributeError: 'NoneType' object has no attribute 'get'

Any idea what's wrong?

7
  • It looks like one of your variables in item_count() is None, but there is no way to figure out which one from the information you've posted. Please post the full error traceback and an minimal reproducible example with enough information so that someone else can reproduce your error. Commented Dec 11, 2020 at 23:04
  • @craig you mean among i, j, k, one of them is none? Commented Dec 11, 2020 at 23:33
  • That is a possible cause of the error that you are seeing. Try printing them in the loop to see if one of them is None. Commented Dec 13, 2020 at 19:01
  • @Craig how can I print it since I am calling the udf from the pyspark dataframe? Commented Dec 13, 2020 at 19:42
  • @kihhfeue try to get a few entries from your dataframe and put them into the function manually and see what happens Commented Dec 13, 2020 at 19:53

1 Answer 1

1

Check for none and skip those entries:

def item_count(json,type):
    count = 0
    if (json is None) or (json.get("fields",{}) is None):
        return count  
   
    for i in json.get("fields",{}).get("list1",[]):
        if i is None:
            continue
        for j in i.get("list2",[]):
            if j is None:
                continue 
            for k in j.get("list3",[]):
                if k is None:
                    continue 
                count += k.get("type",None) == type
    return count
Sign up to request clarification or add additional context in comments.

2 Comments

the error is gone now but not sure why I still get 0 counts when I checked the original json and there is definitely value that matches the condition. Does json.load change the format of type?
I made some edit to the question. The type value is false

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.