2

I have a Dstream of JSON messages of the form {"UserID": "Xxxx", "Count": 000}. I want to figure out the best way to parse it so that I can create a data frame.

What's the difference between 1 and 2 in this case:

  1. parsed = kafkaStream.map(lambda x: json.loads(x))
  2. parsed = kafkaStream.map(lambda x: json.loads(x[1])
1
  • 1
    Have you tried them? Commented Sep 30, 2015 at 20:47

2 Answers 2

3

This is KafkaStream specific question. You are receiving PAIR RDD from the Kafka DSstream. A pair rdd is two elements tuple (key, value). This is why you have to pick the 2nd element to retrieve the value. I would write

parsed = kafkaStream.map(lambda (key, value): json.loads(value))

In Python it's recommended to use _ for unused variable, but in this case I'd use key to remind me the lambda is receiving the pair RDD.

Sign up to request clarification or add additional context in comments.

Comments

1

When you do json.loads(x) the string (your message) is parsed into a dictionary, not sure what you're trying to do with json.loads(x[1]), but if you want the value of the first key of the dictionary you should go for json.loads(x)["UserId"]. Not sure if this is what you haven't understood.

Example:

import json

raw = """{
    "UserId": "Xxx", 
    "Count": "0000"
}"""

print(type(raw))
print(raw)

parsed = json.loads(raw)

print(type(parsed))
print(parsed)

parsed_partial = json.loads(raw)["UserId"]

print(type(parsed_partial))
print(parsed_partial)

Output:

<class 'str'>
{
    "UserId": "Xxx", 
    "Count": "0000"
}
<class 'dict'>
{'UserId': 'Xxx', 'Count': '0000'}
Xxx

For understanding map() read this.

1 Comment

Thank you jlnabais. I think my question was not properly worded/tagged since i am very new to both stack overflow, python, kafka and spark streaming. I wanted to understand how a json message consumed by spark streaming from kafka as a Dstream was being parsed. In this case will json.loads(x[0]) be the offset and json.loads(x[1]) be the message containing {"UserID": "Xxxx", "Count": 000}

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.