2

I'm trying to extract some info of restaurants from a json data set, here are 2 samples, one a restaurant and one not

{"business_id": "vcNAWiLM4dR7D2nwwJ7nCA", "full_address": "4840 E Indian School Rd\nSte 101\nPhoenix, AZ 85018", "hours": {"Tuesday": {"close": "17:00", "open": "08:00"}, "Friday": {"close": "17:00", "open": "08:00"}, "Monday": {"close": "17:00", "open": "08:00"}, "Wednesday": {"close": "17:00", "open": "08:00"}, "Thursday": {"close": "17:00", "open": "08:00"}}, "open": true, "categories": ["Doctors", "Health & Medical"], "city": "Phoenix", "review_count": 9, "name": "Eric Goldberg, MD", "neighborhoods": [], "longitude": -111.98375799999999, "state": "AZ", "stars": 3.5, "latitude": 33.499313000000001, "attributes": {"By Appointment Only": true}, "type": "business"}
{"business_id": "mVHrayjG3uZ_RLHkLj-AMg", "full_address": "414 Hawkins Ave\nBraddock, PA 15104", "hours": {"Tuesday": {"close": "19:00", "open": "10:00"}, "Friday": {"close": "20:00", "open": "10:00"}, "Saturday": {"close": "16:00", "open": "10:00"}, "Thursday": {"close": "19:00", "open": "10:00"}, "Wednesday": {"close": "19:00", "open": "10:00"}}, "open": true, "categories": ["Bars", "American (New)", "Nightlife", "Lounges", "Restaurants"], "city": "Braddock", "review_count": 11, "name": "Emil's Lounge", "neighborhoods": [], "longitude": -79.866350699999998, "state": "PA", "stars": 4.5, "latitude": 40.408735, "attributes": {"Alcohol": "full_bar", "Noise Level": "average", "Has TV": true, "Attire": "casual", "Ambience": {"romantic": false, "intimate": false, "classy": false, "hipster": false, "divey": false, "touristy": false, "trendy": false, "upscale": false, "casual": false}, "Good for Kids": true, "Price Range": 1, "Good For Dancing": false, "Delivery": false, "Coat Check": false, "Smoking": "no", "Accepts Credit Cards": true, "Take-out": true, "Happy Hour": false, "Outdoor Seating": false, "Takes Reservations": false, "Waiter Service": true, "Wi-Fi": "no", "Caters": true, "Good For": {"dessert": false, "latenight": false, "lunch": false, "dinner": false, "breakfast": false, "brunch": false}, "Parking": {"garage": false, "street": false, "validated": false, "lot": false, "valet": false}, "Music": {"dj": false}, "Good For Groups": true}, "type": "business"}

When i run it prints both even though the category "Restaurants" doesn't exist in the first bit of data, can anyone explain why please?

for line in f:
    jd = json.loads(line)
    if jd['categories'] == 'Food' or 'Restaurants':
        print (jd['name'], jd['business_id'], jd['latitude'], jd['longitude'])

Here's the JSON data in a more readable format:

{
    "business_id": "vcNAWiLM4dR7D2nwwJ7nCA", 
    "full_address": "4840 E Indian School Rd\nSte 101\nPhoenix, AZ 85018", 
    "hours": {
        "Thursday": {
            "close": "17:00", 
            "open": "08:00"
        }, 
        "Tuesday": {
            "close": "17:00", 
            "open": "08:00"
        }, 
        "Friday": {
            "close": "17:00", 
            "open": "08:00"
        }, 
        "Wednesday": {
            "close": "17:00", 
            "open": "08:00"
        }, 
        "Monday": {
            "close": "17:00", 
            "open": "08:00"
        }
    }, 
    "open": true, 
    "categories": [
        "Doctors", 
        "Health & Medical"
    ], 
    "city": "Phoenix", 
    "review_count": 9, 
    "name": "Eric Goldberg, MD", 
    "neighborhoods": [], 
    "longitude": -111.98375799999999, 
    "state": "AZ", 
    "stars": 3.5, 
    "latitude": 33.499313000000001, 
    "attributes": {
        "By Appointment Only": true
    }, 
    "type": "business"
}
{
    "business_id": "mVHrayjG3uZ_RLHkLj-AMg", 
    "full_address": "414 Hawkins Ave\nBraddock, PA 15104", 
    "hours": {
        "Tuesday": {
            "close": "19:00", 
            "open": "10:00"
        }, 
        "Friday": {
            "close": "20:00", 
            "open": "10:00"
        }, 
        "Saturday": {
            "close": "16:00", 
            "open": "10:00"
        }, 
        "Thursday": {
            "close": "19:00", 
            "open": "10:00"
        }, 
        "Wednesday": {
            "close": "19:00", 
            "open": "10:00"
        }
    }, 
    "open": true, 
    "categories": [
        "Bars", 
        "American (New)", 
        "Nightlife", 
        "Lounges", 
        "Restaurants"
    ], 
    "city": "Braddock", 
    "review_count": 11, 
    "name": "Emil's Lounge", 
    "neighborhoods": [], 
    "longitude": -79.866350699999998, 
    "state": "PA", 
    "stars": 4.5, 
    "latitude": 40.408735, 
    "attributes": {
        "Alcohol": "full_bar", 
        "Noise Level": "average", 
        "Music": {
            "dj": false
        }, 
        "Attire": "casual", 
        "Ambience": {
            "touristy": false, 
            "hipster": false, 
            "romantic": false, 
            "divey": false, 
            "intimate": false, 
            "trendy": false, 
            "upscale": false, 
            "classy": false, 
            "casual": false
        }, 
        "Good for Kids": true, 
        "Price Range": 1, 
        "Good For Dancing": false, 
        "Delivery": false, 
        "Coat Check": false, 
        "Smoking": "no", 
        "Accepts Credit Cards": true, 
        "Take-out": true, 
        "Happy Hour": false, 
        "Outdoor Seating": false, 
        "Takes Reservations": false, 
        "Waiter Service": true, 
        "Wi-Fi": "no", 
        "Caters": true, 
        "Good For": {
            "dessert": false, 
            "latenight": false, 
            "lunch": false, 
            "dinner": false, 
            "brunch": false, 
            "breakfast": false
        }, 
        "Parking": {
            "garage": false, 
            "street": false, 
            "validated": false, 
            "lot": false, 
            "valet": false
        }, 
        "Has TV": true, 
        "Good For Groups": true
    }, 
    "type": "business"
}
1
  • As bruno desthuilliers & I mentioned in our answers, it's painful reading that JSON data. Next time, please post your data in a more readable form, preferably with irrelevant fields removed, so potential answerers can focus on your actual problem. For the benefit of future readers I'll add a formatted version of the data to this question, created using json.dumps(jd, indent=4), but please check it to make sure I haven't inadvertently introduced any errors. Commented Jun 1, 2015 at 11:44

3 Answers 3

6

This:

if jd['categories'] == 'Food' or 'Restaurants':

is parsed as:

if (jd['categories'] == 'Food') or 'Restaurants':

Since 'Restaurant' is a non-empty string, it always have a true value in a boolean context, so your test is really:

if (jd['categories'] == 'Food') or True:

which is an obvious tautology.

You want:

if jd['categories'] == 'Food' or jd['categories'] == 'Restaurants':

or more simply:

if jd['categories'] in  ('Food', 'Restaurants'):

Now in your case (BTW please take time to post a cleaned up, simplified and formatted json snippet next time), jd['categories'] is a list, so you cannot compare it wit a string - well you can but it will always eval to False - nor use the containment test as above, you have to check wether js['categories'] contains either of 'Food' or 'Restaurants':

if 'Food' in jd['categories'] or 'Restaurants' in jd['categories']:
Sign up to request clarification or add additional context in comments.

3 Comments

If i use if jd['categories'] == 'Food' or jd['categories'] == 'Restaurants': I get no output Could it be an issue if their is multiple categories? I need to check if it contains the restaurants category but it can also have others
@Ali_bean cf my edited answer. Your json snippet wasn't properly formatted so I failed to spot 'categories' was a list.
thanks a lot and apologies, i will be more careful with formatting next time
1

It's not exactly easy to test this from the data in the OP, but you need to change your test to something like this:

#Get category list from current dict
cat = jd['categories']
if 'Food' in cat or 'Restaurants' in cat:
    print(jd['name'], jd['business_id'], jd['latitude'], jd['longitude'])

Comments

0

Line # 3 doesn't seems optimized properly

for line in f:
    jd = json.loads(line)
    if jd['categories'] in ('Food', 'Restaurants'):
        print (jd['name'], jd['business_id'], jd['latitude'], jd['longitude'])

You may also think of encoding or escaping the string coming from json.loads() function as it will be more appropriate to compare the strings that way.

2 Comments

i was doing it line by line to figure stuff out as its a 1.5gb data set but thanks!
Could you please also look at pandas lib - pandas.pydata.org.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.