0

I am reading in a CSV file with the general schema of

  ,abv,ibu,id,name,style,brewery_id,ounces
14,0.061,60.0,1979,Bitter Bitch,American Pale Ale (APA),177,12.0
0 , 0.05,, 1436, Pub Beer, American Pale Lager, 408, 12.0

I am running into problems where fields are not existing such as in object 0 where it is lacking an IBU. I would like to be able to insert a value such as 0.0 that would work as a float for values that require floats and an empty string for ones that require strings.

My code is along the lines of

import csv
import numpy as np

def dataset(path, filter_field, filter_value):
  with open(path, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
      if filter_field:
        for row in filter(lambda row: row[filter_field]==filter_value, reader):
          yield row

def main(path):
      data = [(row["ibu"], float(row["ibu"])) for row in dataset(path, "style", "American Pale Lager")]

As of right now my code would throw an error sine there are empty values in the "ibu" column for object 0.

How should one go about solving this problem?

3 Answers 3

1

You can do the following: add a default dictionary input that you can use for missing values and also to update upon certain conditions such as when ibu is empty

this is your implementation changed to correct for what you need. If I were you I would use pandas ...

import csv, copy

def dataset(path, filter_field, filter_value, default={'brewery_id':-1, 'style': 'unkown style', '   ': -1, 'name': 'unkown name', 'abi':0.0, 'id': -1, 'ounces':-1, 'ibu':0.0}):
with open(path, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        if row is None:
            break
        if row[filter_field].strip() != filter_value:
            continue
        default_row = copy.copy(default)
        default_row.update(row)
        # you might want to add conditions
        if default_row["ibu"] == "":
            default_row["ibu"] = default["ibu"]
        yield default_row

data = [(row["ibu"], float(row["ibu"])) for row in dataset('test.csv', "style", "American Pale Lager")]

print data

>> [(0.0, 0.0)]
Sign up to request clarification or add additional context in comments.

Comments

1

Why don't you use

import pandas as pd

df = pd.read_csv(data_file)

The following is the result:

In [13]: df
Out[13]:
   Unnamed: 0    abv   ibu    id          name                    style  \
0          14  0.061  60.0  1979  Bitter Bitch  American Pale Ale (APA)
1           0  0.050   NaN  1436      Pub Beer      American Pale Lager

   brewery_id  ounces
0         177    12.0
1         408    12.0

Comments

1

Simulating your file with a text string:

In [48]: txt=b"""  ,abv,ibu,id,name,style,brewery_id,ounces
    ...: 14,0.061,60.0,1979,Bitter Bitch,American Pale Ale (APA),177,12.0
    ...: 0 , 0.05,, 1436, Pub Beer, American Pale Lager, 408, 12.0
    ...: """

I can load it with numpy genfromtxt.

In [49]: data=np.genfromtxt(txt.splitlines(),delimiter=',',dtype=None,skip_heade
    ...: r=1,filling_values=0)

In [50]: data
Out[50]: 
array([ (14,  0.061,  60., 1979, b'Bitter Bitch', b'American Pale Ale (APA)', 177,  12.),
       ( 0,  0.05 ,   0., 1436, b' Pub Beer', b' American Pale Lager', 408,  12.)], 
      dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<i4'), ('f4', 'S12'), ('f5', 'S23'), ('f6', '<i4'), ('f7', '<f8')])
In [51]: 

I had to skip the header line because it is incomplete (a blank for the 1st field). The result is a structured array - a mix of ints, floats and strings (bytestrings in Py3).

After correcting the header line, and using names=True, I get

array([ (14,  0.061,  60., 1979, b'Bitter Bitch', b'American Pale Ale (APA)', 177,  12.),
       ( 0,  0.05 ,   0., 1436, b' Pub Beer', b' American Pale Lager', 408,  12.)], 
      dtype=[('f0', '<i4'), ('abv', '<f8'), ('ibu', '<f8'), ('id', '<i4'), ('name', 'S12'), ('style', 'S23'), ('brewery_id', '<i4'), ('ounces', '<f8')])

genfromtxt is the most powerful csv reader in numpy. See it's docs for more parameters. The pandas reader is faster and more flexible - but of course produces a data frame, not array.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.