5

Here is my Python code:

import csv

# Reading
ordersFile = open('orders.csv', 'rb')
ordersR = csv.reader(ordersFile, delimiter=',')

# Find order employeeID=5, shipCountry="Brazil"
print "Find order employeeID=5, shipCountry=\"Brazil\""
for order in ordersR:
    if order[2] == '5' and order[13] == 'Brazil':
        print order
# Find order employeeID=5
print "Find order employeeID=5"
for order in ordersR:
    if order[2] == '5':
        print order
ordersFile.close()

I can print something of "# Find order employeeID=5, shipCountry="Brazil"", but I got nothing for # Find order employeeID=5. I was thinking of how to reading(selecting) rows in the same csv files more than one time.

2
  • 1
    You cannot read the same open file twice. Either rewind it before reading for the second time, or close and open again. Better yet, count everything you need when you read the file for the first time. Commented Sep 6, 2017 at 20:12
  • 2
    Have you tried ordersFile.seek(0)? Commented Sep 6, 2017 at 20:18

6 Answers 6

10

You're just reading right through your CSV file, but if you want to work on the data in multiple passes, you should read the contents into a variable. Then you don't have to re-read the file every time you need to do stuff with it.

import csv

# Read order rows into our list
# Here I use a context manager so that the file is automatically
# closed upon exit
with open('orders.csv') as orders_file:
    reader = csv.reader(orders_file, delimiter=',')
    orders = list(reader)

# Find order employeeID=5, shipCountry="Brazil"
print "Find order employeeID=5, shipCountry=\"Brazil\""
for order in orders:
    if order[2] == '5' and order[13] == 'Brazil':
        print order

# Find order employeeID=5
print "Find order employeeID=5"
for order in orders:
    if order[2] == '5':
        print order

If your CSV file is too huge to fit into memory (or you don't want to read it all into memory for some reason), then you'll need a different approach. If you need that, please leave a comment.

Sign up to request clarification or add additional context in comments.

3 Comments

Isn't for row in reader:orders.extend(row) just orders=list(reader)?
@birryree Thanks, and could you share a different approach as well?
@user6142261 - would you say all your stuff can be done in one pass (if so, other answers have that covered)? Do you not want to read everything into memory? Are you open to using third-party libs? Do you have huge CSV files?
4

It's better to read through files once because I/O is likely to be the slowest part of your program.

If you need to re-read the file, you can either close it and re-open it, or seek() to the beginning, i.e. add ordersFile.seek(0) between your loops.

Comments

3

What you can do is simply convert the reader object result into a list :

with open('orders.csv', 'rb') as ordersFile:
    ordersR = list(csv.reader(ordersFile, delimiter=','))

The reader object is like a generator, once you have iterate the values, you cannot begin a second loop to read the values again.

Comments

1

if you do not want to store all your data in a list, this is a pure generator-based approach to iterate over your csv file twice. using itertools.tee:

with open('orders.csv', 'r') as file:
    rows0, rows1 = tee(reader(file, delimiter=','))

    for row in rows0:
        print(row)  # search for something...

    print()

    for row in rows1:
        print(row)  # search for a different thing...

Comments

0

This a good case for using the pandas module (you need to install it: pip install pandas)

After that, you just read the file once, and perform any type of fitering easily

for instance, to read and filter the file more that once, follow this example:

import pandas as pd 

# read csv into a dataframe 
df = pd.read_csv('orders.csv', delimiter=',') 

# get the data that has employeeID == 5
df1 = df[df["employeeID"] == 5]
print(df1) 

# get the data that has employeeID == 5 and  shipCountry=\"Brazil\"

df2 = df[(df["employeeID"] == 5)& (df["shipCountry"] == "Brazil")]
print(df2) 

Comments

0

As @Nick T mentioned above, I/O is considered expensive comparing to RAM access, so if you need to iterate over your file more than once, it is better to save it to a variable.

You also can combine multiple conditions in a single for loop, so it performs faster (single iteration):

with open('orders.csv', 'rb') as ordersFile:
    orders = list(csv.reader(ordersFile, delimiter=','))

# Find order employeeID=5, shipCountry="Brazil"
emp = []
country = []
for order in orders:
    if order[2] == '5':
        if order[13] == 'Brazil':
            country.append(order)
        else:
            emp.append(order)

 print 'emp id=5 and shippingcountry=Brazil: {}'.format(country)
 print 'emp id=5: {}'.format(emp)

Note that this isn't scalable, you probably don't want to add any more if logic in this block as it becomes not readable

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.