0

I have a CSV file with header and I want to retrieve all the rows from CSV that matches a dictionary key-value. Note that dictionary can contain any number of orbitary key and value to match with.

Here is the code I have written to solve this, is there any other better way to approach this (other than pandas dataframe)?

Better way mean - removal of unnecessary variable if any? better data structure, better library, reducing space/time complexity than below solution

options = {'h1': 'v1', 'h2': 'v2'}
output = []
with open("data.csv", "rt") as csvfile:
    data = csv.reader(csvfile, delimiter=',', quotechar='"')
    header = next(data)
    for row in data:
        match = 0
        for k, v in options.items():
            match += 1 if row[header.index(k)] == v else 0
        if len(options.keys()) == match:
            output.append(dict(zip(header, row)))
return output
1
  • Well "better" depends on use cases. For doing the thing once only, "better" means the first way that works. For doing many checks of different keys on the same csv file, then spending the time to load the csv data into a database or custom container in memory might be better. How to index the database and how to arrange the custom container and such could be very dependent on more use case details. Commented May 7, 2021 at 14:25

2 Answers 2

1

You don't say what you would consider a "better" approach to be. That said, it would take fewer lines of code if you used a csv.DictReader to process the input file as illustrated.

import csv


def find_matching_rows(filename, criteria, delimiter=',', quotechar='"'):
    criteria_values = tuple(criteria.values())
    matches = []
    with open(filename, 'r', newline='') as csvfile:
        for row in csv.DictReader(csvfile, delimiter=delimiter, quotechar=quotechar):
            if tuple(row[key] for key in criteria) == criteria_values:
                matches.append(row)
    return matches


results = find_matching_rows('matchtest.csv', {'h1': 'v1', 'h2': 'v2'})
for row in results:
    print(row)

Sign up to request clarification or add additional context in comments.

Comments

0

You can use a list comprehension to read and filter the rows of a DictReader. Make the wanted options a set and then its an easy test for intersection.

import csv
  
def test():
    options = {'h1': 'v1', 'h2': 'v2'}
    wanted = set(options.items())
    with open("data.csv", "rt", newline="") as csvfile:
        return [row for row in csv.DictReader(csvfile) if set(row.items()) & wanted]

print(test())
print(len(test()))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.