Get rows from CSV by matching header to multiple dictionary key-values

Question

I have a CSV file with header and I want to retrieve all the rows from CSV that matches a dictionary key-value. Note that dictionary can contain any number of orbitary key and value to match with.

Here is the code I have written to solve this, is there any other better way to approach this (other than pandas dataframe)?

Better way mean - removal of unnecessary variable if any? better data structure, better library, reducing space/time complexity than below solution

options = {'h1': 'v1', 'h2': 'v2'}
output = []
with open("data.csv", "rt") as csvfile:
    data = csv.reader(csvfile, delimiter=',', quotechar='"')
    header = next(data)
    for row in data:
        match = 0
        for k, v in options.items():
            match += 1 if row[header.index(k)] == v else 0
        if len(options.keys()) == match:
            output.append(dict(zip(header, row)))
return output

Well "better" depends on use cases. For doing the thing once only, "better" means the first way that works. For doing many checks of different keys on the same csv file, then spending the time to load the csv data into a database or custom container in memory might be better. How to index the database and how to arrange the custom container and such could be very dependent on more use case details. — Andrew Allaire
– Andrew Allaire, Commented May 7, 2021 at 14:25

martineau · Accepted Answer · 2021-05-07 17:05:20Z

1

You don't say what you would consider a "better" approach to be. That said, it would take fewer lines of code if you used a csv.DictReader to process the input file as illustrated.

import csv


def find_matching_rows(filename, criteria, delimiter=',', quotechar='"'):
    criteria_values = tuple(criteria.values())
    matches = []
    with open(filename, 'r', newline='') as csvfile:
        for row in csv.DictReader(csvfile, delimiter=delimiter, quotechar=quotechar):
            if tuple(row[key] for key in criteria) == criteria_values:
                matches.append(row)
    return matches


results = find_matching_rows('matchtest.csv', {'h1': 'v1', 'h2': 'v2'})
for row in results:
    print(row)

answered May 7, 2021 at 17:05

martineau

124k29 gold badges181 silver badges319 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

tdelaney · Accepted Answer · 2021-05-07 17:58:33Z

0

You can use a list comprehension to read and filter the rows of a DictReader. Make the wanted options a set and then its an easy test for intersection.

import csv
  
def test():
    options = {'h1': 'v1', 'h2': 'v2'}
    wanted = set(options.items())
    with open("data.csv", "rt", newline="") as csvfile:
        return [row for row in csv.DictReader(csvfile) if set(row.items()) & wanted]

print(test())
print(len(test()))

answered May 7, 2021 at 17:58

tdelaney

77.9k6 gold badges91 silver badges129 bronze badges

Collectives™ on Stack Overflow

Get rows from CSV by matching header to multiple dictionary key-values

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related