7

I have an Excel (.xlsx) file that I'm trying to parse, row by row. I have a header (first row) that has a bunch of column titles like School, First Name, Last Name, Email, etc.

When I loop through each row, I want to be able to say something like:

row['School']

and get back the value of the cell in the current row and the column with 'School' as its title.

I've looked through the OpenPyXL docs but can't seem to find anything terribly helpful.

Any suggestions?

2
  • 1
    have you tried using read_excel from pandas? Commented Jun 25, 2016 at 4:26
  • I also want to use such a convenient function. So far, I'm using ordereddict to help me solve the problem. If you find any way more convenient, please share with us. Commented May 10, 2017 at 3:21

5 Answers 5

2

I'm not incredibly familiar with OpenPyXL, but as far as I can tell it doesn't have any kind of dict reader/iterator helper. However, it's fairly easy to iterate over the worksheet rows, as well as to create a dict from two lists of values.

def iter_worksheet(worksheet):
    # It's necessary to get a reference to the generator, as 
    # `worksheet.rows` returns a new iterator on each access.
    rows = worksheet.rows

    # Get the header values as keys and move the iterator to the next item
    keys = [c.value for c in next(rows)]
    for row in rows:
        values = [c.value for c in row]
        yield dict(zip(keys, values))
Sign up to request clarification or add additional context in comments.

Comments

0

Excel sheets are far more flexible than CSV files so it makes little sense to have something like DictReader.

Just create an auxiliary dictionary from the relevant column titles.

If you have columns like "School", "First Name", "Last Name", "EMail" you can create the dictionary like this.

keys = dict((value, idx) for (idx, value) in enumerate(values))
for row in ws.rows[1:]:
    school = row[keys['School'].value

2 Comments

Thanks for the reply; I'm a Python newbie so could you elaborate a bit more on your answer?
What is unclear? You can update the question in the light of the answer.
0

I wrote DictReader based on openpyxl. Save the second listing to file 'excel.py' and use it as csv.DictReader. See usage example in the first listing.

with open('example01.xlsx', 'rb') as source_data:
    from excel import DictReader

    for row in DictReader(source_data, sheet_index=0):
        print(row)

excel.py:

__all__ = ['DictReader']

from openpyxl import load_workbook
from openpyxl.cell import Cell

Cell.__init__.__defaults__ = (None, None, '', None)   # Change the default value for the Cell from None to `` the same way as in csv.DictReader


class DictReader(object):
    def __init__(self, f, sheet_index,
                 fieldnames=None, restkey=None, restval=None):
        self._fieldnames = fieldnames   # list of keys for the dict
        self.restkey  = restkey         # key to catch long rows
        self.restval  = restval         # default value for short rows
        self.reader   = load_workbook(f, data_only=True).worksheets[sheet_index].iter_rows(values_only=True)
        self.line_num = 0

    def __iter__(self):
        return self

    @property
    def fieldnames(self):
        if self._fieldnames is None:
            try:
                self._fieldnames = next(self.reader)
                self.line_num += 1
            except StopIteration:
                pass

        return self._fieldnames

    @fieldnames.setter
    def fieldnames(self, value):
        self._fieldnames = value

    def __next__(self):
        if self.line_num == 0:
            # Used only for its side effect.
            self.fieldnames

        row = next(self.reader)
        self.line_num += 1

        # unlike the basic reader, we prefer not to return blanks,
        # because we will typically wind up with a dict full of None
        # values
        while row == ():
            row = next(self.reader)

        d = dict(zip(self.fieldnames, row))
        lf = len(self.fieldnames)
        lr = len(row)

        if lf < lr:
            d[self.restkey] = row[lf:]
        elif lf > lr:
            for key in self.fieldnames[lr:]:
                d[key] = self.restval

        return d

Comments

0

The following seems to work for me.

    header = True
    headings = []
    for row in ws.rows:
        if header:
            for cell in row:
                headings.append(cell.value)
            header = False
            continue
        rowData = dict(zip(headings, row))
        wantedValue = rowData['myHeading'].value

Comments

0

I was running into the same issue as described above. Therefore I created a simple extension called openpyxl-dictreader that can be installed through pip. It is very similar to the suggestion made by @viktor earlier in this thread.

The package is largely based on source code of Python's native csv.DictReader class. It allows you to select items based on column names using openpyxl. For example:

import openpyxl_dictreader

reader = openpyxl_dictreader.DictReader("names.xlsx", "Sheet1")
for row in reader:
    print(row["First Name"], row["Last Name"])

Putting this here for reference.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.