Filter a column in excel Python

Question

I am trying to organise a column by filtering the values. In other words, there are thousands of repetitive names and I want to take just one name from each "group" and copy it in a other column.

So the column A is the current situation and the column be is the result I want to get:

Column A                   Column B

AB Mark Sociedad Ltda      AB Mark Sociedad Ltda
AB Mark Sociedad Ltda      Acosta Acosta Manuel
AB Mark Sociedad Ltda      ALBAGLI, ZALIASNIK 
AB Mark Sociedad Ltda
Acosta Acosta Manuel 
Acosta Acosta Manuel 
Acosta Acosta Manuel
ALBAGLI, ZALIASNIK 
ALBAGLI, ZALIASNIK
ALBAGLI, ZALIASNIK

Finally this is the script I am trying to use:

import openpyxl
from openpyxl import load_workbook
import os

os.chdir('path')
workbook = openpyxl.load_workbook('abc.xlsx')
page_i = workbook.get_sheet_names()
sheet = workbook.get_sheet_by_name('Sheet1')

for a in range(1, 10):
    representativex = sheet['A' + str(a)].value
    tuple(sheet['A1':'A10'])

    for row in sheet['A1':'A10']:
        if representativex in row:
            continue
        else:
            sheet['B' + str(a)].value 
            sheet['B' + str(a)] = representativex

        workbook.save('abc.xlsx')

Unfortunately it doesn't work.

Thanks KJ for your answer but I need to do it in python because this is just a small part of a big script. — Hans Schmidt
– Hans Schmidt, Commented Jan 24, 2017 at 3:35

gold_cy · Accepted Answer · 2017-01-24 04:22:39Z

2

I don't really use Python for this but here is a crude way that I found relatively quickly.

import openpyxl

wb = openpyxl.load_workbook('test.xlsx')
ws1 = wb.active

names = []
for row in ws1.columns[0]:
    names.append(row.value)

names = sorted(list(set(names)))

start = 1
for name in names:
    ws1.cell(row = start, column=2).value = name
    start += 1

wb.save('test.xlsx')

Edit: Apparently the newer upgrade of openpyxl needs a slight modification

Change this:

for row in ws1.columns[0]:
        names.append(row.value)

To this:

for row in ws1.iter_cols(max_col = 1, min_row=1):
    for cell in row:
        names.append(cell.value)

And just in case your columns are different,

iter_cols(min_col=None, max_col=None, min_row=None, max_row=None)[source]

Returns all cells in the worksheet from the first row as columns.

If no boundaries are passed in the cells will start at A1.

If no cells are in the worksheet an empty tuple will be returned.
Parameters: 

    min_col (int) – smallest column index (1-based index)
    min_row (int) – smallest row index (1-based index)
    max_col (int) – largest column index (1-based index)
    max_row (int) – smallest row index (1-based index)

edited Jan 24, 2017 at 4:22

answered Jan 24, 2017 at 3:38

gold_cy

14.2k4 gold badges27 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Hans Schmidt Over a year ago

Thanks for your help Dmitry. I am trying to use your script exactly as you typed, however it poped up this error ***** for row in ws1.columns[1]: TypeError: 'generator' object is not subscriptable [Finished in 5.3s with exit code 1]***** do you know why?

Collectives™ on Stack Overflow

Filter a column in excel Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related