1

I am trying to organise a column by filtering the values. In other words, there are thousands of repetitive names and I want to take just one name from each "group" and copy it in a other column.

So the column A is the current situation and the column be is the result I want to get:

Column A                   Column B

AB Mark Sociedad Ltda      AB Mark Sociedad Ltda
AB Mark Sociedad Ltda      Acosta Acosta Manuel
AB Mark Sociedad Ltda      ALBAGLI, ZALIASNIK 
AB Mark Sociedad Ltda
Acosta Acosta Manuel 
Acosta Acosta Manuel 
Acosta Acosta Manuel
ALBAGLI, ZALIASNIK 
ALBAGLI, ZALIASNIK
ALBAGLI, ZALIASNIK

Finally this is the script I am trying to use:

import openpyxl
from openpyxl import load_workbook
import os

os.chdir('path')
workbook = openpyxl.load_workbook('abc.xlsx')
page_i = workbook.get_sheet_names()
sheet = workbook.get_sheet_by_name('Sheet1')

for a in range(1, 10):
    representativex = sheet['A' + str(a)].value
    tuple(sheet['A1':'A10'])

    for row in sheet['A1':'A10']:
        if representativex in row:
            continue
        else:
            sheet['B' + str(a)].value 
            sheet['B' + str(a)] = representativex

        workbook.save('abc.xlsx')

Unfortunately it doesn't work.

4
  • You're trying to simply dedupe the column? Commented Jan 24, 2017 at 3:14
  • Hi Dmitry. Exactly I am trying to dedupe it in Python. Commented Jan 24, 2017 at 3:16
  • Excel has this functionality already. See here Commented Jan 24, 2017 at 3:31
  • Thanks KJ for your answer but I need to do it in python because this is just a small part of a big script. Commented Jan 24, 2017 at 3:35

1 Answer 1

2

I don't really use Python for this but here is a crude way that I found relatively quickly.

import openpyxl

wb = openpyxl.load_workbook('test.xlsx')
ws1 = wb.active

Sample Data

names = []
for row in ws1.columns[0]:
    names.append(row.value)

names = sorted(list(set(names)))

start = 1
for name in names:
    ws1.cell(row = start, column=2).value = name
    start += 1

wb.save('test.xlsx')

Sample Output Data

Edit: Apparently the newer upgrade of openpyxl needs a slight modification

Change this:

for row in ws1.columns[0]:
        names.append(row.value)

To this:

for row in ws1.iter_cols(max_col = 1, min_row=1):
    for cell in row:
        names.append(cell.value)

And just in case your columns are different,

iter_cols(min_col=None, max_col=None, min_row=None, max_row=None)[source]

Returns all cells in the worksheet from the first row as columns.

If no boundaries are passed in the cells will start at A1.

If no cells are in the worksheet an empty tuple will be returned.
Parameters: 

    min_col (int) – smallest column index (1-based index)
    min_row (int) – smallest row index (1-based index)
    max_col (int) – largest column index (1-based index)
    max_row (int) – smallest row index (1-based index)
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your help Dmitry. I am trying to use your script exactly as you typed, however it poped up this error ***** for row in ws1.columns[1]: TypeError: 'generator' object is not subscriptable [Finished in 5.3s with exit code 1]***** do you know why?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.