0

I have a Python script that reads a .xls file and uses a loop to remove all of the unnecessary returns inside of each row. So far my script can go through a row that I specify and remove the returns, but I want it to automatically go through every row and remove all the unnecessary returns. Here is my script -


import xlrd
import xlwt

# function for removing returns in file
edits_returns = ''
def remove_returns1(row, column):
    global edits_returns
    cell_hold = sheet.cell(row, column).value
    cell_hold_str = str(cell_hold)
    if "\n" in cell_hold_str:
        edits_returns = edits_returns + ('Return(s) replaced in (row %d : cell %d.)\n' % (row, column))
    out_cell = cell_hold_str.replace('\n', '')
    return out_cell

# obtaining filename
fname = raw_input('Input Filename > ')

# opening file
workbook = xlrd.open_workbook(fname)
sheet = workbook.sheet_by_index(0)

# informing user of # of rows and columns
print "\nNumber of rows: %d" % sheet.nrows
print "Number of Columns: %d\n" % sheet.ncols

# removing returns by row
column = 0
while column < sheet.ncols:
    new_value = remove_returns1(34, column)
    column += 1
    print new_value,

# printing the edits
print "\n\n", edits_returns

  • My questions

    1. How can I iterate through every row automatically instead of manually?
    2. Is there a better way to print the edit results as seen in edit_results? (I plan to make this script do more than just remove returns in the future)
    3. Am I doing something redundant or can something I've written in my script be done differently?

Example input:

10/13/15 mcdonalds\n $20 0.01%
10/13/15 mcdonalds\n $20 0.01%

Example output:

10/13/15 mcdonalds $20 0.01%
10/13/15 mcdonalds $20 0.01%
  • All of the rows are still on their own line. they are not attached.

Example output from one of the provided answers:

10/13/15 mcdonalds $20 0.01%10/13/15 mcdonalds $20 0.01%

This appears close, but is still not what I'm looking for.


Thanks in advance! I'm open to all constructive criticism.

7
  • 1
    Please tell me why my question deserves a -1? I've put in a lot of research time and couldn't find anything. I also looked through some of the other questions and couldn't find one like it. Commented Oct 5, 2015 at 19:11
  • You mean you replace \n with '' for each column? Commented Oct 5, 2015 at 19:12
  • No, I made a loop that looks in each cell individually. I specify the column manually, as you can see in the 6th line from the bottom (34, column). This makes it to where it goes through every column in row 34 and removes all the returns, but how do I make it go through every row too? Commented Oct 5, 2015 at 19:14
  • Use one for while loop increase row no. after the column loop completes, for use a for loop, you already know no. of columns and rows. What's the issue? Commented Oct 5, 2015 at 19:17
  • I'm new to Python and don't know a lot of these things. After the column loop completes, how do I make it go to the next row and start back with the column loop? Commented Oct 5, 2015 at 19:19

1 Answer 1

1

Replace

# removing returns by row
column = 0
while column < sheet.ncols:
    new_value = remove_returns1(34, column)
    column += 1
    print new_value,

# printing the edits
print "\n\n", edits_returns

with below. You need to go over rows one by one and then each column.

# removing returns by row
row_idx =0
while row_idx < sheet.nrows:
    col_idx = 0
    while col_idx < sheet.ncols:
        new_value = remove_returns1(row_idx, col_idx)
        col_idx += 1
        print new_value,

    print      
    row_idx += 1

To store each row into a variable, you need to first append that columns to a list and then join them.

row_idx =0
while row_idx < sheet.nrows:
    col_idx = 0
    row_data =[]
    while col_idx < sheet.ncols:
        new_value = remove_returns1(row_idx, col_idx)
        col_idx += 1
        row_data.append(new_value)

    a= ' '.join(row_data)
    print a
    row_idx += 1

You can also make 'a' a list and append all the row to it, if you don't want to print out or use them immediately.

Sign up to request clarification or add additional context in comments.

12 Comments

Nothing prints whenever I have that in my code. You see, 34 is the row number. I need 34 to start at 0 then go all the way to the total number of rows inside the .xls document, removing all of the returns.
I think it won't be working previously also, you have defined column wrongly. Did it work for a single row previously?
Yes, with the original script provided above, works with a single row. A row being cells (0-9,0) (zero through nine). At first I thought it was confusing how it was setup because the rows go horizontal, but are on the second part of a cell (columns, rows). I'm used to an x/y axis so at first in my head I thought it was backwards, (rows, columns). So I do believe I've specified it correctly.
That seemed to do everything at once. I just need it to do one row, then return and go to the next row. Would you like me to find you a .xls file to test with? I'm doing some test on my own with a private .xls document.
Like, I got all of the information contained in the .xls file into one big string with no returns. I want to keep the returns at the beginning of each row, but not any returns that are in the middle. That's why I need it to do one row, then print, then the next row, then print, etc until all the rows are complete.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.