3

i have the following excel file, and i would like to clean specific rows/columns so that i can further process the file.

enter image description here

I have tried this, but i have not managed to remove any of the blank lines, i ve only managed to trim from those containing data. Here, i was trying to only save the data from the third row and on.

xl = pd.ExcelFile("MRD.xlsx")
df = xl.parse("Sheet3")
df2 = df.iloc[3:]

writer4 = pd.ExcelWriter('pandas3.out.no3lines.xlsx', engine='xlsxwriter')
table5 = pd.DataFrame(df2)
table5.to_excel(writer4, sheet_name='Sheet1')
writer4.save()

I specifically want to remove rows 1, 3 (the empty ones) and the first column, so that i can pivot it. Is there a way to do this? Thank you.

1
  • 1
    never post images of the dataset. print your dataframe instead in this case. as the empty rows will be read as NAs, you can just df.dropna(how='all') Commented Nov 26, 2017 at 17:03

1 Answer 1

5

You can use drop(...) to remove rows, and drop(..., axis=1) to remove columns

data = [
    ['', '', '', ''],
    ['', 1, 2, 3],
    ['', '', '', ''],
    ['', 7, 8, 9],
]

import pandas as pd

df = pd.DataFrame(data)

# drop first column - [0]
df = df.drop(0, axis=1)

# drop first and third row - [0,2]
df = df.drop([0,2])

print(df)

Before:

  0  1  2  3
0           
1    1  2  3
2           
3    7  8  9

After:

   1  2  3
1  1  2  3
3  7  8  9
Sign up to request clarification or add additional context in comments.

2 Comments

you would have to show code and full error message.
df.dropna(axis=1, how='all') this worked, your direction was right. The problem was with the NaN's and missing column name i think.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.