1

I'm trying to plot data read into Pandas from a xlsx file. After some minor formatting and data quality checks, I try to plot using matplotlib but get the following error:

TypeError: Empty 'DataFrame': no numeric data to plot

This is not a new issue and I have followed many of the pages on this site dealing with this very problem. The posted suggestions, unfortunately, have not worked for me.

My data set includes strings (locations of sampling sites and limited to the first column), dates (which I have converted to the correct format using pd.to_datetime), many NaN entries (that cannot be converted to zeros due to the graphical analysis we are doing), and column headings representing various analytical parameters.

As per some of the suggestions I read on this site, I have tried the following code

  1. df = df.astype(float) which gives me the following error ValueError: could not convert string to float: 'Site 1' (Site 1 is a sampling location)

  2. df = df.apply(pd.to_numeric, errors='ignore') which gives me the following: dtypes: float64(13), int64(1), object(65) and therefore does not appear to work as most of the data remains as an object. The date entries are the int64 and I cannot figure out why some of the data columns are float64 and some remain as objects

  3. df = df.apply(pd.to_numeric, errors='coerce') which deletes the entire DataFrame, possibly because this operation fills the entire DataFrame with NaN?

I'm stuck and would appreciate any insight.

EDIT

I was able to solve my own question based on some of the feedback. Here is what worked for me:

df = "path"

header = [0]    # keep column headings as first row of original data
skip = [1]      # skip second row, which has units of measure
na_val = ['.','-.','-+0.01']    # Convert spurious decimal points that have 
                                # no number associated with them to NaN
convert = {col: float for col in (4,...,80)}   # Convert specific rows to 
                                               # float from original text
parse_col = ("A","C","E:CC")    # apply to specific columns 

df = pd.read_excel(df, header = header, skiprows = skip, 
na_values = na_val, converters = convert, parse_columns = parse_col)
0

1 Answer 1

1

Hard to answer without a data sample, but if you are sure that the numeric columns are 100% numeric, this will probably work:

for c in df.columns:
try:
    df[c] = df[c].astype(int)
except:
    pass
Sign up to request clarification or add additional context in comments.

5 Comments

This answer is on the right track, but a few comments: <ol> - OP was on the right track with approach 2, but is attempting to apply it across the whole frame versus a single column. that's important to make clear. - should be .astyle(float) based on the original question - try-except is a very sketchy way to implement the approach </ol>
@ocop - I appreciate the comment but unfortunately am finding it unspecific enough to be helpful. You're correct, I am trying to apply the float correction to the entire dataset in order to facilitate visualization. How should astyle(float) be applied in your suggestion?
Should be astype(float) in place of Ezer K's astype(int). The "astyle" part was a typo on my part.
Thank you. You also mentioned that try-except is a very sketchy way to implement the approach. Is there a better/cleaner suggestion?
@Ezer K - was just able to try your suggestion and did not work. Looks like data in excel file are all stored as text and when using pd.read_excel gets converted to object in pandas.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.