0

I am currently trying to develop a convenience function, which is supposed to create for each column in a pandas dataframe a basic plot with the values and their amount in the dataset for all columns in the dataframe.

def plot_value_counts(df, leave_out):
  # is supposed to create the subplots grid where I can add the plots
  fig, axs = plt.subplots(int(len(df)/2) + 1,int(len(df)/2) + 1)
  for idx, name in enumerate(list(df)):
    if name == leave_out:
      continue
    else:
      axs[idx] = df[name].value_counts().plot(kind="bar")
  return fig, axs

this snippet runs for ever and never stops. I tried looking at other similar questions on stackoverflow, but couldn't find anything specific for my case.

the usage of the subplots function came from the following question: Is it possible to automatically generate multiple subplots in matplotlib?

below a short sample of the data file, so that everybody can understand the problem: https://gist.github.com/hentschelpatrick/e0a7e1400a4b5c356ec8b0e4952f8cc1#file-train-csv

2
  • 1
    Can you provide a (short!) example of an input for which this process hangs for you? Commented Feb 15, 2019 at 13:01
  • @asongtoruin i added a link to the gist with some of the data. Commented Feb 15, 2019 at 13:58

2 Answers 2

1

You can pass the axis object in the plot method docs. And you should iterate on columns:

fig, axs = plt.subplots(int(len(df)/2) + 1,int(len(df)/2) + 1)
for idx, name in enumerate(df.columns):
    if name == leave_out:
        continue
    else:
        df[name].value_counts().plot(kind="bar", ax=axs[idx])

EDIT: If you have memory issues (doens't seem to run) try first without using subplots and show each plot:

for idx, name in enumerate(df.columns):
    if name == leave_out:
        continue
    else:
        df[name].value_counts().plot(kind="bar")
        plt.show()
Sign up to request clarification or add additional context in comments.

3 Comments

it still runs indefinitely
The method works, although clearly it doesn't like a 100 subplots. Try passing a single axis / figure with plt.show() (see edit)
I have found the reason why it ran out of memory even though there are just 6 columns. 2 of them are numeric values which arent binned resulting in an extremely overloaded plot. Thank you!
1

Here is a function that i had written for my project to plot all columns in a pandas dataframe. It will generate a grid of size nx4 and will plot all the columns

def plotAllFeatures(dfData):
    plt.figure(1, figsize=(20,50))
    pos=1
    for feature in dfData.columns:
        plt.subplot(np.ceil(len(dfData.columns)/4),4,pos)
        dfData[feature].plot(title=feature)
        pos=pos+1
    plt.show()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.