0

Looking to plot a histogram emanating from a dataframe, I seem to lack in transforming to a right object type that matplotlib can deal with. Here are some failed attempts. How do I fix it up?

And more generally, how do you typically salvage something like that?

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

filter(lambda v: v > 0, df['foo_col']).hist(bins=10)

---> 10 filter(lambda v: v > 0, df['foo_col']).hist(bins=100) AttributeError: 'filter' object has no attribute 'hist'

hist(filter(lambda v: v > 0, df['foo_col']), bins=100)

---> 10 hist(filter(lambda v: v > 0, df['foo_col']), bins=100) TypeError: 'Series' object is not callable

1
  • I'm confused about what you're trying to do. You want to plot a histogram for all values > 0? Commented May 26, 2018 at 7:28

1 Answer 1

2

By all accounts, filter is lucky to be part of the standard library. IIUC, you just want to filter your dataframe to plot a histogram of values > 0. Pandas has its own syntax for that:

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

data = np.random.randint(-50, 1000, 10000)

df = pd.DataFrame({'some_data': data})

df[df['some_data'] >= 0].hist(bins=100)
plt.show()

Note that this will run much faster than python builtins could ever hope to (it doesn't make much difference in my trivial example, but it will with bigger datasets). It's important to use pandas methods with dataframes wherever possible because, in many cases, the calculation will be vectorized and run in highly optimised C/C++ code.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.