1

I have a code that

  1. reads the data from CSV,
  2. replaces the columns from space to underscore, and
  3. replaces nan with None.
    def read_file_and_transform(local_file_path):
        """ """
        try:
            data_df = pd.read_csv(local_file_path)

            data_df.columns = data_df.columns.str.replace(' ', '_')

            clean_df = data_df.where((pd.notnull(data_df)), None)

        except Exception as e:
            logger.error("Failure in read file and transform method {}".format(e))
            raise e

I am writing a unit test case for these three lines and facing the error with line 3

Here is my test case:

class MockPandas:
    def __init__(self):
        pass

    def read_csv(self, *args, **kwargs):
        """ """
        return pd.DataFrame([{"a b": np.nan, "b": 2.33}])

    def notnull(self, *args, **kwargs):
        """ """
        return pd.DataFrame([{"a_b": "None", "b": 2.33}])


 @patch("path", MockPandas())
    def test_read_file_and_transform(self):
        """ """
        result = self.obj.read_file_and_transform("/file_path")
        assert result == [{"a": None, "b": 2.33}]

The error I am facing is :

ValueError: Boolean array expected for the condition, not object

Can anyone help me here? Thanks

3
  • 1
    I think that notnull should return an array of booleans: for every cell in the original data frame - determine if it's not nan (and place True/False in that value). See the official documentation: pandas.pydata.org/docs/reference/api/pandas.notnull.html and an example to outputs: geeksforgeeks.org/python-pandas-dataframe-notnull So you should change the values you return in your mocked version of notnull. Let me know if this works so I can write a proper answer. Commented Jul 15, 2022 at 19:49
  • Thanks, @PeterK for the inputs here. I understood what you are trying to say and it fixed my issue. I have upvoted your answer. Commented Jul 18, 2022 at 10:52
  • Thanks, @aman-raheja, I'd appreciate it if you can approve my extended answer for better visibility to future readers of this question. Commented Jul 19, 2022 at 1:34

1 Answer 1

1

pandas.notnull returns a new data frame with the same size as the original data frame, where each cell has a boolean value indicating if the respective value is not nan.

Therefore you should change the return value of your mocked version of notnull to match the expected return value.

For example, if the original df is:

       A       B     C     D
0  Sandy     NaN  20.0  14.8
1   alex  olivia  20.0   3.0
2  brook  terica   7.0   NaN
3  kelly     dan   NaN   2.3
4    NaN  amanda   8.0   6.0

Then df.notnull would be:

       A      B      C      D
0   True  False   True   True
1   True   True   True   True
2   True   True   True  False
3   True   True  False   True
4  False   True   True   True

Panda's notnull documentation can be found here.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.