Testing pandas dataframe with unittest framework

Question

I'm trying to make unit-test that deals with csv files using python unittest framework. I want to test such cases as columns names match, values in columns match, etc. I know that there are more convenient libraries for it, like datatest and pytest , but I can use only unittest in my project.

Guess I'm using wrong unittest.TestCase methods, and send data in the wrong format. Please advise how to do it better way.

db.csv example:

  TIMESTAMP   TYPE   VALUE YEAR  FILE   SHEET
0 02-09-2018  Index   45   2018  tq.xls A01
1 13-05-2018  Index   21   2018  tq.xls A01
2 22-01-2019  Index   9    2019  aq.xls B02

Here is code example:

import pandas as pd
import unittest

class DFTests(unittest.TestCase):

    def setUp(self):
        test_file_name =  'db.csv'
        try:
            data = pd.read_csv(test_file_name,
                sep = ',',
                header = 0)
        except IOError:
            print('cannot open file')
        self.fixture = data

    #Check column names
    def test_columns(self):
        self.assertEqual(
            self.fixture.columns,
            {'TIMESTAMP', 'TYPE', 'VALUE','YEAR','FILE','SHEET'},
        )

    #Check timestamp format
    def test_timestamp(self):
        self.assertRaisesRegex(
            self.fixture['TIMESTAMP'],
            r'\d{2}-\d{2}-\d{4}'
        )

    #Check year values
    def test_year_values(self):
        self.assertIn(
            self.fixture['YEAR'],
            {2018, 2019, 2020},
        )


if __name__ == '__main__':
    unittest.main()

Errors:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
TypeError: assertRaisesRegex() arg 1 must be an exception type or tuple of exception types
TypeError: 'Series' objects are mutable, thus they cannot be hashed

Any help is appreciated.

In order to test a few alternatives, it would be good to have a representative snippet of the .csv you're dealing with. — anddt
– anddt, Commented Oct 7, 2020 at 10:28

iff_or · Accepted Answer · 2023-11-22 13:07:32Z

6

You can use list comprehension to assert over each dataframe row. Try something like this:

import pandas as pd
import unittest

colnames = ["TIMESTAMP", " TYPE", " VALUE", " YEAR", " FILE", " SHEET"]
years = set([2018, 2019, 2020])


class DfTests(unittest.TestCase):
    def setUp(self):
        try:
            data = pd.read_csv("data.csv", sep=",")
            self.fixture = data
        except IOError as e:
            print(e)

    def test_colnames(self):
        self.assertListEqual(list(self.fixture.columns), colnames)

    def test_timestamp_format(self):
        ts = self.fixture["TIMESTAMP"]
        # You need to check for every row in the dataframe
        [self.assertRegex(i, r"\d{2}-\d{2}-\d{4}") for i in ts]

    def test_years(self):
        df_years = self.fixture[" YEAR"]
        self.assertTrue(all([i in years for i in df_years]))


if __name__ == "__main__":
    unittest.main()

Also, bear in mind that pandas has some built-in testing functions. On the other hand, when unit-testing dataframes (and general data validation) great_expectations would be probably the best tool for the job.

edited Nov 22, 2023 at 13:07

iff_or

9291 gold badge11 silver badges25 bronze badges

answered Oct 7, 2020 at 11:12

anddt

1,6811 gold badge12 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

干猕猴桃 Over a year ago

Maybe you can suggest, I got error with test_timestamp_format : TypeError: expected string or bytes-like object

anddt Over a year ago

Strange, it worked on my machine on the data you provided. Try using str(i) in the first i of that list comprehension.

anddt Over a year ago

Also, an error like that might signal that not all entries in TIMESTAMP are strings. Consider adding a test to check that as well.

Collectives™ on Stack Overflow

Testing pandas dataframe with unittest framework

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related