3

I'm trying to make unit-test that deals with csv files using python unittest framework. I want to test such cases as columns names match, values in columns match, etc. I know that there are more convenient libraries for it, like datatest and pytest , but I can use only unittest in my project.

Guess I'm using wrong unittest.TestCase methods, and send data in the wrong format. Please advise how to do it better way.

db.csv example:

  TIMESTAMP   TYPE   VALUE YEAR  FILE   SHEET
0 02-09-2018  Index   45   2018  tq.xls A01
1 13-05-2018  Index   21   2018  tq.xls A01
2 22-01-2019  Index   9    2019  aq.xls B02

Here is code example:

import pandas as pd
import unittest

class DFTests(unittest.TestCase):

    def setUp(self):
        test_file_name =  'db.csv'
        try:
            data = pd.read_csv(test_file_name,
                sep = ',',
                header = 0)
        except IOError:
            print('cannot open file')
        self.fixture = data

    #Check column names
    def test_columns(self):
        self.assertEqual(
            self.fixture.columns,
            {'TIMESTAMP', 'TYPE', 'VALUE','YEAR','FILE','SHEET'},
        )

    #Check timestamp format
    def test_timestamp(self):
        self.assertRaisesRegex(
            self.fixture['TIMESTAMP'],
            r'\d{2}-\d{2}-\d{4}'
        )

    #Check year values
    def test_year_values(self):
        self.assertIn(
            self.fixture['YEAR'],
            {2018, 2019, 2020},
        )


if __name__ == '__main__':
    unittest.main()

Errors:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
TypeError: assertRaisesRegex() arg 1 must be an exception type or tuple of exception types
TypeError: 'Series' objects are mutable, thus they cannot be hashed

Any help is appreciated.

2
  • 1
    In order to test a few alternatives, it would be good to have a representative snippet of the .csv you're dealing with. Commented Oct 7, 2020 at 10:28
  • @anddt Sure, I added example Commented Oct 7, 2020 at 10:35

1 Answer 1

6

You can use list comprehension to assert over each dataframe row. Try something like this:

import pandas as pd
import unittest

colnames = ["TIMESTAMP", " TYPE", " VALUE", " YEAR", " FILE", " SHEET"]
years = set([2018, 2019, 2020])


class DfTests(unittest.TestCase):
    def setUp(self):
        try:
            data = pd.read_csv("data.csv", sep=",")
            self.fixture = data
        except IOError as e:
            print(e)

    def test_colnames(self):
        self.assertListEqual(list(self.fixture.columns), colnames)

    def test_timestamp_format(self):
        ts = self.fixture["TIMESTAMP"]
        # You need to check for every row in the dataframe
        [self.assertRegex(i, r"\d{2}-\d{2}-\d{4}") for i in ts]

    def test_years(self):
        df_years = self.fixture[" YEAR"]
        self.assertTrue(all([i in years for i in df_years]))


if __name__ == "__main__":
    unittest.main()

Also, bear in mind that pandas has some built-in testing functions. On the other hand, when unit-testing dataframes (and general data validation) great_expectations would be probably the best tool for the job.

Sign up to request clarification or add additional context in comments.

3 Comments

Maybe you can suggest, I got error with test_timestamp_format : TypeError: expected string or bytes-like object
Strange, it worked on my machine on the data you provided. Try using str(i) in the first i of that list comprehension.
Also, an error like that might signal that not all entries in TIMESTAMP are strings. Consider adding a test to check that as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.