8
\$\begingroup\$

I am writing tests for legacy software which has a dependency on CGI scripts on a remote machine.

Mocking the dependency would entail modifying the code under test, which is a thing I cannot do at the moment. For this reason, I decided to make a replica of the remote machine for the code to run against (I have made a copy of the CGI scripts on my development machine and provided a way to set up a local http server within my tests).

Since I don't have much experience with CGI, I have written some "meta-tests" to ensure the local server is working correctly.

Test folder structure is:

root
|
`- src/
| `-- #code to test...
|
`-functional_tests/
  |-- __init__.py
  |-- tests.py
  `-- www
      `-- cgi-bin
          |-- # legacy cgi scripts...
          `-- hello_world.sh
#tests.py

from contextlib import contextmanager
from http.client import HTTPResponse
from http.server import CGIHTTPRequestHandler, HTTPServer
import threading
from unittest import TestCase

import urllib.request
from urllib.error import URLError


# custom handler for my dir structure
class _TestHandler(CGIHTTPRequestHandler):

    def __init__(self, request, client_address, server):

        super().__init__(
            request, client_address, server, directory="functional_tests/www"
        )
        self.cgi_directories = ["/cgi-bin"]

# utility function invoked by tests which involve cgi
@contextmanager
def _setup_cgi_http_server():

    # runs a stoppable http.server.HTTPServer instance (see https://stackoverflow.com/a/19211760/19831748)
    def _run_server(httpd: HTTPServer, stop_event: threading.Event):

            # prevents handle_request() from hanging 
            httpd.timeout = 0.25

            print("http server started...")

            while not stop_event.is_set():
                httpd.handle_request()

            print("...http server stopped")

    stop_event = threading.Event()

    with HTTPServer(("", 8000), _TestHandler) as httpd:

        server_thread = threading.Thread(target=_run_server, args=[httpd, stop_event])
        server_thread.start()

        yield httpd

        stop_event.set()


class TestCGIHTTPServer(TestCase):

    def test_can_invoke_cgi(self):

        TEST_CGI_URL = "http://localhost:8000/cgi-bin/hello_world.sh"

        self.enterContext(_setup_cgi_http_server())

        with urllib.request.urlopen(TEST_CGI_URL) as response:

            assert isinstance(response, HTTPResponse)

            content = response.read().decode("utf-8")

            self.assertIn("hello world", content.lower())

    def test_server_is_closed_after_exit(self):

        with _setup_cgi_http_server() as _:
            pass
        
        with self.assertRaises((URLError, ConnectionResetError)):
            urllib.request.urlopen("http://localhost:8000")
#!/bin/bash

# hello_world.sh

echo "Content-Type: text/plain"
echo ""
echo "Hello world"
\$\endgroup\$
2
  • \$\begingroup\$ I am trying to understand this. You have production code on another machine that includes CGI scripts running under some webserver and you are trying to ensure that those scripts can be executed in your test environment. So your actual test suite would include test cases for each of these scripts. Do I have this right? \$\endgroup\$ Commented Nov 6 at 11:21
  • \$\begingroup\$ @Booboo The actual test suite is meant to test the legacy software itself. Before the software tries to access the CGI URLs, the (configurable) CGI HOST is set to localhost and _setup_cgi_http_server() is invoked during setUp() \$\endgroup\$ Commented Nov 6 at 20:37

3 Answers 3

5
\$\begingroup\$

Mostly this looks fine. With the caveat that CGI is very legacy, and frowned upon in the modern python ecosystem, at least for new code.

I am glad you chose to include a dirt simple "hello world".

bind()

    with HTTPServer(("", 8000), ...

I think that lets random hosts probe your TCP port. Prefer to specify "localhost" there.

There may be room to DRY this up a little bit, given the redundancy with TEST_CGI_URL.

And in test_server_is_closed_after_exit(), the short URL we see isn't a bad thing, but we could as easily specify TEST_CGI_URL again. I mean, at some point you may need to change the URL because an unrelated daemon has allocated port 8000, and we ideally would change it in just one place.

extra assert

        assert isinstance(response, HTTPResponse)

Maybe that was helpful to you during initial debugging? But I can't imagine it reveals anything interesting about the code you actually want to test.

Consider deleting it, or testing that response.status indicates Success.

extra downcase

        self.assertIn("hello world", content.lower())

The call to .lower() is surprising. Just verify that "Hello world" came back.

boilerplate text

This is simple and works great, but is not very convenient for copy-n-pasting revised headers or text:

echo "Content-Type: text/plain"
echo ""
echo "Hello world"

Consider rephrasing as a here-doc.

cat << EOF
Content-Type: text/plain

Hello world
EOF

In python we would tend to use a triple """ to accomplish the same thing.

\$\endgroup\$
2
  • \$\begingroup\$ Thank you for the review. The extra assert was added to aid my type checker/code completion. For some reason, the urllib.request.urlopen() has a return type of Any | None (python 3.11.2) \$\endgroup\$ Commented Nov 7 at 0:29
  • \$\begingroup\$ Aha, got it. Yeah, I have peppered my own code with such ugliness, to get it to lint cleanly. So in that case, the simplest thing you can get away with is assert response, which rules out the None case. \$\endgroup\$ Commented Nov 7 at 0:31
4
\$\begingroup\$

Binding to all interfaces

As mentioned by @J_H:

bind()

with HTTPServer(("", 8000), ...

I think that lets random hosts probe your TCP port. Prefer to specify "localhost" there.

Indeed, leaving the address empty has the following behavior:

By default, the server binds itself to all interfaces

Source: Python docs. You can use localhost or 127.0.0.1. Using localhost might fall back on an IPv6 address, but that should be OK as long as your IPv6 stack is working properly.

Port already bound

He made another valid point: port 8000 may already be in use on your machine at the time the test is being run. It's best to let Python choose a random port that is available. The secret trick is to use port 0 (zero). Then to retrieve the assigned port you just need this:

httpd.server_port

This is what I do in my own tests.

I would make it a fixture in your conftest.py. In fact, the whole server instance should be a fixture. Anyway, port and address should not be hardcoded more than once in your code.

Randomize

The test should already be sufficient, but as J_H said it's a good idea to always check the status code as well. Also, to make the tests even more realistic, you could randomize the expected response by using fixtures. The idea is to generate a random string every time, via a dedicated function (fixture with scope "session"), then test the existence of that string across your tests. The reasoning is that hello world is widely used in code and not very random. For example, some API test endpoints might return that kind of response by design.

I am more familiar with pytest than unittest, but I understand that fixtures exist in unittest too and should work in a similar fashion.

Tightening

The test can be more strict. First, by being case-sensitive. You are using assertIn, so the response only has to contain hello world, but could carry unrelated text. You could trim the response, and then look for the exact string you are expecting.

\$\endgroup\$
3
\$\begingroup\$

Layout

The test_can_invoke_cgi function has too much wasted vertical whitespace. There is no need for a blank line between every line of code. This is fine:

def test_can_invoke_cgi(self):
    TEST_CGI_URL = "http://localhost:8000/cgi-bin/hello_world.sh"
    self.enterContext(_setup_cgi_http_server())
    with urllib.request.urlopen(TEST_CGI_URL) as response:
        assert isinstance(response, HTTPResponse)
        content = response.read().decode("utf-8")
        self.assertIn("hello world", content.lower())

Naming

The class name TestCGIHTTPServer is hard to read. I realize you are just following the pattern of the imported function names, but it is hard to understand with all the capital letters bunched together. It might be better to go against convention and not use all caps for the acronyms. For example, TestCgiHttpServer uses a form of "camel case", like the PEP 8 style guide recommends.

Comments

This is a helpful comment:

# prevents handle_request() from hanging 
httpd.timeout = 0.25

You could elaborate on it by specifying the time unit as well as describing why you chose that magic number.

PEP 8 also recommends docstrings for classes and function. This comment:

# custom handler for my dir structure
class _TestHandler(CGIHTTPRequestHandler):

can be converted to a docstring:

class _TestHandler(CGIHTTPRequestHandler):
    """Custom handler for my directory structure"""
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.