10
\$\begingroup\$

This is a follow-up question for SIFT Keypoint Detection for Image in C++. With the detected SIFT keypoints of images, image stitching is possible to perform. In this post, the first step is to generate SIFT keypoint descriptor (with get_keypoint_descriptor template function). The second step is to match features using the robust cross-checking method (with find_robust_matches function). The third step is to perform homography estimation with RANSAC (RANdom SAmple Consensus) to filter outliers. The final step is to perform image warping and create the final panorama.

The experimental implementation

  • get_keypoint_descriptor template function implementation (in file image_operations.h)

    namespace TinyDIP
    {
        namespace SIFT_impl {
            //  get_keypoint_descriptor template function implementation
            template<typename ElementT, class FloatingType = double>
            requires((std::floating_point<ElementT> || std::integral<ElementT>))
            constexpr static auto get_keypoint_descriptor(
                const Image<ElementT>& input,
                const Point<2>& keypoint_location,
                const std::size_t block_size = 8
            )
            {
                //  Calculate Gradient Magnitudes and Orientations for 16x16 neighborhood
                Image<std::tuple<FloatingType, FloatingType>> pixel_orientations(block_size * 2, block_size * 2);
                const auto center_x = keypoint_location.p[0];
                const auto center_y = keypoint_location.p[1];
                for (std::size_t y = center_y - block_size; y < center_y + block_size; ++y)
                {
                    for (std::size_t x = center_x - block_size; x < center_x + block_size; ++x)
                    {
                        if (x >= input.getWidth() || y >= input.getHeight())
                        {
                            continue;
                        }
                        pixel_orientations.at(x - center_x + block_size, y - center_y + block_size) =
                            compute_each_pixel_orientation<ElementT, FloatingType>(subimage(input, 3, 3, x, y));
                    }
                }
    
                //  Group Pixels into 4x4 Subregions
                Image<std::vector<double>> subregion_orientations(4, 4);
                for (std::size_t subregion_y = 0; subregion_y < 4; ++subregion_y)
                {
                    for (std::size_t subregion_x = 0; subregion_x < 4; ++subregion_x)
                    {
                        std::vector<double> raw_histogram;
                        raw_histogram.resize(8);
                        for (std::size_t y = 0; y < 4; ++y)
                        {
                            for (std::size_t x = 0; x < 4; ++x)
                            {
                                auto each_pixel_orientation =
                                    pixel_orientations.at(subregion_x * 4 + x, subregion_y * 4 + y);
                                std::size_t bin_index = static_cast<std::size_t>(std::get<1>(each_pixel_orientation) / 45.0);
                                bin_index = bin_index == 8 ? 7 : bin_index;
                                raw_histogram[bin_index] += std::get<0>(each_pixel_orientation);
                            }
                        }
                        subregion_orientations.at(subregion_x, subregion_y) = raw_histogram;
                    }
                }
                std::vector<double> feature_vector;
                feature_vector.reserve(128);
                for (std::size_t subregion_y = 0; subregion_y < subregion_orientations.getHeight(); ++subregion_y)
                {
                    for (std::size_t subregion_x = 0; subregion_x < subregion_orientations.getWidth(); ++subregion_x)
                    {
                        #ifdef USE_APPEND_RANGE
                        feature_vector.append_range(subregion_orientations.at(subregion_x, subregion_y));
                        #else
                        for (auto&& element : subregion_orientations.at(subregion_x, subregion_y))
                        {
                            feature_vector.emplace_back(element);
                        }
                        #endif
                    }
                }
                return feature_vector;
            }
        }
    }
    
  • compute_each_pixel_orientation template function implementation (in file image_operations.h)

    namespace TinyDIP
    {
        namespace SIFT_impl {
            //  compute_each_pixel_orientation template function implementation
            /*  input is 3 * 3 image, calculate the gradient magnitude
            *    M(1, 1) = ((input(2, 1) - input(0, 1))^(2) + (input(1, 2) - input(1, 0))^(2))^(1/2)
            *   orientation
            *    θ(1, 1) = tan^(-1)((input(1, 2) - input(1, 0)) / (input(2, 1) - input(0, 1)))
            *   the value range of orientation is 0° ~ 360°
            */
            template<typename ElementT, class FloatingType = double>
            constexpr static auto compute_each_pixel_orientation(const Image<ElementT>& input)
            {
                if (input.getDimensionality() != 2)
                {
                    throw std::runtime_error("Unsupported dimension!");
                }
                if (input.getWidth() != 3 || input.getHeight() != 3)
                    throw std::runtime_error("Input size error!");
                FloatingType gradient_magnitude =
                    std::sqrt(
                        std::pow((static_cast<FloatingType>(input.at_without_boundary_check(static_cast<std::size_t>(2), static_cast<std::size_t>(1))) - static_cast<FloatingType>(input.at_without_boundary_check(static_cast<std::size_t>(0), static_cast<std::size_t>(1)))), 2.0) +
                        std::pow((static_cast<FloatingType>(input.at_without_boundary_check(static_cast<std::size_t>(1), static_cast<std::size_t>(2))) - static_cast<FloatingType>(input.at_without_boundary_check(static_cast<std::size_t>(1), static_cast<std::size_t>(0)))), 2.0)
                    );
                double orientation = std::atan2(
                        static_cast<FloatingType>(input.at_without_boundary_check(static_cast<std::size_t>(1), static_cast<std::size_t>(2))) - static_cast<FloatingType>(input.at_without_boundary_check(static_cast<std::size_t>(1), static_cast<std::size_t>(0))),
                        static_cast<FloatingType>(input.at_without_boundary_check(static_cast<std::size_t>(2), static_cast<std::size_t>(1))) - static_cast<FloatingType>(input.at_without_boundary_check(static_cast<std::size_t>(0), static_cast<std::size_t>(1)))
                    );
                orientation *= (180.0 / std::numbers::pi_v<FloatingType>);
                orientation += 180;
                return std::make_tuple(gradient_magnitude, orientation);
            }
        }
    }
    
  • find_robust_matches function implementation (in file image_operations.h)

    namespace TinyDIP
    {
        // A type alias for clarity
        using SiftDescriptor = std::vector<double>;
    
        /**
         * find_robust_matches function implementation
         * @brief Refines feature matches using a symmetry/cross-check test for higher precision.
         */
        std::vector<std::pair<std::size_t, std::size_t>> find_robust_matches(
            const std::vector<SiftDescriptor>& descriptors1,
            const std::vector<SiftDescriptor>& descriptors2,
            const double ratio_threshold)
        {
            // Step 1: Match from image 1 to image 2
            auto matches1to2 = find_raw_matches(std::execution::par, descriptors1, descriptors2, ratio_threshold);
    
            // Step 2: Match from image 2 to image 1
            auto matches2to1 = find_raw_matches(std::execution::par, descriptors2, descriptors1, ratio_threshold);
    
            std::vector<std::pair<std::size_t, std::size_t>> robust_matches;
    
            // Step 3: Cross-check. A match is good only if it's found in both directions.
            for(const auto& match1 : matches1to2)
            {
                for(const auto& match2 : matches2to1)
                {
                    if(match1.first == match2.second && match1.second == match2.first)
                    {
                        robust_matches.emplace_back(match1);
                        break; // Found the symmetric match, no need to check further for this one
                    }
                }
            }
    
            std::cout << "Found " << matches1to2.size() << " raw matches (1->2), " 
                      << matches2to1.size() << " raw matches (2->1). Kept " 
                      << robust_matches.size() << " robust matches after cross-checking.\n";
    
            return robust_matches;
        }
    }
    
  • find_raw_matches function implementation (in file image_operations.h)

    namespace TinyDIP
    {
        /**
         * find_raw_matches template function implementation
         * @brief Finds initial ("raw") matching keypoints between two sets of SIFT descriptors using Lowe's ratio test.
         * This is a building block for the more robust cross-checking matcher.
         */
        template<class ExecutionPolicy>
        requires(std::is_execution_policy_v<std::remove_cvref_t<ExecutionPolicy>>)
        std::vector<std::pair<std::size_t, std::size_t>> find_raw_matches(
            ExecutionPolicy&& policy,
            const std::vector<SiftDescriptor>& descriptors1,
            const std::vector<SiftDescriptor>& descriptors2,
            const double ratio_threshold)
        {
            if (descriptors1.empty() || descriptors2.empty())
            {
                return {};
            }
    
            std::vector<std::pair<std::size_t, std::size_t>> matches;
            std::vector<std::size_t> indices(descriptors1.size());
            std::ranges::iota(indices, 0);
    
            std::mutex mtx; // Mutex to protect access to the shared 'matches' vector
    
            std::for_each(std::forward<ExecutionPolicy>(policy), std::ranges::begin(indices), std::ranges::end(indices),
                [&](std::size_t i)
                {
                    double best_dist_sq = std::numeric_limits<double>::max();
                    double second_best_dist_sq = std::numeric_limits<double>::max();
                    std::size_t best_match_index = static_cast<std::size_t>(-1);
    
                    for (std::size_t j = 0; j < descriptors2.size(); ++j)
                    {
                        const double dist_sq = squared_euclidean_distance(descriptors1[i], descriptors2[j]);
    
                        if (dist_sq < best_dist_sq)
                        {
                            second_best_dist_sq = best_dist_sq;
                            best_dist_sq = dist_sq;
                            best_match_index = j;
                        }
                        else if (dist_sq < second_best_dist_sq)
                        {
                            second_best_dist_sq = dist_sq;
                        }
                    }
    
                    // Apply Lowe's ratio test
                    if (best_dist_sq < ratio_threshold * ratio_threshold * second_best_dist_sq)
                    {
                        std::lock_guard<std::mutex> lock(mtx);
                        matches.emplace_back(i, best_match_index);
                    }
                });
    
            return matches;
        }
    }
    
  • find_homography_ransac function implementation (in file image_operations.h)

    namespace TinyDIP
    {
        /**
         * find_homography_ransac function implementation
         * @brief Finds the best homography matrix using the RANSAC algorithm.
         */
        linalg::Matrix<double> find_homography_ransac(
            const std::vector<Point<2>>& keypoints1,
            const std::vector<Point<2>>& keypoints2,
            const std::vector<std::pair<std::size_t, std::size_t>>& matches,
            int iterations = 1000,
            double inlier_threshold = 3.0)
        {
            linalg::Matrix<double> best_H(3, 3);
            int max_inliers = 0;
    
            std::mt19937 rng(std::chrono::steady_clock::now().time_since_epoch().count());
            std::uniform_int_distribution<std::size_t> dist(0, matches.size() - 1);
    
            const double inlier_threshold_sq = inlier_threshold * inlier_threshold;
    
            for (int i = 0; i < iterations; ++i)
            {
                // 1. Randomly sample 4 matching points
                std::vector<std::pair<Point<2>, Point<2>>> sample_points;
                if (matches.size() < 4) continue;
    
                std::vector<std::size_t> indices;
                while(indices.size() < 4)
                {
                    std::size_t rand_idx = dist(rng);
                    if(std::find(std::ranges::begin(indices), std::ranges::end(indices), rand_idx) == std::ranges::end(indices))
                    {
                        indices.emplace_back(rand_idx);
                    }
                }
    
                for(const auto& idx : indices)
                {
                    sample_points.emplace_back(keypoints1[matches[idx].first], keypoints2[matches[idx].second]);
                }
    
                // 2. Compute a candidate homography from the sample
                linalg::Matrix<double> H_candidate(3, 3);
                if (!compute_homography(sample_points, H_candidate))
                {
                    continue;
                }
    
                // 3. Count inliers
                int current_inliers = 0;
                for (const auto& match : matches)
                {
                    const auto& p1 = keypoints1[match.first];
                    const auto& p2 = keypoints2[match.second];
    
                    // Project p1 to p1_projected using H_candidate
                    double w = H_candidate.at(2, 0) * p1.p[0] + H_candidate.at(2, 1) * p1.p[1] + H_candidate.at(2, 2);
                    if (std::abs(w) < 1e-7) continue;
    
                    double x_proj = (H_candidate.at(0, 0) * p1.p[0] + H_candidate.at(0, 1) * p1.p[1] + H_candidate.at(0, 2)) / w;
                    double y_proj = (H_candidate.at(1, 0) * p1.p[0] + H_candidate.at(1, 1) * p1.p[1] + H_candidate.at(1, 2)) / w;
    
                    // Calculate squared distance
                    double dx = x_proj - p2.p[0];
                    double dy = y_proj - p2.p[1];
                    double dist_sq = dx * dx + dy * dy;
    
                    if (dist_sq < inlier_threshold_sq)
                    {
                        current_inliers++;
                    }
                }
    
                // 4. Update best homography if this one is better
                if (current_inliers > max_inliers)
                {
                    max_inliers = current_inliers;
                    best_H = H_candidate;
                }
            }
    
            std::cout << "RANSAC found " << max_inliers << " inliers.\n";
    
            return best_H;
        }
    }
    
  • compute_homography function implementation (in file image_operations.h)

    namespace TinyDIP
    {
        /**
         * compute_homography template function implementation
         * @brief Computes the homography matrix from 4 or more point correspondences using the SVD-based Direct Linear Transform (DLT) algorithm.
         * This involves solving the system of linear equations Ah = 0.
         * Reference: https://engineering.purdue.edu/kak/courses-i-teach/ECE661.08/solution/hw5_s1.pdf
         */
        bool compute_homography(const std::vector<std::pair<Point<2>, Point<2>>>& points, linalg::Matrix<double>& H)
        {
            if (points.size() < 4)
            {
                return false;
            }
    
            // Construct the matrix A for the system Ah = 0.
            // For n points, A is a 2n x 9 matrix.
            linalg::Matrix<double> A(2 * points.size(), 9);
    
            for (std::size_t i = 0; i < points.size(); ++i)
            {
                const double x1 = points[i].first.p[0];
                const double y1 = points[i].first.p[1];
                const double x2 = points[i].second.p[0];
                const double y2 = points[i].second.p[1];
    
                const std::size_t row1 = 2 * i;
                const std::size_t row2 = 2 * i + 1;
    
                A.at(row1, 0) = -x1;
                A.at(row1, 1) = -y1;
                A.at(row1, 2) = -1;
                A.at(row1, 3) = 0;
                A.at(row1, 4) = 0;
                A.at(row1, 5) = 0;
                A.at(row1, 6) = x2 * x1;
                A.at(row1, 7) = x2 * y1;
                A.at(row1, 8) = x2;
    
                A.at(row2, 0) = 0;
                A.at(row2, 1) = 0;
                A.at(row2, 2) = 0;
                A.at(row2, 3) = -x1;
                A.at(row2, 4) = -y1;
                A.at(row2, 5) = -1;
                A.at(row2, 6) = y2 * x1;
                A.at(row2, 7) = y2 * y1;
                A.at(row2, 8) = y2;
            }
    
            // Solve Ah = 0 using SVD. The solution is the right singular vector
            // corresponding to the smallest singular value.
            std::vector<double> h = linalg::svd_solve_ah_zero(A);
            if (h.empty() || h.size() != 9)
            {
                return false;
            }
    
            // Reshape the 9x1 vector h into the 3x3 homography matrix H
            // and normalize it by the last element.
            H = linalg::Matrix<double>(3, 3);
            const double scale = h[8];
            if (std::abs(scale) < 1.0e-9) // Avoid division by zero
            {
                return false;
            }
            for(std::size_t i = 0; i < 3; ++i)
            {
                for(std::size_t j = 0; j < 3; ++j)
                {
                    H.at(i, j) = h[i * 3 + j] / scale;
                }
            }
    
            return true;
        }
    }
    
  • find_stitch_homography function implementation (in file image_operations.h)

    namespace TinyDIP
    {
        /**
         * find_stitch_homography function implementation
         * @brief Phase 1 of stitching: Detects features and computes the homography.
         * This is always done on full-resolution images for maximum accuracy.
         */
        linalg::Matrix<double> find_stitch_homography(const Image<RGB>& img1, const Image<RGB>& img2, const double ratio_threshold = 0.7)
        {
            // 1. Get Keypoints and Descriptors from full-resolution images
            std::cout << "Detecting SIFT features...\n";
            auto v_plane1 = TinyDIP::getVplane(TinyDIP::rgb2hsv(img1));
            auto v_plane2 = TinyDIP::getVplane(TinyDIP::rgb2hsv(img2));
    
            auto keypoints1 = SIFT_impl::get_potential_keypoint(v_plane1);
            auto keypoints2 = SIFT_impl::get_potential_keypoint(v_plane2);
    
            std::cout << "Found " << keypoints1.size() << " keypoints in image 1 and " << keypoints2.size() << " in image 2.\n";
            std::cout << "Generating descriptors...\n";
    
            std::vector<SiftDescriptor> descriptors1;
            descriptors1.reserve(keypoints1.size());
            for(const auto& kp : keypoints1) descriptors1.emplace_back(SIFT_impl::get_keypoint_descriptor(v_plane1, kp));
    
            std::vector<SiftDescriptor> descriptors2;
            descriptors2.reserve(keypoints2.size());
            for(const auto& kp : keypoints2) descriptors2.emplace_back(SIFT_impl::get_keypoint_descriptor(v_plane2, kp));
    
            // 2. Match Features using the robust cross-checking method
            std::cout << "Matching features...\n";
            auto matches = find_robust_matches(descriptors1, descriptors2, ratio_threshold);
    
            if (matches.size() < 4)
            {
                std::cerr << "Error: Not enough robust matches found to compute homography.\n";
                return linalg::Matrix<double>(); // Return empty matrix
            }
    
            // 3. Compute Homography with RANSAC
            std::cout << "Computing homography with RANSAC...\n";
            return find_homography_ransac(keypoints1, keypoints2, matches, 2000, 2.0);
        }
    }
    
  • create_stitched_image function implementation (in file image_operations.h)

    namespace TinyDIP
    {
        /**
         * create_stitched_image function implementation
         * @brief Phase 2 of stitching: Warps and blends images using a pre-computed homography.
         */
        Image<RGB> create_stitched_image(const Image<RGB>& img1, const Image<RGB>& img2, const linalg::Matrix<double>& H)
        {
            if (H.empty()) {
                std::cerr << "Cannot create stitched image with an empty homography.\n";
                return Image<RGB>();
            }
    
            // 1. Determine output canvas size by transforming the corners of img2
            auto H_inv = linalg::invert(H); // Needed to map img2 corners into img1's space
            if (H_inv.empty()) {
                 std::cerr << "Could not invert homography. Cannot determine output size.\n";
                return Image<RGB>();
            }
    
            const double w2 = img2.getWidth(), h2 = img2.getHeight();
            std::vector<Point<2>> corners = { {0,0}, {static_cast<std::size_t>(w2 - 1), 0}, {0, static_cast<std::size_t>(h2 - 1)}, {static_cast<std::size_t>(w2 - 1), static_cast<std::size_t>(h2 - 1)} };
            double min_x = 0, max_x = img1.getWidth(), min_y = 0, max_y = img1.getHeight();
    
            for(const auto& p : corners)
            {
                double w = H_inv.at(2,0) * p.p[0] + H_inv.at(2,1) * p.p[1] + H_inv.at(2,2);
                double x = (H_inv.at(0,0) * p.p[0] + H_inv.at(0,1) * p.p[1] + H_inv.at(0,2)) / w;
                double y = (H_inv.at(1,0) * p.p[0] + H_inv.at(1,1) * p.p[1] + H_inv.at(1,2)) / w;
                if(x < min_x) min_x = x;
                if(x > max_x) max_x = x;
                if(y < min_y) min_y = y;
                if(y > max_y) max_y = y;
            }
    
            const double trans_x = -min_x;
            const double trans_y = -min_y;
            const std::size_t out_width = static_cast<std::size_t>(std::ceil(max_x - min_x));
            const std::size_t out_height = static_cast<std::size_t>(std::ceil(max_y - min_y));
    
            // Create a translation matrix
            linalg::Matrix<double> H_trans(3,3);
            H_trans.at(0,0) = 1; H_trans.at(0,1) = 0; H_trans.at(0,2) = trans_x;
            H_trans.at(1,0) = 0; H_trans.at(1,1) = 1; H_trans.at(1,2) = trans_y;
            H_trans.at(2,0) = 0; H_trans.at(2,1) = 0; H_trans.at(2,2) = 1;
    
            // Combine translation with the original homography
            auto H_final = linalg::multiply(H_trans, H);
    
            // 2. Warp img2 to align with img1's coordinate frame
            std::cout << "Warping image 2...\n";
            auto warped_img2 = warp_perspective(img2, H_final, out_width, out_height);
    
            // 3. Blend images
            std::cout << "Blending images...\n";
            Image<RGB> stitched_image(out_width, out_height);
    
            // Paste img1 at its new translated position
            for(std::size_t y = 0; y < img1.getHeight(); ++y)
            {
                for(std::size_t x = 0; x < img1.getWidth(); ++x)
                {
                    auto dest_x = static_cast<std::size_t>(x + trans_x);
                    auto dest_y = static_cast<std::size_t>(y + trans_y);
                    if(dest_x < out_width && dest_y < out_height)
                    {
                        stitched_image.at(dest_x, dest_y) = img1.at(x,y);
                    }
                }
            }
    
            // Blend warped image 2 on top
            for(std::size_t y = 0; y < out_height; ++y)
            {
                for(std::size_t x = 0; x < out_width; ++x)
                {
                    const auto& pixel_warped = warped_img2.at(x, y);
                    bool warped_is_black = (pixel_warped.channels[0] < 5 && pixel_warped.channels[1] < 5 && pixel_warped.channels[2] < 5);
    
                    if(!warped_is_black)
                    {
                        const auto& pixel_base = stitched_image.at(x, y);
                        bool base_is_black = (pixel_base.channels[0] < 5 && pixel_base.channels[1] < 5 && pixel_base.channels[2] < 5);
                        if (base_is_black)
                        {
                            stitched_image.at(x,y) = pixel_warped;
                        }
                        else // Simple averaging for overlap
                        {
                            RGB blended_pixel;
                            blended_pixel.channels[0] = (static_cast<std::uint16_t>(pixel_base.channels[0]) + pixel_warped.channels[0]) / 2;
                            blended_pixel.channels[1] = (static_cast<std::uint16_t>(pixel_base.channels[1]) + pixel_warped.channels[1]) / 2;
                            blended_pixel.channels[2] = (static_cast<std::uint16_t>(pixel_base.channels[2]) + pixel_warped.channels[2]) / 2;
                            stitched_image.at(x, y) = blended_pixel;
                        }
                    }
                }
            }
    
            return stitched_image;
        }
    }
    
  • warp_perspective function implementation (in file image_operations.h)

    namespace TinyDIP
    {
        /**
         * warp_perspective template function implementation
         * @brief Applies a perspective transformation to an image using reverse mapping.
         */
        template<typename ElementT>
        Image<ElementT> warp_perspective(const Image<ElementT>& src, const linalg::Matrix<double>& H, const std::size_t out_width, const std::size_t out_height)
        {
            Image<ElementT> warped_image(out_width, out_height);
    
            if (H.empty())
            {
                 std::cerr << "Warning: Homography matrix is empty. Warping will fail.\n";
                 return warped_image;
            }
    
            // Cache matrix elements for performance
            const double h11 = H.at(0,0), h12 = H.at(0,1), h13 = H.at(0,2);
            const double h21 = H.at(1,0), h22 = H.at(1,1), h23 = H.at(1,2);
            const double h31 = H.at(2,0), h32 = H.at(2,1), h33 = H.at(2,2);
    
            #pragma omp parallel for
            for(std::size_t y = 0; y < out_height; ++y)
            {
                for(std::size_t x = 0; x < out_width; ++x)
                {
                    // Apply homography to find corresponding point in source image
                    const double w = h31 * x + h32 * y + h33;
    
                    // Avoid division by zero
                    if (std::abs(w) < 1e-9) continue;
    
                    const double src_x = (h11 * x + h12 * y + h13) / w;
                    const double src_y = (h21 * x + h22 * y + h23) / w;
    
                    // Use bilinear interpolation to get the pixel value
                    warped_image.at(x, y) = bilinear_interpolate(src, src_x, src_y);
                }
            }
            return warped_image;
        }
    }
    
  • imstitch function implementation (in file image_operations.h)

    namespace TinyDIP
    {
        /**
         * imstitch function implementation
         * @brief Stitches two images together.
         */
        Image<RGB> imstitch(const Image<RGB>& img1, const Image<RGB>& img2, const double ratio_threshold = 0.7)
        {
            // Phase 1: Calculate Homography on full-res images
            auto H = find_stitch_homography(img1, img2, ratio_threshold);
    
            // Phase 2: Create final image by warping and blending
            return create_stitched_image(img1, img2, H);
        }
    }
    
  • scale_homography function implementation (in file image_operations.h)

    namespace TinyDIP
    {
        /**
         * scale_homography function implementation
         * @brief Adjusts a homography matrix to account for image scaling.
         */
        linalg::Matrix<double> scale_homography(
            const linalg::Matrix<double>& H,
            const std::size_t w1_full, const std::size_t h1_full, const std::size_t w1_small, const std::size_t h1_small,
            const std::size_t w2_full, const std::size_t h2_full, const std::size_t w2_small, const std::size_t h2_small)
        {
            // Scaling factor from small image 1 to full image 1
            double scale_x1 = static_cast<double>(w1_full) / w1_small;
            double scale_y1 = static_cast<double>(h1_full) / h1_small;
    
            // Scaling factor from full image 2 to small image 2
            double scale_x2 = static_cast<double>(w2_small) / w2_full;
            double scale_y2 = static_cast<double>(h2_small) / h2_full;
    
            // S1: maps from small1 coordinates to full1 coordinates
            linalg::Matrix<double> S1(3, 3);
            S1.at(0,0) = scale_x1;
            S1.at(1,1) = scale_y1;
            S1.at(2,2) = 1.0;
    
            // S2_inv: maps from full2 coordinates to small2 coordinates
            linalg::Matrix<double> S2_inv(3, 3);
            S2_inv.at(0,0) = scale_x2;
            S2_inv.at(1,1) = scale_y2;
            S2_inv.at(2,2) = 1.0;
    
            // H_small = S2_inv * H * S1
            auto temp = linalg::multiply(S2_inv, H);
            return linalg::multiply(temp, S1);
        }
    }
    
  • Other Functions - Linear Algebra Utilities

namespace TinyDIP
{
namespace linalg
{
    /**
     * @brief A basic Matrix class for linear algebra operations.
     */
    template<typename T>
    class Matrix
    {
    public:
        Matrix() : rows_(0), cols_(0) {}

        Matrix(const std::size_t rows, const std::size_t cols)
            : rows_(rows), cols_(cols), data_(rows * cols)
        {
        }

        T& at(const std::size_t row, const std::size_t col)
        {
            return data_[row * cols_ + col];
        }

        const T& at(const std::size_t row, const std::size_t col) const
        {
            return data_[row * cols_ + col];
        }

        std::size_t rows() const
        {
            return rows_;
        }
        
        std::size_t cols() const
        {
            return cols_;
        }

        bool empty() const
        {
            return rows_ == 0 || cols_ == 0;
        }

        // Other utility functions can be added here (e.g., print)

    private:
        std::size_t rows_;
        std::size_t cols_;
        std::vector<T> data_;
    };

    /**
     * @brief Transposes a given matrix.
     */
    template<typename ElementT>
    requires (std::floating_point<ElementT>)
    Matrix<ElementT> transpose(const Matrix<ElementT>& A)
    {
        Matrix<ElementT> A_T(A.cols(), A.rows());
        for (std::size_t r = 0; r < A.rows(); ++r)
        {
            for (std::size_t c = 0; c < A.cols(); ++c)
            {
                A_T.at(c, r) = A.at(r, c);
            }
        }
        return A_T;
    }
    
    /**
     * @brief Multiplies two matrices.
     */
    template<typename ElementT>
    requires (std::floating_point<ElementT>)
    Matrix<ElementT> multiply(const Matrix<ElementT>& A, const Matrix<ElementT>& B)
    {
        if (A.cols() != B.rows())
        {
            throw std::runtime_error("Matrix dimensions are incompatible for multiplication.");
        }
        Matrix<ElementT> C(A.rows(), B.cols());
        for (std::size_t i = 0; i < A.rows(); ++i)
        {
            for (std::size_t j = 0; j < B.cols(); ++j)
            {
                ElementT sum = 0.0;
                for (std::size_t k = 0; k < A.cols(); ++k)
                {
                    sum += A.at(i, k) * B.at(k, j);
                }
                C.at(i, j) = sum;
            }
        }
        return C;
    }

    /**
     * @brief Finds eigenvalues and eigenvectors of a real symmetric matrix using the Jacobi eigenvalue algorithm.
     * @param A The input symmetric matrix.
     * @param eigenvalues A vector to be filled with the eigenvalues.
     * @param eigenvectors A matrix whose columns will be the eigenvectors.
     * @param max_iterations Maximum number of sweeps to perform.
     * @param tolerance Convergence tolerance.
     */
    template<typename ElementT>
    requires (std::floating_point<ElementT>)
    void jacobi_eigen_solver(
        const Matrix<ElementT>& A,
        std::vector<ElementT>& eigenvalues,
        Matrix<ElementT>& eigenvectors,
        int max_iterations = 100,
        ElementT tolerance = 1.0e-9)
    {
        if (A.rows() != A.cols())
        {
            throw std::runtime_error("Jacobi solver requires a square matrix.");
        }
        const std::size_t n = A.rows();
        Matrix<ElementT> D = A; // Make a copy to modify

        // Initialize eigenvectors as the identity matrix
        eigenvectors = Matrix<ElementT>(n, n);
        for(std::size_t i = 0; i < n; ++i)
        {
            eigenvectors.at(i, i) = 1.0;
        }
        
        for (int iter = 0; iter < max_iterations; ++iter)
        {
            // Find the largest off-diagonal element
            ElementT max_val = 0.0;
            std::size_t p = 0, q = 1;
            for (std::size_t i = 0; i < n; ++i)
            {
                for (std::size_t j = i + 1; j < n; ++j)
                {
                    if (std::abs(D.at(i, j)) > max_val)
                    {
                        max_val = std::abs(D.at(i, j));
                        p = i;
                        q = j;
                    }
                }
            }

            if (max_val < tolerance) break; // Convergence check

            // Perform Jacobi rotation
            ElementT app = D.at(p, p);
            ElementT aqq = D.at(q, q);
            ElementT apq = D.at(p, q);
            ElementT theta = 0.5 * std::atan2(2 * apq, aqq - app);
            ElementT c = std::cos(theta);
            ElementT s = std::sin(theta);

            // Update D (the matrix being diagonalized)
            D.at(p, p) = c * c * app + s * s * aqq - 2 * s * c * apq;
            D.at(q, q) = s * s * app + c * c * aqq + 2 * s * c * apq;
            D.at(p, q) = D.at(q, p) = 0.0;

            for (std::size_t i = 0; i < n; ++i)
            {
                if (i != p && i != q)
                {
                    ElementT aip = D.at(i, p);
                    ElementT aiq = D.at(i, q);
                    D.at(i, p) = D.at(p, i) = c * aip - s * aiq;
                    D.at(i, q) = D.at(q, i) = s * aip + c * aiq;
                }
            }
            
            // Update eigenvectors matrix
            for(std::size_t i = 0; i < n; ++i)
            {
                ElementT e_ip = eigenvectors.at(i, p);
                ElementT e_iq = eigenvectors.at(i, q);
                eigenvectors.at(i, p) = c * e_ip - s * e_iq;
                eigenvectors.at(i, q) = s * e_ip + c * e_iq;
            }
        }
        
        // Extract eigenvalues from the diagonal of D
        eigenvalues.resize(n);
        for(std::size_t i = 0; i < n; ++i)
        {
            eigenvalues[i] = D.at(i, i);
        }
    }

    /**
     * @brief Solves the system Ah=0 using SVD, by finding the eigenvector of A^T*A with the smallest eigenvalue.
     * @param A The input matrix.
     * @return The vector h that minimizes ||Ah||, which is the last column of V in A=UDV^T.
     */
    template<typename ElementT>
    requires (std::floating_point<ElementT>)
    std::vector<ElementT> svd_solve_ah_zero(const Matrix<ElementT>& A)
    {
        // Form the symmetric matrix A^T * A
        Matrix<ElementT> A_T = transpose(A);
        Matrix<ElementT> ATA = multiply(A_T, A);

        // Find eigenvalues and eigenvectors of A^T * A
        std::vector<ElementT> eigenvalues;
        Matrix<ElementT> eigenvectors;
        jacobi_eigen_solver(ATA, eigenvalues, eigenvectors);

        // Find the index of the smallest eigenvalue
        auto min_it = std::min_element(std::begin(eigenvalues), std::end(eigenvalues));
        std::size_t min_idx = std::distance(std::begin(eigenvalues), min_it);

        // The solution h is the eigenvector corresponding to the smallest eigenvalue
        std::vector<ElementT> h(A.cols());
        for (std::size_t i = 0; i < A.cols(); ++i)
        {
            h[i] = eigenvectors.at(i, min_idx);
        }
        return h;
    }

    /**
     * @brief Computes the inverse of a 3x3 matrix.
     * @return The inverted matrix, or an empty matrix if inversion fails.
     */
    template<typename T>
    Matrix<T> invert(const Matrix<T>& M)
    {
        if (M.rows() != 3 || M.cols() != 3)
        {
            throw std::runtime_error("Matrix inversion is implemented for 3x3 matrices only.");
        }

        T det = M.at(0, 0) * (M.at(1, 1) * M.at(2, 2) - M.at(2, 1) * M.at(1, 2)) -
                M.at(0, 1) * (M.at(1, 0) * M.at(2, 2) - M.at(1, 2) * M.at(2, 0)) +
                M.at(0, 2) * (M.at(1, 0) * M.at(2, 1) - M.at(1, 1) * M.at(2, 0));

        if (std::abs(det) < 1e-9)
        {
            // Matrix is singular and cannot be inverted.
            return Matrix<T>();
        }

        T inv_det = 1.0 / det;
        Matrix<T> inverse(3, 3);

        inverse.at(0, 0) = (M.at(1, 1) * M.at(2, 2) - M.at(2, 1) * M.at(1, 2)) * inv_det;
        inverse.at(0, 1) = (M.at(0, 2) * M.at(2, 1) - M.at(0, 1) * M.at(2, 2)) * inv_det;
        inverse.at(0, 2) = (M.at(0, 1) * M.at(1, 2) - M.at(0, 2) * M.at(1, 1)) * inv_det;
        inverse.at(1, 0) = (M.at(1, 2) * M.at(2, 0) - M.at(1, 0) * M.at(2, 2)) * inv_det;
        inverse.at(1, 1) = (M.at(0, 0) * M.at(2, 2) - M.at(0, 2) * M.at(2, 0)) * inv_det;
        inverse.at(1, 2) = (M.at(1, 0) * M.at(0, 2) - M.at(0, 0) * M.at(1, 2)) * inv_det;
        inverse.at(2, 0) = (M.at(1, 0) * M.at(2, 1) - M.at(2, 0) * M.at(1, 1)) * inv_det;
        inverse.at(2, 1) = (M.at(2, 0) * M.at(0, 1) - M.at(0, 0) * M.at(2, 1)) * inv_det;
        inverse.at(2, 2) = (M.at(0, 0) * M.at(1, 1) - M.at(1, 0) * M.at(0, 1)) * inv_det;

        return inverse;
    }

} // namespace linalg
} // namespace TinyDIP

Main function for testing:

int main(int argc, char* argv[])
{
    TinyDIP::Timer timer1;
    // === Argument Parsing ===
    std::vector<std::string> args(argv + 1, argv + argc);
    bool preview_mode = false;

    // Search for and remove the --preview flag
    auto it = std::find(args.begin(), args.end(), "--preview");
    if (it != args.end())
    {
        preview_mode = true;
        args.erase(it);
    }

    if (args.size() < 3 || args.size() > 4)
    {
        // Print usage information to standard error
        std::cerr << "Usage: " << argv[0] << " <img1.bmp> <img2.bmp> <output_base> [ratio_threshold] [--preview]\n";
        std::cerr << "Example (Full): " << argv[0] << " s1.bmp s2.bmp result 0.7\n";
        std::cerr << "Example (Preview): " << argv[0] << " s1.bmp s2.bmp result 0.7 --preview\n";
        std::cerr << "  - ratio_threshold (optional): A value between 0.0 and 1.0. Lower is stricter. Default is 0.7.\n";
        std::cerr << "  - --preview (optional): Runs the pipeline on downscaled images for a fast preview.\n";
        return EXIT_FAILURE;
    }

    // Assign command-line arguments to variables
    const std::string file_path1 = args[0];
    const std::string file_path2 = args[1];
    const std::string output_filename_base = args[2];
    double ratio_threshold = 0.7; // Default value

    if (args.size() == 4)
    {
        try
        {
            ratio_threshold = std::stod(args[3]);
            if (ratio_threshold <= 0.0 || ratio_threshold >= 1.0)
            {
                std::cerr << "Error: ratio_threshold must be between 0.0 and 1.0.\n";
                return EXIT_FAILURE;
            }
        }
        catch (const std::invalid_argument& e)
        {
            std::cerr << "Error: Invalid number provided for ratio_threshold.\n";
            return EXIT_FAILURE;
        }
        catch (const std::out_of_range& e)
        {
            std::cerr << "Error: ratio_threshold value is out of range.\n";
            return EXIT_FAILURE;
        }
    }


    // === Load Full-Resolution Images ===
    std::cout << "--- Loading Full-Resolution Images ---\n";
    auto bmp1 = TinyDIP::bmp_read(file_path1.c_str(), true);
    auto bmp2 = TinyDIP::bmp_read(file_path2.c_str(), true);

    if ((bmp1.getWidth() == 0) || (bmp1.getHeight() == 0))
    {
        std::cerr << "Error: Failed to read image from " << file_path1 << "\n";
        return EXIT_FAILURE;
    }
    if ((bmp2.getWidth() == 0) || (bmp2.getHeight() == 0))
    {
        std::cerr << "Error: Failed to read image from " << file_path2 << "\n";
        return EXIT_FAILURE;
    }
    std::cout << "Image 1 loaded: " << bmp1.getWidth() << " x " << bmp1.getHeight() << "\n";
    std::cout << "Image 2 loaded: " << bmp2.getWidth() << " x " << bmp2.getHeight() << "\n\n";

    // === Phase 1: Find Homography (always on full-res images) ===
    auto H = TinyDIP::find_stitch_homography(bmp1, bmp2, ratio_threshold);
    if(H.empty())
    {
        std::cerr << "Stitching pipeline failed during homography calculation.\n";
        return EXIT_FAILURE;
    }


    // === Phase 2: Create Stitched Image (either preview or full-res) ===
    TinyDIP::Image<TinyDIP::RGB> stitched_result;
    std::string final_output_name = output_filename_base;

    if (preview_mode)
    {
        std::cout << "\n--- Creating Preview Image ---\n";
        final_output_name += "_preview";

        // 1. Resize images for preview
        const std::size_t preview_width = 400;
        const auto new_height1 = static_cast<std::size_t>(static_cast<double>(bmp1.getHeight()) * preview_width / bmp1.getWidth());
        const auto new_height2 = static_cast<std::size_t>(static_cast<double>(bmp2.getHeight()) * preview_width / bmp2.getWidth());

        std::cout << "Resizing images for preview to " << preview_width << "px width...\n";
        auto bmp1_small = TinyDIP::copyResizeBicubic(bmp1, preview_width, new_height1);
        auto bmp2_small = TinyDIP::copyResizeBicubic(bmp2, preview_width, new_height2);

        // 2. Scale the high-quality H for the preview images
        std::cout << "Scaling homography for preview...\n";
        auto H_small = TinyDIP::scale_homography(
            H,
            bmp1.getWidth(), bmp1.getHeight(), bmp1_small.getWidth(), bmp1_small.getHeight(),
            bmp2.getWidth(), bmp2.getHeight(), bmp2_small.getWidth(), bmp2_small.getHeight()
        );

        // 3. Create the stitched image using the SCALED homography and small images
        stitched_result = TinyDIP::create_stitched_image(bmp1_small, bmp2_small, H_small);
    }
    else
    {
        std::cout << "\n--- Creating Full-Resolution Image ---\n";
        // Create the stitched image using the high-quality H and full-res images
        stitched_result = TinyDIP::create_stitched_image(bmp1, bmp2, H);
    }
    

    // === Save the final result ===
    if (stitched_result.getWidth() == 0 || stitched_result.getHeight() == 0)
    {
        std::cerr << "Error: Stitching resulted in an empty image.\n";
        return EXIT_FAILURE;
    }

    if (TinyDIP::bmp_write(final_output_name.c_str(), stitched_result) == 0)
    {
        std::cout << "Successfully saved stitched image to " << final_output_name << ".bmp\n";
    }
    else
    {
        std::cerr << "Error: Failed to save stitched image.\n";
        return EXIT_FAILURE;
    }

    std::cout << "\n--- Pipeline Complete ---\n";

    return EXIT_SUCCESS;
}

The example output.

StitchingOutputImage

Image source:

  • Image Source1

  • Image Source1

TinyDIP on GitHub

All suggestions are welcome.

The summary information:

  • Which question it is a follow-up to?

    SIFT Keypoint Detection for Image in C++

  • What changes has been made in the code since last question?

    I implemented find_stitch_homography, create_stitched_image and some of helper functions for image stitching in this post.

  • Why a new review is being asked for?

    Please review the implementation of find_stitch_homography, create_stitched_image and related functions.

\$\endgroup\$
1
  • 2
    \$\begingroup\$ Neat! Instead of averaging where the images overlap, you can pick a line where you switch from the one to the other image. The optimal line is the one that goes through the most similar pixels. This is surprisingly simple, and very, very effective. See here: crisluengo.net/archives/393 \$\endgroup\$ Commented Oct 7 at 22:33

2 Answers 2

6
\$\begingroup\$

Congratulations on getting this far; doing image stitching is far from trivial! The next step to make it looking truly professional is a more sophisticated blending algorithm for the overlapping regions, like what Enblend uses.

Unnecessarily hardcoded types

Despite the template parameter FloatingType for most functions, I still see a few doubles in the code. Those either be auto or FloatingType. Also, compute_homography() and find_stitch_homography() should have a FloatingType template parameter to be consistent. create_stitched_image() is even worse, it also hardcodes the input images to be RGB.

Simplify the code

The algorithms are complicated enough as it is, you don't want to make the code look even more complicated. I would try to go over the code and identify things that can be simplified:

Reduce indentation levels

Don't indent namespace blocks.

Avoid declaring complicated lambdas inside function calls. This reduces both the indentation of the lambda body and makes the function call itself more readable.

Use early return or continue statements to avoid indentation of if-statements, instead of:

if (!warped_is_black) {
    /* long piece of code */
}

Write:

if (warped_is_black)
    continue;

/* long piece of code */`

Avoid separate handling of x and y coordinates

Sure, sometimes you need to do individual operations on x and y coordinates, but often you do virtually the same operation on both. Try to group those operations somehow.

You have Matrix, but you are missing a corresponding Vector. Even a mathematical vector hardcoded to have two coordinates would be useful in your code.

Create a way to iterate over rectangular regions in a single for-statement, where the iterator yields a two-dimensional coordinate. This will simplify your code and reduce indentation levels. Consider being able to write:

for (auto subpos: rectangle(4, 4)) {
    std::vector<…> raw_histogram(8);
    for (auto pos: rectangle(4, 4)) {
        auto each_pixel_orientations = pixel_orientations.at(subpos * 4 + pos);
        …
    }
}

Use std::hypot()

Often overlooked is the function std::hypot(), which can reduce a lot of typing in some cases. Consider:

auto dx = static_cast<FloatingType>(input.at_without_boundary_checks(2, 1) - input.at_without_boundary_checks(0, 1));
auto dy = static_cast<FloatingType>(input.at_without_boundary_checks(1, 2) - input.at_without_boundary_checks(1, 0));

auto magnitude = std::hypot(dx, dy);
auto orientation = std::atan2(dy, dx);

Consider implementing a multi-dimensional operator[]

Since C++23 it is possible to overload operator[] to take more than one parameter. So insted of writing foo.at_without_boundary_checks(x, y), you can make it so you can write foo[x, y].

Create structs instead of using std::tuple and std::pair

Almost always when you are writing std::pair or std::tuple, it is better to declare a struct where you can give proper names to each member of the pair or tuple. This then avoids having to write .first and .second and having to remember which is which, or having to write std::get<…>. Consider for example:

template<class FloatingType = double>
struct Gradient {
    FloatingType magnitude;
    FloatingType orientation;
};

template<typename ElementT, class FloatingType = double>
… auto compute_each_pixel_orientation(const Image<ElementT>& input)
{
    …
    auto magnitude = std::hypot(dx, dy);
    auto orientation = std::atan2(dy, dx);

    return Gradient{magnitude, orientation};
}

… get_keypoint_descriptor(…)
{
    …
    auto bin_index = static_cast<std::size_t>(each_pixel_orientation.orientation / 45);
    …
    raw_histogram[bin_index] += each_pixel_orientation.magnitude;
    …
}

Avoid unnecessary work

In find_robust_matches(), you are trying to find raw matches from image 1 to image 2, and from image 2 to image 1, and then you try to find the overlap between those two sets. However, this means you are doing unnecessary work: once you know the raw matches from image 1 to image 2, you don't have to scan all of image 2 anymore for raw matches, instead just take the list of raw matches from 1 to 2 and then check if those are also valid raw matches from 2 to 1.

If that's not helpful, then at least change the \$O(N^2)\$ algorithm for the cross-check to first sort matches1to2 and matches2to1, and then use std::set_intersection().

\$\endgroup\$
3
\$\begingroup\$

G. Sliepen has provided a good general overview; I'm focusing on find_raw_matches here. There are two major issues with it:

  1. Lowe's ratio test is only reliable when you have a priori knowledge that the second-closest point is an invalid match. If you're dealing with strongly self-similar images (such as when stitching two halves of a radially-symmetric logo), you can expect to reject many true matches at this stage. No, I don't know how to fix this -- it's a problem I'm also running into, with software for finding duplicate images.

  2. Pairwise comparison of keypoints is inefficient, running in \$O(M*N)\$ time. There are a number of k-nearest-neighbor search algorithms, both exact and approximate, that run in \$O(M \text{log} N)\$ time. In my duplicate-finding software, I got a thousand-fold speed increase by going from tree-based exact kNN to HNSWLib's approximate kNN search; pairwise comparison was so slow that it never finished running. You're only dealing with keypoint counts in the low thousands, so I don't expect the speedup to be as dramatic, and with some algorithms, the time needed for setup may wipe out the gains from faster searching.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.