C++ port of Python Canny Edge Detection Implementation

Published 17 Mar 2025

First Published 18th Feb 2016

At the tail end of my write up for the Python implementation of the Canny Edge Detection algorithm, I mentioned that I would port it to C++ as it would run faster so I did and here it is.

For your interest, the Python code on the same computer displayed the following output for timing of each section.


Image width=700 height=500
Applying gaussian blur to image
– Apply gaussian blur to image took 6.695 seconds
Gradient intensity pass
– Calculating the intensity gradient of image took 2.076 seconds
Non-maximal suppression
– Calculating the non-maximal suppression pass of the image took 0.429 seconds

The C++ code in Release mode displayed the following output in the Console for timing of each section.

I think the above just highlights how important it is to chose the correct language for a certain task. I wrote the Python implementation purely out of interest knowing full well it was probably one of the slowest languages to use for executing the task.

If we look at the change in execution time from the Python implementation to the C++ port;

Apply Gaussian Blur (C++ is 334 times faster)

Gradient Intensity Pass (C++ is 43 times faster)

Non Maximal Suppression Pass (C++ is 39 times faster)

The reasons are obvious, Python is interpreted and C++ is a compiled language.

Anyway, based on the C++ performance, each frame of 700*500 pixels would take 79ms. I have not optimised this implementation at all, easy optimisations would be to initially copy the image to memory and access it directly rather than through CImg. The current implementation that uses CImg is mainly for display purposes and can handle 12fps.

An optimised version non GPU accelerated should easy push past 60fps on the same computer. In fact, my next modification to the code will be a basically optimised version of the C++ code using buffers directly. Should be interesting to see just how much faster this algorithm can go without major optimisations.

If I get time, I will further optimise the C++ code with x86 Assembly for the time intensive aspects of the code. I believe replacing the square root function with an optimised hypotenuse calculation should less than halve the time that part of the process takes.

Anyway, I have placed the Microsoft Visual Studio 2015 solution in GitHub if you wish to have a look.

Link: GitHub Repository