The last post has been relatively short and was showing only a bit of skeleton code calling a simple kernel function from the host side via a wrapper function. This time I would like to post a snippet which actually does something. It is an example showing how to implement a basic blur-filter using the CUDA programming environment. In addition to the former snippet, this example also contains the missing parts showing how to allocate/deallocate device memory and how to transfer data from the host to the device and vice versa.
I figured that as subject for testing good old Lena would be suitable.