Saturday, September 19, 2015

printf inside kernel

To debug the kernel, you can directly use printf() function like C inside cuda kernel, instead of calling cuprintf() in cuda 4.2. However, I noticed that there is a limit of trace to print out to the stdout, around 4096 records, thought you may have N, e.g. 50K, threads running on the device.

To pre-allocate a data structure to install these info is a safer and better solution.

1 comment:

  1. How? because you will get an error of calling a __host__ function("printf") from a __global__ function.

    ReplyDelete