Saturday, September 19, 2015

2d array on gpu

You can dynamically allocate multi-dimensional array on cpu and free it before exiting the program.
However this is not the case for gpu.
When you first allocate pointer, it works fine. Then when you cudamalloc a space for each pointer, it will trigger a segmentation fault.

One solution is creating a contiguous trunk of memory and find a index for each dimension.
Another solution for 2d is use pitch memory.

printf inside kernel

To debug the kernel, you can directly use printf() function like C inside cuda kernel, instead of calling cuprintf() in cuda 4.2. However, I noticed that there is a limit of trace to print out to the stdout, around 4096 records, thought you may have N, e.g. 50K, threads running on the device.

To pre-allocate a data structure to install these info is a safer and better solution.