You can dynamically allocate multi-dimensional array on cpu and free it before exiting the program.
However this is not the case for gpu.
When you first allocate pointer, it works fine. Then when you cudamalloc a space for each pointer, it will trigger a segmentation fault.
One solution is creating a contiguous trunk of memory and find a index for each dimension.
Another solution for 2d is use pitch memory.
Saturday, September 19, 2015
printf inside kernel
To debug the kernel, you can directly use printf() function like C inside cuda kernel, instead of calling cuprintf() in cuda 4.2. However, I noticed that there is a limit of trace to print out to the stdout, around 4096 records, thought you may have N, e.g. 50K, threads running on the device.
To pre-allocate a data structure to install these info is a safer and better solution.
To pre-allocate a data structure to install these info is a safer and better solution.
Subscribe to:
Posts (Atom)