http://www.seas.upenn.edu/~cis565/Lectures2011S/Lecture12.pdf
http://simulationcorner.net/index.php?page=fastmatrixvector
Wednesday, April 15, 2015
Sunday, February 22, 2015
cuda ptx
1) binary utilities
http://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#axzz3SUgZUVcD
binary in elf-format
nvcc embeds cubin files into the host executable file
they can be generated separately by using "-cubin" option
cuobjdump: cubin and host binaries
nvdisasm: cubinfiles , which can support control flow analysis and output
For example, I have matrix multiplication app called matrixMul.
% cuda elf sections
cuobjdump -elf matrixMul
%cuda assembly
cuobjdump -sass matrixMul
%extract ptx from elf
cuobjdump matrixMul -ptx -sass
% list different cubin files for different architecture
cuobjdump a.out -lelf
% extract all the cubins from the binary
cuobjdump matrixMul -xelf all
Assume, I want to analysis the architecture with cuda capability 3.0.
The previous cubin is matrixMul.sm_30.cubin
% extract the control flow graph of a kernel
nvdisasm -cfg matrixMul.sm_30.cubin
% to generate DOT graph description language
sudo apt-get install graphviz
nvdisasm -cfg matrixMul.sm_30.cubin | dot -o cfg.png -Tpng
% to shwo the register liveness range information
nvdisasm -plr matrixMul.sm_30.cubin
http://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#axzz3SUgZUVcD
binary in elf-format
nvcc embeds cubin files into the host executable file
they can be generated separately by using "-cubin" option
cuobjdump: cubin and host binaries
nvdisasm: cubinfiles , which can support control flow analysis and output
For example, I have matrix multiplication app called matrixMul.
% cuda elf sections
cuobjdump -elf matrixMul
%cuda assembly
cuobjdump -sass matrixMul
%extract ptx from elf
cuobjdump matrixMul -ptx -sass
% list different cubin files for different architecture
cuobjdump a.out -lelf
% extract all the cubins from the binary
cuobjdump matrixMul -xelf all
Assume, I want to analysis the architecture with cuda capability 3.0.
The previous cubin is matrixMul.sm_30.cubin
% extract the control flow graph of a kernel
nvdisasm -cfg matrixMul.sm_30.cubin
% to generate DOT graph description language
sudo apt-get install graphviz
nvdisasm -cfg matrixMul.sm_30.cubin | dot -o cfg.png -Tpng
% to shwo the register liveness range information
nvdisasm -plr matrixMul.sm_30.cubin
Monday, February 16, 2015
Instruction Level Parallelism and Thread Level Parallelism
A good tutorial from Prof. John Owen from UCD.
http://www.nvidia.com/content/cudazone/cudau/courses/ucdavis/lectures/tlp1.pdf
TLP: many restaurants with one boss and one chef
ILP: one restaurant with one boss and many chefs
http://www.nvidia.com/content/cudazone/cudau/courses/ucdavis/lectures/tlp1.pdf
TLP: many restaurants with one boss and one chef
ILP: one restaurant with one boss and many chefs
Interesting matrix multiplication in CUDA 7.0 SDK
In the CUDA 7.0 SDK, for the matrix multiplication benchmark, the input A is 320 x 320 and input B is 640 x 320. It calculates output C using A x B!
(320 x 320) x (640 x 320)
A B
It doesn't make sense!
(320 x 320) x (640 x 320)
A B
It doesn't make sense!
Sunday, February 15, 2015
Exercises using gpuocelot
http://www.ieap.uni-kiel.de/et/people/kruse/tutorials/cuda/tutorial01o/web01o/tutorial01o.html
Thursday, January 29, 2015
embed assembly inside cuda kernel
If you know specific asm, you could potentially just right the kernel assebmly by your own.
Ha, too much work!
Here is two lines I found in the SDK.
unsigned lane_mask_lt;
asm("mov.u32 %0, %%lanemask_lt;" : "=r"(lane_mask_lt));
Ha, too much work!
Here is two lines I found in the SDK.
unsigned lane_mask_lt;
asm("mov.u32 %0, %%lanemask_lt;" : "=r"(lane_mask_lt));
error while loading shared libraries: libcudart.so.5.5: cannot open shared object file: No such file or directory
It happens when the system can't find the dynamic linker bindings, though you probably set every environment correctly and compiled the program successfully.
Here is the solution.
32-bit: sudo ldconfig /usr/local/cuda/lib
64-bit: sudo ldconfig /usr/local/cuda/lib64
Thanks.(http://stackoverflow.com/questions/10808958/why-cant-libcudart-so-4-be-found-when-compiling-the-cuda-samples-under-ubuntu)
Here is the solution.
32-bit: sudo ldconfig /usr/local/cuda/lib
64-bit: sudo ldconfig /usr/local/cuda/lib64
Thanks.(http://stackoverflow.com/questions/10808958/why-cant-libcudart-so-4-be-found-when-compiling-the-cuda-samples-under-ubuntu)
Subscribe to:
Posts (Atom)