1) binary utilities
http://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#axzz3SUgZUVcD
binary in elf-format
nvcc embeds cubin files into the host executable file
they can be generated separately by using "-cubin" option
cuobjdump: cubin and host binaries
nvdisasm: cubinfiles , which can support control flow analysis and output
For example, I have matrix multiplication app called matrixMul.
% cuda elf sections
cuobjdump -elf matrixMul
%cuda assembly
cuobjdump -sass matrixMul
%extract ptx from elf
cuobjdump matrixMul -ptx -sass
% list different cubin files for different architecture
cuobjdump a.out -lelf
% extract all the cubins from the binary
cuobjdump matrixMul -xelf all
Assume, I want to analysis the architecture with cuda capability 3.0.
The previous cubin is matrixMul.sm_30.cubin
% extract the control flow graph of a kernel
nvdisasm -cfg matrixMul.sm_30.cubin
% to generate DOT graph description language
sudo apt-get install graphviz
nvdisasm -cfg matrixMul.sm_30.cubin | dot -o cfg.png -Tpng
% to shwo the register liveness range information
nvdisasm -plr matrixMul.sm_30.cubin
Sunday, February 22, 2015
Monday, February 16, 2015
Instruction Level Parallelism and Thread Level Parallelism
A good tutorial from Prof. John Owen from UCD.
http://www.nvidia.com/content/cudazone/cudau/courses/ucdavis/lectures/tlp1.pdf
TLP: many restaurants with one boss and one chef
ILP: one restaurant with one boss and many chefs
http://www.nvidia.com/content/cudazone/cudau/courses/ucdavis/lectures/tlp1.pdf
TLP: many restaurants with one boss and one chef
ILP: one restaurant with one boss and many chefs
Interesting matrix multiplication in CUDA 7.0 SDK
In the CUDA 7.0 SDK, for the matrix multiplication benchmark, the input A is 320 x 320 and input B is 640 x 320. It calculates output C using A x B!
(320 x 320) x (640 x 320)
A B
It doesn't make sense!
(320 x 320) x (640 x 320)
A B
It doesn't make sense!
Sunday, February 15, 2015
Exercises using gpuocelot
http://www.ieap.uni-kiel.de/et/people/kruse/tutorials/cuda/tutorial01o/web01o/tutorial01o.html
Subscribe to:
Posts (Atom)