版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_24990189/article/details/89708838
目录
Be carefule when using CPU-based timing calls to measure CUDA activity:
CUDA Timing Functions with CUDA Event API:
1.Profiling tools:
CPU:
Interl Vtune Amplifer XE、GNU gprof
GPU:
NVIDIA Visual Profier 、NVIDIA Nsight for CUDA code
2.CPU Timing Functions:
Use high-precision OS calls:
gettimeofday() in Linux
QueryPerfromanceCounter() in Windows
Be carefule when using CPU-based timing calls to measure CUDA activity:
CUDA activity is often asynchronous(kernel launches , asynchronous memory coppies)
Need proper synchronization before results are meaningful
CUDA Timing Functions with CUDA Event API:
cudaEvent_t start,stop;
float elapsed;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start, 0);
fool_kernel <<< grid , block >>> ();
cudaEventRecord(stop , 0);
#cudaEventSynchronize() is required since cudaEventRecord() is asynchronize
cudaEventSynchronize(stop);
cudaEventElapseTime(&elapsed , start , stop);
printf("Elapsed time %f (second)\n", elapsed/1000)
3.NVIDIA Visual Profiler
概述:
Unified CPU and GPU Timeliine
GUided Performance Analysis
For Linux、Mac OS X and Windows
NVIDIA Nsight provides similar performance informance
使用:
wlsh@wlsh-ThinkStation:~$ nvvp