How to measure time in NVIDA CUDA?

This post will show you some points about how to measure time in Cuda.

Reading the documentation about Cuda you could find two ways:

  • cutStartTimer(myTimer)
  • Events

Events are a bit more sophisticated and, if your code uses asynchronous kernels, you must to use it. But, how could you know if a code has an asynchronous kernel or not?

To let a code be asynchronous the programmer must create streams with the input data and transfers it to the device using the instruction:

cudaMemcpyAsync

In conclusion, if in the code there is not any instruction like ‘cudaStreamCreate’ and ‘cudaMemcpyAsync’ you cold assume that your code is synchronous (simplifying the measurements).

Measuring with the cut{Start|Stop}Timer

It is very important to use the instruction cudaThreadSynchronize() to avoid erroneous measurements.

The code is bellow:

 uint kernelTime;
  cutCreateTimer(&kernelTime);
  cutResetTimer(kernelTime);

  cudaMalloc((void **) &a_d, size);   // Allocate array on device
  cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
  cutStartTimer(kernelTime);
  // Do calculation on device:
  square_array <<< n_blocks, block_size >>> (a_d, N);
  cudaThreadSynchronize();
  cutStopTimer(kernelTime);

  // Retrieve result from device and store it in host array
  cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
  cudaFree(a_d);
  printf ("Time for the kernel: %f ms\n", cutGetTimerValue(kernelTime));

The output:

[ivan@machine]$ ./timer
Device name : Tesla C2050
Time for selecting the device: 3423.731934 ms
Time for the kernel: 0.068000 ms

Measuring with events

The events are more precise and an example of use is bellow:

  cudaEvent_t start, stop;
  float time;
  cudaEventCreate(&start);
  cudaEventCreate(&stop);

  cudaMalloc((void **) &a_d, size);   // Allocate array on device
  cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
  cudaEventRecord(start, 0);
  // Do calculation on device:
  square_array <<< n_blocks, block_size >>> (a_d, N);
  cudaEventRecord(stop, 0);
  cudaEventSynchronize(stop);

  // Retrieve result from device and store it in host array
  cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
  cudaFree(a_d);
  cudaEventElapsedTime(&time, start, stop);
  printf ("Time for the kernel: %f ms\n", time);

The ouput:

[ivan@machine]$ ./event
Device name : Tesla C2050
Time for selecting the device: 3819.466064 ms
Time for the kernel: 0.041632 ms

In short

If the code is asynchronous you could use any of the two ways introduced before. If the kernel is a bit more complex the differences between them are smaller:

[ivan@garoe tiempo]$ ./timer
Device name : Tesla C2050
Time for selecting the device: 3824.726074 ms
Time for the kernel: 1.619000 ms

[ivan@garoe tiempo]$ ./event
Device name : Tesla C2050
Time for selecting the device: 3859.903076 ms
Time for the kernel: 1.600992 ms
Anuncios

12 comentarios sobre “How to measure time in NVIDA CUDA?

  1. The cuda calls are asynchronous as I understand. Then how can we use cudaEventRecord to measure the time just after the kernel call in main?

    Thanks,
    Pushkar

  2. Sorry Pushkar but I haven’t had any free time to spend in this blog 😦
    Thanks “Sales Manager” but I had said, I would need more time here just to try to maintain some level of quality.

  3. I rarely create comments, however after browsing a bunch of
    comments on How to measure time in NVIDA CUDA? Ivans blog.

    I do have a couple of questions for you if you tend not to mind.
    Is it simply me or does it look like like some of the
    remarks come across like they are written by brain dead people?

    😛 And, if you are writing at other online sites, I would like to
    keep up with you. Would you post a list of all of all your public sites like
    your Facebook page, twitter feed, or linkedin profile?

  4. I am using cut Timer. Fow some codes, it is giving correct result. For others, its giving wrong result. Know it is wrong because it is not matching with nvpp results. From nvpp timeline, time for all iterations is different than what is shown by cut Timer. Any help?

  5. Its such as you learn my thoughts! You seem to grasp a lot
    about this, like you wrote the e book in it or something.

    I believe that you can do with a few p.c. to drive
    the message house a little bit, but instead of that, this is wonderful blog.
    A fantastic read. I will definitely be back.

  6. Finally, we must separate responsibilities between the Board and then left to the Educators to implement.

    If you are a commercial property owner, your increase will be substantial and you will most likely pass a large
    portion of that onto the dexter missouri motels consumer base.

    I assume this is the Board Secretaries job to keep complete minutes,
    but found no cars and saw no lights.

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s