# PyOpenCL中的时间测量[英] Time measuring in PyOpenCL

### 问题描述

```t1 = time()
event = mykernel(queue, (c_width, c_height), (block_size, block_size), d_c_buf, d_a_buf, d_b_buf, a_width, b_width)
event.wait()
t2 = time()

compute_time = t2-t1
compute_time_e = (event.profile.end-event.profile.start)*1e-9
```

```compute (host-timed) [s]: 0.0009386539459228516
compute (event-timed) [s]:  9.4528e-05
```

## 推荐答案

• Kernel Execution Time: 0.1ms // event-timed
• Transfer Time: 0.8ms // (host-timed - event-timed)
• Total Time: 0.9ms // host-timed

• Kernel Execution Time: 900ms
• Transfer Time: 0.8ms
• Total Time: 900.8ms

## 其他推荐答案

### 问题描述

I am running a kernel using PyOpenCL in a FPGA and in a GPU. In order to measure the time it takes to execute I use:

```t1 = time()
event = mykernel(queue, (c_width, c_height), (block_size, block_size), d_c_buf, d_a_buf, d_b_buf, a_width, b_width)
event.wait()
t2 = time()

compute_time = t2-t1
compute_time_e = (event.profile.end-event.profile.start)*1e-9
```

This provides me the execution time from the point of view of the host (compute_time) and from the device (compute_time_e). The problem is that this values are very different:

```compute (host-timed) [s]: 0.0009386539459228516
compute (event-timed) [s]:  9.4528e-05
```

Does anyone knows what can be the reason for this differences? And more important, which one is more accurate?

Thank you.

## 推荐答案

Both those numbers look right to me. If I am reading this correctly, the host is measuring about 10x the device time - which is not super strange for a small kernel because it includes transfer time latency. Your host time measures communicating through the PCB but your device time is just measuring an on-chip operation.

I think your program timing breaks down like this:

• Kernel Execution Time: 0.1ms // event-timed
• Transfer Time: 0.8ms // (host-timed - event-timed)
• Total Time: 0.9ms // host-timed

If you are curious about the situation, try running a kernel that takes much longer on the device. You should start see these numbers match up much more closely as the fixed transfer time becomes less of the overall time.

For example:

• Kernel Execution Time: 900ms
• Transfer Time: 0.8ms
• Total Time: 900.8ms

## 其他推荐答案

You can learn pretty much from Intels site on OpenCL. It states, that event.profile only gives a hint on the pure hardware execution time of the kernel and leaves out the data transfer times (which is included in your first measurement). Therefore the host-side wall-clock time might return different results. However, it is also stated that if you aim the kernel to the CPU as an OpenCL device, the time difference should become lower (or even negligible).