前言

在pytorch中，我们经常需要测量某个代码段，某个类，某个函数或者某个算子的运行时间，以判断整个模型的速度瓶颈所在，本文介绍一些常用的方法。

$\nabla$ 联系方式：
e-mail: [email protected]
QQ: 973926198
github: https://github.com/FesianXu

一般来说，我们可以用两大类方法进行代码段测量：

timeit，相当于测量代码开始时刻和结束时刻，然后求差。
profile，一些pytorch自带或者第三方的代码耗时工具。

timeit

这种工作方式类似于下面的代码：

import time 
begin = time.clock()
run_main_code()
end = time.clock()

print(end-begin) 
# time consuming

然而，pytorch的代码经常会运行在GPU上，而在GPU上的运行都是异步的，意味着采用一般的timeit操作不能准确地得到运行时总和，因此我们一般需要用pytorch内置的计时工具和同步工具，代码如[1]：

start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)

start.record()
z = x + y
end.record()

# Waits for everything to finish running
torch.cuda.synchronize()

print(start.elapsed_time(end))

profile

timeit的方法测试一些小代码还勉强适用，但是在大规模的测试中显然会变得很麻烦，当然，你可以通过添加修饰器的方式[2]，去简化一行行重复人工添加这些时间测量代码的枯燥，但是这也并不是最好的解决方案。

幸运的是，pytorch自带了计算模型每个部分耗时的工具，其既可以计算cpu耗时，也可以计算gpu耗时，很实用。这个神奇的工具叫做profile [3]，在pytorch.autograd里面，使用方式很简单，

x = torch.randn((1, 1), requires_grad=True)
with torch.autograd.profiler.profile(enabled=True) as prof:
	for _ in range(100):  # any normal python code, really!
    	y = x ** 2
print(prof.key_averages().table(sort_by="self_cpu_time_total"))

Reference

[1]. https://discuss.pytorch.org/t/how-to-measure-time-in-pytorch/26964/2
[2]. https://blog.csdn.net/LoseInVain/article/details/82055524
[3]. https://pytorch.org/docs/stable/autograd.html?highlight=autograd%20profiler#torch.autograd.profiler.profile

测量pytorch代码段的运行时间

前言

timeit

profile

Reference

猜你喜欢