torch.profiler

什么是torch.profiler

PyTorch Profiler 是一个工具，它允许在训练和推理期间收集性能指标。Profiler 的上下文管理器 API 可用于更好地了解哪些模型操作最昂贵，检查它们的输入形状和调用堆栈，研究设备内核活动并可视化执行跟踪。

性能指标：例如内存使用、CPU 和 GPU 使用、操作时间等。

profiler.profile

torch.profiler.profile(*, activities=None, schedule=None, on_trace_ready=None, record_shapes=False, profile_memory=False, with_stack=False, with_flops=False, with_modules=False, experimental_config=None, use_cuda=None)

API 的参数如下：

activities：要使用的活动组列表。支持的值为 torch.profiler.ProfilerActivity.CPU 和 torch.profiler.ProfilerActivity.CUDA。默认值为 ProfilerActivity.CPU 和 (如果可用) ProfilerActivity.CUDA。
schedule：一个可调用对象，它以步数 (int) 作为单个参数，并返回 ProfilerAction 值，该值指定在每个步骤执行的 profiler 操作。
on_trace_ready：一个可调用对象，它在 schedule 在 profiling 期间返回 - ProfilerAction.RECORD_AND_SAVE 时，会在每个步骤被调用。
record_shapes：是否保存操作的输入形状信息。
profile_memory：是否跟踪张量内存分配/释放。
with_stack：是否记录操作的源信息 (文件和行号)。
with_flops：是否使用公式估计特定操作 (矩阵乘法和 2D 卷积) 的 FLOPs (浮点操作数)。
with_modules：是否记录操作的调用堆栈中对应的模块层次结构 (包括函数名)。例如，如果模块 A 的 forward 调用了模块 B 的 forward，其中包含一个 aten::add 操作，那么 aten::add 的模块层次结构为 A.B。请注意，此功能目前仅支持 TorchScript 模型，不支持 eager 模式模型。
experimental_config：Kineto 库功能使用的一组实验性选项。请注意，不保证向后兼容性。
use_cuda：是否使用 CUDA。如果为 None，则会根据可用性自动选择使用 CUDA 或 CPU。

示例

with torch.profiler.profile(
    activities=[
        torch.profiler.ProfilerActivity.CPU,
        torch.profiler.ProfilerActivity.CUDA,
    ]
) as p:
    code_to_profile()
print(p.key_averages().table(
    sort_by="self_cuda_time_total", row_limit=-1))