We can think of a Tensor that implements Automatic Differentation as a regular tensor that have, among other attributes, the following attribites that helps to capture its history:
inputs: list of inputs such as tensors and scalars that created the output tensor. Leaf tensors have no inputsoperation: function that was applied on the inputs to create the output tensor. Leaf tensors have no operationsdata: the output tensor from applying the operation on the inputsrequires_grad: whether we want to track the history of the computations when creating new tensors. If False, inputs and operation attributes will be set toNone
Therefore, to detach a tensor from a computation graph Or if we don’t want to track temporary computations done on a tensor (such as during inference), we can do the following:
Tensor.detach()returns new tensor sharing the same underlying storage but make it a leaf tensor. Therefore:inputswill be set toNoneoperationwill be set toNonerequires_gradwill be set toFalse
with torch.no_gradlet’s us perform computations on tensors w/o tracking those computations- Operating directly on
Tensor.dataavoids recoding the operations. Useful when updating a tensor or for initialization
a = torch.tensor([[1., 2.]], requires_grad=True)
a #=> tensor([[1., 2.]], requires_grad=True)
# The following will record the computation
b = a + 1
b.grad_fn #=> <AddBackward0 at 0x7f81abadc8b0>
# All the following forms allow us to avoid recording the computation on the tensor
b = a.data + 1
b.grad_fn #=> None
b.requires_grad #=> False
b = a.detach() + 1
b.grad_fn #=> None
b.requires_grad #=> False
with torch.no_grad():
b = a + 1
b.grad_fn #=> None
b.requires_grad #=> False
# Useful for updates/initialization because it keeps requires_grad attribute
a.data = torch.randint(10, size=(1, 2))
a.grad_fn #=> None
b.requires_grad #=> True