Why Cannot backpropagate more than once Happens in RNN/LSTM Training
Why Cannot backpropagate more than once Happens in RNN/LSTM Training
When training an RNN/LSTM with a custom autograd engine, you may see this error:
Exception: Cannot backpropagate more than once
This usually appears when hidden state is reused across batches without detaching it from the previous computation graph.
The Symptom
The following pattern works:
for batch_i in range(len(input_batches)):
hidden = (
Tensor(hidden[0].data, autograd=True),
Tensor(hidden[1].data, autograd=True),
)
for t in range(len(input_batches[batch_i])):
# forward through time
...
loss.backward()
But if you only re-wrap hidden once before for i in range(iterations):, training can fail with:
Cannot backpropagate more than once
Root Cause
hidden is not just numeric data. It also carries autograd history (the graph).
If you do this at the start of every batch:
hidden = (
Tensor(hidden[0].data, autograd=True),
Tensor(hidden[1].data, autograd=True),
)
you effectively detach from the old graph:
- keep tensor values
- drop graph history
So each loss.backward() only walks the current batch graph.
If you do it only once before the outer iteration loop:
- Batch 1 finishes, and new
hiddenis now connected to Batch 1 graph. - Batch 2 reuses that
hidden, chaining Batch 2 graph onto Batch 1 graph. - Calling
backward()again tries to traverse nodes/edges already backpropagated. - Your autograd engine detects repeated backward on the same path and raises the exception.
Correct Mental Model
The key is not whether detach is inside for i or outside it.
The key is:
Before each new sequence segment that will call
backward(), detachhiddenfrom the old graph.
This is exactly truncated BPTT behavior.
Practical Rule
- Across time steps (
t) in the same segment: keephiddenconnected. - Across batches (or before the next
backward()): detachhidden.
Example helper style:
h = Tensor(h.data, autograd=True)
c = Tensor(c.data, autograd=True)
or provide a dedicated detach() method for clarity.
Takeaway
If you see Cannot backpropagate more than once in recurrent training, first inspect hidden-state lifecycle. In most cases, detaching hidden state at the right boundary (between backward passes) resolves the issue cleanly.