Safety-efficiency trade-off

This work was presented at COSYNE 2021.

An extended version of this work with human virtual reality experiments is now available on bioRxiv.

Across species, survival depends on harvesting essential rewards such as food in the face of potential, occasionally catastrophic, dangers (such as serious injuries). This presents the challenge as to how to remain as efficient as possible when it comes to reward acquisition, whilst avoiding accrual of damage, especially during early exploration of new environments. This safety-efficiency dilemma is a special case of the exploration-exploitation dilemma, which typically collapses punishment and reward into a single scalar signal, whereby early losses can be overcome by later gains. However, the agent might want to consider losses separately in scenarios where damage can accrue and potentially lead to death. One approach is to keep value learning systems for reward and punishment separate, and only integrate them to make choices. Indeed the brain appears to adopt this strategy by overlaying a Pavlovian fear system atop an instrumental reward system. In a series of grid-world simulations, we show here how this emulates a multi-attribute reinforcement learning system that promotes safe learning, especially in early exploration. But it also introduces the problem of how to arbitrate when Pavlovian avoidance actions do not align with that of the reward-oriented instrumental system. Indeed we show that different learning environments may require a different balance of rewards and punishment, which implies the best strategy is to have a flexible Pavlovian fear commissioning scheme. We propose a model for doing this in which Pavlovian ‘fear’ actions are gated by uncertainty: and show that this enhances safety in early exploration and hence improves safety-efficiency trade-offs. In conclusion, our work shows how safe exploration can be achieved by a flexible Pavlovian fear system without too much cost to efficiency. One implication of this is that inflexibility of the fear commissioning parameter could lead to maladaptive anxiety, depression and chronic pain.

Comparison videos Fear-Avoidance environments