PREPRINT

Implementation of reinforcement learning in chemical reaction networks: application to phototaxis as curiosity-driven exploration

Apr, 2026

Author

Tang, Ruyi and Sergeant-Perthuis, Grégoire and Colliaux, David

Abstract

Living systems navigate environments using noisy and incomplete sensory signals. In unicellular algae, phototaxis is often modeled as a mechanistic run-tumble process driven by stimulus-response rules. However, such descriptions overlook how organisms actively sample their environment to reduce sensory ambiguity. From a minimal cognition perspective, we reframe this navigation as a subjective, information-driven sensorimotor process. To this end, we propose a framework linking a Partially Observable Markov Decision Process (POMDP) with biochemical reaction dynamics. Environmental variables are hidden, while the cell maintains a minimal internal state updated from the current observation through a memoryless Bayesian reweighting step. These internal dynamics balance orienting toward light with exploratory reorientation and can be implemented through Chemical-Reaction-Network Ordinary Differential Equations (CRN-ODEs), showing how biochemical processes can physically realize the required informationprocessing mechanisms. Our model includes a biophysical observation process for photoreception and a chemically computable polynomial bound on information gain. Using Inverse Reinforcement Learning (IRL) on 30 experimentally recorded Chlamydomonas trajectories, we infer the behavioral objective consistent with observed phototactic motion and benchmark the resulting dynamics with standard Stochastic Simulation Algorithm (SSA) baselines. Our model reproduces the empirical distribution of alignment with the light source and achieves alignment statistics comparable to objective SSA baselines on this dataset. Within this framework, run-tumble alternation emerges as an information-acquisition strategy: tumbling reorients the cell to sample new sensory configurations and resolve sensor ambiguity, demonstrating how intracellular biochemical networks can support adaptive information-seeking behavior in cellular navigation.