What Role Does Striatal Dopamine Play in Goal-directed Action?

—Evidence suggests that dopamine activity provides a US-related prediction error for Pavlovian conditioning and the reinforcement signal supporting the acquisition of habits. However, its role in goal-directed action is less clear. There are currently few studies that have assessed dopamine release as animals acquire and perform self-paced instrumental actions. Here we brieﬂy review the literature documenting the psychological, behavioral and neural bases of goal-directed actions in rats and mice, before turning to describe recent studies investigating the role of dopamine in instrumental learning and performance. Plasticity in dorsomedial striatum, a central node in the network supporting goal-directed action, clearly requires dopamine release, the timing of which, relative to cortical and thalamic inputs, determines the degree and form of that plasticity. Beyond this, bilateral release appears to reﬂect reward prediction errors as animals experience the consequences of an action. Such signals feedforward to update the value of the speciﬁc action associated with that outcome during subsequent performance, with dopamine release at the time of action reﬂecting the updated predicted action value. More recently, evidence has also emerged for a hemispherically lateralised signal associated with the action; dopamine release is greater in the hemisphere contralateral to the spatial target of the action. This eﬀect emerges over the course of acquisition and appears to reﬂect the strength of the action-outcome association. Thus, during goal-directed action, dopamine release signals the action, the outcome and their association to shape the learning and performance processes necessary to support this form of behavioral control. (cid:1) 2024 The Author(s). Published by Elsevier Inc. on behalf of IBRO. ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.org/licenses/by/ 4.0/).

Forms of action variously described as voluntary, intentional, volitional or goal-directed are often defined by the way they differ from those that are involuntary, reflexive or habitual.Whereas the latter are typically labelled as automatic or as 'elicited' by various internal and external stimuli, the former are thought to depend on some form of deliberative process relating the action to the identity and desirability of its causal consequences.Similarly, whereas the aim of a habit is to produce a relatively invariant motor movement, that of a goal directed action is reflected in its flexibility; i.e., it adapts to change the environment in the service of our basic needs and desires using whatever behavioral means is best suited to that aim (Dickinson andBalleine, 1994, 1993).
These forms of action control are also distinguished at a neural level.Although some early views suggested goal-directed and habitual actions are mediated by cortical and subcortical processes, respectively (Fuster, 2002;Bar-Gad et al., 2003;Daw et al., 2005;Rushworth et al., 2011), it is now generally accepted that they are instantiated within parallel cortical-basal ganglia circuits that are broadly similar in their structure but that differ markedly in the brain regions that they connect (Alexander et al., 1990;Balleine, 2005).Thus, in rodents, whereas habits are mediated by a circuit linking the motor cortices with the dorsolateral 'motor' striatum (DLS), the circuit mediating goal-directed actions links the prefrontal cortex with the posterior dorsomedial or 'associative' striatum (pDMS) (Balleine and O'Doherty, 2010;Balleine, 2019).Interestingly, although various mechanisms have been proposed stipulating how stimulus-response learning is instantiated in the DLS to control habits (Reynolds et al., 2001;Shindou et al., 2019), no neural mechanism has been proposed that links the psychological content of goal-directed actions, their representation of contingency and value, to the neural determinants of goaldirected learning and performance.Nor have the learning rules (either computational or biological) governing the encoding of the specific action-outcome associations underlying goal-directed action been settled (Perez and Dickinson, 2020;Morris et al., 2022).
In this brief review, we outline the case for investigating dopamine release in the pDMS as one means of establishing the rules controlling the integration of contingency and value encoding.As described in what follows, release at the time a goaldirected action is performed has been found to reflect a combination of ongoing estimates of action-related reward predictions and outcome-generated feedback integrated with hemispherically-divergent signals reflecting the strength of the action-outcome contingency (Hart et al., 2024).In this way, dopamine release in the pDMS can signal fluctuations in action and outcome value as well as the long-term actionoutcome relationship (contingency) to appropriately modulate the plasticity of the striatal projection neurons that mediate the performance of goal-directed actions (Peak et al., 2019(Peak et al., , 2020)).Here we describe recent evidence supporting these claims and consider the implications of these ideas for broader theories of goal-directed action control.

THE PSYCHOLOGICAL AND NEURAL BASES OF GOAL-DIRECTED ACTIONS
What are goal-directed actions?It has, of course, been understood for some time, largely from studies in rodents, that actions acquired in instrumental conditioning situations can generally satisfy the criteria for goaldirected action, at least early in training or when training involves exposure to multiple action-outcome relationships (Balleine and Dickinson, 1998a).Evidence came first from studies using outcome devaluation and contingency degradation tests to assess the rats' sensitivity to changes in outcome value and the causal consequences of various actions, respectively (Adams and Dickinson, 1981;Colwill and Rescorla, 1985;Dickinson and Mulatero, 1989;Balleine and Dickinson, 1998a).In outcome devaluation, the value of a specific outcome is reduced after training -generally using either taste aversion learning or sensory-specific satiety (Balleine and Dickinson, 1998b) -and prior to a test conducted in extinction, so as to assess the influence of the change in value on the performance of the action in the absence of any feedback.In this test rats have been found selectively to reduce the performance of actions associated with the devalued outcome relative to actions associated with other outcomes, demonstrating that: (i) rats can encode the specific relationship between an action and its consequences and (ii) performance is directly controlled by outcome value.
Contingency degradation assesses, as its name suggests, the sensitivity of actions to changes in their relationship to specific outcomes and serves, in nonhuman animals, as a proxy for the manipulation of action-outcome causality.After a period of training on one or many action-outcome relationships, this test usually involves maintaining the delivery of the contingent outcomes while also delivering either the same or a different outcome non-contingently.Thus, when the contingent and non-contingent outcomes have the same identity and are delivered at the same rate, the probability of earning that outcome is the same whether the action is performed or not.Generally, in this situation, rats selectively reduce the performance of actions earning the non-contingent outcome but only when it is the same as that earned contingently; if the non-contingent outcome differs from the earned outcome, then responding remains unchanged (Hammond, 1980;Colwill and Rescorla, 1986;Dickinson and Mulatero, 1989).
In both types of test, therefore, the instrumental actions of rats are exquisitely sensitive to specific manipulations of contingency and value with actions earning the devalued or the non-contingent outcome reduced while other actions remain unaffected.Rats can clearly encode highly specific action-outcome associations and use them to decide on a course of action and, despite investigations of alternative interpretations of these phenomena over the course of several decades (Rescorla, 1992;Shin et al., 2010), support for the general claim that rats are capable of goaldirected action has endured.Indeed, subsequent investigations have proven fruitful in establishing other important features of goal-directed learning.For example, to summarize briefly, these studies have shown such learning is rapidly acquired, within a single session of training (Hart and Balleine, 2016;Bradfield et al., 2020); can resist the influence of extinction (Rescorla, 1996) or overtraining (Colwill and Rescorla, 1988) when multiple actionoutcome associations are trained (Kosaki and Dickinson, 2010); involves encoding both the outcomes that actions produce and those that are foregone, the latter being a form of counterfactual encoding (Laurent and Balleine, 2015); and involves a form of error correction learning: in both rats and humans signaling the noncontingent outcome in a contingency degradation situation is sufficient to block the degradation effect (Dickinson and Charnock, 1985;Colwill and Rescorla, 1986;Morris et al., 2022).
The neural bases of goal-directed action.Studies have used the outcome devaluation and contingency degradation tests to establish evidence for goal-directed control in other species, including mice (Wiltgen et al., 2007), humans (Tanaka et al., 2008;Morris et al., 2015) and non-human primates (Izquierdo et al., 2004;Jackson et al., 2016), as well as some birds (Clayton and Dickinson, 1998).These tests have also been combined with various pharmacological, surgical, chemogenetic and optogenetic manipulations in rodents and with neuroimaging in humans in a bid to establish the neural circuits mediating this capacity.Early studies found evidence of cortical involvement, notably the prelimbic prefrontal cortex (area 32) during initial acquisition of action-outcome learning and the insular cortex during performance following changes in value (Balleine and Dickinson, 1998a).Subsequent investigation of prefrontal target structures found that various manipulations of the posterior dorsomedial striatum, but not anteriordorsomedial (Yin et al., 2005) or lateral (Yin et al., 2004) regions, abolished both the acquisition and retrieval of specific action-outcome associations.
Recent research has considerably deepened our understanding of this corticostriatal circuit.We have learned that the prelimbic projection neurons critical for pDMS encoding are intratelencephalic (IT) neurons (Hart et al., 2018;Fisher et al., 2020;Balleine et al., 2021).Unlike other cortical projection neurons, IT neurons project bilaterally to the striatum where they target the principal striatonigral, direct pathway spiny projection neurons (dSPNs) and striatopallidal indirect pathway neurons (iSPNs) in a more-or-less unbiased fashion (Kress et al., 2013;Fisher et al., 2020).The functional role of IT neurons depends on a limbic-cortical-striatal circuit (Fisher et al., 2020) that drives cortical glutamate release in the striatum and plasticity predominantly at dSPNs during initial learning (Matamales et al., 2020;Peak et al., 2020).Importantly, using ex-vivo patch clamp electrophysiology, we have found that, regardless of the specific action involved, goal-directed learning-related plasticity at dSPNs emerges bilaterally within the pDMS (Fisher et al., 2020).Thus, unlike movement-related activity in motor cortex (Sanes and Donoghue, 2000) and motor regions of the striatum (Xiong et al., 2015), the initial changes in plasticity at cortico-striatal synapses in the pDMS associated with goal-directed learning are not hemispherically lateralized, in line with the role of this structure in learning to control environmental events rather than motor movements (Tai et al., 2012;Cox and Witten, 2019;Peak et al., 2019).
Modulatory neurotransmitters in the striatum.It is, however, important to recognize that glutamate release from presynaptic cortical inputs is not sufficient for dSPN plasticity.In the past, elegant electrophysiological and pharmacological studies have demonstrated that a complex temporal arrangement of pre-synaptic and post-synaptic activity is required together with the contiguous activity of specific neuromodulators involving, most prominently, the local release of acetycholine and dopamine (Pawlak and Kerr, 2008;Shen et al., 2008;Reynolds et al., 2022).This ordered binding at glutamatergic and dopamine receptors on dSPNs initiates a number of intracellular signaling cascades resulting in transcriptional changes supporting subsequent alterations in synaptic connectivity and neural activity within this cortical-basal ganglia circuit (Matamales and Girault, 2011;Shiflett and Balleine, 2011).Within this circuit, dSPNs project directly to the substantia nigra pars reticulata (SNr) and express D1 dopamine receptors, which are Gs coupled and so, when they bind dopamine, increase dSPN excitability (Gerfen and Surmeier, 2011).When dopamine release is increased, therefore, plasticity at and the activity of dSPNs and the stiatonigral pathway will increase, encouraging learning about, and performance of, specific actions depending on the value of their consequences.In contrast, to date, changes in plasticity at iSPNs during the acquisition of goal-directed actions have not been found (Shan et al., 2014;Fisher et al., 2020).However, their activity increases when conditions change; i.e., when an ongoing action-outcome contingency is altered by either a change in the identity (in outcome identity reversal studies) or the rate of occurrence (say, in extinction) of the specific consequences with which an action is associated (Matamales et al., 2020;Peak et al., 2020;Balleine et al., 2021).These findings suggest, therefore, that iSPNs are selectively engaged during the modulation or updating of goal-directed learning.
iSPNs express dopamine D2 receptors, which are Gi coupled, meaning that dopamine binding inhibits iSPN activity, and so iSPNs are most active when dopamine release is reduced (Gerfen and Surmeier, 2011).However, although the involvement of iSPNs in updating action-outcome learning suggests that this function depends on the indirect striatopallidal pathway, recent evidence suggests that iSPNs can modulate dSPN plasticity locally within the pDMS (Matamales et al., 2020).This modulatory process is likely quite complex, potentially involving the interaction of iSPNs with local interneurons, particularly the cholinergic interneurons within the pDMS.Under the influence of the thalamostriatal pathway from the parafascicular nucleus to the pDMS, the tonic firing of these interneurons is modified to a burst-pause pattern when action-outcome contingencies change, resulting in a potent increase and then reduction in local acetycholine release (Becchi et al., 2023).Importantly, acetylcholine is a powerful local modulator of dopamine release from midbrain axons in the striatum (Liu et al., 2022), and so this cholinergic fluctuation can also indirectly regulate the activity of iSPNs (Threlfell et al., 2012;Tanimura et al., 2019).Therefore, dopamine rich conditions increase the excitability of dSPNs by glutamate and bias the system towards LTP driving new learning.Conversely, dopamine-lean conditions favor activation of iSPNs and, potentially, local inhibition within the striatum.If, as a consequence, discrete regions of excitation and inhibition emerge within the pDMS, then updating will be facilitated in regions of high dopamine and resisted in those with low.And, indeed, evidence suggest that both dopamine depletion and iSPN specific lesions block the updating of goal-directed learning (Lex and Hauber, 2010;Matamales et al., 2020).
Despite these avenues of research, the dynamics of dopamine release implied in this account have not been directly assessed as animals encode and update the action-outcome associations for goal-directed actions.Therefore, the suggestion that fluctuations in dopamine are sufficiently rapid and precise to modulate the cellular activity supporting goal-directed learning and performance remains speculative.
Nevertheless, recently, evidence has started to accumulate that dopamine may play just such a role.Below, we outline those few studies currently relevant to this suggestion, providing our case for a new unifying perspective: that pDMS dopamine release is shaped to reflect the contingency and predicted value of specific actions in a manner that could modulate the highly selective plasticity processes necessary to support goal-directed action.

MOVEMENT, MOTIVATION AND LEARNING: THE DIVERSE ROLES OF STRIATAL DOPAMINE
Our understanding of the role of striatal dopamine has developed significantly in recent years.The earliest interpretations related to its role in movement, largely driven by the profound impairments in motor activity that resulted from lesions of midbrain dopamine neurons (Ungerstedt, 1971) together with the elicitation of contralateral turning behaviour by unilateral stimulation of dopamine release (Arbuthnott and Crow, 1971) and the obvious motor symptoms associated with Parkinson's disease.However, even during this early period, it was recognized that the precise nature of these dopamine-related motor impairments is complex, and, paradoxically, motor function could still be achieved under conditions of high motivational drive (such as running in an emergency), despite dopamine depletion (Glickstein and Stein, 1991).Such observations seeded theories that paired dopamine with motivational arousal, bolstered by findings that midbrain dopamine neurons in primates showed increased phasic firing in response to stimuli that invigorate behavior (Schultz, 1986;Schultz and Romo, 1990).
Subsequent research employing electrophysiological recordings of dopamine neurons in behaving primates was foundational in driving a radical shift in our understanding of dopamine function, revealing that phasic firing of dopamine neurons during Pavlovian conditioning accords with formal theories of learning in signaling what have come to be called, somewhat erroneously, ''reward prediction errors" (RPEs) (Schultz et al., 1997;Waelti et al., 2001).Pavlovian conditioning involves pairing an initially neutral stimulus with an unconditioned stimulus (US), not with a 'reward' (one does not reward stimuli), and so, when there is a prediction error, it is with regard to the magnitude of the US that is anticipated based on the conditioned stimulus.In these studies, dopamine neurons were found to show increased phasic firing in response to an unpredicted US which declined as it became better predicted by a conditioned stimulus (CS) and to stop altogether when the now predicted US was unexpectedly omitted.In addition, as the US prediction improved, phasic activity appeared to shift to the earliest predictive CS as the Pavlovian (stimulusoutcome) relationship developed.This RPE signal was determined by the degree to which the Pavlovian cue predicted reward, as demonstrated using classic blocking experiments in which learning about stimulus-outcome pairings is prevented if the outcome is already fully predicted by other co-occurring stimuli (Schultz and Dickinson, 2000;Waelti et al., 2001).Evidence for dopamine mediated RPEs in Pavlovian tasks has proven to be robust and have been argued to be necessary for Pavlovian cue-reward learning (Steinberg et al., 2013), although alternative accounts are also supported (Jeong et al., 2022).
Ironically, although a lot has been learned about the psychological, behavioral and neural bases of prediction errors in appetitive Pavlovian conditioning, reward prediction errors as they truly apply in goal-directed action are less well understood.This is driven, in part, by a discrepancy in the literature regarding the role of dopamine when measured in-vivo, compared with the behavioral effects of pharmacological dopamine modulation.Whereas neuronal recordings have widely replicated the RPE signaling profiles (Cohen et al., 2012;Hamid et al., 2016), the most obvious effect of administering dopamine antagonists or of selective lesions of midbrain dopamine neurons is on movement and response vigor with these motoric symptoms often overshadowing any potential evidence for deficits in learning independently of these effects (Beninger, 1983;Robbins et al., 1990;Ungerstedt, 1971 -although see Lex and Hauber, 2010).The evidence appears, therefore, to point in multiple directions; not just to a role in learning but also in movement and/or motivation.
These seemingly distinct operations are central to contemporary theories of dopamine function, the most influential of which fall into the category of ''unifying" frameworks, according to which dopamine contributes to multiple distinct functions, either differentiated by timescale (i.e., tonic versus phasic firing: Niv et al., 2007;Schultz, 2007), by the mechanism of release (cell firing versus local modulation: Liu et al., 2022;Liu and Kaeser, 2019;Lohani et al., 2019), by cellular effects (modulating synaptic plasticity and excitability: Berke, 2018), or by the anatomical location (e.g., in ventral tegmentum or substantia nigra and the target of those neurons in prefrontal cortex, ventral striatum or dorsal striatum: Cox and Witten, 2019;Howe and Dombeck, 2016;Parker et al., 2016;Seamans and Yang, 2004).Each of these frameworks reconciles an important feature of dopamine signaling.However, all rely to some degree on the wealth of information gleaned from studies of dopamine activity in Pavlovian conditioning preparations, and so the degree to which they can be applied to account for instrumental, and specifically goal-directed, learning remains far from clear.
This distinction is important and not merely a semantic one: Pavlovian and instrumental learning are independent processes that subserve distinct behavioral and psychological functions that are supported by different neural networks: Pavlovian conditioning relies predominantly on ventral tegmental dopamine neurons and their connections with amygdala and ventral striatum (Balleine and Killcross, 2006;Martin-Soelch et al., 2007;Dayan and Berridge, 2014) whereas instrumental learning, including the encoding of actionoutcome associations, relies on dopamine neurons in substantia nigra pars compacta with circuits focused on the dorsal striatum as described above (Hart et al., 2014;Balleine, 2019).As such, recent findings that motor-related dopamine signaling is specific to the dorsal and not ventral striatum (Howe and Dombeck, 2016), provide support for the claim that dopamine subserves different functions in different striatal regions.Nevertheless, dopamine RPE signals are also present in the dorsal striatum (Lee et al., 2019), suggesting that, even within the specific context of dorsal striatal signaling, some kind of unifying model is required.
Goal-directed learning, in which animals learn associations between actions and outcomes, provides a unique framework in which the seemingly distinct functions of dopamine signaling are in fact naturally unified; what the animal does (the action or a series or sequence of movements) forms one aspect of the content of what the animal learns at an associative level (the action), the other being related to the outcome with which the action is associated.Below, we describe recent evidence suggesting that action, outcome and action-outcome-related signals can be detected in dopamine-related activity in the dorsomedial striatum, along with recent work suggesting that these signals coexist within the dorsomedial striatum where they simultaneously convey information regarding the value of the action, the outcome and the strength of the action-outcome relationship during goal-directed learning.We propose, therefore, that, in the context of goal-directed learning, action-related dopamine signals in the dorsomedial striatum are positioned both to shape the encoding of specific action-outcome associations and the performance of specific actions based on that encoding.

DORSAL STRIATAL DOPAMINE SIGNALS FOR MOVEMENT AND LEARNING
Until recently, direct evidence that dopamine plays a role in controlling goal-directed action was very limited.Nevertheless, some substantiation of this claim has been provided by studies focussing specifically on the dorsal striatum -predominantly a region around a central point between DMS and DLS -and its primary dopaminergic input, the substantia nigra pars compacta (SNc).For example, Panigrahi et al (2015) demonstrated, using the MitoPark mouse model of Parkinson's disease, that the progressive loss of midbrain dopamine neurons caused performance in an effort-based operant task, requiring mice to adjust the vigor of their reaching movements to obtain reward, to become progressively more bradykinetic, with altered movement kinematics related to a change in the velocity of forelimb movements.These changes resulted in a loss of movement vigor, something that was ameliorated by oral administration of L-DOPA.This finding was supported by a subsequent report that rapid, phasic dopamine signalling in the dorsal striatum was related to locomotor acceleration (Howe and Dombeck, 2016), providing an important demonstration of fast, sub-second modulation of movement by phasic dopamine, an observation that is seemingly inconsistent with the ''tonic" dopamine hypothesis of movement control (Niv et al., 2007;Schultz, 2007).This study further found a ''functional topography" in dopamine signalling, with dopamine axons in the dorsal striatum preferentially signalling locomotion, and those in progressively ventral regions preferentially signalling ''reward".These conclusions were drawn from the effects of functional optogenetic stimulation of dopamine axons, which triggered bouts of locomotion if implemented in the dorsal striatum but not in the ventral striatum (Howe and Dombeck, 2016).Likewise, self-paced spontaneous movement has been subsequently shown to be directly modulated by a subset of SNc dopamine neurons, the activity of which were found to be causal to both action initiation and the vigor of future actions (Mendonc¸a et al., 2024;da Silva et al., 2018).However, when assessed in the context of instrumental actions, it is clear that dopamine activity localised to the DMS also conveys learning-related signals and, most notably, signals that could be taken to reflect the RPE.For example, reward-related dopamine release in the DMS has been found to be suppressed during predicted, relative to unpredicted, rewards following instrumental actions (Hollon et al., 2021).In this study mice were trained to make a lever press action reinforced by optogenetic stimulation of dopamine neurons in the substantia nigra.Interestingly, DMS dopamine release induced by the stimulation, measured by fast scan cyclic voltammetry, was reduced if it was induced by the action compared to unpaired stimulation and was timed to the expected point of stimulation: delayed stimulation produced an increase in dopamine activity whereas omitting stimulation caused a dip in dopamine activity.This reduction in dopamine release was also found when a sequence of actions was performed for SNc stimulation, such that the sequence and not the individual actions in the sequence diminished dopamine activity, and was also observed with natural rewards, i.e., dopamine release was reduced after contingent, relative to non-contingent, sucrose reward.Convergently, a recent investigation (Mohebi et al., 2024) found that dopamine transients to reward delivery varied in magnitude according to recent reward history, consistent with RPE signals.These signals were found across the striatum, although they differed markedly across sub-regions, perhaps reflecting differences in function or in sensitivity to the temporal dynamics of the task.Although the actual contingency controlling learning in these tasks was not assessed, this pattern of results is consistent with the authors' claim that dopamine serves as an error signal for the acquisition of such actions.
Indeed, the picture that has started to emerge is one in which dopamine conveys widespread error signals throughout the striatum, with movement signals localised to the dorsal territories.For example, Tsutsui-Kimura et al., (2020) used fiber photometry with a fluorescence-based calcium indicator (GCaMP7f) to record dopamine terminal activity in the DMS, dorsolateral striatum, and ventral striatum in a perceptual decisionmaking task and found widespread evidence for temporal difference (TD) error signals throughout all striatal regions.However, only DMS dopamine signals were modulated by contralateral orienting responses.Likewise, Parker et al., (2016) assessed both Pavlovian and instrumental signals using a probabilistic reversal protocol in which instrumental actions (left and right lever presses) were probabilistically paired with both a reward and reward cue (CS+) or non-reward and an alternative cue (CSÀ).Using fiber photometry with calcium imaging to measure activity in dopamine terminals in the ventral and dorsal striatum (DMS), they found that RPE signals to each CS and reward consumption signals were broadcast to both dorsal and ventral striatum.Importantly, although these signals were significantly greater in ventral striatum, only dopamine projections in the DMS showed choice selectivity, with activity in these neurons significantly greater when mice made a contralateral choice (relative to the side of recording).In fact, lateralized dopamine signals in the dorsal striatum are not exclusive to contralateral choice and have also been reported in response to contralaterally presented visual stimuli whereas ventral striatal dopamine neurons respond to both ipsilateral and contralateral stimulus presentations (Moss et al., 2021).Importantly in this context, although the majority of dopaminergic projections are ipsilateral, evidence showing functional contralateral dopamine projections provides support for bilaterally synchronised dopamine transients, especially in the dorsal striatum which appears to have significant contralateral release (Fox et al., 2016).It is likely that these contralateral projections are important for conveying learning-related error signals, which, unlike choice-related signals, are bilaterally synchronous (Lee et al., 2019;Hart et al., 2024).Together, these findings suggest that learning-related signals are transmitted widely throughout the striatum, whereas movement and choice-related signals generally appear to be exclusive to the dorsal striatum.

INTEGRATING ACTION-RELATED AND LEARNING-RELATED SIGNALS IN DMS
In the context of behavior mediated by the dorsal striatum, the question thus becomes: how are learning and actionrelated dopamine signals parsed and integrated to control goal-directed learning and performance?There is evidence for a dissociation between these signals at the level of SNc dopamine neurons themselves, with distinct populations of dopamine neurons responding to reward and movement (Howe and Dombeck, 2016;Mendonc¸a et al., 2024;da Silva et al., 2018).It is unclear, however, how these signals are disambiguated at the time of release within the dorsal striatum.One possibility is that dopamine signals are differentiated both regionally and temporally.For example, Hamid et al., (2021) observed that dopamine terminal activity and release in the dorsal striatum propagated from either medial to lateral or lateral to medial in ''waves".When rewards were earned as a consequence of behaviour they propagated medially to laterally.However, when delivered independently of behaviour and paired with a predictive cue they propagated laterally to medially.Given that these distinct patterns should alter the temporal dynamics of release across dorsal striatum, it is easy to see that they should also favor distinct functions when those functions are better or more powerfully reinforced or rewarded at specific temporal intervals.Thus, the authors suggested that wave-like dopamine release patterns could provide a learning signal (or eligibility trace) that differentiates each form of association (instrumental and Pavlovian), particularly as striatal subregions can exhibit graded functional specialization (Kasanetz et al., 2008;Klaus et al., 2017).This is in line with recent findings described above that these regional transients operate on, and are modulated across, different temporal horizons (Mohebi et al., 2024).Nevertheless, such a mechanism doesn't obviously lend itself to the dissociation of movement and associative learning signals within instrumental tasks.
This question was the focus of Lee et al., (2019) who sought to reconcile movement/action-related signals and those putatively related to RPE signals in the DMS.To examine this question, they attempted to establish whether lateralized, movement-related dopamine signals conveyed an action-specific RPE.Their hypothesis was that dopamine neurons reflect the relative value of the contralateral movement choice, driving heightened activity on trials with higher anticipatory RPE (i.e., contralateral choice value) in the contralateral hemisphere (compared to lower RPE), with the inverse occurring in the ipsilateral hemisphere, thereby generating a hemispherically lateralized dopamine signal modulated by contralateral RPEs.The authors conducted a new analysis on the data collected by Parker et al., (2016 -described above), in which both the activity of DMS-projecting dopamine neuron cell bodies and terminals were assessed across a probabilistic instrumental reversal learning task.Instead of finding a unified contralateral RPE signal, the authors found dissociable movement-and value-related signals, with lateralized dopaminergic activity associated with choice (contralateral greater than ipsilateral) modulated by a bilateral RPE signal reflecting the chosen action.Put another way, dopamine activity in the hemisphere ipsilateral to the choice movement was lower than that in the contralateral hemisphere, however, both signals scaled with the magnitude of the RPE (i.e., dopamine activity was increased in both hemispheres during actions with high RPEs).Although this analysis ultimately failed to unify movement and RPE signals, it was the first to clearly parse these two distinct signals in the DMS during an instrumental action.
More recently, we have found further evidence to support the claim that distinct dopamine signals are associated with action and reward in the DMS (Hart et al., 2024).In this series of experiments, we assessed dopamine release in rats trained using an instrumental protocol to make two instrumental responses (left and right lever presses) for distinct rewards (pellet and sucrose retrieved from a central magazine), under increasing interval or ratio schedules of reinforcement.Importantly, these experiments sought to assess dopamine signals in a purely self-paced, free-operant design, in which no overt Pavlovian cues or other signals were presented to influence action selection, in an attempt to isolate dopaminergic signals in the DMS specifically related to goal-directed instrumental actions.Dopamine release was measured using fiber photometry with the fluorescent indicator, dLight (Patriarchi et al., 2018).Consistent with Lee et al., (2019), we found clear evidence of bilateral dopamine RPEsignals during both actions and outcomes.Dopamine release was heightened bilaterally during actions with high value (high reward prediction), whereas dopamine was diminished bilaterally during actions with low reward prediction (see Fig. 1).In addition to these bilateral signals, we saw very clear evidence of lateralized action-related signals: dopamine release was greater in the hemisphere contralateral to the (left or right) lever press action (see Fig. 2(A)), which was reversed when the animals turned in the opposite direction to head back towards the central magazine (refer Fig. 1).This lateralised signal appeared to reflect the broader responsethe lever press -rather than the specific motor movement or motor effector (left or right paw) used for the press.However, this was not systematically studied and is an issue in need of further research.
When we assessed these signals over time we found that the lateralized dopamine signal during the action (lever press) emerged across training as animals learned the action-outcome association.We assessed the relationship between the action-outcome contingency and dopamine lateralization using contingency degradation and outcome identity reversal manipulations.
In one experiment, dopamine lateralization during the action was lost when the specific action-outcome relationship was degraded on one action but preserved on the non-degraded action (see Fig. 2(B)).Critically, detailed assessment of the movement kinematics indicated that this loss of lateralization was unrelated to changes in motor movement; indeed, the reduction in dopamine lateralisation emerged well before any difference in lever press performance was detected.In a separate experiment, we showed that this lateralization was reduced when the specific action-outcome association was disrupted by reversing the identity of the outcomes earned by each action (see Fig. 2(C)), which recovered with further training.Both of these experiments resulted in the loss of lateralised dopamine release in a manner that could not be explained by changes in action typography or movement kinematics, with the second study showing that these effects were observable in the absence of any change in response rate (or vigor).Importantly, we also found that outcome devaluation by specific satiety, which preserves the action-outcome contingency while modifying outcome value, did not affect this lateralised signal.
On the basis of these findings, therefore, we propose a new unifying framework for the interpretation of DMS dopamine signals during goal-directed actions (see Fig. 3).Whereas moment-to-moment fluctuations in action value are reflected in bilateral dopamine release, a second signal broadcasts the overall strength of specific actionrelationship via the degree of lateralized dopamine release; i.e., via the difference between contralateral and ipsilateral release during actions.

DMS DOPAMINE RELEASE DURING GOAL-DIRECTED ACTION: TOWARDS A NEW UNIFYING FRAMEWORK
Goal-directed learning and the control of goal-directed performance requires the integration of knowledge involving the long-range actionoutcome relationship (or outcome contingency) with the current action value based on the value of the consequences or outcome that it earns.Whereas the former represents a relatively stable relationship that develops and changes with experience as a specific action-outcome association is formed, the latter is constantly fluctuating according to the immediate reward history of the action and the experienced value of the outcome.Action values are commonly thought to be updated by the RPE signal, which propagates according to the animals' experience with reward (or non-reward) to update the subsequent value of a specific action for performance.As described above, there is growing evidence that, as in Pavlovian learning, this RPE signal during instrumental learning is encoded by phasic dopamine signals in the DMS, which are broadcast bilaterally and broadly throughout the striatum.Importantly, however, it appears that, within the dorsal striatum, there is a second, lateralized, dopamine signal that differentiates activity in the ipsilateral and contralateral hemisphere to a specific action.It is this action-related signal in the pDMS that we propose is harnessed during goal-directed learning to signal the long-range action-outcome relationship.Our evidence for this comes from the finding that this signal during goal-directed actions is specific not only to the action but also to the relationship between the action and its outcome: when the action-outcome relationship is altered, either by contingency degradation or by outcome identity reversal, this clear hemispheric difference in dopamine release is diminished or abolished (Hart et al., 2024).
Importantly, this finding doesn't negate the presence of movement-related signals in the striatum.Instead, it suggests that, in the context of goal-directed action, movement-related signals are collectively utilised to Here, that predicted value, V t , produces a small positive increment in release, +d t , when the outcome is contacted suggesting obtained reward was not fully predicted.(B) At time 't + 1 0 the action value V t is then updated by adding this increment to generate the new action value V t+1 producing a larger predicted value.However, in this case that prediction is violated: the predicted outcome is not delivered and so a decrement in release is generated, -d t+1 .(C) As a consequence, at time 't + 2 0 this decrement results in a reduction in the action value, V t+2 .
form a distinct representation of an action as more than just the animal's movement.We and others have previously suggested that the dorsal striatum is preferentially geared towards the execution of learned responses rather than motor movements per se (Balleine, 2019;Cox and Witten, 2019;Peak et al., 2019).For example, whereas, in an untrained animal, unilateral stimulation of dSPNs or iSPNs drives contraversive and ipsiversive rotational movements, respectively (Kravitz et al., 2010), such manipulations in a trained animal modulate the performance of learned choices and in such a way that stimulation interacts with recent reward history to bias choice (Tai et al., 2012).This integrative process underlying action representation applies not just to simple actions but to action sequences.As predicted by hierarchical views, recent studies have observed that, as the sequence is formed, dopamine release in DMS comes to be related to the sequence rather than its component actions (van Elzelingen et al., 2022).In many ways, therefore, this ability of striatal SPNs to integrate rapid changes in action value both for performance and to drive a lateralized striatal learning signal reflecting the strength of the action-outcome association is uniquely geared to goal-directed learning.Thus, given the functions that plasticity in pDMS serve, and the need for conjunction of a variety of cortical, thalamic and dopaminergic inputs to drive that plasticity, it should not be surprising to find that these signals are in some manner entrained to the action-outcome contingency (Adrover et al., 2020;Johnson et al., 2020;Hamid et al., 2021;Hart et al., 2024).It is true that, in striking similarity, striatal dopamine release appears to signal contralateral movement in untrained animals, and indeed in trained animals during Pavlovian approach movements, such as advancing toward the reward magazine (Hart et al., 2024); conditioned preparatory responses that are considered to be elicited somewhat reflexively by US's and the stimuli that predict them (Burton and Balleine, 2022).However, there are clues to indicate how this movement signal might be harnessed by learning processes: across the course of goal-directed learning, early movement-related lateralization is greatest, unsurprisingly, during periods of greatest movement.In an instrumental lever pressing task where animals move between a lever and a magazine, lateralization during early training peaks during the approach to, and retreat away from, the lever; not during the lever press itself (Hart et al., 2024).As training progresses, however, the lateralization increases at the time of the press and decreases during the surrounding epochs.This finding alone cannot rule out obvious explanations along the lines of refinement of the motor aspects of the instrumental response as training progresses.However, when taken in combination with the findings described above, that changing the action-outcome relationship results in a collapse in this signal, it is clear that pDMS dopamine lateralization during goal-directed learning is signalling more than just movement.We propose that dopamine release in this region of the striatum serves as the catalyst for the integration of learning and performance signals in striatal SPNs for goal-directed action.Confirmation of this relationship would, however, require two kinds of experiment: (i) studies directly manipulating the degree of lateralised release in pDMS to assess its causal influence on action-outcome learning and (ii) simultaneous measurement and manipulation of both dopamine release and the activity in SPNs during goal-directed learning.These studies remain to be conducted.Lateralised release returns with continued training after reversal.Note that, in all panels, the rats are depicted as using left and right paw to depress left and right lever respectively however it is not known whether release in this situation reflects the specific effector used or the action target; i.e., the left or right lever.
Of course, there are many other questions that remain to be resolved.For example, is the development of a lateralized dopamine signal during the acquisition of goal-directed actions specific to goal-directed learning, or does it emerge in dorsolateral striatum during habit learning?Some evidence would suggest that both DMS and DLS plasticity emerges simultaneously (Kupferschmidt et al., 2017;Smith et al., 2021); and, in fact, such an arrangement would provide a sensible way for the stimuli that drive action initiation to be integrated with action-outcome learning to control performance, in accord with collaborative theories of action-outcome and stimulus-response learning (Balleine and Dezfouli, 2019;Balleine, 2019).Again, multiple recording protocols will be important to resolve this question.
In addition, there remain important questions regarding how dopamine release is modulated by learning.There is now reasonable evidence for striatonigro-striatal feedback circuits (Lerner et al., 2015;Crittenden et al., 2016;Ambrosi and Lerner, 2022), which may drive learning-related changes and moment-tomoment modulation in dopamine release via direct striatal modulation of dopamine neurons themselves.And, indeed, it has been argued that dopamine waves may be driven by synchrony between midbrain dopamine neurons (Hamid et al., 2021).However, given energetic costs of action potential propagation throughout the entire axonal arbour (Pissadaki and Bolam, 2013), it cannot be assumed that learning-related changes occur solely at the level of somatic dopamine neuron firing.Although our findings measuring dopamine release accord well with recordings of activity in dopamine terminals and soma, discrepancies between firing of dopamine neurons and dopamine release have recently been reported, albeit in ventral striatum (Mohebi et al., 2019).Considerable evi-dence, some described above, points to local modulation of release.
Furthermore, several recent reports have suggested that striatal cholinergic interneurons can induce axonal release of dopamine (Liu et al., 2022), potentially under the control of the corticostriatal circuit (Go´mez-Oca´diz and Silberberg, 2023), providing both a new mechanism of dopamine release and a potential source of learning-related local modulation of striatal dopamine signaling, one which doesn't require immediate changes at the level of SNc dopamine soma.There have also been reports that other interneurons may play a role; notably low threshold-spiking interneurons (LTSIs) have been reported to regulate the acquisition of goaldirected actions: their activity antagonises both goal-directed learning and dopamine release and, indeed, also influences cholinergic interneuron activity (Holly et al., 2019(Holly et al., , 2021)).It seems likely, therefore, that the direct modulation of dopamine neuronal activity and of local release are integrated in complex ways to produce both the hemispherically localised patterns of release observed with changes in action and direction of movement and the more global and rapid moment-to-moment fluctuations observed bilaterally with changes in action value (Azcorra et al., 2022).How these are achieved will take some working out, and an important area of focus may well be the inhibition of ipsilateral dopamine during actions, as opposed to its facilitation contralaterally (Hart et al., 2024).The reports that striatal patches directly project to dopamine neurons (Crittenden et al., 2016), that activation of these striato-nigral terminals can immediately suppress dopamine release in the dorsal striatum (Nadel et al., 2021), and of closed loop regulation thorough disinhibition of SNr GABAergic interneurons (Lerner et al., 2015) provide potential ways in which release could be rapidly inhibited in this way.If such mechanisms were important specifically for modulating hemispherically lateralised release, this may also account for the regional localization of action-related dopamine signals relative to RPE signals, which have been widely reported in somatic dopamine neuron firing, are widespread, and bilaterally broadcast.
Finally, as currently specified, there is no mechanism for understanding how and whether dopamine release contributes to and/or shapes regions of plasticity in the pDMS to account for the encoding and retrieval of specific action-outcome associations.We have argued previously that regions of plasticity could be encouraged or discouraged based on the regulation of cholinergic interneuron activity under the control of the thalamostriatal pathway (Bradfield et al., 2013;Bradfield and Balleine, 2017;Balleine et al., 2021).If a mechanism for very local changes in release could be specified then there is a ready means by which increases and decreases in local dopamine activity could be thought to regulate plasticity via the signalling properties of dSPNs and iSPNs, respectively, within those regions.This kind of regional plasticity/activity has been recently linked to the function of sub-experts in a 'mixture of experts' model with regard to dopamine wave-like release, in which ramps in DMS dopamine release appeared to accompany behavioral control, potentially promoting both online action vigor and providing a tag for plasticity (Hamid et al., 2021).As such, they accord with our data suggesting dopamine provides both contingency and action value signals and also with views that the different functions of dopamine during reward pursuit and outcome delivery can be gated by local microcircuit elements regulating windows of plasticity and cellular activity (Bradfield et al., 2013;Franklin and Frank, 2015;Berke, 2018;Balleine et al., 2021;Hamid et al., 2021).Establishing the rules for such local plasticity will provide key insight into the specificity of encoding and retrieval, which will be necessary to achieve a complete explanation of the role of dopamine in the striatal control of goal-directed learning and performance.

Fig. 1 .
Fig.1.Prediction error updating of action values for goal-directed learning.A simple hypothetical functional relationship between the dopamine-related prediction error signal and action value updating: (A) At time 't' the estimated value of the action is based on its predicted outcome.Here, that predicted value, V t , produces a small positive increment in release, +d t , when the outcome is contacted suggesting obtained reward was not fully predicted.(B) At time 't + 1 0 the action value V t is then updated by adding this increment to generate the new action value V t+1 producing a larger predicted value.However, in this case that prediction is violated: the predicted outcome is not delivered and so a decrement in release is generated, -d t+1 .(C) As a consequence, at time 't + 2 0 this decrement results in a reduction in the action value, V t+2 .

Fig. 2 .
Fig. 2. Summary of the changes in dopamine release during instrumental training, contingency degradation and reversal as described in Hart et al (2024).(A) Over the course of instrumental training the actions begin to generate lateralised signals with dopamine release in the hemisphere contralateral to the action increasing and in the ipsilateral hemisphere decreasing relative to baseline.(B) The net difference in hemispheric dopamine release reflects the action-outcome contingency: degradation training removes the hemispheric bias but only on the degraded action.(C) Similar effects on lateralised dopamine release are induced after training when the outcome identities are reversed.Lateralised release returns with continued training after reversal.Note that, in all panels, the rats are depicted as using left and right paw to depress left and right lever respectively however it is not known whether release in this situation reflects the specific effector used or the action target; i.e., the left or right lever.

Fig. 3 .
Fig. 3. Schematic illustration of the hypothetical effects of hemispheric changes in dopamine release on instrumental performance.(A) The increase in dopamine release in the right hemisphere, i.e., contralateral to the left lever press, is hypothesized to increase direct pathway (striatonigral) output by increasing the excitability of dSPNs, and to suppress indirect pathway (striatopallidal) activity by reducing excitability of iSPNs, with the overall consequence being inhibition of the substantia nigra pars reticulata, and disinhibition of thalamic and brainstem regions important for motor output and control.(B) Conversely, reduced dopamine release in the hemisphere ipsilateral to the left lever press response will increase indirect pathway (striatopallidal) activity and so suppress (competing) actions controlled by the left hemisphere.(C) This bias in hemispheric dopamine release is reversed for (D) right lever press actions, as is the control of the direct and indirect pathways and the relative influence of the stiatopallidal and stratonigral projections.