The State of Connectionist Modelling in 2004

Interview with Michael Thomas for Parallaxis Journal (2004, Vol.8, 43-61)

The interview addressed the following issues on current connectionist modelling:

1. Are hidden units unscientific free parameters, guaranteed to allow connectionist networks to simulate any behavioural data?

2. Have connectionist networks generated any genuinely new cognitive theories?

3. If you damage a connectionist network, are the deficits to its behaviour domain-general or domain-specific?

4. Can connectionist networks help explain what general intelligence is?

5. Can connectionist networks aid our understanding of really complex aspects of thought, like reasoning?

6. New memories wipe out old memories in connectionist networks. This isn’t much like the brain, is it?

7. What lies in the future for connectionism?

Interview questions

1. In 1998, Green highlighted the problematic issue that connectionist models appear to have too many degrees of freedom, undermining any successes they demonstrate in simulating behavioural data. While the large number of free parameters in these models could be justified on the grounds that connectionist networks are simulations of real neural circuits, Green pointed out that in fact, connectionist modellers have retreated from this neural realism, thereby undermining the few constraints they could offer for why they permit themselves so many free parameters to fit behavioural data. How do you think that this problem has been solved, controlled or diminished since Green’s critique?

[Green, D. C. (1998). The degrees of freedom would be tolerable if nodes were neural. Reply to Lamm on Connectionist-Explanation. PSYCOLOQUY 9(26) ftp://ftp.princeton.edu/pub/harnad/Psycoloquy/1998.volume.9/psyc.98.9.26.connectionist-explanation.23.green.]

So, first, thank you for inviting me to contribute to your magazine. In my research, I spend a lot of time in the details of modelling and empirical work, and it is refreshing to consider some of the wider theoretical issues in the field of connectionist modelling.

On to Question 1, then. The debate started by Green’s article in 1998 was an interesting and energetic one, and with my colleague Tony Stone, I contributed a commentary to that debate which addressed some of the philosophy of science issues surrounding connectionism. In that article, we drew an analogy between the status of a hidden unit and that of a gene as theoretical entities. So first, I would encourage your readers to take a look at that commentary for my original response to Green’s points!

Of course, our commentary was written 6 years ago, so one might ask, have I changed my opinion since? Green’s criticisms of connectionism revolve around two main points. First, what is the real neural plausibility of connectionist models, given that no one (to this day) has a solid answer for what a hidden unit corresponds to in the brain? Second, connectionist models have many free parameters (architecture, connection strengths, thresholds, etc.) that allow them in principle to simulate a wide range of input-output functions. If they are this flexible, can a model that successfully simulates some limited set of human behaviour really be telling us anything interesting about the cognitive system?

My current response to these issues would be as follows.

(1) Given our current (limited) state of knowledge about cognitive mechanisms and the structure of mental representations, it is much too early to be attempting to rule out whole classes of explanatory models. Models are a tool to aid theory development, and connectionism has been successful in helping to advance theory. So if models appear to be useful, if the generate new ideas, then carry on building them.

(2) ‘Cognitive’ connectionism is not neural, and I am often tempted to think that connectionist models can be useful without any pretence that embody brain-style computation. For instance, the focus they have generated on mechanisms of change in cognitive development is by virtue of the fact that they are robust learning mechanisms. However, I also sometimes give in to the connectionist temptation to think that connectionist systems are (somehow) slightly more neurally plausible than symbolic systems, as the basic currency of cognitive computation, because I can imagine a connectionist network being more readily implementable in the brain’s wetware than serial symbolic computation, but this is not a terribly strong argument. In the long run, I suspect that connectionism will produce cognitive level theories that are more consistent in the vocabulary they use with neural theories, and that increasingly there will be a dialogue between these disciplines.

(3) Green, as others have done before him, focuses too much on simulation as the aim of computational modelling. A model may need to simulate the data to be of use, but its aim is to explain. Successful connectionist models in psychology are those that have generated theoretical insights beyond any fitting of particular data patterns (for instance, models of reading). Green’s criticism is usually accompanied by another: that when connectionist models successfully simulate data, they are inscrutable and we have no idea why they succeed. And I think this just isn’t true.

Let me say a little more about some of these points. Then I’ll finish by saying what I think a hidden unit corresponds to. (You heard it here first.)

First, I should say that I don’t lie awake at night worrying about philosophers’ a priori arguments about connectionism. For me, it’s a bit like having an a priori argument about whether Microsoft Windows is better or worse than Linux. I just get on and use the computer.

My current interests are what mechanisms – in the brain or in the cognitive system – explain variability, in either the normal or atypical population. I suspect I will end up using a wide range of models to help me understand this. Backpropagation networks, Hebbian networks, pattern associators, recurrent networks, self-organising networks. I think all of these will help me clarify theories of how developing representational systems may vary. But I wouldn’t rule out the possibility of interesting insights from production systems, Baysian networks, or even genetic algorithms. Previous psychological theories have offered vague notions to explain cognitive variability, including the level of cognitive ‘resources’ available, the ‘richness’ of representations, the ‘abstractness’ of representations, processing ‘speed’, level of ‘inhibition’ of task irrelevant, and so forth. Computational work is now essential to figure out what these ideas could mean when specified in detail, and whether they are sufficient to account for human behavioural data (such as the general factor of intelligence).

Above all, I think implementing models is the most important issue at present, because so many previous verbal psychological theories have been so vague that we can’t be sure what their terms really mean, whether they would really work, and what precise predictions can be generated from them.

Take a more general debate: nature versus nurture. When we build computational models, we are attempting to explore questions such as the following: what constraints are necessary in a model of learning so that it could show the behaviour we see in adults given, the (informational) environment it is exposed to? How specific are those constraints to the particular cognitive domain, what empirical evidence is there that such internal information processing constraints exist, and how specifically do they interact with data gleaned from the environment? These days, notions like ‘innate’ or ‘learned’ are irrelevant, we need to focus on how the process works. Take, for example, the Chomskian notion of “triggering” in language acquisition. Before we can evaluate it, how does it really work? We should not accept a linguist’s claim that this is the most trivial straightforward sort of learning. We need to see an implemented model with its strong constraints on learning clearly specified. We need to see what happens to the model when it is exposed to realistic language input. Hopefully such a model could capture the extended time course and error pattern of child language acquisition. Building the model would go a long way towards establishing the true meaning (and viability) of the notion of triggering.

So I don’t think that theoretical progress will come from philosophers and their a priori arguments of what a ‘unit’ or a ‘connection’ must or mustn’t be. In fact, it surprises me that philosophers of science have had so much to say on connectionism – it’s just a computer programme, after all, and one usually run on a desktop computer. Why should connectionism raise new conceptual issues about the nature of mind? Space Invaders didn’t. It leads me to believe philosophers follow in the wake of scientific events rather than generating them. By contrast, I greatly value the theoretical clarity they sometimes bring to confused fields (for example, in the scientific study of consciousness).

Onto neural plausibility, then. First, it is important to clarify that connectionist models have no level of description per se. They are a tool. When they are used in psychology, they are cognitive models, by which I mean that the activity of units stands for concepts that figure in traditional psychological theories. So a unit may correspond to a word or a semantic feature or an object edge or a phoneme. By contrast, we very rarely know what the activity of a given neuron corresponds to in conceptual terms, other than what we can infer from correspondences with perceptual input in single cell recording (see Lehky & Sejnowski, 1990, for how even those correlations might be misleading).

Hidden units are problematic, but their interpretation will still be at a conceptual level. Sometimes connectionists have suggested a hidden unit is a component of computation analogous in some way to a “neuron”, or “a group of neurons”, whereby a network of simple units or nodes can generate complex behaviour. And as we have seen, this is part of an argument that says connectionist networks are better than symbolic models because they are more neurally plausible.

I think connectionists are on very shaky ground, here. As Fodor (1998) points out, the New York subway has lots of nodes (stations) and connections (lines), but no one calls the New York subway system ‘neurally plausible’. Well, I could argue with Fodor (and his granny and his cat, who often take part in his arguments for some reason) and say this analogy makes a level-of-description error. Maybe with the right timetable, we could get the New York subway system to implement, say, the stomatogastric ganglion circuit that controls chewing in the spiny lobster (see Churchland & Sejnowksi, 1992, p.5, for the route map), as long as the number of people passing through each station could generate large scale changes in the behaviour of New York city as a whole entity. But Fodor’s point is well taken. The correspondence of hidden units in cognitive connectionist networks to actual neural circuits is currently too vague to be meaningful, and I suspect that the connectionist should view hidden units purely in terms of the computational role they play (see below).

Critics have also argued that backpropagation, a widely used connectionist learning algorithm, has very suspect neural credentials. However, my view is that while backpropation per se is unlikely to be the literal algorithm that the brain uses, it most probably falls within a group of gradient-descent error-driven algorithms that the brain does exploit. Perhaps the real school of algorithms exploits the plentiful back projections in the brain to compute output-target disparities. O’Reilly (1998) draws a useful distinction between two types of learning that the brain probably uses: (1) self-organising or Hebbian learning for the brain to form concise representations of the environment it interacts with (e.g., of objects in the world, or of what my arm will do when I try and waggle it about), and (2) error-driven algorithms to map between these representations (e.g., what motor actions I should do to pick up an object). Backpropagation (along with backprop-through-time, and contrastive hebbian learning) are probably at present suitable rough-and-ready tools to discover what distributed learning systems can acquire through error correction when expose to a structured environment. Of course, there’s this tricky problem of catastrophic interference in distributed systems, but we’ll return to that later.

Back to the idea of simulation. Just a couple more comments here. First, I agree with Green that if I have a connectionist model that simulates some aspect of human behaviour when it has 22 hidden units, but fails to simulate these data when it has 21 hidden units, then I have a useless model. Connectionist models need to be reasonably robust to parameter changes that have little obvious empirical justification. (Despite this, as I have written about elsewhere, some researchers have seen number of hidden units as a parameter that can explain cognitive development, intelligence, and developmental disorders [Thomas & Karmiloff-Smith, 2003a]. At best, these theories are half finished). Second, although explanation is more important than simulation (see Seidenberg, 1993, for some of the arguments here), I think it is worth introducing a further distinction in the roles that models can play. This is the distinction between what I call Abstract models and Specific models. The stage of theory development in a given field determines when each type of model will be profitably deployed.

On the one hand, models can be employed at a fairly abstract level, where the researcher only seeks to capture general characteristics of the problem domain. The aim here is to explore the patterns of behavioural data that can be generated by models embodying particular principles of processing. The end product is an expansion of the range of candidate inferences that can be drawn from patterns of human behavioural data to underlying structure. In other words, you find out that other sorts of model can also produce a given pattern of behaviour. Abstract models tend to be useful in the earlier stages of theory development in a given field, since they are an engine for conceptual clarification. One example would be the demonstration that distributed memories can produce double dissociations. Another would be that associative systems can learn rule-following behaviour. In both cases, the models offer additional possible explanations of data.

On the other hand, models can be used in an attempt to capture detailed patterns of empirical data from a target cognitive domain, while incorporating as many empirically motivated psychological (and perhaps neural) constraints as possible. The aim of Specific models is to evaluate the viability of particular hypotheses and generate new testable predictions. Such models are characteristic of more theoretically developed fields of enquiry, which are supported by a rich body of empirical data. In short, the idea is that once a theory is detailed enough and supported by enough data, you should be able to build a model of the theory to see if it really works. The metric of success will be a simulation of behavioural data, but the measured success relates to the constraints that are built into the model. The model should fail to simulate the data without those constraints.

Finally, I said I’d tell you what I thought a hidden unit corresponds to. I think a hidden unit is a (potentially non-linear) threshold-based computational primitive that can help to partition or segment the multidimensional representational space that corresponds to a given cognitive domain. It is an informational carving knife. Bet you wish you’d never asked, now.

References

Churchland, P. S., & Sejnowski, T. J. (1992). The computational brain. The MIT Press.

Fodor, J. A. (1998). In critical condition: Polemical essays on cognitive science and the philosophy of the mind. The MIT Press.

Lehky, S. R. & Sejnowski, T. J. (1990). Neural network model of visual cortex for determining surface curvature from images of shaded surfaces. Proceedings of the Royal Society of London, B 240, 251-278.

O’Reilly, R. C. (1998). Six principles for biologically-based computational models of cortical cognition. Trends in Cognitive Sciences, 2, 455-462

Seidenberg, M. (1993). Connectionist models and cognitive science. Psychological Science, 4, 228-235.

Thomas, M. S. C. & Karmiloff-Smith, A. (2003). Connectionist models of development, developmental disorders and individual differences. In R. J. Sternberg, J. Lautrey, & T. Lubart (Eds.), Models of Intelligence: International Perspectives, (p. 133-150). Washington, DC: American Psychological Association.

Thomas, M. S. C. & Stone, A. (1998). Cognitive connectionist models are just models, and connectionism is a progressive research programme. Commentary on Green on Connectionist Explanation. Psycoloquy, 36.

2. In your opinion, how valuable do you think connectionist models have been in generating alternative explanations for classic cognitive theories (such as, say, Festinger’s Cognitive Dissonance Theory)? To what extent do you think that theoretical cognitive models and connectionist models compete with each other in the sense of providing a more valid theoretical foundation?

[Goldsmith, M. (1998). Connectionist modelling and theorizing: Who does the explaining and how? PSYCOLOQUY 9(18) /1998.volume.9/]

In our original commentary as part of this theoretical dialogue (Thomas & Stone, 1998), we listed several areas where we felt connectionist models had made a real theoretical contribution. These were in models of language, such as inflectional morphology and syntax, in explaining acquired deficits after brain damage, and in explaining mechanisms of cognitive development. So, a short update on what has happened to these in the meantime.

Models of inflectional morphology (typified by the English past tense debate) have essentially run themselves to a standstill (see Thomas & Karmiloff-Smith, 2003b, for an update). There is plenty of evidence that duality is required in the language system to explain various patterns of dissociation from psycholinguistics and brain imaging. Before connectionism, Pinker’s theory was that the language system was split between linguistic rules and word-specific information. Connectionist models have demonstrated that all the existing data can (most probably) be explained by a duality between phonology and semantics, and no current empirical evidence in inflectional morphology needs rule-based representations to explain it. More on this later.

Connectionist approaches to syntax stalled in the 1990s, but connectionism has been part of a shift in linguistics to a statistical approach based more on corpus analyses of real language use. This data driven approach makes use of the cheap computational power that makes connectionism a viable research tool. Research on early language development now focuses on collecting much richer data, for instance exemplified by the work of Mike Tomasello (e.g., Tomasello, 2000). Here, researchers might try and collect as much as a third of all the utterances a child makes between the ages of one and two. This approach is part of a trend to focus on the linguistic environment more than internal abstract principles (where the data is just taken to be impoverished). However, since the models of Elman in the early nineties, it is still hard to see a connectionist model doing parsing of syntax without implementing (symbolic) variable binding between a word’s meaning and its grammatical and thematic function (though see Spivey & Tanenhaus, 1998, for principles of multiple constraint satisfaction applied to interaction of syntax and semantics in sentence comprehension). With Martin Redington, I recently applied simple recurrent networks to explore the extent to which behavioural deficits in syntax processing in acquired or developmental disorders can be explained by the structure of the problem domain or of the constraints of the processing system performing comprehension – so at least someone is still using these models! I myself suspect that in the long term, a version of connectionism will do away with Chomskian generative linguistics as a processing account of language, and the psychological reality of deep linguistic structure. But that time has not yet come.

Connectionist models of acquired deficits have grown to be an influential aspect of cognitive neuropsychology, for instance as applied to language, memory, semantics, and vision. There has been much sharing of principles between neurocomputation and psychology on ways in which information processing systems can breakdown. The advances in connectionist models here (most in the capacity of Abstract models, but sometime Specific too) has revealed the relative poverty of the traditional box-and-arrow diagram theories that went before. However, box-and-arrow models are still useful for summarising evidence from specific cognitive deficits after brain damage, and sketching out rough versions of cognitive architecture consistent with these case studies and with a task analysis of what the cognitive system must achieve.

Connectionist models of development have continued to expand the range of ideas about how we might explain changes in complexity of reasoning with age. Two examples will suffice: Tom Shultz has demonstrated how a connectionist model which increases its representational resources with training can capture both quantitative shifts and qualitative behavioural shifts on classical Piagetian conservation tasks (Shultz, 1998). A recent special issue of the journal Infancy in 2004 compares three different connectionist models that seek to explain apparently qualitative shifts in infant categorisation, between focusing on isolated features of objects to the patterns in which those features arise (see www.infancyarchives.com/Contents/Infancy5_2/infancy5_2.htm).

One area where I think that connectionist approaches have changed the basic nature of theorising is in terms of memory. On the old model, based on classical computation, long-term memory held content which had to be moved into working memory for it to be operated on. In psychological terms, long-term memories were laid down in long-term memory via a domain-general buffer of short-term memory. Connectionist researchers have offered two different perspectives on this. First, they argue that short-term memory is unlikely to be domain-general but there will be lots of domain-specific short-term memories (e.g., within language, one for semantics, one for phonology, one for syntax; see MacDonald& Christiansen, 2002, for connectionist arguments). Second, working memory, instantiated in pre-frontal cortex, may have no content. Instead, pre-frontal cortex contains markers that serve to raise the activation of task-relevant long-term memories in posterior cortical areas and inhibit irrelevant ones. The long-term memories themselves are active and perform computations on content (see, e.g., work by Davelaar & Usher, 2002; Haarmann & Usher, 2001; O’Reilly, Braver, & Cohen, 1999). Why has connectionism changed the flavour of the theory, here? The different perspective comes from the differential ease of shifting content around computational systems. It is easy in a variable binding system where the same binary string can be instantiated in different memory registers, but it is hard in a connectionist network where knowledge and processing are inherently tied together in the pattern of connections, weights, and thresholds.

The question asks about Festinger’s Cognitive Dissonance theory. I am not aware of connectionist approaches to this theory in particular, although Eiser (1994) has speculated on the extension of connectionism to attitude theory. Connectionist principles are easily applied to the emergence of complex behaviour based on the interaction of lots of attitudes, or on the interaction of lots of individuals within a group. However, especially in light of previous discussion of what a processing unit corresponds to, the entities in such a network would be frighteningly abstract!

Rogers and McClelland (2004) have applied simple connectionist networks to the development of semantics, and shown how many of the commitments of theory-theory of concept acquisition can fall out of similarity-based reasoning in distributed models of learning. This approach may be amenable to extension to attitude theory. O’Laughlin and Thagard (2000) used a handwired constraint satisfaction network to capture failures in reasoning on false-belief problems in autism, explaining the problem in terms of heightened inhibition, resulting in a loss of conceptual coherence. In this model, the activity on a unit corresponded to “Sally believes the marble is in the basket”. Now that’s a long way from single cell recording!

Two further points are worth making concerning the role of connectionism in generating alternative explanations for classic cognitive theories. Both of these can be illustrated with regard to the much-studied domain of past tense formation. First, it is very hard to compare classical cognitive theories – which are usually verbal theories – against implemented computational models (of any variety). Computational models are forced to commit to a particular implementation of various theoretical terms and claims about the environment, and evaluated on quantitative fits to human data. Verbal theories frequently do not make such precise claims, with their predictions typically predicting what kinds of things should be harder than others (e.g., higher frequency words faster to recognise than lower frequency words) but not by how much. This results in a clear differential in the conditions of falsifiability.

The past tense debate has been characterised by a verbally specified symbolic model on the one hand, and a set of specific connectionist implementations on the other. Although many of the connectionist models have been found wanting, there is a suspicion that the same has not happened for Pinker’s dual-route model because it is ill-specified and unimplemented. It is not clear how an implemented version of Pinker’s past tense model would work, or even if it would work. (It should be noted that Pinker himself has fair reasons for not implementing the model – see Thomas & Karmiloff-Smith, 2003b).

Second, sometimes one’s preference for one model or another does not come down to which fits the behavioural data better, but to the consistency of the model’s theoretical assumptions with other domains or disciplines. Thus in the past tense debate, currently it seems likely that both connectionist and rule-based models can explain most of the existing empirical data on inflectional morphology, its acquisition and its breakdown. (We’ll assume for the moment that an implemented dual-route model would actually work, and that the current toy connectionist models would scale up to a viable system for modulating word forms based on grammatical context). So which model is better?

It comes down to consistency. The connectionists claim their model is better because it employs computational primitives that seem more readily implementable in the brain than syntax and variable binding. Proponents of the dual-route model claim it is better because it makes representational assumptions that are consistent with the rest of language processing. If we are going to have to postulate rules and a set of related processing principles to explain other domains of language function such as syntax, why not build them into a model of inflectional morphology too? In this case, then, the debate on whether the connectionist theory will replace the classical cognitive theory comes down to what kind of consistency you prefer – with neurocomputation or with linguistics.

References

Davelaar, E. J., & Usher, M. (2002). An activation-based theory of immediate item memory. In J. A. Bullinaria, & W. Lowe (Eds.), Proceedings of the Seventh Neural Computation and Psychology Workshop: Connectionist Models of Cognition and Perception. Singapore: World Scientific.

Eiser, J. R. (1994). Attitudes, chaos, and the connectionist mind. Blackwell Publishers

Haarmann, H. & Usher, M. (2001). Maintenance of semantic information in capacity limited item short-term memory. Psychonomic Bulletin & Review, 8, 568-578.

MacDonald, M. C. & Christiansen, M. H. (2002). Reassessing working memory: Comment on Just and Carpenter (1992) and Waters and Caplan (1996). Psychological Review, 109, 35-54.

O’Laughlin, C., & Thagard, P. (2000). Autism and coherence: A computational model. Mind and Language, 15, 375-392.

O’Reilly, R. C., Braver, T. S. & Cohen, J. D. (1999). A Biologically-based computational model of working memory. In A. Miyake. & P. Shah (eds), Models of working memory: Mechanisms of active maintenance and executive control. New York: Cambridge University Press.

Rogers, T. T., & McClelland, J. L. (2004). Semantic cognition: A parallel distributed processing approach. MIT Press.

Shultz, T. R. (1998). A computational analysis of conservation. Developmental Science, 1, 103-126.

Spivey, M. & Tanenhaus, M. (1998). Syntactic ambiguity resolution in discourse: Modeling the effects of referential context and lexical frequency. Journal of Experimental Psychology: Learning, Memory and Cognition, 24, 1521-1543.

Tomasello, M. (2000). Acquiring syntax is not what you think. In D. V. M. Bishop & L. B. Leonard (eds.), Speech and language impairments in children: Causes, characteristics, intervention, and outcome (p. 1-15). Psychology Press.

Thomas, M. S. C. & Karmiloff-Smith, A. (2003b). Modelling language acquisition in atypical phenotypes. Psychological Review, 110(4), 647-682.

Thomas, M. S. C. & Redington, M. (2004). Modelling atypical syntax processing. To appear in: Proceedings of the COLING-2004 Workshop: Psycho-computational model of human language acquisition. Geneva, Switzerland, 28 August 2004.

3. Connectionist models include a range of computational constraints / parameters, such as the initial architecture of the network, activation flow patterns, the learning algorithm, and so forth. These parameters alter the ability of the system to acquire intelligent behaviour. To simulate atypical development, the parameters settings are altered. When this is done, to what extent are the subsequent behavioural impairments domain-general or domain-specific? To what extent are the errors caused by such manipulations brand new types of error, and to what extent are they exaggerations of errors that are found during the course of normal development?

[Thomas, M., Karmiloff-Smith, A., Connectionist Models of development, developmental disorders and individual differences]

Boy, these questions don’t get any easier!

Current connectionist models are of narrow domains, so are not readily applicable to claims about general cognitive skills. I’ll discuss this point in my answer to the next question. Therefore I’ll take the above question to ask the following: are changes following parameter changes general or selective to the abilities carried out by a single system? For instance, in a network that parses sentences, when you change a parameter, does this affect the ability of the network to parse all sentence types to the same extent, or does it affect some constructions more than others (say, passives more than actives)? Or in a network that learns to reach for moving and possibly hidden objects, does a parameter that affects the ability of the system to identify objects equally affect the ability of the system to identify the location of objects?

The short answer is that this is the focus for my current research, and I don’t have the answers yet. I have done some reasonably exhaustive work on networks that learn the past tense problem. This indicated that some parameters have an effect that extends to all parts of the problem, while others are more specific (Thomas & Karmiloff-Smith, 2003b). Moreover, the effects of a parameter change may be quite different depending on whether the parameter change is developmental (applied to the untrained network) or acquired (applied to the trained network) (Thomas & Karmiloff-Smith, 2002a). For example, processing noise is bad for a developing system because it does not get a clear version of the knowledge it is supposed to learn. However an adult system, with a robust representation of the knowledge, is relatively impervious to the same levels of noise. By contrast, the adult system is very vulnerable to removing network structure (units or connections) because these encode stored knowledge. But the developing system can tolerate lost resources because it can reorganise its knowledge to exploit the remaining resources.

In terms of the pattern of errors you get, I think this raises a very interesting point. Some researchers in atypical development have argued that in the majority of developmental disorders, the errors one sees in performance in a wide range of tasks can normally be found in younger normally developing children, whether it be in language, visuospatial processing, attention, executive function, and so on. These researchers have argued that this implies the modular structure of the human cognitive system is relatively impervious to developmental disruption, and that developmentally disordered cognitive system can be characterised by quite close reference to the structure of the normal (usually adult) cognitive system, with individual components more or less developed along the normal pathway. I’m not crazy about this theory, since, following Annette Karmiloff-Smith and her colleagues, I tend to view the architecture of the adult cognitive system as a product of development rather than a precursor to it, and therefore an aspect that may well be vulnerable to developmental disruption (see Annette’s article in Trends in Cognitive Sciences in 1998).

How then do we explain the commonalities in error patterns found in developmentally disordered children and younger normal children? Why do psychologists so often classify deficits as delays (e.g., as in delayed language development, delayed reading, and so forth)? Well, I’ve written on this with Annette elsewhere in more detail (Thomas & Karmiloff-Smith, 2002b), but first there is reason to think that psychological tests may in part be producing these commonalities, since they are not designed to offer scope for a range of alternative error patterns. Second, the commonalities between normal and atypical development may be overstated, because there are qualitatively different patterns of behaviour found in some disorders, such as in autism, and synaesthesia (a condition where modalities interact, so that, for instance, words trigger colours). But third, my guess is that the structure of the cognitive problems which humans face (be those children typically developing or otherwise) constrains the type of errors that can be produced.

So in answer to the second part of the question, my network simulations in a couple of sample domains have instantiated the idea that different computational parameters tend to interact with or modulate the normal pattern of errors one sees during learning. The atypical neurocomputational constraints that operate in developmental disorders will mostly only be expressed through the ‘vocabulary’ of the cognitive domains that humans have to acquire in their physical and social world. I found this result in both English past tense simulations (Mareschal et al., forthcoming); and most recently in the context of a model of sentence parsing, work carried out with Martin Redington. In that model, we replicated a normal pattern of difficulty in comprehending active, passive, subject cleft and object cleft English sentences, the pattern of difficulty shown in adults with aphasia (Dick et al., 2001) and the (subtly different) pattern of difficulty found in children with Specific Language Impairment (Dick et al., 2004). Atypical computational conditions tended to exaggerate normal patterns of difficulty, but the exact parameters produced different kinds of interactions.

My conclusion from this work is that to call a developmental deficit a ‘delay’ is just to point out that the behaviour resembles that of a younger typically developing child. To explain the data, one must consider the constraints operating on the process of development.

References

Dick, F., Bates, E., Wulfeck, B., Aydelott, J., Dronkers, N., & Gernsbacher, M.A. (2001). Language deficits, localization, and grammar: Evidence for a distributive model of language breakdown in aphasic patients and neurologically intact individuals. Psychological Review, 108(3), 759-788.

Dick, F., Wulfeck, B., Krupa-Kwiatkowski, M., & Bates, E. (2004). The development of complex sentence interpretation in typically developing children compared with children with specific language impairments or early unilateral focal lesions. Developmental Science, 7(3), 360-377.

Karmiloff-Smith, A. (1998) Development itself is the key to understanding developmental disorders. Trends in Cognitive Sciences, 2(10), 389-398.

Mareschal, D., Johnson, M., Sirios, S., Spratling, M., Thomas, M. S. C., & Westermann, G. (forthcoming). Neuroconstructivism: How the brain constructs cognition. Oxford: Oxford University Press. (see http://www.psyc.bbk.ac.uk/people/academic/thomas_m/web_summary.htm)

Thomas, M. S. C. & Karmiloff-Smith, A. (2002a). Are developmental disorders like cases of adult brain damage? Implications from connectionist modelling. Behavioural and Brain Sciences, 25(6), 727-750.

Thomas, M. S. C. & Karmiloff-Smith, A. (2002b). Residual normality: Friend or foe? Behavioural and Brain Sciences, 25(6), 772-780.

Thomas, M. S. C. & Karmiloff-Smith, A. (2003b). Modelling language acquisition in atypical phenotypes. Psychological Review, 110(4), 647-682.

4. In the study of general intelligence, what other benefits would computational modelling of this phenomenon bring besides clarifying fuzziness within existing cognitive theories? How does this relate to the distinction between typical development and developmental disorders?

[Thomas, M., Karmiloff-Smith, A., Modelling Typical and Atypical Cognitive Development: Computational constraints on mechanisms of change]

Perhaps you’re looking for an answer where I say that once we understand the computational parameter involved in mediating general intelligence, then we can construct a drug to make people more intelligent? Perhaps you think that the relation between taking mind-altering drugs and the production of great works of art and literature points in this direction? If I were to speculate, here, I think that in a couple of hundred years we may be able to come up with such an ‘intelligence’ drug. But mostly likely, if you start taking it as an adult, it won’t do much with the ‘ordinary’ mental representations you have developed during your childhood. So you would have to take the drug from day one to develop the right sort of fast, rich, flexible, abstract mental representations. There would clearly be some ethical considerations in testing such a drug, given that it has to be applied throughout childhood (something similar is underway in the USA with drugs intended to treat Attention Deficit Hyperactivity Disorder). But what if you developed ‘intelligent’ mental representations using your ‘intelligence’ drug, and later one day forgot to take your tablet? The result may just that you have a particularly ‘stupid’ day. But I suspect the de-modulation of neural systems would produce something more akin to schizophrenia.

More seriously, however, the immediate focus of this research is theoretical. General intelligence is a statistical construction, indexing the correlation that individuals show across multiple psychological tests. The variation that is not accounted for by this General factor is labelled as a set of ‘Specific’ intelligences. The psychological and neurocomputational reality of these statistical constructs remains a matter of debate (see, e.g., the debate surrounding Arthur Jensen’s work on the g-factor and Howard Gardner’s theories on multiple intelligences).

To evaluate candidate explanations of general variability, one needs to build models of multiple component systems. Under the assumption that various computational parameters will affect the quality of cognitive processes in each component, several possible sources of general variation are possible in theory.

- It could be that variation in same parameter affects all systems (e.g., how ‘good’ your neurons are, or your axons or some such factor present in all components).

- It could be that general variation is explained by different settings in a single parameter in domain-general system, either one involved in controlling component systems such as an executive system, or involved in processing domain-general content, such as a working memory (though see previous caveats).

- It could be that different parameter variations occur in lots of the components, but behaviour is always generated by interactions of components in network of networks, so the general part of the variation is an emergent phenomenon.

- It could be a combination of all of the above, so that general intelligence has a hybrid causal nature.

One benefit of building computational models is to find out which of these possibilities can reproduce something like the statistical construct of general intelligence. Second, one would hope to begin building a ‘vocabulary’ of variability which may later be useful in interpreting neuroscience data, for instance from brain imaging studies of individual variability (e.g., to explain intelligence, should we expect differences in activation across many brain areas or a single area? See Duncan et al., 1996, 2000, and Duncan & Owen, 2000, for a claim of the latter sort).

Some of my current research includes a search for parameters that have a good or bad effect whatever the architecture they are in or whatever problem they are required to learn. This is the elusive ‘golden’ parameter. Personally, I don’t think it exists. Parameters have different effects depending on the architecture they are in and the problem that needs to be solved (e.g., see role of learning rate in semantic vs. episodic memory in my answer to Question 6). But there’s no harm in looking for a golden parameter, and at least one modeller has made proposals of this sort (e.g., Garlick, 2002, thinks that learning rate has that touch of gold).

There are clues that intelligence corresponds to something closely tied to neural computation. For instance, phantom limbs following amputation are caused by neural plasticity and reorganisation of somatosensory cortex, but their emergence appears to be more likely in individuals with higher IQ (Spitzer, 1996). Moreover, as Jensen (1998) has reviewed, some fairly low-level brain measures (EEGs, speed of nerve conductance) also appear to be correlated with scores on IQ tests.

But significant questions remain to be answered. Higher IQ is correlated by faster responses (reaction times) and short inspection times. But ‘faster’ computers aren’t necessarily cleverer. If you load a game into a PlayStation which is rigged to run at twice the speed, the outcome is not cleverer processing, just a faster game. It is changes in the programme itself, its representations and transformations, that lead to more intelligent outcomes. What, then, links speed with the abstraction / sophistication of mental representations?

The long-term aim of having models whose parameter settings control ‘level of intelligence’ is the exploration of training regimes that optimise the performance of the systems given their settings. In other words, one would seek to maximise the environmental variation that can be effected in the development of a cognitive system despite significant contribution of genes to establishing intelligence. This line of research would hope to make contact with work on early educational intervention and hot housing, but it is some way off.

A further theoretical consideration is the relation of normal variation (intelligence) to atypical variation (developmental disorders). Will both sorts of variation be explained by the same parameters – do they lie on the same dimensions? Again, the aim here is to clarify existing theoretical debates in the field of developmental disorders, such as that between the ‘developmental’ model and the ‘difference’ model (e.g., Bennett-Gates & Zigler, 1998) and whether the tails of distributions of cognitive variability show stronger genetic control than variation closer to the mean (Plomin & Dale, 2000). For instance, in exploring one of my simple connectionist models, I found that linear increments to a given parameter in the startstate produced non-linear differences in the variability of performance after training, variability that was normally generated by random changes in the nature of training experienced by each network. In other words, this (Abstract) model demonstrated that a continuum of variation of a computational parameter produced an apparently increasing ‘genetic’ rather than ‘environmental’ contribution to developmental outcome. In turn, this suggests that behavioural genetics studies indicating higher heritability of very low (or very high) cognitive performance do not rule out the possibility that these are still just tails of normal distribution of ability.

References

Bennett-Gates, D., & Zigler, E. (1998). Resolving the developmental-difference debate: An evaluation of the triarchic and systems theory models. In J. A. Burack, R. M. Hodapp, & E. Zigler (eds.), Handbook of mental retardation and development (p. 115-131). Cambridge University Press.

Duncan, J., Emslie H., Williams, P., Johnson, R., & Freer, C. (1996). Intelligence and the frontal lobe: The organization of goal-directed behavior. Cognitive Psychology, 30, 257-303

Duncan, J., & Owen, A. M. (2000). Common regions of the human frontal lobe recruited by diverse cognitive demands. Trends in Neurosciences, 23, 475-483.

Duncan, J. et al. (2000). A neural basis for general intelligence. Science, 289, 457-460.

Garlick, D. (2002). Understanding the nature of the general factor of intelligence: The role of individual differences in neural plasticity as an explanatory mechanism, Psychological Review, 109, 116-136.

Gardner, H. (1999). Intelligence reframed: Multiple intelligences for the 21st century. New York: Basic Books.

Jensen, A. R. (1998). The G-factor. Westport, CT: Praeger.

Jensen, A. R. (1999). The G Factor: the Science of Mental Ability. Psycoloquy, 10, 23.

Plomin, R. & Dale, P. S. (2000). Genetics and early language development: A UK study of twins. In D. V. M. Bishop & L. B. Leonard (eds.), Speech and language impairments in children: Causes, characteristics, intervention, and outcome (p. 35-51). Psychology Press.

Rogers, T. T., & McClelland, J. L. (2004). Semantic cognition: A parallel distributed processing approach. MIT Press.

Spitzer, M. (1996). Phantom limbs, self-organising feature maps, and noise-driven neuroplasticity. In J. A. Reggia, E. Ruppin, & R. Berndt (eds.), Neural modelling of brain and cognitive disorders. World Scientific.

5. Since computational models can only represent a given experimental task or situation, to what extent do you think developmental models will fall short in representing and simulating structures of knowledge organization (e.g., cognitive schemata, semantic and propositional/conceptual networks, categories, and so forth)?

[Miclea, M., Curşeu, P., (2003), Neurocognitive Models, ASCR, Cluj-Napoca]

I think there are three issues here.

First, one of the main shortcomings of recent connectionist modelling is the piecemeal approach to model separate unrelated cognitive domains. Take two of the paradigmatic examples in language: Elman’s early model of syntax processing (1990) and Rumelhart and McClelland’s past tense network (1986). Both of these should appear in an overall model of the language system, right? But thus far, they have predominantly been studied in isolation. And it’s not obvious that the representational and design commitments of the two models are compatible. So ‘unified’ modelling is an area of research that remains to be explored.

It also remains to be demonstrated where the architectures for the various cognitive domains come from. In many cases it is unrealistic to argue that architectures are pre-specified modules (i.e., that the main elements of their structure are not activity dependent). For example, take Hinton and Shallice’s (1991) model for mapping written words to meanings. Reading is a recent culture invention that requires intensive education to acquire. The architecture just isn’t there in people who haven’t learned to read. Such an architecture must be the product of development, whereby some process yokes together and specialises more general systems, in this case visual, phonological, and semantic systems. What developmental process is that?

This process of specialisation must fit within a bigger picture of the way that the relatively undifferentiated infant cortex transforms to become the highly differentiated and reasonably uniform adult cortex. Uniformity of outcome implies a strongly constrained process of emergent specialisation. Nevertheless, there must also be flexibility to explain the reorganisation children experience if they experience early focal brain damage. Some research has been carried out on computational approaches to emergent specialisation (Jacobs, 1999), but an awful lot remains to be fleshed out for a full cognitive theory.

The second issue is what is going on in the clever part of human cognition, the part that Fodor referred to as the ‘central system’ (Fodor, 1983)? The notional central system performs general reasoning and problem solving, and contains mysteriously structured schemas, semantics, concepts, beliefs, propositions, theories and so forth. Does connectionism have the representational power to address this complexity, with its unstructured vector representations? The jury is still out. Currently, we don’t know what ‘central’ representations look like. Fodor (2000) reckons we’ll never know, because he thinks the only two cognitive architectures we’ve come up with, symbolic and connectionist, are both inadequate at capturing the key feature of human reasoning – context sensitivity. Fodor refers to this as abductive reasoning, which is the idea that processes are somehow global, potentially making contact with the entire set of knowledge (theories about how the world works) that the individual possesses in order to draw a conclusion. For example, let’s say I have to decide whether I want to go to the park today. A whole host of reasons could determine whether I should go to the park, including who might be there, what the weather is like, is the football on, is the pollen count high, what other things could I do, is there an asteroid heading towards planet Earth, and so forth. The nature of reasoning processes when potentially all stored information could be relevant appears computationally complex at the very least (although I think Fodor is wrong to believe it is computationally intractable, since both symbolic and connectionist models can achieve types of abductive reasoning; but that’s a debate for another time).

In terms of computational models, one of the problems here is that the symbolic models that contain the structured representations necessary to simulate propositional behaviour (e.g., the Structural Mapping Engine in analogical reasoning) have to be handwired, and it is not obvious that these models could be made developmental – or even that the representations that they use are learnable (see, e.g., the SME model described in Gentner et al., 1995, where two static models are each handwired to depict analogical reasoning at two different ages). Connectionist models by contrast are much better at learning representations, but don’t seem to have the power to represent the conceptual structure necessary to capture the complexity of the reasoning. Rogers and McClelland’s (2004) recent explorations with the Rumelhart and Todd model of semantic knowledge (Rumelhart & Todd, 1993) perhaps makes some progress on this question. The model learning semantic knowledge in propositional form, such as TREES HAVE LEAVES, A DOG IS AN ANIMAL and so on. But there remains much work to do to bridge the gap between simple developmental models and complex, structured, handwired models.

Will connectionist models be able to do the job? My intuition is that to explain behaviours relating to propositional knowledge, the connectionist solution will invoke more complex temporal processing dynamics rather than more richly structured symbolic-like representations. In this type of story, propositions would be processed serially over time rather than existing as static, fully formed tree structures, with transitions up or down the tree structure exploiting similarity-based processing. This solution would allow the structure of the propositional output to be retained (in terms of observable behaviour) whilst simplifying the required structure of the internal representations and retaining the developmental strength of connectionist systems.

For those who have read Fodor and Pylyshyn’s (1988) critique of early connectionist models, you may remember the dilemma these authors set connectionism. Either it is an implementation of classical symbolic computation, and therefore of little interest to psychologists (perhaps of interest to neuroscientists), or it can’t implement symbolic computations in which case it is insufficiently powerful to explain human cognition (because Fodor and Pylyshyn assume that language and reasoning require symbolic processing). The solution I outline above would escape the horns of this dilemma by relying on Turing equivalence – the ability of a certain computational system to implement any type of function given infinite time and resources – or in this case to simulate it to a certain level of accuracy given limited time and resources. Connectionist systems can approximately (but not perfectly) simulate serial symbolic computations under normal conditions by an inefficient, laboured, time-consuming and attention-expensive use of the neural circuits that are better employed doing the things they are good at – fast parallel pattern recognition. Serial deductive reasoning in humans would be such a form of slow, thoughtful, ‘simulated’ symbolic behaviour. But when the time pressure is on, or in areas of expertise, human performance will revert to similarity based, pattern recognition type processing favoured by the native computational substrate. This solution would allow us to explain the observed ‘symbolic’ human behaviour. However, when you look closely, the simulation of symbolic computation will not turn out to be perfect, and outside of normal conditions, the human system will turn out to make similarity-based errors that betray its alternative computational primitives. This allows you to escape the alternative charge of implementing classical computation. Hence we vault the horns of the dilemma and land squarely on the bull’s back!

But this proposal is just a stab in the dark. It remains true that current connectionist models are generally piecemeal simulations of restricted cognitive domains, and too simple to explain interestingly complex human behaviour.

This brings me to my third issue (you forgot there were three, right?) Connectionism doesn’t give you a free pass when it comes to the frame problem (as AI researchers call it). Common sense reasoning in humans is based on the background knowledge humans have about their physical and social worlds. No one knows how we encode such knowledge just yet. One promising approach is to try and detect this background knowledge as it is expressed in the patterns of words we use. That is, to pick up higher order statistical invariances in word orders of large corpuses of human text. Nevertheless, there are some aspects of explaining human reasoning that remain extremely hard to crack.

References

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179-211.

Fodor, J. (1983). Modularity of mind. MIT Press: Cambridge.

Fodor, J. A. (2000). The mind doesn’t work that way: The scope and limits of computational psychology. MIT Press.

Fodor, J. and Pylyshyn, Z. (1988). Connectionism and cognitive Architecture. Cognition, 28, 3-71.

Gentner, D., Rattermann, M. J., Markman, A., & Kotovsky, L. (1995). Two forces in the development of relational similarity. In T. J. Simon & G. S. Halford (eds.) Developing cognitive competence: New approaches to process modelling, (p. 263-313). Lawrence Erlbaum Associations.

Hinton, G. E., & Shallice, T. (1991). Lesioning an attractor network: Investigations of acquired dyslexia. Psychological Review, 98, 74-95.

Jacobs, R. A. (1999) Computational studies of the development of functionally specialized neural modules. Trends in Cognitive Sciences, 3, 31-38.

Rumelhart, D. E. & McClelland, J. L. (1986). On learning the past tenses of English verbs. In D. E. Rumelhart, J. L. McClelland, and the PDP Research Group (1986). Parallel distributed processing: Explorations in the microstructure of cognition, Volume 2, (pp. 216-271). Cambridge, MA: MIT Press.

Rumelhart, D. E., and Todd, P. M. (1993). Learning and connectionist representations. In D.E. Meyer and S. Kornblum (eds.), Attention and performance XIV (pp. 3-30). Cambridge, MA: MIT Press/Bradford Books.

6. Catastrophic interference appears to be an inherent feature of distributed representations. Catastrophic interference (or ‘forgetting’) is where the overlap of several cognitive representations within the same connectionist architecture leads to interference between the representations; when one set is learned after another, over-writing of the older knowledge with the more recent knowledge can occur. However, catastrophic interference does not appear to be a characteristic of real neural systems. Moreover, unlike connectionist networks, the cognitive system can learn and retrieve stimuli after only one training trial. Does this disparity undermine the ecological validity of the simple computational models currently used to capture cognitive phenomena?

[Miclea, M., Curşeu, P., (2003), Neurocognitive Models, ASCR, Cluj-Napoca]

I think catastrophic interference is informative about the properties that different neural systems should have to perform different kinds of task, specifically, in this case, semantic versus episodic memory. However, I don’t think it is a serious problem for connectionist networks as cognitive models per se.

First, I suspect the severity of catastrophic interference is sometimes exaggerated by the extreme conditions presented in some of the early simulation work (e.g., McCloskey & Cohen, 1989; Ratcliff, 1990; see French, 1999, for a review of work on catastrophic interference and approaches to avoiding it).

Next, interference is exactly what you want under some conditions, but not others. In an episodic memory system, you certainly don’t want interference. You want a distinct memory of yesterday, not a general abstract prototype of what yesterdays are usually like based on superposition of lots of exemplars of yesterday-representations. To achieve an episodic memory system, you should use a network with localist / non-overlapping representations, and fast, one-shot learning (i.e., use a large learning rate and only one presentation of each pattern). By contrast, you do want interference in a semantic memory, where the aim is to create abstract categories. You want a general, prototypical category of DOG which throws away many of the details of the individual dogs you have experienced. The interference process serves to lose the details. To achieve a semantic memory system, you should use a network with distributed / overlapping representations, with multiple presentations of different training exemplars, and slow learning, so that the system converges on an average of the exemplars.

In short, the implication is that there will be multiple learning systems in the brain, some of which make use of catastrophic interference, others that avoid it. Exploration of this notion of complementary learning systems, and the brain systems that support each, may be found in McClelland, McNaughton and O’Reilly (1995), and O’Reilly and Norman (2002).

There are some wrinkles to this story, of course. First, if you have two (or more) memory systems, you need some mechanism for shifting knowledge between them (e.g., shifting information from ‘the dog I remember seeing yesterday’ to update my general concept of DOG). As I pointed out in my answer to Question 2, moving about information in multi-network connectionist systems is not easy because knowledge is laid down in the structure / connections of the individual networks. Various proposals have been put forward on how to shift information between stores, including the idea that dreaming may be involved in this process. (See French (1997) and French, Ans and Rousset (2001) for one specific proposal using the spontaneous generation of “pseudo” knowledge.)

Second, in terms of representations, the distinction between distributed and localist is not dichotomous. Instead there is a continuum of sparseness, depending on how many units are involved in coding a concept. Furthermore, localist representations can generalise if there are no constraints on how many can be active at once (e.g., during recall, units are not activated on a winner-takes-all basis). For example, Rumelhart and McClelland’s (1981, 1982) original interactive activation of word recognition could respond to novel words by activating a gang of existing localist word units, with the activation of each word determined by its similarity to the novel input. In this capacity, abstract concepts can be represented in the form of an exemplar-based memory model (see Page, 2000, for a discussion of localist approaches to connectionism).

Third, sometimes catastrophic interference effects can be found in humans, for instance in infant categorisation. Indeed, connectionist models have been successful in simulating these patterns of behaviour (Mareschal, Quinn, & French, 2002). I suspect catastrophic interference can also be found in adult long-term memory under certain conditions, as well. For instance, in my work on bilingual language processing, it became apparent that when individuals immerse themselves in intensive acquisition of a second language, systematic interference effects emerge in the performance in their first language. And when a second language is abandoned, it is gradually forgotten in a way that depends on the similarity of the structure of the second language to the first language.

Lastly, the issue of catastrophic interference does highlight that there are outstanding problems to be solved in the way that connectionist models map onto the cognitive system. For instance, error-driven training (such as in backpropagation) assumes a target output for each input. Where does this target come from? What separate memory system stores the target output for each input during training? Take the past tense model: does the child learn lots of present-tense / past-tense word pairs, store those in a rote memory, and then train up their inflectional morphology system at some later time? This appears to be unlikely, but it is what is implied by a literal reading of the implemented model. A more realistic notion is that the child is always generating expectations about what will happen next as they interact with or observe the world, and differences between their expectations and what actually happens in the world generates the training signal. In this case, the ‘target’ in supervised learning is stored in the world.

So there is work to be done in fleshing out connectionist accounts, but at the moment, I don’t believe that the phenomenon of catastrophic interference offers a knockdown argument against the validity of connectionist models for capturing cognitive processes.

References

French, R. M. (1997). Pseudo-recurrent connectionist networks: An approach to the “sensitivity-stability” dilemma. Connection Science, 9(4), 353-379.

French, R. M. (1999). Catastrophic Forgetting in Connectionist Networks. Trends in Cognitive Sciences, 3(4), 128-135.

French, R. M., Ans, B., & Rousset, S. (2001). Pseudopatterns and dual-network memory models: Advantages and shortcomings. In R. French & J. Sougné (eds.), Connectionist models of learning, development and evolution, (p13-22). London: Springer.

Mareschal, D., Quinn, P., & French, R. M., (2002). Asymmetric interference in 3- to 4-month olds’ sequential category learning. Cognitive Science, 26, 377-389.

McClelland, J. L., McNaughton, B. L. & O’Reilly, R. C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102, 419-457.

McClelland, J. E. & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: Part 1. An account of basic findings. Psychological Review, 88, 375-107.

McCloskey, M. & Cohen, N. J. (1989). Catastrophic interference in connectionist networks: The sequential learning problem. In G. H. Bower (ed.), The psychology of learning and motivation. Vol. 24. New York: Academic Press.

O’Reilly, R. C. & Norman, K. A. (2002). Hippocampal and neocortical contributions to memory: Advances in the complementary learning systems framework. Trends in Cognitive Sciences, 6, 505-510.

Page, M. (2000). Connectionist modelling in psychology: A localist manifesto. Behavioral and Brain Sciences, 23(4), 443-512.

Ratcliff, R. (1990). Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychological Review, 97, 285-308.

Rumelhart, D. E., & McClelland, J. L. (1982). An interactive activation model of contrast effects in letter perception. Part 2. The contextual enhancement effect and some tests and extensions of the model. Psychological Review, 89, 68-82.

7. Taking into account the “huge” technical and technological development and, one hopes, the exponential growth of facilities for developing more and more complex computational models, how do you see the field of connectionism developing over the next five years?

What innovations are to come and will these emerge as a consequence of greater computational power? I’ll make seven predictions:

(i) A greater focus on the study of multi-component systems, where global behaviour is driven by interactions between sub-networks, in a distributed network of networks. My own interest at present, for instance, is how one captures the process of emerging specialisation of function from a system with initially more homogeneous computational substrate into multiple (perhaps partially) specialised components.

(ii) Greater convergence with functional neuroimaging, with increasing evidence to emerge from functional connectivity of brain areas during cognition. In addition, convergence with genetics in the constraints that govern changes in neurocomputation across development.

(iii) With more powerful computers on desktops, an attempt to scale up models to tackle more realistic problem sets (for instance, in language, or in perception). This may require new learning algorithms, because backpropagation may not scale very well. It is likely larger scale systems will employ sparser representations.

(iv) Greater use of recurrent and interactive systems, now we have the power to simulate cycling activation on a finer temporal scale.

(v) A possible divergence in computational formalisms, with some researchers using algorithms that buy in further principles from neural functioning (e.g., the separate but interacting roles of local electrical signalling vs. more widespread chemical modulation of neurotransmitter roles).

(vi) A tension between model size and model opaqueness. To the extent that connectionism is about building models, it is important to remember what models are for. (Though note, some would argue that connectionism is defined by a commitment to a set of processing principles rather than the models per se; see Seidenberg, 1993, Thomas and Stone, 1998, for discussion). The role of modelling is simplification. It is a tool to advance understanding. If models become much larger with many sub-components, there is a risk we won’t understand why they function as they do. This means they can no longer serve to inform theories, and merely become impressive cultural artefacts like lava lamps and Rubik’s cubes. So I predict a tension: more complex models, but a struggle to understand how these models work. Modellers must wield their new power with great care!

(vii) A continuation of the reshaping of psychological theorising. Connectionist flavoured theories will incorporate notions like ‘attractor basins’, ‘multiple soft constraint satisfaction’, ‘trajectories through state space’ and so on. They will have a lively interest in the details of the interaction between the structure of the environment and the processing constraints of the cognitive system. Connectionist psychology will adopt a form where there can be a constructive dialogue with neuroscience and theories of large-scale brain activity. In the future, there will be many more scientific papers with coloured diagrams of phase space, neural networks, and brains!

References

Seidenberg, M. S. (1993). Connectionist models and cognitive science. Psychological Science, 4, 228-235.

MT 27/05/2004

(Interview by Camelia HANGA and Ramona MOLDOVAN)