Symbolism vs. Connectionism: A Closing Gap in Artificial Intelligence

Abstract

AI was born symbolic and logic. The pioneers of AI have formalized many elegant theories, hypotheses, and applications, such as PSSH and expert systems. From the 1980s, the pendulum swung toward connectionist, a paradigm inspired by the neural connections in brains. With the growing amount of accessible data and ever stronger computing power, connectionist models gain considerable momentum in recent years. This new approach seems to solve many problems in symbolic AI but raises many new issues at the same time. Which one is better to account for human cognition and more promising for AI? There’s no consensus reached. However, despite their vast difference, people began to explore how to integrate them together. For example, many hybrid systems have been proposed and experimented. Other people see them residing at different levels of one unified hierarchical structure. In recent years, it is increasingly realized that the gap is closing, simply because there’s no gap at all from the beginning. In essence, all connectionist models have symbolic components, and all symbolic models have mathematical mechanisms. The debate is dying down, opening up new opportunities for future hybrid paradigms.

Keywords: artificial intelligence, connectionist, symbol manipulation, neural networks.

  1. Introduction

Artificial Intelligence (AI) is a field that is very difficult to define. Indeed, it is less a field than a collection of diverse fields. John McCarthy, along with Marvin Minsky, Nathan Rochester, and Claude Shannon coined this term “AI” in the proposal to the Rockefeller Foundation for the Dartmouth Conference in 1956, in which they defined AI as the attempt to “find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.”[1] This first definition obviously focused on the symbolic capacities. In the half century after that, a whole landscape of sub-disciplines has been developed and are continuously thriving to create new disciplines today. In general, four ways can be identified when sorting through the definitions of AI—thinking humanly, thinking rationally, acting humanly, and acting rationally. They differ radically not only in goals but also in approaches. For example, to act like humans, computers must obtain abilities such as natural language processing, knowledge representation, automated reasoning, and machine learning. On contract, the attempts to enable computers to think humanly include understanding how the human mind works.[2]

Whether thinking or behaving is concerned, the central problem of AI is how knowledge is represented, encoded, and processed. In this respect, one main debate is the dichotomy of symbolic and connectionist paradigms. This discussion originates from cognitive science, where scientists are arguing over if the nature of human cognition is symbolic or distributed.[3] The human mind is undisputedly symbolic. Down to the physical level, however, human brains are composed of a large number of mostly homogenous neurons, whose connections and firing patterns are used for memories storage, concepts generalization, logic reasoning, and information representation.[4]

Likewise, although AI was born symbolic, with most of the early pioneers pondering on how to structurally code knowledge and reasoning processes into machines, the alternating periods of media hype and harsh “AI winters” shattered people’s prospect about symbolic AI. The recent revival of AI in the 21st century was characterized by connectionism, particularly, neural networks, which root in Frank Rosenblatt’s work of perceptron in the 1950s and which was established in the 1980s by David Rumelhart, James McClelland, etc.[5] It gains fresh momentum recently due to larger bodies of accessible data and stronger computing powers, which allow neural networks to learn much more quickly. Now, novel connectionist techniques like deep learning are changing many industries with each passing day, arguably rendering symbolic approaches, the so-called Good Old-Fashioned AI (GOFA) obsolete.[6]

Despite the prominence of recent successes, connectionism is never a panacea to achieve human-level intelligence, nor an account for human cognition. It has drawn criticisms from philosophers, cognitive scientists, and practitioners. Connectionist models need a vast number of examples for training. On the contrary, humans can learn from very few, or even single events. Also, connectionist models are mostly black-boxed, resulting in unexpected errors. Moreover, while connectionist models are good at some specific tasks, they are not particularly useful in building human-level AI, or artificial general intelligence (AGI) that is capable of completing a wide range of tasks in an appropriate fashion. Hence, entirely giving up the symbolic approach is like throwing away baby along with the bathwater. Many people began to realize that symbolic and connectionist approaches are not competing, but complementary. Various methods to integrate the two have been explored. Therefore, it is of necessity to look into the history of and the gap between symbolic and connectionist AI, so that roads to future success could be found.

In this paper, I’ll review the history, principles, and current status of both approaches and the debate between them. I’ll examine some attempts to integrate them together. I’ll conclude that the two paradigms are mixing from the beginning, so the debate is actually a pseudo-debate. Future exploration should not focus on the difference between them, but continue to look for ways of integration.

  1. Symbolic AI

The richness of human culture and technology roots in the co-evolution of human brain with symbolic cognition, making us the “symbolic species.”[7] A broad consensus is that symbolic faculty, regardless of its origin, is what distinguishes humans from the rest of the animal kingdom. As American philosopher Charles Sanders Peirce put it, “we think only in signs.” By manipulating symbols, numerous achievements have been accomplished throughout human history, from language to music and computer, which, from the very beginning of the Difference Engines, is a machine that is not only built upon symbols, but also using symbolic tools such as programming language and graphic interface[8] to manipulate symbols like numbers, notes, and logical symbols.[9] Symbolic AI was conceived in the attempts to explicitly represent human knowledge in facts, rules, and other declarative, symbolic forms.

  • What is a symbol?

A symbol is a pattern that stands for other things. The target could be some object, another symbol, or some relation.[10] While Swiss linguist Ferdinand de Saussure considered a sign as composed of a signifier and a signified, C. S. Peirce proposed a more precise, triadic model that includes the representamen, the interpretant, and the object.[11] Also, he distinguished among three modes of signs—iconic, indexical, and symbolic, which are not mutually exclusive but co-exist in one sign.

In human societies, signs are ubiquitous. The nature of human language is the organization of signs.[12] Individual signs have limited ability to convey meanings unless embodied in a large sign system. In other words, the relations among signs in sign systems, or the structure of semiotic spaces matter more than individual signs themselves. As Minsky suggested in The Society of Mind, “the secret of what something means lies in how it connects to other things we know… A thing with just one meaning has scarcely any meaning at all.”[13]

By manipulating symbols, we understand the world. For example, we assign a symbol to each character (the physical level) in mythological stories. Then, a new level (the cognitive and social level) of symbols, e.g., the abandonment and bond between characters, emerge from the interaction among the symbols in the physical level. And then, a third level (the narrative level) of symbols can be abstracted from the second level, for instance, the transition from harmony to disharmony.8 By abstracting symbols from lower levels to higher levels, we form abstract concepts and find universal meanings. This is why Joseph Campbell argues there’s only one hero in all human mythologies,[14] who can be seen as the ultimate symbol at the highest level of abstraction.

  • Physical Symbol System Hypothesis

In 1975, Allen Newell and Herbert Simon proposed “physical symbol system hypothesis” (PSSH) while accepting their Turing Award, arguing that “Symbols lie at the root of intelligent action, which is, of course, the primary topic of artificial intelligence.”[15] They considered physical symbol system the “necessary and sufficient means for general intelligent action.” In other words, physical symbol system is deemed the only way toward AGI.

According to PSSH, a physical symbol system (PSS) is a physical computing device for symbol manipulation,[16] which consists of discrete symbols. Symbols, in turn, form expressions, or symbol structures through some sort of physical connections. Physical structures often work as internal representations of the environments.[17] Besides, a physical symbol system also contains a set of processes that “operate on expressions to produce other expressions.”15 Thus, computers and human minds are both physical symbol systems, albeit operating on radically different hardware. In other words, PSSH implies that the existence of symbolic-level computing in a system is independent of the physical substrate it operates on.10

  • Symbolic representation

Representation, or semiotic morphism, is the mapping from one sign system to another. For example, as Joseph Goguen discusses, a user interface can be seen as a graphic representation/morphism that maps between the arrangement of pixels on a display screen and the underlying functionality.[18] Knowledge representation is to represent information about the world in a system in a way the system can employ to store and retrieve old information, infer new knowledge, and perform complex functions. Knowledge representation is one of the main problems in AI.

Symbolic approaches represent knowledge in a highly structured fashion, which can be traced back to the works of pre-AI logic theorists who were trying to develop rule-based systems for knowledge expression and inference. The basic units of symbolic representation are symbolic atoms, e.g., specific words or concepts. This representation paradigm is also called localist as opposed to distributed representation in connectionist models.

An early example is C. S. Peirce’s existential graph (EG). It was developed around 1896 as a graphic logic system.[19] Figure 1 is one of Peirce’s examples of EG, representing the sentence You can lead a horse to water, but you can’t make him drink. There are three sets—a person, a horse, and some water, which are connected by linked bars that represent existential quantifiers. The shaded area represents possibility, while the shaded area inside an oval means impossibility.[20]

Figure 1: C. S. Peirce’s EG example for the sentence “You can lead a horse to water, but you can’t make him drink.”

EG is symbolic in the sense that it uses individual nodes and arcs to represent different concepts and their relationships. It captures the aggregate structures of knowledge as appose to disperse them across structures with homogeneous nodes. Other similar theories include Marvin Minsky’s idea of frames, which gives each object or event—namely a slot—attributes of a value and a type and arranges them into a large taxonomic, hierarchical structure.2

Another set of symbolic approaches that is worth noting is the semantic networks introduced by Ross Quillian. Semantic networks use graphic notations to represent individual objects and categories of objects. The notations include nodes that are connected by labeled links, which represent relations among objects. For example, Figure 2 is a semantic network consisting of four objects, four categories, and labeled links.

SubsetOf links Mammals with Persons, while MemberOf links Mary with FemalePersons. For this network, we can assert that persons have two eyes: 2

Unlike many rigorous logical approaches, semantic networks are flexible. As we can see from Figure 2, Bob only has one eye instead of two eyes. Even though Bob is a member of MalePersons, which is a subset of Persons, whose members have two eyes as a default, this default can be overridden by a more specific value. Therefore, if Bob only has one eye, a new category OneEyedPersons can be created as a subset of Persons. So, we can say that there is an exceptional Eyes assertion for Bob in Persons:

One competing symbolic school of semantic networks is description logic. Its principal tasks are subsumption and classification as oppose to knowledge visualization. However, scholars are increasingly aware of the unnecessariness to distinguish logic from semantic networks because the latter is a form of logic as well and both of them are built on symbolic concepts.2

  • Pro and con of symbolic approach

Symbolic paradigm is criticized for many reasons, especially when connectionist models are showing more promises in recent years. For example, many symbolic structures need manual coding, which is costly and cumbersome, resulting in the inability to change dynamically and capture the complexity of the real world.[21] Some people even believe that PSSH is empirically unfalsifiable.[22]

Nils Nilsson identified several attacks against symbolic AI and gave responses to them respectively.[23] The first attack comes from Hubert Dreyfus and others,[24] criticizing the “disembodied abstractness” of symbolic AI.21 In other words, it is believed by some people that an essential component of intelligence is a physical body that interacts with the environment through perceptions and behaviors, “grounding” the symbols to the world and giving them meanings. Purely manipulating symbols misses this “embodiment” process. Nilsson argues that it’s far from clear whether symbolic grounding is necessary for intelligence. For example, the knowledge in expert systems has no direct connections with real-world objects. The second attack focuses on the fact that intelligence requires non-symbolic processing. Nilsson and others admit the necessity of non-symbolic process but argue that the relationship between symbolic and non-symbolic process is supplement instead of replacement. The third criticism concentrates on the difference between brains and computers, arguing that computers will never be the same as brains. Nilsson warned that we should not be confined by current computational models and he believes the increasing understanding of the brain will surely bring us ever new computational models. His opinion echoes with Peter Denning’s reflection on computation. Early, classic definitions of computation were offered by Kurt Gödel, Alonzo Church, Alan Turing, and others. Later, textbooks define it as the “execution sequences of halting Turing machines,” and then “information processes generated by algorithms.” However, as Denning argues, today, computation must be considered as either natural or artificial, and the definition should shift from computers to a more fundamental model—representation-transformation, which focuses on information process.[25] Therefore, it would be irrelevant to focus on how computers are different from brains because they both involve computation processes, or, as Newell and Simon claim with PSSH, both brains and computers are essentially physical symbol systems that can give rise to intelligence.

Another criticism to symbolic approaches is called “Boolean dream,” which proposed by Douglas Hofstadter. He thought the problems explored in symbolic paradigm are too simple from a neural science point of view, unable to provide rich insight for the computational organization of the brain. The Boolean dream means to assume that rules can always lead from true statements to other true statements and to see thinking as the manipulations of propositions. Hofstadter considers this attempts of logically formalizing thought and common sense in computers an “elegant chimera,” because common sense is “not an ‘area of expertise.’”[26] The problem lies in the brittleness of logic and often leads to rigid systems.[27] And also, symbolic models (such as Newell and Simon’s General Problem Solver, 1972) is only successful at coarse levels, unable to account for the detailed structure of cognition.16

However symbolic AI is criticized, it is far away from irrelevant to AI’s recent success, because a lot of powerful tools were developed directly from symbol manipulation, without which, connectionist models would not have been so successful, for instance, the Monte Carlo tree search (MCTS) used in AlphaGo,[28] and the hierarchical conceptual ontology of 160 concepts in IBM Watson’s grammar rules.3 I’ll discuss these hybrid models in section 5.2.

  1. Connectionism

Connectionist models refer to those bio-inspired networks composed of a large number of homogenous units and weighted connections among them, analogous to neurons and synapses in brains.[29] The strengths of the connections reflect how closely the units are linked and can be strengthened or weakened dynamically by new training data. The main task of connectionist paradigm is to tune the weights until the optimum is reached through techniques like gradient descent.

Beside computer scientists and cognitive scientist, people from many other areas show deep interests in connectionism as well. Philosophers think connectionism might be able to provide a new framework to understand the nature of human mind. Linguists believe it may shed light on how humans acquire language in both individual development and collective evolution. Neural scientists think it has potential to simulate the way brains work. So, connectionist models have drawn a lot of attention since its birthday.

  • A brief history of connectionist paradigm

The origin of connectionist models can be traced back to 1943, when Warren McCulloch and Walter Pitts of the University of Chicago found that the signals in the neural nets in brains can be modeled by logic expressions, exhibiting digital properties.4 It’s natural to think if we can simulate this structure and its behavior with artificial neurons, some degree of intelligence will be developed in computers. In 1958, Frank Rosenblatt of Cornell University named his experiment on artificial neural nets “perceptron.” His model had 400 photocells connected to 1,000 perceptrons and was only able to complete very simple tasks.[30] One decade later, Marvin Minsky and Seymour Papert heavily criticized perceptron, terribly dampening the enthusiasm in this direction,[31] until the late 1980s and early 1990s, when new connectionist models resurfaced through works of people such as Rumelhart, McClelland, and Paul Smolensky. In 1986, Rumelhart and McClelland created a connectionist model to predict the past tense of English language.5 They also rediscovered the famous back-propagation algorithm initially proposed by Paul Werbos in the 1970s. Even though their model was not remarkably accurate, as criticized by Steven Pinker, Allan Prince, and others,[32] this paradigm soon gains momentum because of the growing amount of accessible computerized data. Before long, St. John and McClelland built a system arguably without any trace of symbolic approach at all but it exhibited symbolic properties, as claimed.[33]

In the recent decade, the connectionist paradigm is developing very rapidly, thanks to the growing body of accessible data, the stronger computing power, and the unprecedentedly ubiquitous sensors. In 2004, computer scientists found that GPUs can be used to perform large distributed computing. Three scientists—Geoffrey Hinton, Yann LeCun, and Yoshua Bengio led to break the bottlenecks in 2006. Since then, new algorithms and techniques such as deep learning are emerging without end, bringing about considerable commercial successes in areas including image registration, speech recognition, and machine translation.

  • What are artificial neural networks?

Artificial neural network (ANN) is composed of identical nodes called artificial neurons—one layer of input nodes, one or more layers of hidden nodes, and one layer of output nodes. The more the hidden layers, the deeper the network becomes (see Figure 3). This is where the name “deep learning” comes from. Each node is connected to every node in the higher and lower neighboring layers. The connections among nodes are measured by weights (), where  is the number of features. Each node has a threshold, which, together with the sum of the input weights and the activation value calculated through activation function, decides whether this node activates or not.

An ANN requires training before setting foot in the wild. Input data  (feature vector ) is fed into the network from the input layer, through the hidden layer, and ultimately arrives at the output layer, outputting , where b is the threshold and sigmoid function . The difference between the output () and the desired result () is measured by loss function (), whose average is called cost function (). At the beginning of the training process, weights may be set randomly. Using optimization methods such as gradient descent, the cost function is gradually tuned to minimization. A feed-forward network transmits data only in one direction, while a recurrent network can feed the output into the input nodes, resulting in something analogous to short-term memory.

ANN has two kinds of learning—supervised learning, and unsupervised learning. The former requires the training data to be labeled beforehand while the latter doesn’t.

  • Connectionist representation

Unlike localist representation in symbolic AI, connectionist models use distributed methods to represent knowledge. In other words, concepts are represented in terms of a set of numbers, vectors, matrixes, or tensors. It addresses one problem in localist representation—if one node denotes one specific concept, then where does the meaning of the concept come from? In distributed representations, individual concepts are represented across multiple nodes. For example, in a symbolic representation, the concept of cat may be represented by one “cat node,” or a set of nodes that represents properties of cats, for instance, “two eyes,” “four legs,” and “fluffy.” In distributed representation, however, individual nodes don’t signify specific concepts. It is impossible to find a “cat node” or a “grandma neuron.” The concept of cat is represented by a specific pattern of activation across the network (see Figure 4). Each node participates in the representation of every concept.

Figure 4: Symbolic apple vs. connectionist apple (credit: Marvin Minsky)

Connectionist representation is evident in human brains. For example, Tom M. Mitchell and colleague at Carnegie Mellon University have demonstrated that the neural firing patterns in the fMRI images are very similar when the subjects are shown the same pictures. Moreover, the patterns remain largely alike across words and pictures. In other words, neural representations can give rise to semantic meanings.[34] Therefore, many people believe, non-symbolic, distributed representation is not only perfect for practice but also adequate to explain human cognition.[35]

  • Pro and con of connectionist models?

Connectionist models are desirable in many ways. With widely accessible data and powerful GPUs, they are easy to build.3 They are good at “best-match” problems and are very robust. In addition, since knowledge is distributed across the network, information is relatively difficult to destroy, so they are resilient to data noise, damage, and information overload. Moreover, connectionist models are much more flexible than symbolic representation, and it is promising to ground representation in the interaction with the environment. Also, connectionist models can deal with “incomplete, approximate, and inconsistent information,”21 as well as exceptional situations better than symbolic approaches.

Neural networks are inspired by brains. They have facilitated our understanding of human intelligent phenomena such as dreams, and have captured a lot of elegant features of human cognition. For example, human visual cognition is a hierarchical structure, where lower levels recognize features of pixels such as brightness and darkness to form higher levels of features such as edges and objects, and finally reaching the semantic level—human faces and cats. Neural networks can simulate this hierarchical structure, extracting features from lower layers and outputting abstract, semantic information (see Figure 5) in the higher layer.[36]

Figure 5: ANN extracts features in hierarchical structures (credit: Andrew Ng)

However, this capturing of the human mind of neural networks is criticized to be oversimplified and crude, being incompetent to account for human cognition. Indeed, it abstracts away many important features of the brain, such as neurotransmitter and hormone. Moreover, neurons in brains are not homogenous. For instance, a giant neuron that was found wrapped around entire mouse brain in early 2017 may shed light on how consciousness arises in brains.[37] These missing details may matter significantly in mind. Also, many features in neural networks, such as back-propagation, do not necessarily exist in brains. Gary Marcus has done some experiments on infants mastering artificial grammars and found the hierarchical feature detectors mentioned above can’t explain inferences to new cases in language. Although the models are deep and sophisticated, its unstructuredness limits the insights it can offer to high-level cognition.42 Therefore, as Pinker and Prince put it, neural networks perform poorly in high-level, rule-based processing, such as language and inference.32

Another criticism of connectionist models is their requirement for a huge number of training data to capture simple concepts that humans can grasp with very few or even single examples. Being short of innate, symbolic structures, connectionist models rely heavily on the underlying statistical distribution of training data. They also need massive computing power and storage space. For example, one Google research in 2013 trained a neural network to recognize faces. The model had 1 billion connections and was trained on a data set containing 10 million images with 1,000 machines (16,000 cores) for three days.[38] AlphaGo was trained on 30 million different games, more than any human can play in one lifetime. Moreover, connectionist models are only good at very specific tasks. Their training outcomes are often uncertain, resulting in problems like overfitting. As Ali Rahimi ironically put it at NIPS 2017 conference, “machine learning has become alchemy,”[39] echoing with Herbert Dreyfus’s 1965 report Alchemy and Artificial Intelligence.

  1. The Debate

AI was born symbolic. In the 1960s and 1970s, the pendulum swung to knowledge representation and cognitive intelligence. At that time, the importance of symbol manipulation might be overemphasized. In the two decades after that, it swung toward dynamics and embodiment. Now, AI seems increasingly far away from symbolic concepts.[40]

  • Which one accounts for human cognition?

It is believed that, if a model can explain human mind, then it will provide significant insight to AI. So, much of the debate focuses on how well the two paradigms account for human cognition. It is argued that symbolic, relational structures are necessary and effective in human representations. For example, Dedre Gentner’s structure-mapping theory provides psychological evidence for the significance of analogy and similarity in human cognition. Analogy and similarity heavily rely on symbolic, relational representations.[41] Also, psychological research shows that symbolic models can explain various psychological phenomena, such as analogy in problem-solving and moral decision-making.3 And neural science studies show that many important structures of the brains are formed prior to experience. Even fetuses have some degree of symbolic capacities and can respond to face-like stimuli.[42] [43]

On the contrary, connectionist models are criticized for its inadequacy to capture human cognition, even though it is inspired by brains. As Forbus, Liang, & Rabkina point out, feature vectors perform poorly when scaling up. Also, the scale of training data of deep learning is so large that it can’t account for human learning and it often makes terrible, unexpected mistakes that will never occur in humans, such as misclassifying images only because some pixels are distorted in an imperceptible way,[44] and recognizing random noise as well-formed objects.[45] Moreover, connectionist models are largely black-boxed, obscuring what they have learned and how they work. That’s because, as stated in section 3.4, although connectionist models are inspired by brains, they abstract away many significant features.

But not all people believe connectionist models cannot capture human cognition. We have talked about the hierarchical structure of visual cognition in section 3.4, which is elegantly captured by some pattern-matching neural networks. Also, some radical connectionists believe connectionist models can account for many phenomena of human cognition, such as the holistic representation of data, the appreciation of context, and spontaneous generalization, which are badly modeled in symbolic models.29

Another bone of contention is a property of human intelligence called systematicity, the ability to have semantically-related thought given a certain thought.[46] In other words, in the human mind, connected knowledge is preferred over independent facts.3 Fodor and Pylyshyn argue that connectionist models can’t guarantee systematicity because they can be trained to be either systematic or non-systematic, so, these models can’t explain why systematicity exists in human cognition. Advocates of connectionism offered responses. For example, some people argue that symbolic models are no better than connectionist models in explaining systematicity, and neither of them can do this on its own.

  • Level Theory

Not all people see the two as two extremes on a spectrum. Many people think they reside in different levels of the same hierarchical structure, dealing with things at different scales. This could be seen as an inference of PSSH. As Herbert Simon argues, the lower neural level deals with events shorter than hundreds of milliseconds in a parallel fashion, such as perception and motor control, while the higher symbolic level deals with events longer than half a second with a serial manner, such as logic reasoning. All of our thinking, emotions, and motivations are at the symbolic level, independent of the implementational details of the physical, neural substrate. The parallelism at the neural level doesn’t necessarily imply that symbolic level is conducting parallel processing as well.10

Level theory looks good, but it’s not clear what in between the two levels. Smolensky goes further. He argues, between symbolic level and neural level, there is a middle level called subsymbolic level, where the detailed neural structures are absent, but which formalizes at some level of abstraction. The semantically interpretable entities at the subsymbolic level are activation patterns, instead of symbols at the symbolic level and neurons at the neural level. The subsymbolic view may provide a useful tool to account for cognition, but it is essentially a connectionist model as well.

  1. Ways towards integration

The community has not reached consensus on how to deal with the debate. As Minsky put it, to ask which approach is best to pursue “is simply a wrong question,” since each has advantages and flaws.[47] Some people think the two can’t be combined well, so they prefer leaving them to what they are good at. For example, Jerry Kaplan of Stanford University argues that symbolic AI is more suitable for the problems that need abstract reasoning, while connectionist models are better at problems that need interacting with the world or extracting patterns from massive, disordered data. He takes riding bikes as an example, stating that it is a problem fitting for neural networks but not symbolic approaches. Indeed, it is very difficult to explicitly represent the expertise a robot needs to ride a bike. Likewise, tasks like formal verification and model checking are better suited for symbolic AI.

However, most real-world problems are distributed on the spectrum between the two extremes. In other words, in order to perfectly solve real-world problems, some sorts of hybrid systems are needed. As Minsky put it, “we must develop systems that combine the expressiveness and procedural versatility of symbolic systems with the fuzziness and adaptiveness of connectionist representations.”47

  • Implementational Connectionism

One view of integrating the two is called implementational connectionism. It stems from the level theory mentioned in section 4.2 and looks for an accommodation between the two. The basic idea is, symbolic approach and connectionist approach is not competing, but reside at different levels. Symbolic processes are implemented at higher levels by the connectionist operations at lower levels. In other words, the human brain is both a connectionist network and a symbolic processor.29 So, the task of implementational connectionists is to explore how to build AIs that are capable of symbol manipulation from neural networks.

In his The Algebraic Mind, Gary Marcus discussed in detail whether implementational connectionism is promising enough to account for human cognition. He uses one simple connectionist model—multilayer perceptron to test its accountability for three core tenets of symbol manipulation. Three limitations of multilayer perceptron were found: it cannot freely generalize abstract relations; it cannot represent complex relations between knowledge; it cannot distinguish between kinds and individuals. These limitations of this model, Marcus argues, can help us build better models in the future.[48]

  • Hybrid systems

Although some people think the two paradigms cannot be combined well, a lot of hybrid systems have been explored. Hybrid systems are composed of two components that are computationally separate, allowing them to interact with each other. The connectionist component is used to solve low level tasks such as pattern matching and generalizing from noisy data. The symbolic component is used for “inherently symbolic” tasks. For example, Stevan Harnard developed a connectionist network to ground symbols.[49] Daniel Kahneman distinguishes two cognitive systems, one of which is fast, automatic, and subconscious, while the other slow, rule-governed, and conscious, but both of them involve distributed representation.[50] Risto Miikkulainen built the DISCERN network that uses distributed neural network to model symbolic information of story understanding.[51] Smolensky invented a tensor product method where symbolic information is stored at connectionist “locations.” Other examples include James Hendler’s hybrid model that takes a semantic network and connects the instance-nodes to a distributed network and Wendy Lehnert’s model that could give a unified syntactic and semantic account for parsing.[52] More recently, Chris Eliasmith proposed a massive connectionist architecture, using so-called semantic pointers that possesses features of classical variable binding.[53] In the recent two decades, neural-symbolic computation, a new hybrid approach exhibits potential to overcome the problem of propositional fixation of neural networks, being used in fields like bio-information and video games.[54] Marvin Minsky also gave some suggestions on how to build hybrid systems, for example, by forming localized clumps of expertise at some higher levels of connectionist models by weakening the density of connection paths between agencies.47

The components in hybrid systems can be coupled in a loosely or tightly way. Either way, there could be a dominant module that controls other modules. For example, the neural network of symbolic expert systems could execute certain decision making. Or, all modules could be equal, in charge of different tasks.

Building hybrid system needs to solve many challenges, such as architecture and learning. It is important to integrate different representations and processes together. Also, the learning of generalizations of symbolic rules is not well-understood yet. In the workshop of High-Level Connectionist Models in 1988, Michael Dyer argued that hybrid systems should proceed on at least four levels—knowledge, spreading activation semantic network, connectionist, and neural—with a focus on how each level compiles with other levels. 52

  • Pseudo-debate

Nevertheless, recently, the debate seems dying down. In one hand, more and more hybrid systems are born. Merely focusing on the divergence instead of their possibility to integrate is no longer productive. On the other hand, it is increasingly realized that the gap between the two seems closing simply because there’s no gap at all from the beginning. Here are some observations.

In essence, all connectionist models are hybrid systems, having symbolic components to some extent. Symbolic AI has given rise to a lot of powerful tools, such as logical inference, constraint satisfaction, and NLP, which are widely used in every connectionist model. For example, AlphaGo incorporates MCTS, a technique that originated directly from the symbolic paradigm. At the same time, symbolic models are never purely-structured, but combining mathematical information together, such as semantic networks. Another example is IBM’s Watson, which demonstrates that symbolic models could be parallel, incorporate non-logical mechanisms, and depend on probabilities.

Most of the training data for connectionist models are structured and symbolic. For example, ImageNet, one of the largest visual database for recognition software to train and test on, had its pictures manually labeled by 49,000 people from 167 countries on Amazon Mechanical Turk crowd engineering platform over four years. The labels are definitely symbolic. Therefore, every system trained on such dataset incorporates human symbolic processes.

In general, connectionist models need human intervention, which is fundamentally symbolic. Today, machine learning needs humans to prepare data, set initial conditions, run training algorithm, inspect and evaluate the results, and decide whether to run it again. For example, the training process of AlphaGo includes “a small number of handcrafted local features [which] encode common-sense Go rules.”28 Even unsupervised learning, which allows computers to learn largely by themselves, requires humans to input information such as question area and learning algorithm. And these input matters significantly for the output.

As Marcus put it, there is actually a large amount innateness in every neural network. The real problem is, “what kind of innateness do we want?”

  1. Conclusion

AI was born symbolic and logic. The pioneers of AI have formalized many graceful theories, hypotheses, and applications, such as PSSH and expert systems. From the 1980s, the pendulum swung away from symbols and favored connectionist, a paradigm inspired by the neural connections in brains. With the growing amount of accessible data and ever stronger computing power, connectionist models gain considerable momentum in recent years. This new approach seems to solve many problems in symbolic AI but many new issues emerge at the same time. Which one is better to account for human cognition and more promising for AI? There’s no consensus reached.

However, despite their vast difference, people began to explore how to integrate them together. For example, many hybrid systems have been proposed and experimented. While some people consider them not competing but supplementing, others see them residing at different levels of one unified hierarchical structure. They believe connectionist networks can implement symbolic processes.

In recent years, it is increasingly realized that the gap is closing, simply because there’s no gap at all from the beginning. In essence, all connectionist models have symbolic components, and all symbolic models have mathematical mechanisms. The debate is dying down, opening up new opportunities for future hybrid paradigms. More interdisciplinary research needs to be done, so that various areas, including computer science, cognitive science, semiotics, neural science, and philosophy, could be integrated organically in order to find more promising models.

Reference:

[1] McCarthy, John, Marvin Lee Minsky, Nathan Rochester, and Claude Shannon. “A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence,” 1955. http://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html.

[2] Russell, Stuart J., and Peter Norvig. Artificial Intelligence: A Modern Approach. 3rd ed. Prentice Hall Series in Artificial Intelligence. Upper Saddle River, N.J: Prentice Hall, 2010.

[3] Forbus, Kenneth D., Chen Liang, and Irina Rabkina. “Representation and Computation in Cognitive Models.” Topics in Cognitive Science 9, no. 3 (July 1, 2017): 694–718. https://doi.org/10.1111/tops.12277.

[4] McCulloch, Warren S., and Walter Pitts. “A Logical Calculus of the Ideas Immanent in Nervous Activity.” The Bulletin of Mathematical Biophysics 5, no. 4 (December 1, 1943): 115–33. https://doi.org/10.1007/BF02478259.

[5] Rumelhart, David E., and James L. McClelland. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Computational Models of Cognition and Perception. Cambridge, Mass: MIT Press, 1986.

[6] Haugeland, John. Artificial Intelligence: The Very Idea. Cambridge, Mass: MIT Press, 1985.

[7] Deacon, Terrence William. The Symbolic Species: The Co-Evolution of Language and the Brain. 1st ed. New York: W.W. Norton, 1997.

[8] Andersen, Peter Bøgh, Berit Holmqvist, and Jens F. Jensen, eds. The Computer as Medium. Learning in Doing. Cambridge [England] ; New York: Cambridge University Press, 1993.

[9] Denning, Peter J., and Craig H. Martell. Great Principles of Computing. Cambridge, Massachusetts: The MIT Press, 2015.

[10] Simon, Herbert A. “The Human Mind: The Symbolic Level.” Proceedings of the American Philosophical Society 137, no. 4 (1993): 638–47.

[11] Chandler, Daniel. Semiotics: The Basics. 2nd ed. Basics (Routledge (Firm)). London ; New York: Routledge, 2007.

[12] Goguen, Joseph. “An Introduction to Algebraic Semiotics, with Application to User Interface Design.” In Computation for Metaphors, Analogy, and Agents, 242–91. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, 1999. https://doi.org/10.1007/3-540-48834-0_15.

[13] Minsky, Marvin. Society Of Mind. Simon and Schuster, 1988.

[14] Campbell, Joseph. The Hero with a Thousand Faces. Third edition. Novato, Calif: New World Library, 2008.

[15] Newell, Allen, and Herbert A. Simon. “Computer Science as Empirical Inquiry: Symbols and Search.” Communications of the ACM 19, no. 3 (March 1976): 113–126. https://doi.org/10.1145/360018.360022.

[16] Smolensky, P. “Connectionist AI, Symbolic AI, and the Brain.” Artificial Intelligence Review 1, no. 2 (June 1, 1987): 95–109. https://doi.org/10.1007/BF00130011.

[17] Simon, Herbert A. The Sciences of the Artificial. 3rd ed. Cambridge, Mass: MIT Press, 1996.

[18] Goguen, Joseph A., and D. Fox Harrell. “Information Visualization and Semiotic Morphisms.” In Multidisciplinary Approaches to Visual Representations and Interpretations, Volume 2, edited by Grant Malcolm, 1 edition., 83–98. Amsterdam u.a: Elsevier Science, 2005. http://cseweb.ucsd.edu/~goguen/papers/sm/vzln.html.

[19] Sowa, John F. “Peirce’s Tutorial on Existential Graphs.” Semiotica 186, no. 1–4 (2011): 345–94.

[20] Borgida, Alexander, and Sowa, John F, eds. Principles of Semantic Networks: Explorations in the Representation of Knowledge. The Morgan Kaufmann Series in Representation and Reasoning. San Mateo, Calif: Morgan Kaufmann, 1991.

[21] Sun, Ron. “Artificial Intelligence: Connectionist and Symbolic Approaches.” International Encyclopedia of the Social & Behavioral Sciences, April 16, 2000. https://doi.org/10.1016/B0-08-043076-7/00553-2.

[22] Kaplan, Jerry. Artificial Intelligence: What Everyone Needs to Know. What Everyone Needs to Know. New York, NY: Oxford University Press, 2016.

[23] Nilsson, Nils J. “The Physical Symbol System Hypothesis: Status and Prospects.” SpringerLink, 2007, 9–17. https://doi.org/10.1007/978-3-540-77296-5_2.

[24] Dreyfus, Hubert, Stuart E. Dreyfus, and Tom Athanasiou. Mind Over Machine. New York: The Free Press, 1987.

[25] Denning, Peter J. “What Is Computation?” The Computer Journal 55, no. 7 (July 1, 2012): 805–10.

[26] Hofstadter, Douglas R. “Chapter 26: Walking Up from the Boolean Dream.” In Metamagical Themas. New York: Basic Books, 1985.

[27] Hofstadter, Douglas R. “On Seeing A’s and Seeing As.” Stanford Electronic Humanities Review 4, no. 2: Constructions of the Mind (July 22, 1995). https://web.stanford.edu/group/SHR/4-2/text/hofstadter.html#note1.

[28] Fu, M. C. “AlphaGo and Monte Carlo Tree Search: The Simulation Optimization Perspective.” In 2016 Winter Simulation Conference (WSC), 659–70, 2016. https://doi.org/10.1109/WSC.2016.7822130.

[29] Garson, James. “Connectionism.” In The Stanford Encyclopedia of Philosophy, edited by Edward N. Zalta, Winter 2016. Metaphysics Research Lab, Stanford University, 2016. https://plato.stanford.edu/archives/win2016/entries/connectionism/.

[30] “New Navy Device Learns by Doing: Psychologist Shows Embryo of Computer Designed to Read and Grow Wiser.” New York Times, July 8, 1958. http://www.nytimes.com/1958/07/08/archives/new-navy-device-learns-by-doing-psychologist-shows-embryo-of.html.

[31] Minsky, Marvin Lee, and Seymour Papert. Perceptrons: An Introduction to Computational Geometry. Expanded ed. Cambridge, Mass: MIT Press, 1988.

[32] Pinker, Steven, and Alan Prince. “On Language and Connectionism: Analysis of a Parallel Distributed Processing Model of Language Acquisition.” Cognition 28, no. 1 (March 1, 1988): 73–193. https://doi.org/10.1016/0010-0277(88)90032-7.

[33] St. John, Mark F., and James L. McClelland. “Learning and Applying Contextual Constraints in Sentence Comprehension.” Artificial Intelligence 46, no. 1 (November 1, 1990): 217–57. https://doi.org/10.1016/0004-3702(90)90008-N.

[34] Association for Computing Machinery (ACM). “Using Machine Learning to Study Neural Representations of Language Meaning,” with Tom Mitchell. Accessed December 12, 2017. https://www.youtube.com/watch?v=Xc5SM9sbUiQ.

[35] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. “Deep Learning.” Nature 521, no. 7553 (May 2015): 436. https://doi.org/10.1038/nature14539.

[36] Jones, Nicola. “Computer Science: The Learning Machines.” Nature News 505, no. 7482 (January 9, 2014): 146. https://doi.org/10.1038/505146a.

[37] Reardon, Sara. “A Giant Neuron Found Wrapped around Entire Mouse Brain.” Nature News 543, no. 7643 (March 2, 2017): 14. https://doi.org/10.1038/nature.2017.21539.

[38] Le, Q. V., Marc’ Aurelio Ranzato, Rajat Monga, Kai Chen, Greg S. Corrado, Jeff Dean, and Andrew Ng. “Building High-Level Features Using Large Scale Unsupervised Learning.” In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 8595–98, 2013. https://doi.org/10.1109/ICASSP.2013.6639343.

[39] Alister. Ali Rahimi’s Talk at NIPS(NIPS 2017 Test-of-Time Award Presentation). Accessed December 15, 2017. https://www.youtube.com/watch?v=Qi1Yry33TQE.

[40] Lungarella, Max, Fumiya Iida, Josh Bongard, and Rolf Pfeifer, eds. 50 Years of Artificial Intelligence: Essays Dedicated to the 50th Anniversary of Artificial Intelligence. 2007 edition. Berlin: Springer, 2008.

[41] Gentner, Dedre. “Structure-Mapping: A Theoretical Framework for Analogy.” Cognitive Science 7, no. 2 (April 1, 1983): 155–70. https://doi.org/10.1207/s15516709cog0702_3.

[42] Freeman, Jeremy Andrew, and Gary F. Marcus, eds. The Future of the Brain: Essays by the World’s Leading Neuroscientists. Princeton, New Jersey: Princeton University Press, 2015.

[43] Reid, Vincent M., Kirsty Dunn, Robert J. Young, Johnson Amu, Tim Donovan, and Nadja Reissland. “The Human Fetus Preferentially Engages with Face-like Visual Stimuli.” Current Biology 27, no. 13 (July 10, 2017): 2052. https://doi.org/10.1016/j.cub.2017.06.036.

[44] Szegedy, Christian, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. “Intriguing Properties of Neural Networks.” ArXiv:1312.6199 [Cs], December 20, 2013. http://arxiv.org/abs/1312.6199.

[45] Nguyen, Anh, Jason Yosinski, and Jeff Clune. “Deep Neural Networks Are Easily Fooled: High Confidence Predictions for Unrecognizable Images.” ArXiv:1412.1897 [Cs], December 5, 2014. http://arxiv.org/abs/1412.1897.

[46] Fodor, Jerry A., and Zenon W. Pylyshyn. “Connectionism and Cognitive Architecture.” Cognition 28, no. 1–2 (1988): 3–71.

[47] Minsky, Marvin. “Logical vs. Anaogical or Symbolic vs. Connectionist or Neat vs. Sruffy.” In Artificial Intelligence at MIT: Expanding Frontiers, edited by Patrick Henry Winston and Sarah Alexandra Shellard. The MIT Press, 1990.

[48] Marcus, Gary F. The Algebraic Mind: Integrating Connectionism and Cognitive Science. Cambridge, Mass London: A Bradford Book, 2003.

[49] Harnad, Stevan. “The Symbol Grounding Problem.” Physica D, no. 42 (1990): 335–46.

[50] Kahneman, Daniel. Thinking, Fast and Slow. 1st ed. New York: Farrar, Straus and Giroux, 2011.

[51] Miikkulainen, Risto. Subsymbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon, and Memory. MIT press, 1993.

[52] Pollack, Jordan B. “High-Level Connectionist Models.” AI Magazine 9, no. 4 (1988). https://pdfs.semanticscholar.org/fd6f/93cb4e4744b03ca1ea34c33b3cb6ffc96f85.pdf.

[53] Eliasmith, Chris. How to Build a Brain: A Neural Architecture for Biological Cognition. Oxford Series on Cognitive Models and Architectures. Oxford: Oxford University Press, 2013.

[54] Garcez, Artur d’Avila, Tarek R. Besold, Luc de Raedt, Peter Földiak, Pascal Hitzler, Thomas Icard, Kai-Uwe Kühnberger, Luis C. Lamb, Risto Miikkulainen, and Daniel L. Silver. “Neural-Symbolic Learning and Reasoning: Contributions and Challenges.” Stanford University, CA, 2015.

发表评论

电子邮件地址不会被公开。 必填项已用*标注