167x Filetype PDF File size 0.30 MB Source: www.socsci.uci.edu
Testing the Universal Grammar Hypothesis 1. Introduction Perhaps the single most controversial claim in linguistic theory is that children learning their native language face an induction problem, or in other words, that the available input underspecifies the adult state. This induction problem is known by many names: the “Poverty of the Stimulus” (e.g., Chomsky 1980a, Chomsky 1980b, Lightfoot 1989, Crain 1991), the “Logical Problem of Language Acquisition” (e.g., Baker 1981, Hornstein & Lightfoot 1981), and “Plato’s Problem” (e.g. Chomsky 1988, Dresher 2003). Regardless of the name, it all boils down to the same claim: the data generally available to young children are compatible with multiple hypotheses, or perhaps more correctly, data necessary to rule out the incorrect hypotheses are either not available at all, or not available in sufficient quantity (Lightfoot 1982, Legate & Yang 2002, among many others). The Universal Grammar (UG) hypothesis was introduced as a solution to this problem (Chomsky 1957/1975, 1965). The logic of the UG hypothesis is straightforward: if the necessary evidence for choosing the correct linguistic hypothesis is unavailable in the input, then children must bring some internal bias to the language learning problem (Chomsky 1981, Hornstein & Lightfoot 1981, Legate & Yang 2002, among many others). While the necessity of some kind of bias is generally granted by even the most ardent critics of the UG hypothesis (e,g, Pullum & Scholz 2002, Regier & Gahl 2004), the nature of the necessary biases is the subject of considerable debate. First, there is the question of what cognitive objects the bias operates over. A bias might operate over the representations the child considers as hypotheses (e.g., parameters of linguistic variation (Chomsky 1981)), the data the child learns from (e.g., only unambiguous data (Fodor 1998)), or the learning algorithm the child uses to alter belief in competing hypotheses (e.g. trigger-based learning (Gibson & Wexler 1994, Niyogi & Berwick 1996)). Second there is the question of whether the necessary bias is specific to language learning (i.e. domain-specific) or applies generally to any kind of cognitive learning (domain-general). UG is usually proposed as a collection of domain-specific biases ranging over both the representations that children consider and the data that children learn from, but this is far from logically necessary (e.g., Chomsky 1971, 1981, Kimball 1973, Baker 1978, Gordon 1986, Lightfoot 1991) Recent debates about the UG hypothesis have tended to focus on two broad questions. The first concerns the existence of the induction problem (e.g., Sampson 1989, 1999, Pullum & Scholz 2002, MacWhinney 2004, Tomasello 2004), which is, of course, the motivation for the UG hypothesis. Until recently, the claim that children’s input lacks sufficient evidence for successful language learning has been based on the intuitions of linguists rather than on large- scale empirical analyses of child-directed speech. However, without quantifiable evidence for induction problems, there is no need for the UG hypothesis at all. In fact, Pullum and Scholz (2002) have claimed just that: using data from the Wall Street Journal corpus (Linguistic Data Consortium 1993) and the CHILDES database (MacWhinney 2000), they argue that there is no evidence for an induction problem for several well-known linguistic phenomena in English such as anaphoric one (Baker 1978) and yes-no questions involving complex subjects (Chomsky 1971). Even granting the existence of an induction problem for a given linguistic phenomenon, the second broad question follows directly: what is the nature of the prior knowledge necessary to solve that problem? More specifically, is the knowledge innate or derived from prior learning? Is the knowledge domain-specific or domain-general? One could imagine any or all of the possible combinations being applicable to various aspects of the linguistic system: some knowledge may be innate and domain-general, some innate and domain-specific, some derived from domain- general knowledge acquired previously, and some derived from domain-specific knowledge acquired previously. With the proliferation of possible types of prior knowledge, it is not clear that a single type will be sufficient to solve all of the induction problems in language learning. In fact, Tomasello (2004) takes this one step further: he argues that the proliferation of specific suggestions for that prior knowledge in the theoretical literature has rendered the UG hypothesis untestable through standard scientific falsification. He contends that it will not be possible to evaluate the UG hypothesis until it is broken down into specific hypotheses about biases with respect to specific linguistic phenomena. The project we propose here aims to address both of these questions directly, and in the process lay out a concrete methodology for testing the UG hypothesis that is in similar in spirit to what both the critics of the UG hypothesis (e.g., Pullum and Scholz 2002 and Tomasello 2004) and the supporters of the UG hypothesis (e.g., Chomsky 1957/75, Crain and Pietroski 2002) propose. Utilizing techniques recently made possible through advances in technology, and combining aspects of theoretical, experimental, and computational linguistics, it is now feasible to perform several quantitative tasks relevant to evaluating the UG hypothesis with respect to the issues discussed above. We can search reasonably large corpora of both adult and child-directed speech for relevant linguistic structures; we can precisely measure the adult knowledge state children eventually attain using psycholinguistic techniques from experimental syntax; and we can implement sophisticated probabilistic learning models (specifically Bayesian models) capable of operating over the structured representations postulated by linguistic theory. With these techniques in hand, we plan to investigate the existence of the induction problem by examining both the realistic data used as input by children (available through resources such as CHILDES (MacWhinney 2000)) and the knowledge state achieved by adults for complex linguistic phenomenon such as syntactic islands (e.g., the experiments in Sprouse 2007). We will then implement Bayesian learning models to test whether unbiased learners can reach the adult knowledge state given the data available. If unbiased learners cannot do this, then we can conclude that the induction problem does indeed exist for that phenomenon and that children require learning biases to succeed. We can then identify what kind of biases lead to acquisition success by incorporating different types of learning biases into the models (as is done, for example, for learning anaphoric one in Pearl & Lidz (submitted)). The biases implemented may be domain-general in nature (e.g., Regier & Gahl 2004, Perfors, Tenenbaum, & Regier 2006, Pearl & Lidz submitted) or domain-specific (Sakas & Fodor 2001, Pearl & Weinberg 2007, Pearl 2008, submitted, Pearl & Lidz submitted). Crucially, because the Bayesian modeling framework allows us to accommodate biases of many kinds, from choosing the smallest hypothesis consistent with the data (Tenenbaum & Griffiths, 2001) to restricting the input to certain clauses (Lightfoot 1991, Pearl & Weinberg 2007) to constraining the representations under consideration via parameters (Chomsky 1981), we will be able to both reduce the UG hypothesis to smaller specific hypotheses and evaluate the necessity of those hypotheses for successful learning (for instance, as advocated for by Tomasello (2004)). 2. Accurate measures of the primary data The first step of our investigation is to assess the input that is actually available to children for various linguistic phenomena. Since the debate regarding the induction problem and the necessity of UG hinges on the state of children’s input, occurrence facts about child input should not be based on the intuitions of linguists (an idea advocated extensively in Pullum & Scholz (2002), for instance). This is particularly true now that corpora of child-directed speech are freely available, such as CHILDES (MacWhinney 2000). Notably, however, the corpora available are rarely marked with all the information of interest to a linguist focused on complex syntactic and semantic phenomena, which are primarily the locus of the induction problem debate (Crain & Pietroski 2002, Legate & Yang 2002, Pullum & Scholz 2002, Lidz, Waxman, & Freedman 2003, Reali & Christiansen 2004, Regier & Gahl 2004, Kam et al. 2005, Perfors, Regier, & Tenenbaum 2006, Foraker et al. 2007, Pearl & Lidz submitted, among many others). While some corpora may contain morphological information or part-of-speech identification, most are simply transcripts of child-directed speech. We propose to annotate several available child corpora in the CHILDES database syntactically (using, for example, the features in Government and Binding Theory (Chomsky, 1981)) via a two-step process. The output of this process will be fully formed hierarchical structures, so that formal analyses from theoretical linguistics can be easily adopted as biases in the models we later build (see sections 4 and 5 for details). First, we will use a freely available dependency tree parser (such as the Charniak parser1) to generate a first-pass syntactic analysis. Then, we will evaluate the resulting syntactic trees by hand (with the help of undergraduate research assistants), correcting when necessary, to ensure the accuracy of the structures generated. We intend to make the final parsed corpora available through CHILDES for other language researchers to use. In addition, we propose to investigate adult corpora of conversational speech (such as those available through TalkBank (http://www.talkbank.org) in order to compare the differences between adult and child-directed speech for various linguistic phenomena. Often, child-directed speech corpora are relatively sparse compared to available adult speech corpora, especially if syntactic annotation is desired, which has led much of the corpus-based linguistic research to rely on adult-directed speech (e.g., Pullum & Scholz (2002)). Yet, it is a common (and quite reasonable) argument that child-directed speech may differ quite significantly from adult speech (see, for example, discussion in Legate & Yang (2002)). Given that recent probabilistic learning models are sensitive to the relative frequencies of various data (e.g., Foraker et al. 2007), it seems only prudent to ask, for a given linguistic phenomenon, if the data frequencies do differ. It may turn out for some linguistic phenomena that the relative frequencies do not vary much between the speech directed at, say, three-year-olds and the speech directed at adults. This would then suggest that adult speech corpora may indeed be a reasonable estimate of children’s input for some phenomena, particularly complex syntactic and semantic interpretation phenomena that are acquired later in development (e.g., negative polarity items like ‘any’, the interpretation of connectives such as ‘or’, and binding theory phenomena, as discussed in Crain & Pietroski (2002)). Given the abundance of adult-directed conversational speech, such a scenario would provide a far richer source of data from which children’s input could be estimated. However, should child-directed and adult-directed speech frequencies differ, it will be crucial to this project to determine not only if, but also in what way they differ, so as to correctly evaluate both our own models and those potentially offered by others. Like the child-directed speech, much conversational adult-directed speech is not annotated with syntactic information. The process we propose to use to generate annotated adult-directed speech corpora is identical to the process for generating the annotated child-directed speech, involving a first-pass annotation by a freely available parser and subsequent human evaluation of the generated annotation. We intend to make the annotated corpora available to the research community either through TalkBank (http://www.talkbank.org) or the Linguistic Data Consortium (http://www.ldc.upenn.edu/), a common repository for electronic corpora. 3. Accurate measures of the adult state The second step of our investigation is to assess the adult knowledge state children eventually attain. It almost goes without saying that acceptability judgments form the primary measure of the adult grammar in the field of theoretical syntax; therefore, acceptability judgments are the logical choice for a quantifiable measure of the adult state. There are at least three reasons for the predominance of acceptability judgments in the study of adult grammars. First, acceptability judgments can be provided with little effort from the subject (Schutze 1996, Cowart 1997). Second, these judgments are highly reliable across speakers of the same language (Cowart 1997, Keller 2000, Sprouse 2007). Third, these judgments are a robust proxy for grammaticality (Chomsky 1965, Schutze 1996, Cowart 1997, and many others). Paradoxically, the very properties that have made acceptability judgments such a valuable data source for theoretical syntacticians have also served to undermine general confidence in that data. First, because 1 Available through Brown University (ftp://ftp.cs.brown.edu/pub/nlparser/). judgments are available to any native speaker, linguists have tended to use their own judgments rather than those of naïve consultants (Christiansen and Edelman 2003). Second, because judgments are generally reliable across speakers, linguists have tended to use single data points rather than samples (Bresnan 2007, Cowart 1997). Third, because judgment tasks are often designed as a choice between grammatical and ungrammatical, until recently relatively little research has been done on the gradience inherent to acceptability judgments, and the factors that might be causing or influencing that gradience (Keller 2000, Sorace and Keller 2005). In response to these concerns, several linguists have developed a set of formal methodologies, which have collectively come to be known as experimental syntax, for collecting acceptability judgments. While the details vary from experiment to experiment, experimental syntax methodologies all have at least four components in common (Featherston 2007, Sprouse 2007). First, judgments are collected from a sample of naïve consultants, usually at least 10 and ideally more than 20, to insure that judgments generalize to the broader population. Second, consultants are presented with a variety of sentences for any given structure under investigation, to insure that the judgments generalize across lexical items. Third, consultants are presented with a formal task, such as a Likert Scale task or the Magnitude Estimation task (Stevens 1957, Bard et al. 1996), to help insure that relative acceptability data are not lost to categorical responses. Fourth, data are analyzed using standard behavioral statistics. For this project, we will use experimental syntax techniques to measure the relative acceptability of structures in the adult grammar for comparison to the relative frequencies of those structures in the child-directed speech corpora and adult conversational speech corpora. Experimental syntax methodologies have advantages over previous informal collection techniques too numerous to mention here (see Schutze 1996, Cowart 1997, Keller 2000, Featherston 2007, and Sprouse 2007 for discussion). However, given the nature of this project - in particular, the comparison between relative frequencies and acceptability judgments - two of these advantages bear mention. First, experimental syntax has introduced rating tasks, such as magnitude estimation (Stevens 1957), that provide a more precise measure of relative acceptability than previous informal collection tasks. Most informal collection tasks involved binary rating scales such as yes/no or limited, discrete rating scales such as the 5 or 7 point Likert scales. All of these limited scales can result in a loss of information to categorization (Bard et al. 1996). In contrast, magnitude estimation places no predefined restriction on the response scale: subjects may use the entire positive number line for their responses, thus eliminating the categorization problem. Bard et al. (1996) demonstrated that given such freedom, subjects routinely distinguish more than 7 levels of acceptability. Furthermore, Sprouse (submitted b) has demonstrated that subjects’ responses in magnitude estimation tasks are incredibly robust across samples, even with minor variations to the experimental design (such as modifying the modulus sentence). Taken together, these facts suggest that newer rating tasks such as magnitude estimation will provide more detailed data regarding the adult grammar. Second, experimental syntax has also introduced the principles of factorial experimental design, which has enabled the investigation of contributions from factors that are traditionally outside the domain of syntactic theory, but that may still have an effect on both acceptability judgments and (crucially) relative frequencies. For example, Sprouse (2008, submitted a) both demonstrate that the acceptability of wh-movement dependencies is affected by the distance of the dependency (see also Frazier (1989) and Phillips et al (2005)). Specifically, shorter wh- movement dependencies (1) are significantly more acceptable than longer wh-movement dependencies (2) despite the fact that syntactic theories predict both structures to be categorically grammatical. (1) Jack hoped that you knew who the giant would chase. (2) Jack knew who you hoped that the giant would chase.
no reviews yet
Please Login to review.