基于语料库的英语语言特征研究(txt+pdf+epub+mobi电子书下载)


发布时间:2020-09-14 14:47:40

点击下载

作者:张继东,赵晓临

出版社:上海交通大学出版社

格式: AZW3, DOCX, EPUB, MOBI, PDF, TXT

基于语料库的英语语言特征研究

基于语料库的英语语言特征研究试读:

前言

《语料库语言学与英语教学研究论集》是教育部人文社科项目“科技文本中实词基形和屈折变形搭配的异质性研究”(09YJA740019)的成果之一,它见证了东华大学语料库语言学研究中心成员历时多年的学术努力。

语料库语言学的研究方法展现了语言学研究中的经验主义实质,体现为实证和量化两个标志性特征。在语料库的支持下,关于语言本质、构成和功能的描写和理论的提升都是在语言实例和数据的基础上做出的客观判断,而不是仅依赖语言学家的语言直觉进行主观臆测。计算机科学的发展使得语言研究工具获得技术的适时升级,语言学家不仅可以凭借数据库平台构建满足研究所需的语料库,也可以使用语料库软件来分析语料、检索和提取语言实例以及数据。自20世纪中后期以来,语料库语言学作为一种新的语言学研究方法发展迅猛,语料库以“非人力所能及”的真实语料检索和数据驱动研究优势在语言学各领域中发挥着“颠覆”传统语言研究方式的作用。目前,语料库语言学正在摆脱工具性地位的束缚,为争取获得与语义学和语用学比肩齐名的学科席位而努力。

在国内语料库的相关研究起步时,张继东老师和赵晓临老师已经开始在此领域进行了深入的探索,并在学校各级领导的支持下,成立了语料库语言学研究中心。两位老师孜孜不倦,持之以恒,带领青年骨干教师和研究生在语料库的基础上对语言特征展开了多维度的研究,并取得了丰硕的学术成果,得到了业界的广泛认可。为了能够系统地梳理相关研究成果,我们将以我院英语教师和研究生为主体的研究人员近年来撰写和发表的论文结集成书,以进一步推动国内的语料库语言学学术研究。本论文集分为三个部分:第一部分主要是从语料库语言学方法论的角度,对通用英语中词汇的“意义单位”所涉及的相关内容进行了详细的描述和分析;第二部分集中比较了中国学习者与英语本族语者在英语使用方面存在的语言特征差异,并对语料库在将来课堂教学中发挥的作用进行了展望;第三部分着重描述了科技英语中实词使用的差异性特征。三个部分的编排体现了我院的语料库语言学研究的学术提升过程。

值得一提的是,在张继东老师的努力下,东华大学语料库研究中心与英国兰卡斯特大学语言学和英语语言系建立了良好的合作关系,该系语料库语言学专家、Corpora杂志的执行主编Paul Baker教授欣然为本论文集撰稿,同时该书在组稿的过程中也得到了Jeoffrey Leech教授、Tony McEnery教授、Andrew Hardie博士的指点和帮助。在此,我们表示衷心感谢。

本书所收学术论文能够集结成册、付梓印刷,离不开多方的支持。在这里,我们由衷地感谢东华大学领导对语料库语言学研究中心长期的关心和支持,感谢教育部人文社科项目鉴定专家的厚爱与鼓励,感谢所有同仁、朋友和学生的参与、支持和奉献!编者

基于语料库的词汇型式研究

赵晓临

摘要:本文基于中国学习者口语语料库COLSEC和伯明翰大学远程登录英语语料库BoE的口语子库,对比分析动词get的GET V-ed的型式行为。研究表明,中国学习者使用该词汇型式的语义特征不明显,get具有词汇化特征;而英语本族语者使用该词汇型式具有明显的消极语义特征,get具有非词汇化特征。此结果对英语口语教学有一定的启示作用。

关键词:型式,型式行为,语义特征,非词汇化

1 引语

近年来基于大数据的语料库语言学研究证实,词汇型式和词相互依赖;某个型式与某些词共现,某个词也依赖一定的型式存在。另一方面,词汇型式与意义密切联系,词的意义因其高频使用的典型型式不同而有所差异,而高频出现在某个型式中的词汇往往共有某一方面的意义(Hunston&Francis 2000:3)。对中国学习者的词汇型式行为研究也备受关注,濮建中(2003)对中国学习者书面语语料库的调查发现,在相同词汇型式下中国学习者与英语本族语者的词语搭配存在较大差异,影响了学习者语言表达的准确性和地道性。但是,对中国学习者口语中常用词汇型式依然缺少研究,本文就此做初步探讨。本研究以动词get为例,调查其词汇型式GET V-ed在中国学习者和英语本族语者口语中的型式行为,并探讨调查结果对词汇教学的启示作用。选择动词get基于三方面原因。首先,get是大型语料库中最高频出现的常用动词之一,譬如按词频统计,get在BNC语料库的最高频动词表中列第7位、在FLOB和Frown语料库中列第10位和第8位、在LLC语料库中列第6位、在中国学习者语料库CLEC和中国学习者口语语料库COLSEC中列第7和第5位。其次,由动词get构成的词汇型式或多词单位数量大,使用率非常高。再次,动词get的行为模式是不同研究领域关注的焦点之一,如语料库研究的搭配行为(collocational behaviour)、认知语言学研究的构式(construction)等都涉及诸如get等常用词的高频型式。

2 型式、型式行为的研究与界定

最早使用“型式”(pattern)概念的是英国语言学家A.S.Hornby。长期从事英语教学研究的Hornby认为,语言专家的知识应该为普通学习者和教师服务。基于此,他关注语言使用的规律性特征,上述观点在其代表作《英语的型式与使用指南》(A Guide to Patterns and Usage in English(1954))中得到了充分体现。书中不仅提出了“型式”的概念,而且探讨了25个动词、4个名词和3个形容词的词汇型式及其语境意义。Francis、Hunston和Manning(1996,1998)承继了Hornby的“型式”概念,描述英语本族语者高频使用的动词、名词和形容词的词汇型式。但与Hornby不同的是,Francis、Hunston和Manning采用Sinclair倡导的语料库驱动方法,基于英国伯明翰大学的英语语料库Bank of English(以下简称BoE)探讨问题。Halliday(2004:46)认为Hunston和Francis的研究对象定位为“词汇语法”(lexical grammar),即从词汇的角度处理词汇与语法的混合型式。Hunston和Francis(2000)将自己的研究称为型式语法(pattern grammar),即研究高频的词汇搭配,并在此基础上概括搭配的词汇-语法型式。根据Hunston(2008)的观点,型式可以看作构式的子集(sub-set),但是型式语法与构式语法(construction grammar)的研究方法大相径庭。后者从认知角度研究语言,将理论置于比观察优先的地位;而型式语法强调观察的重要性,它以词汇为出发点,是以词汇为中心的语法,是词汇、语法共居一体的语法,与意义和功能密切相关。

语料库语言学研究从诸多角度探讨词汇型式。Sinclair(1991)研究多词(multi-word)型式,即搭配(collocation)和类连接(colligation)的结合体。Stubbs(2002)探讨词链(chains),研究实义词和功能词的关系,以及词与语法范畴的关系。Biber(1999:990)的词串(lexical bundles)和Scott(1997:59)的词丛(clusters)研究也完全依赖词频数据,探讨多词连续线性序列(linear sequences或n-grams)。

综观文献,多词序列的指称尚无统一的标准术语。本文采用Hunston和Francis提出的词汇型式及其界定办法,词汇型式指与某个词有规律地相关联并对其意义产生影响的所有词汇的组合。如果一个词的某种组合出现的频率高,或者某种组合取决于一个词,或者与一个词相关的某种组合具有明确的意义,我们就可以认定此组合为该词的词汇型式(Hunston&Francis 2000:37)。理论上讲,一个词可以出现在多个词汇型式中,但是实际语言使用却表明,一个词的型式屈指可数,一种词汇型式也只选择一定范围的词。例如Francis、Manning和Hunston(1996)对BoE语料库中动词decide的调查显示,该词与型式V that、V wh-、V wh to-inf、V to-inf、be V-ed、it be V-ed that共存;词汇型式it v-link ADJ that与7组形容词高频共现,分别为likely、marvellous、important&necessary等(ibid,1998:420)。同时,词汇型式与词的意义之间存在着必然的联系,因为在词汇型式中出现的高频搭配词通常意义比较集中,形成一定的语义特征,由此也决定了该词的意义。词汇型式与语义特征之间的关系,本文称之为词汇的型式行为。例如与词汇型式it v-link ADJ that共现的7组形容词表达特定的意义,像“likely”组表达(不)可能性,“important&necessary”组表达(不)重要性或(不)必要性。

从以上研究成果不难看出,词与词汇型式是密不可分的。词汇的型式行为集词的形式与意义于一身,应该成为词汇教学的重要内容之一。

3 分析方法及工具

本文从两个方面探讨动词get的词汇型式GET V-ed的型式行为。一是描述英语本族语者的词汇型式行为;二是对比分析中国学习者的词汇型式行为与英语本族语者的差异。

学习者语料库采用中国学习者口语语料库COLSEC,词容量723,299,语料来源为全国大学英语四六级口语考试的实景录像语料;语料检索采用Mike Scott开发的WordSmith。英语本族语者语料库采用英国伯明翰大学的远程登录(telnet)英语语料库Bank of English中的4个口语子库(以下简称BoE),词容量为62939687,语料来源含有英国和美国的广播节目以及日常口语会话;语料检索采用BoE语料库的配套检索软件LookUp。

首先提取BoE和COLSEC语料库中GET V-ed的索引行,然后按照共现频数、MI值排序,最后对前50个右1动词过去分词搭配词进行分析。本文之所以按MI值提取搭配词,是因为它提供两个词之间搭配力强弱的信息。MI为正值,说明两词之间存在相互吸引。MI值越大,两词共现的概率越高,故搭配力越强;反之,MI值越小,搭配力越弱。虽然按照MI值提取的搭配词可能包含语料库中词频很低的词,但是它是反映短语中两词间联系的紧密程度的有用指标(Kennedy 2008:23)。

4 英语本族语者GET V-ed的型式行为

BoE中共检索到词汇型式GET V-ed 12313索引行,动词过去分词搭配词检索结果如表1所示。表1 BoE中GET的右1动词过去分词搭配词

表1a为按共现频数提取的GET的50个右1动词过去分词搭配词,表1b为按MI值提取的50个右1动词过去分词搭配词。结果显示上述100个动词均为行为动词(Biber 1999:361)。动词get的GET V-ed型式的行为特征总结如下。

1)GET的动词过去分词搭配词具有明显的语义特征。我们可将所有搭配词分为三组。第一组搭配词具有消极语义特征,如表1a中的involved,caught,lost,stuck,killed,fed,bored,hurt,hit,confused,thrown,arrested,bogged,tired,left,addicted,upset,shot,divorced,kicked,worried,frustrated,scared,annoyed,knocked,broken都表达“陷入困境”或“遭受打击”之意,其频数占总频数的48.5%;再如表1b中的birched,sidetracked,slagged,waylaid,fobbed,caned,mugged,clobbered,bogged,psyched,bored,nabbed,mopped,lumbered,shortchanged,booted,hitched,tarred,addicted,riled,whacked,teased,nicked,burgled,hooked,divorced,busted,thumped,hassled,steamed,knifed,snowed,chucked,mired,caught,electrocuted,bullied,squashed,annoyed,表达“遭受打击”之意,这些具有消极语义特征的搭配词在MI值排序的前50个搭配词中的比例高达78%。频数数据和MI值数据说明,GET与具有消极语义特征的动词过去分词的搭配力非常强。第二组搭配词显示出与第一组相反的语义特征。表1a中有三个搭配词excited,interested和promoted具有积极语义特征,其词频仅占总频数的2.8%;表1b中只有excited一个。这说明GET与具有积极语义特征的动词过去分词的搭配力很弱。表1a和表1b中的第三组搭配词表达中性意义,1a中有married,used,paid,done,started,elected,called,mixed,told,dressed,changed,sent,put,taken,asked,scripted,treated,pushed,sorted,carried,它们的频数占总频数的48.7%;1b中此类搭配词有reacquainted,acquainted,married,re-elected,roped,strung,scripted,clued,reimbursed,steamed,占MI值排序的前50个搭配词的20%。可以说,GET V-ed具有消极和中性语义特征。

2)动词get在词汇型式GET V-ed中呈现非词汇化特征。非词汇化特征指的是词汇失去了自身的语义内容(semantic contents),其意义已融入搭配伙伴(collocational partners)的意义之中,这种语言现象在常用词的使用上尤为突出(卫乃兴2006)。在词汇型式GET V-ed中,其意义和动词过去分词融为一体,如以下两例所示:(1)He did,however,also say he feels strongly about the troops and the innocent victims who will get caught up in the conflict,so I guess I’d say...(BoE:npr/US.Text:SU1—910117)(2)The leaders went out and sat by themselves—that is,Prime Minister Netanyahu and Chairman Arafat—and decided they wanted to get started sooner than that,or just as soon as they can...(BoE:usspok/US.Text:SU2—15)

上例中get caught up和get started各自已经成为一个整体,前者表达“被动卷入”之意,后者表达“开始”之意。在这两个词语序列中,语法和词汇融合在一起,表达一个整体意义。

综上所述,英语本族语者使用词汇型式GET V-ed时,GET与具有消极和中性语义特征的动词过去分词粘合性强,搭配紧密度高;动词get在词汇型式GET V-ed中表现出非词汇化特征,它仅传递语法意义,换句话说,GET和动词过去分词共同组成了一个词汇语法单位,语义上不可分割。

5 中国学习者与英语本族语者的GET V-ed的型式行为对比

中国学习者口语语料库COLSEC中与GET共现频数在3以上的动词过去分词搭配词、共现频数及MI值见表2。表2 COLSEC中GET的右1动词过去分词搭配词

从表2可以看出,GET的7个右1高频动词过去分词搭配词可分为以下两组:一组是bored和tired,表达“陷入困境”之意,其频数占搭配词总频数的12.5%;另一组由搭配词married,prepared,passed,started构成,表达中性行为。MI值在3.0以上的12个动词过去分词搭配词中,有4个具有消极语义特征,它们是polluted,punished,tired和worried,占搭配词总数的33.3%;improved和increased具有积极语义特征,占搭配词总数的16.7%;married,worked,prepared,表达中性行为,占25%。involved的语境意义比较复杂,稍后讨论,caused为语法错误,不予讨论。我们下面进一步讨论中国学习者与英语本族语者的差异。

1)中国学习者使用此词汇型式比率低。频数在3以上的动词过去分词搭配词仅有7个。

2)中国学习者使用GET与具有消极语义特征的动词过去分词搭配力不强。在COLSEC中,此类搭配词仅占12个MI值3.0以上的搭配词的33.3%;而在BoE中,MI值在9.1以上的搭配词的比例高达78%。

3)中国学习者使用GET与具有积极语义特征的动词过去分词搭配,表现出与英语本族语者极大差异。如COLSEC的检索结果中GET与improved和increased搭配实例如下:(3)...miss less chan-changes with than others so that you may have less chances to get improved yourself.Second,...(COLSEC:030275.txt)(4)...that er er chronically speaking er the percentage of female enrollment is getting increased and the reason for that I think...(COLSEC:010019.txt)

在BoE语料库中,get improved和get increased不存在,而BoE中的get excited,get interested和get promoted在中国学习者语料库中没有实例。

中国学习者使用的GET involved的语义特征也显示出与英语本族语者的巨大差异。中国学习者使用的词汇型式仅限于GET involved in n,但是2/3的名词搭配词表达“参与好的活动”之意,具有积极语义特征。检索示例如下:(5)...the social abilities,it is a very good way for them to get involved in the social activities.(COLSEC:020072.txt)

我们又在BoE语料库中进一步检索了GET involved in n的索引行。观察语境发现,英语本族语者使用GET involved in n型式具有明显的消极语义特征。首先,名词搭配词大多表达“困难或麻烦之事”,如war,skirmish,dispute,minutia等;其次句中的谓语动词表达“不情愿”之意,如avoid,not want(willing)to,reluctant(reluctance)to等,语境意义表明该词汇型式具有“不愿参与费力或困难的活动”之意。

综合上述动词get的GET V-ed词汇型式行为的对比分析,中国学习者与英语本族语者的差异可以总结为如下几点。第一,中国学习者对该词汇型式使用率低,限于少量动词。第二,中国学习者使用该词汇型式时虽然表达“从事某行为”,但是语义特征与英语本族语者差别明显。第三,英语本族语者使用词汇型式GET V-ed时,动词get表现出非词汇化特征,动词get与搭配词的意义融合为一体。

6 结论

本文对常用动词get的词汇型式GET V-ed进行了对比分析。从研究结果看,英语本族语者的词汇型式中,搭配词表达消极和中性语义特征,GET表现出非词汇化特征;而中国学习者在这些方面存在缺失和不足。这些问题的主要原因可能存在于英语词汇教学中没有充分重视词与词汇型式,以及词汇型式与意义之间的密切联系。本族语者使用常用动词get的词汇型式GET V-ed时,其搭配词、搭配词的语义特征、词汇型式的语义趋向等型式行为,应该作为词汇教学的重点,以便引导学生关注语言使用的地道性。笔者建议教师在词汇教学中充分利用英语本族语者的语料,如语料库或开源语料库网站,例如柯林斯在线语料库,考虑词汇使用的语境,引导学生养成通过语境观察、总结词汇型式中搭配词的语义特征、概括语义趋向的习惯。

本文虽然只探讨了常用动词get的一个词汇型式GET V-ed及其型式行为,但它从一个侧面揭示了常用词的词汇型式及其语义特征对语言教学和学习的重要性。当然,如何加强常用词的词汇型式及其型式行为教学,我们还需要进一步深入研究。

参考文献

[1]Francis,G.,Hunston,S.and Manning,E.Collins COBUILD Grammar Patterns 1:Verbs[M].London:HarperCollins.1996.

[2]Francis,G.,Hunston,S.and Manning,E.Collins COBUILD Grammar Patterns 2:Nouns and Adjectives[M].London:HarperCollins.1998.

[3]Halliday,M.A.K.&Matthiessen,M.I.M.An Introduction to Functional Grammar(3rd ed.)[M].London:Hodder Arnold.2004.

[4]Hornby,A.S.A Guide to Patterns and Usage in English[M].London:OUP.1954.

[5]Hunston,S.and Francis,G.Pattern Grammar:A Corpus-driven Approach to the Lexical Grammar of English[M].Amsterdam:John Benjamins.2000.

[6]Hunston,S.Starting with the Small Words:Patterns,Lexis and Semantic Sequences[J].International Journal of Corpus Linguistics,2008(3):271-295.

[7]Kennedy,G.Phraseology and Language Pedagogy:Semantic Preference Associated with English Verbs in the British National Corpus[C].In F.Meunier&S.Granger(eds).Phraseology in Foreign Language Learning and Teaching.Amsterdam:John Benjamins.2008:21-41.

[8]Scott,M.WordSmith Tools[M].Oxford:OUP.1997.

[9]Sinclair,J.Corpus,Concordance,Collocation[M].Oxford:OUP.1991.

[10]Stubbs,M.Two Quantitative Methods of Studying Phraseology in English[J].International Journal of Corpus Linguistics,2002(2):215-244.

[11]濮建忠,英语词汇教学中的类连接、搭配及词块[J],外语教学与研究,2003(6):438-445.

[12]卫乃兴,Shared Meaning and Delexicalization[R],第一届国际认知语义学研讨会宣读论文,2006,湖南:湖南师范大学.

A Corpus-based Study of the Pattern of a Word

Xiao-lin ZHAO

Abstract:This paper investigates the behaviour in the pattern GET V-ed,based on College Learner Spoken English Corpus and the Spoken sub-corpora from the telnet Bank of English in University of Birmingham.The investigation indicates that the patterned behaviour displayed by Chinese learners shows no explicit semantic features and GET V-ed has lexical meanings,but by contrast the patterned behaviour displayed by English speakers shows negative features and GET V-ed is non-lexicalized.The findings derive an insight for teaching spoken English.

Keywords:pattern,patterned behaviour,semantic feature,non-lexicalization

Diachronic Lexical Change in American English(1961-2006)

Paul Baker(Lancaster University)

1 Introduction

This paper focuses on the Brown family of corpora,a set of reference corpora,each one million words of size,consisting of 500 text samples,each of about 2000 words,covering 15 genres of published writing.The Brown family retain an important place in the history of corpus linguistics,as its first member was also the first professionally produced reference corpus ever created.Initially referred to as The Standard Corpus of Present-Day American English,it consisted of samples from texts that had been published in 1961 and was constructed by W.Nelson Francis and Henry Kuc∨era at Brown University.Eventually,it became known simply as the Brown corpus.

The sampling frame that was created for the Brown corpus was followed in a number of subsequent corpus-building projects,including the LOB(Lancaster-Oslo/Bergen)Corpus,an equivalent corpus of texts published in 1961 in British English,and created in the early 1970s as part of a joint collaboration between Lancaster University,the Norwegian Computing Centre for the Humanities at Bergen and the University of Oslo.Research carried out by Hofland and Johansson(1982)and Leech and Fallon(1992)has compared the Brown and LOB corpora together,focussing on lexical variation and cultural differences.

Since then,new members of the Brown family have been added to both the British and American sides.In the early 1990s,at Freiburg University the FLOB and Frown corpora were created,comprising British and American English published in 1991 and 1992 respectively.Together,these four corpora could be used to examine diachronic change between the 1960s and 1990s,and also compare how such changes differed between British and American English.Some researchers have focussed on the idea that American English leads linguistic change,with British English‘lagging’behind e.g.Hundt(1997),Leech(2002),McEnery and Xiao(2005).

Care needs to be taken when using the Brown family for a number of reasons.First,the relatively small size of the corpora mean that for many features of language,frequency information is too small to draw conclusions.Kennedy(1998:68)argues that for lexicography,a million words is not large enough,as up to half the words in the corpus will only occur once.Biber(1993)is more optimistic about using a million words for grammatical research,however.I have argued elsewhere(Baker 2009)that a million words is acceptable for lexical research if we focus on high frequency words(such as words from closed-class grammatical categories like articles and conjunctions,along with the top 200 or so words from open-class categories like nouns and verbs).

A second area of concern involves the large gaps in time between the different sets of Brown corpora.There is a thirty year period between the Brown corpus and the Frown corpus,and while it may be tempting to draw conclusions about language change,based on comparisons of frequencies,such conclusions may not reveal the full picture.For example,Leech(2002)noted that in general,use of modal verbs like should,ought,would and will is lower in the 1992 Frown corpus than it is in the 1961 Brown corpus(the picture is also similar in the British equivalents).This could result in a conclusion that people in these countries are using modal verbs less often.However,Leech warns that the change in frequency cannot be indicative of a smooth or linear decrease as we do not know what happened in the years between the two sampling points.There is always the chance that the picture of modal verb usage in 1992 is exceptional in some way and contradicts the pattern for the years 1990,1991,1993 and 1994.Indeed,Millar(2009),who used the diachronic Time corpus of American English(based on the magazine Time),found wider fluctuations of modal verb usage,which sometimes contradicted Leech(2002).The Time corpus is divided into decades and has data from 1923-2006,so there are more sampling points and no periods which are unaccounted-for.However,it could be the case that Millar’s findings are due to the fact that all of the data was sampled from one magazine,and it is difficult to argue that it is fully representative of American English.

A third issue when comparing diachronic change over time involves the nature of what is being compared.The Brown corpus used a sampling frame that was created in the early 1960s and was based on subdivisions that were decided by participants at a conference at Brown University,while the proportions of each genre included in the corpus was based on the amount of actual publications of those genres in the same period.The additions to the Brown family have used the same sampling frame,which on the one hand means that comparisons are highly valid.However,a sampling frame which is appropriate to 1961 may not be as fully representative of the picture of published writing in the present day.For example,genres which were not popular in the 1960s(such as horror fiction)would accordingly not be included in later versions of the Brown corpora,even though such genres might have since become popular.The same problem applies to comparisons between different cultures,for example,the Brown corpus contains a genre category called Adventure and Western fiction.However,British writers do not normally write western fiction,so the creators of the LOB and FLOB corpora have interpreted this category simply as Adventure fiction.

So while the Brown family are potentially useful in allowing us to examine change over time or between British and American English,the above issues also mean that results need to be considered with caution,and understood in relationship to the actual sampling frames that have been used.

The purpose of this chapter is to introduce a new addition to the Brown family,the AE06,and then describe how it was compared with the Brown and Frown corpora in order to identify lexis which appear to a)decreasing b)increasing in frequency or c)remaining stable.In the following sections I first describe the creation of the AE06,then the method used to identify changing or stable lexis.I then present the results obtained using this method and end the chapter by outlining some explanations for the results,which should give some hints about the direction in which American English is heading.

Research on diachronic linguistic change is of interest not only for members of the society who experience the change,but for those who are outside it.In particular,it is American English which is becoming a dominant version of English,and this model is taught in many EFL school around the world.It is thus important that textbooks reflect the actual way that American English is used,particularly for students who are likely to come into contact with native speakers e.g.through work contexts,or as a result of trying to publish or present their own research for an American or global audience.If textbooks are based upon an outdated model of American English and do not reflect more recent developments,then learners’language output will also appear quaint,strange or even unpleasant to native ears.To give an example,if it is the case,as Leech’s(2002)research suggests,that strong modal verbs like should and must are becoming less popular in American English,perhaps as a result of language users wanting to appear less autocratic in their interactions,then this is a feature of linguistic and cultural sensitivity which learners would benefit from knowing about.Such learners may be helped if they were made aware of alternatives such as need to and want to,which appear to be more popular.

2 The AE06 Corpus

The AE06(American English 2006)corpus was created by a team of researchers at Lancaster University in 2010.It followed the same sampling frame as the other Brown corpora(Table 1 shows the breakdown of the genres).Unlike the Brown and Frown corpora,the text samples were all collected from internet sources.However,it was stipulated that such texts had to have been published first in paper format and then later archived online.With the availability of so many texts now being archived on the Internet,this makes the task of building reference corpora much easier than in the past,and the corpus only took a few weeks to collect.This was also the same procedure which had been used to build a British corpus(also from 2006),a couple of years earlier(see Baker 2009).The year 2006 was chosen for the American corpus in order to facilitate comparisons with the British version,and also because it halved the gap in time between the two older versions of the American corpora.

Records were kept with regard to the title,author,date of publication,website address and word length of each file,as well as whether files were sampled from the beginning,middle or end of a large document or whether they were full samples.In the case of some of the news genres,multiple texts were required in order to make up a 2000 word sample(this is in accordance with sampling from the other Brown corpora).Where possible we tried not to always sample texts from the beginning of a document.In all,169 texts(34%)included one or more full texts,171(34%)were sampled from beginnings,120(24%)were from middles and(8%)40 were from ends.Table 1 Genres and Number of Texts in the AE06

试读结束[说明:试读内容隐藏了图片]

下载完整电子书


相关推荐

最新文章


© 2020 txtepub下载