3 A Model for Discourse Comprehension - General Guide To ...

Part VI ? IT CONTRACT TERMS AND CONDITIONS Page 85 ..... Enterprise
Service Desk (ESD) as part of a comprehensive ITSM solution. ..... I-28. Use of
Electronic Versions of this RFP. This RFP is being made available by electronic
means. ...... Assembly who exercises any functions or responsibilities under this
Contract ...

Part of the document


From E-Language to I-Language: Foundations of a Pre-Processor for the Construction Integration Model. Christopher Mark Powell Submitted in partial fulfilment of the requirements of Oxford Brookes
University for the degree of Doctor of Philosophy February 2005 Abstract This thesis is concerned with the 'missing process' of the Construction
Integration Model (CIM - a model of Discourse Comprehension), namely the
process that converts text into the logical representation required by that
model and which was described only as a requirement by its authors, who
expected that, in the fullness of time, suitable grammar parsers would
become available to meet this requirement. The implication of this is that
the conversion process is distinct from the comprehension process. This
thesis does not agree with this position, proposing instead that the
processes of the CIM have an active role in the conversion of text to a
logical representation. In order to investigate this hypothesis, a pre-processor for the CIM is
required, and much of this thesis is concerned with selection and
evaluation of its constituent elements. The elements are: a Chunker that
outputs all possible single words and compound words expressed in a text; a
Categorial Grammar (CG) parser modified to allow compounds and their
constituent words to coexist in the chart; classes from abridged WordNet
noun and verb taxonomies comprising only the most informative classes;
revised handling of CG syntactic categories to take account of structural
inheritance, thereby permitting incremental interpretation, and finally
extended CG semantic categories that allow sense lists to be attached to
each instantiated semantic variable. In order to test the hypothesis, the elements are used to process a Garden
Path sentence for which human parsing behaviour is known. The parse is
shown to build interpretation incrementally, to appropriately sense-tag the
words, derive the correct logical representation and behave in a manner
consistent with expectations. Importantly, the determination of coherence
between proposed sense assignments of words and a knowledge base, a
function of the CIM, is shown to play a part in the parse of the sentence.
This provides evidence to support the hypothesis that the CIM and the pre-
processor are not distinct processes. The title of this thesis, 'From E-Language to I-Language: Foundations of a
Pre-Processor for the Construction Integration Model', is intended to
circumscribe the work contained herein. Firstly, the reference to Chomsky's
notions of E-Language (External(ised) Language) and I-language
(Internal(ised) Language) make clear that we acknowledge these two aspects
of language. Chomsky maintains that E-Language, such as English, German,
and Korean, are mere 'epiphenomena', a body of knowledge or behavioural
habits shared by a community, and as such are not suitable subjects for
scientific study. I-Language, argues Chomsky, is a 'mental object', is
biologically/genetically specified, equates to language itself and so is a
suitable object of study. We shall not pursue the philosophical arguments
and counter-arguments concerning E-Language and I-Language (but see for
example [DUMM86], [CHOM96]), but shall use the notions of E-Language and I-
Language to differentiate between the natural language text to be
processed, which can be unique to a community, geographical and/or temporal
location, or to some extent to an individual, and the internal, structured,
world-consistent representation of that text, and the cognitive processes
involved in the representation creation, which being 'genetically
specified' can be assumed common to all humans. This thesis is therefore
concerned with the interface between these two aspects of language, and
specifically in how the internal cognitive processes of I-Language,
outlined in theories such as the Construction-Integration Model, interact
with external representations of language in order to construct internal
representative models of that E-Language. Secondly, 'Foundations' indicates that this work does not deliver a fully
functioning natural language processing system, but draws together
'distinct' linguistic research threads (e.g. Chunking, Word-Sense
Disambiguation, Grammar Parsing, and theories of grammar acquisition), to
describe the process of converting a natural language text into a logically
structured and plausibly sense-tagged representation of that text. As such,
this thesis is a 'proof of concept', and must be followed by future
evaluative work.
Acknowledgements
Firstly, I would like to thank my first supervisor, Mary Zajicek, and
second supervisor, David Duce, for keeping me on the straight and narrow,
for the encouragement they gave, and for making me believe that I would
actually cross the finish line. I am most grateful for their efforts in
proofreading the thesis and the helpful feedback they provided - my
submission deadline was approaching fast and they pulled out all the stops
to make it happen. I am also indebted to Mary for the many opportunities my
association with her have presented, for the interesting projects and
foreign travel I have enjoyed, and for her continued support and promotion. I must also thank my examiners, Mary McGee Wood and Faye Mitchell, for an
enjoyable viva and for their constructive comments and enthusiasm both
during and after. I owe thanks to Marilyn Deegan for inviting me to 'The Use of Computational
Linguistics in the Extraction of Keyword Information from Digital Library
Content' workshop, Kings College London, Feb. 2004. Preparation for the
workshop gave me a vital push at just the right moment and led to a
consolidation of my work on Specialisation Classes. I would also like to
thank Dawn Archer and Tony McEnery of Lancaster University for their useful
and encouraging comments during the workshop. My fellow research students, Alvin Chua, Jianrong "ten pints" Chen, Samia
Kamal, Sue Davies, Tjeerd olde-Scheper and Nick Hollinworth contributed
hugely to an enjoyable and rewarding time in the Intelligent Systems
Research Group. They provided useful insights from the perspectives of
their own research fields, and shoulders to cry on when the going got
tough. A big thanks to my good friend Tjeerd who is always happy to play
Scully to my Mulder, and whose knowledge of Chaotic Computation is second
only to his knowledge of the finest single malts. Our anticipated research
trip to Islay will be most interesting. Thanks are due to Ken Brownsey, chair of the East Oxford Logic Group, who
once taught me inspirational and useful things like LISP and Functional
Programming. His jokes baffle some and delight others. Writing up was a very solitary and sedentary experience, as was the design
and implementation of the software developed during the course of this
work. However, I was helped during these times by two special chums - a big
thanks to Daisy for taking me on daily walks to ensure I got fresh air in
my lungs and the sun on my face, and to Splodge who slept on my lap and
kept it warm whilst I worked at the computer. Finally I thank Lindsay for putting up with me through my times of elation,
depression, absence, and presence. Without her love and support I would
never have been able to complete this work, and I shall be eternally
grateful to her. She's embarking on her own research degree next year, so
it is my turn to be tested in the supporting role. Table of Contents Abstract i
Acknowledgements iv
1 Introduction 1
1.1 Structure of thesis 2
2 Review of Summarisation Techniques 7
2.1 Early Summarisation Methods 7
2.1.1 Statistical 7
2.1.2 Formal Patterns 9
2.1.3 Discussion 10
2.2 Linguistic Approaches 11
2.2.1 Linguistic String Transformation 12
2.2.2 Micro to Macro Proposition Transformation 12
2.2.3 Discussion 13
2.3 Psychological Approaches. 14
2.3.1 Text-Structural Abstracting 14
2.3.2 Discussion 14
2.4 AI Approaches. 15
2.4.1 FRUMP 15
2.4.2 SUZY 15
2.4.3 TOPIC 16
2.4.4 SCISOR 16
2.4.5 Discussion 17
2.5 Renaissance Approaches 17
2.5.1 Paragraph extraction 18
2.5.2 Formal Patterns revisited 18
2.5.3 Lexical Cohesion 18
2.5.4 SUMMARIST 20
2.5.5 Discussion 21
2.6 Web Page Summarisation 23
2.6.1 Page Layout Analysis 23
2.6.2 BrookesTalk 23
2.6.3 Discourse segmentation 24
2.6.4 Gists 24
2.6.5 The Semantic Web 25
2.6.6 Discussion 26
2.7 Conclusions 27
3 A Model for Discourse Comprehension 29
3.1 Background to the CIM 30
3.2 Experimental Evidence Supporting the CIM 31
3.2.1 Evidence for Propositions 32
3.2.2 Evidence for Micro and Macro Structures. 33
3.3 The Construction-Integration Model 34
3.4 Conclusion. 36
4 A Psychologically Plausible Grammar 39
4.1 Elements of a CIM Pre-Processor 39
4.1.1 Sense is central to grammatical form 40
4.1.2 Sense is central to coherence discovery 41
4.1.3 A mutually constraining approach 42
4.2 Selection of the grammar parser 43
4.3 Inside-Out Theories 44
4.3.1 Evidence for the Poverty of the Stimulus Argument. 44
4.3.2 Principles and Parameters 46
4.3.3 Against the Inside-Out Theories 47
4.4 Outside