Preparing for the Module One examination – useful links Weak form CLT: The goal of language teaching is communicative competence but all types of teaching . The use of language to maintain social relationships Concessive. What is their relationship with other elements in the clause and why do they end up .. structure should be handled separately in two different modules. This line of .. conditional and concessive clauses, which cannot fall within the scope of the negation Purpose and consequence clauses prefer a sentence-final position. Adjustments are measures or actions taken in relation to teaching, learning and .. use language to shape and make meaning according to purpose, audience and context . engage in speaking and listening components in each module*. conditional/concessive – to make conditions or concession.
It is a common belief that text corpora provide the best testing ground for solving any kind of linguistic problems. As far as grammar is concerned, this may be true, but if we focus on investigating the lexicon the results often appear to be rather superficial. WWW contains some relatively homogeneous arrays of texts formed independently of linguists, in some cases emerging quite spontaneously. Text arrays with the most prominent social characteristics of their authors are regarded as independent Internet segments digitized classical literature and teenager blogs are the most contrasting examples.
Frequencies of the same lexical items differ greatly from one segment to another, and this statistics is very significant for sociolinguistics. The main problem in applying the method of segmental statistics is the lack of a suitable instrument for automatic data processing. Several case studies are presented, and the results of segmental statistics seem to be more indicative than those obtained from the Russian National Corpus. The Italian SVC have been classified into lexical-semantic patterns, on the basis of Nsubj and Nobj semantic features and the Support verb lexical-semantic meaning.
Subsequently, the patterns have been grouped into the well-known actional classes of accomplishments, achievements, semelfactives, activities and states VendlerComrie The overall classification shows that most SVCs go hand in hand with the features of telicity as regards verbs and of concreteness and referentiality as regards NOBJand in these classes accomplishments, achievements there is a partial parallelism with Russian, whereas fewer Russian SVCs can be found in the activity and states verb classes.
Moreover, the presence of a high number of SVCs in the Russian corpus may be considered as a further evidence of the typological shift towards the analytic type that contemporary Russian is apparently undergoing see e. University of Bergen, Norway E-mail vs. The Influence of the communication channel on the language Does the mere change of the communication channel, unaccompanied by any other changes in situational characteristics, affect the language? Quantitative analysis of two corpora of Russian texts that differ solely by the communication channel from which they originate e-mail vs.
This study aims at looking into various formats of modern Russian-language internet communication in order to discover changes in sociocultural patterns and models of the discourse behavior that characterize values and norms of the contemporary Russian public life.
8 Transcriptions of Speech - The TEI Guidelines
This analysis should allow to better understand ideas and beliefs prevailing in the Russian public opinion, to trace its changes and emerging linguistic patterns. Bocharov Victor Mathlingvo, Bichineva Svetlana Granovsky Dmitry Ostapuk Natalia Stepanova Maria OpenCorpora OpenCorpora is a project that aims at creating an annotated corpus of Russian texts, which will be fully accessible to researchers, the annotation being crowd-sourced. The article deals with annotation quality assurance tools.
Its overall principles have not been completely determined yet, although some of the directions can already be specified: The paper outlines controversial problems for each direction and provides linguistic examples.
It contains both fundamental information on the Russian language grammatical and combinatory properties of words, semantic and paronymic relations between words and ample encyclopedic information on geographical objects, famous people, organizations, and artifacts. The dictionary includes technical terms and basic concepts of science, humanities, business, and economy. Among its applications is the possibility to form queries for Internet search engines on medicine, commerce, tourism, and other topics.
Voronezh State University A corpus-based study of noun cryptotypes in English We develop a method of identifying noun cryptotypes in English, relying primarily on the Corpus of Contemporary American COCA and the results of typological studies. The study uses data-oriented and theory-oriented approaches to linguistic description. A cryptotype is referred to the principle of distribution of nouns among classes in accordance with a certain semantic feature and with reference to the typological principle of contrastive grammar.
The class membership of a noun is evidentially revealed in syntax, particularly in collostructions which bear the classifying function of the noun class. The semantic, morphologic and syntactic criteria for identification of a noun class are discussed.
The study of cryptotypes concerns the issues of grounding, recognition, and reasoning. An adequately formalized description of cryptotypes can be used in computational modeling and text processing. MSLU Parameter of nearness in the metaphorical space In this work a conception of using deictic means as indicators of spatial relations for function of modal intensifying particles is developed. The modal meanings, different from index ones, are the approximation meanings, the intensifying meanings and a number of other ones.
The situation with the indication VON being a sign or an instruction of searching the required object, is quite different. Naturally, the question on searching arises usually when the object is far.
However, it is possible to search in rather near space. Therefore the description of difference in use of particles VOT and VON in modal functions can be compared with concepts of nearness and distance, but taking into account the described distinctions in their semantics. It turns out that the features of metaphorical use of VOT and VON are connected with their meanings in terms of indication, and not just with opposition on degree of distance, but with ways of indication.
Therefore the opposition is metaphorised, too. We can say that particles specify different ways of searching objects in metaphorical spaces. In all cases the concept of nearness and distance as derivatives of identification and searching operations is not connected with the participants of the intercourse, but with distance between the actual representations realized at present intercourse, and new concepts, objects, properties involved for semantic, emotional and other problems.
Web shop classifier We examine two categories of search results retrieved in response to product queries. This classification reflects the two main kinds of user intents — product reviews and online shops. We describe the training and test samples, classification features, and the classifier structure. Our findings demonstrate that this method has good quality and performance, suitable for real-world applications. The maximal degree belonging to this set serves as a standard in the construction.
We argue against contextual and comparative analyses either explicitly or implicitly assumed in the literature. Instead, we propose that the purpose is an argument of certain gradable adjectives, and the whole construction is a positive construction.
We try to pinpoint the difference between Russian and English functional standards. Research Computing Center, Lomonosov Moscow State University Three-way movie review classification We consider a three-way classification approach for Russian movie reviews.
All reviews are divided into groups: To solve this problem we use various sets of words together with such features as word weights, punctuation marks and polarity influencers that can affect the polarity of the subsequent words. We also estimate the maximum upper limit of automatic classification quality in this task.
D Voice emotion classification: The classification efficiency for different acoustic features was estimated and a very small set of the most reliable characteristics was extracted in order to obtain a robust and quick emotion state classification. A recommended set of features and linear-kernel SVM was used to solve the same problem. Under the several conditions, such as in the case of obtaining a decision support factor in the systems of real-time speech analytics the simplified classification scheme would be more preferable than a complex one.
E A new semantics: The traditional extensional and intensional models of semantics are difficult to actually flesh out in practice, and no large-scale models of this kind exist. This talk argues for a new kind of semantics that combines traditional symbolic logic-based proposition-style semantics of the kind used in older NLP with computation-based statistical word distribution information what is being called Distributional Semantics in modern NLP.
I show how to define such a lexicon, how to build and format it using tensors, and how to use it for various tasks. I discuss some of the recent work on composing vectors and tensors in attempts to produce statistically-based compositional semantics. Combining the two views of semantics opens many fascinating questions that beg study, including the operation of logical operators such as negation and modalities over word sense distributions, the nature of ontological facets required to define concepts, and the action of compositionality overstatistical concepts.
We propose a method for integration of a spellchecker and parser, which allows us on the one hand to correct typographical errors considering the context and on the other hand to increase the robustness of the parser.
We start by outlining various types of misprints and ways to correct them, taking account of the specifi c character of keyboard typing and typical mistakes. To correct the misspellings and misprints we propose to use a modified Levenshtein algorithm, in which each pair of characters involved in calculation of the Levenshtein distance is assigned a specific weight from the interval. This accounts for keyboard typing, phonetically similar characters, similarity between Russian and Latin alphabet symbols, numbers and other symbols.
The paper states the need to take into account the lexical context of the words to be corrected in order to achieve the maximum accuracy of correction, which helps correct words used in an unusual context.
As a result we get a number of correction options for the words. The fi nal choice is made by the Dictascope parser Basing on the modified Eisner algorithm, the parser builds a dependency tree for the sentence. The modification includes punctuation checking and some additional linguistic limitations. In our model several vertices of interpretations correspond to one word, and variants of spell correction could be processed in the same way as morphological interpretations.
The integration of misprint correction and syntactic analysis is illustrated by a simple case correcting a single word and a more complex case — splitting a word in two or merging two words into one.
The proposed method of integration of the parser and the spellchecker modules was implemented in the Dictascope Syntax system. This made it possible to considerably increase the stability of the parser and provided an opportunity to use it as a component of the opinion mining system for monitoring of blogs and forums. Lomonosov Moscow State University Experimental analysis of discourse: The results showed that in the situation participants choose full NPs.
The work, begun inis still in progress. The lexical items which are identified as values and arguments of collocate Lexical Functions LFs are tagged in syntactically annotated Russian sentences.
For that purpose, a framework such as logic programming that integrates a logic-based view of language processing with inferential capabilities seems to be a good option. Discourse analysis challenges Discourse structure analysis is a very challenging task because of the large diversity of discourse structures, the various forms they take in language and the impact of knowledge and pragmatics in their identification Longacre ; Keil and Wilson Recognising discourse structures cannot in general only be based on purely lexical or morphosyntactic considerations: These latter capture the various facets of the influence of pragmatic factors in our understanding of texts Kintsch ; Di Eugenio and Webber The importance of structural and pragmatic factors does depend on the type of relation investigated, on the textual genre and on the author and targeted audience.
In our context, technical texts are obviously much easier to process than free-style texts. Rhetorical structure theory RST Mann and Thompsonis a major attempt to organise investigations in discourse analysis, with the definition of 22 basic structures. Since then, almost relations have been introduced which are more or less clearly defined. Background information about RST, annotation tools and corpora are accessible at http: A recent overview is developed in Taboada and Mann Very briefly, RST poses that coherent texts consist of minimal units, which are all linked with each other, recursively, through rhetorical relations.
No unit is left pending: Some text spans appear to be more central to the text purpose, these are called nuclei or kernelswhereas others are somewhat more secondary, they are called satellites. Satellites must be associated with nuclei. Relations between nuclei and satellites are one-to-one or one-to-many.
For example, an argument conclusion may have several supports, possibly with different orientations. Conversely, a given support can be associated with several distinct conclusions.
For example, in the sentence: Note that these two structures are not necessarily adjacent. The literature on discourse analysis is particularly abundant from a linguistic point of view. Several approaches, based on corpus analysis with a strong linguistic basis are of much interest for our purpose.
Relations are investigated together with their linguistic marks in works such as Delin, Hartley, Paris, Scott, and Vander LindenMarcuMarcuKosseim and Lapalme with their usage in language generation in Rosner and Stedeand in Saito, Yamamoto, and Sekine with an extensive study on how marks can be quite systematically acquired. A deeper approach is concerned with the cognitive meaning associated with these relations, how they can be interpreted in discourse and how they can trigger inferential patterns von WrightMoeschler and Fiedler just to cite a few works.
Within computational linguistic circles, RST has been mainly developed in natural language generation for content planning purposes, e. Kosseim and Lapalme and Reed and Long Besides this area, Marcudeveloped a general framework and efficient strategies to recognise a number of major rhetorical structures in various kinds of texts.
The main challenges are the recognition of textual units and the identification of relations that hold between them. The rhetorical parsing algorithm he introduced relies on a first-order formalisation of valid text structures which obey a number of structural assumptions.
These, however, seem to be somewhat too restrictive w. In particular, our observations show that the following assumptions are too restrictive: His work is based on a number of psycholinguistic investigations Grosz and Sidner which show that discourse markers are used by human subjects both as cohesive links between adjacent clauses and as connectors between larger textual units.
An important result is that discourse markers are used consistently with the semantics and pragmatics of the textual units; they connect and they are relatively frequent and unambiguous. Argumentation and discourse analysis We consider that the various types of arguments form a family of rhetorical relations.
In our case study, a conclusion is a nucleus and a support is a satellite: From a syntactic point of view, a support is a right or left adjunct to a conclusion, which is its head. In other terms, a support must be connected to a conclusion to be syntactically acceptable. In general, kernels have an autonomous meaning, while satellites get their full meaning in context from their association with a kernel.
This whole sentence can then be a kernel or a satellite for another statement. In turn, this latter satellite could include the reasons of such an opposite view, e. An argumentation framework is a complex graph structure which exhibits the various conceptual relations that hold between arguments support and attack are the most well known, but there are many others, e.
We will not develop this aspect here since there is an abundant, well-written literature on argument typology, argument schemes, etc. A synthesis of computational models for processing arguments is reported in Reed and Grasso Logical aspects of argumentation are particularly well developed in general, as in, e. The distinction between arguments and other discourse relations may not be straightforward.
Processing natural language arguments with the <TextCoop> platform - IOS Press
The definition of this relation is clearly very vague, as we understand it. We view it as a subtype of the elaboration relation, which is in our view a kind of proto-relation.
- There was a problem providing the content you requested
- P5: Recommandations pour l'encodage et l'échange de textes électroniques
The evidence relation may share similar language realisations with arguments in the surface, but its contents are substantially different, for example, it does not include any notion of warning or advice, support or attack, or even of logical consequence, which are at the heart of the family we are analysing. Our goal is to define proto-typical language forms for a given set of arguments related to the family: In this article, we first consider procedural texts, which abound in warnings and advice, which are instances of that family.
These occur in isolation: We then consider a special subgenre of procedures, didactic texts, which merges instructions with a large variety of arguments with various forms of explanation Bourse and Saint-Dizier If we consider rhetorical relations from a language point of view over various types of texts, we note that a number of these relations e.
This is the case in particular for the elaboration relation and for various types of arguments. These latter relations have a much larger conceptual scope than, e.
Similar to Dowty for thematic roles, we consider that these complex relations are proto-relations which need to be defined according to their various facets via constitutive properties. Defining the facets of proto-relations associated with argument types is ongoing; it is beyond the scope of this article. Article organisation This article is organised as follows: The approach being based on generative syntax principles, we introduce quite a powerful rule system, constrained by various types of restrictive principles.
This is an open and complex problem with few contributions so far. The section ends by an accurate evaluation of the rule coverage and accuracy and the system performances.
Processing natural language arguments with the platform
This allows us to develop a robust set of rules to process the family of arguments introduced above. Developing an explanation theory is, again, a very open problem.
In particular, we show that the type of generic structure proto-argument we investigate in this paper occurs very frequently in a large diversity of types of texts, making this relation crucial for argument analysis and the development of argumentation networks. We then introduce the Dislog language Dislog stands for discourse in logic or discontinuities in logic since discourse structure analysis is based on marks which often are in long-distance dependency relations.
Dislog is further motivated and illustrated in the next sections devoted to argument analysis. There are at the moment a few well-known and widely used language-processing environments.
They are essentially used for sentence processing, not for discourse analysis. The reasons are essentially that the sentence level and its substructures are the crucial level of analysis for a large number of applications such as information extraction, opinion analysis based on noun modifiers or machine translation.
Discourse analysis turns out to be not so critical for these applications. However, applications such as summarisation Marcu or question-answering do require an intensive discourse analysis level, as shown in Jin and Simmons Dedicated to sentence processing, let us note the GATE platform http: Besides some specific features for simple aspects of discourse processing, none of these platforms allow the specifications of rules for an extensive discourse analysis nor the introduction of reasoning aspects, which is essential to introduce pragmatic considerations into discourse processing.
GATE is used, e. It also includes research on audio visual and language connections. Linguastream has components to mainly deal with part of speech and syntactic analysis. It also handles several types of semantic data with a convenient modular approach.
It is widely used for corpus analysis. Finally, Marcu developed a discourse analyser for the purpose of automatic summarisation. This system is based on the RST assumptions which are not always met in texts, as developed in the section below.
On a very different perspective, and also inspired by sentence syntax, two approaches based on Tree Adjoining Grammars TAGs Gardent ; Webber extend the formalism of TAGs to the processing of discourse structures via tree anchoring mechanisms.
The approach remains essentially lexically based and is aimed at connecting propositions related by various discourse connectors or at relating text spans which are in a referential relation. Some linguistic considerations Most works dedicated to discourse analysis have to deal with the triad: By function, we mean a kernel or a satellite of a rhetorical relation, e. Functions are realised by textual structures which need to be accurately delimited. Functions are not stand alone: In general, the recognition of satellite functions is easier than the recognition of their corresponding kernel s because they are more strongly marked.
For example, it is quite straightforward to recognise an illustration although we defined 20 generic rules that describe this structurebut identifying the exact text span which is its kernel i.
Similarly, the support of an argument is marked much more explicitly than its conclusion. When identifying discourse structures in texts, and in particular when attempting to identify arguments, finding their textual boundaries is very challenging. In addition, contrary to the assumptions of RST e.
Grosz and Sidner ; Marcu partly overlapping textual units can be involved in different discourse relations. For example, argument conclusions and supports are not necessarily contiguous: We also observed a number of one-to-many relations, besides the standard one-to-one relations: As a consequence, the principle of textual contiguity cannot be applied systematically.
For that purpose, we have developed a principle called selective binding, which is also found in formal syntax to deal with long-distance dependencies or movement theory. The necessity of a modular approach, where each aspect of discourse analysis is dealt with accurately in a specific module, while keeping open all the possibilities of interaction or concurrency between modules has led us to consider some simple elements of the model of generative syntax a good synthesis is given in Lasnik and Uriagereka As shall be discussed later, we introduce: Another foundational feature is an integrated view of marks used to identify discourse functions, merging lexical objects with morphological functions, typography and punctuation, syntactic constructs, semantic features and inferential patterns that capture various forms of knowledge domain, lexical, textual.
If machine learning is a possible approach for sentence processing, where interesting results have emerged, it seems not to be so successful for discourse analysis e.
Carlson, Marcu, and Okurowski This is due to two main factors: For these reasons, we adopted a rule-based approach. Rules are hand coded, based on corpus analysis using bootstrapping tools. Dislog rules basically implement the productive principles. They are composed of three main parts: This is developed in Section 2. More complex representations, e. This is of much interest since our analysis is oriented towards a conceptual analysis of discourse, and, in particular, semantic aspects of arguments.
Besides rules, Dislog allows the specification of a number of restrictive principles, e. The structure of Dislog rules Let us now introduce in more depth the structure of Dislog rules. Dislog follows the principles of logic-based grammars as implemented three decades ago in a series of formalisms, among which, most notably: These formalisms were all designed for sentence parsing with an implementation in Prolog via a meta-interpreter or a direct translation into Prolog Saint-Dizier The last two formalisms include a simple device to deal with long distance dependencies.
Various processing strategies have been investigated in particular bottom-up parsing, parallel parsing, constraint-based parsing and an implementation of the Earley algorithm that merges bottom-up analysis with top-down predictions. These systems have been used in applications, with reasonable efficiency and a real flexibility to updates, as, e. Dislog adapts and extends these grammar formalisms to discourse processing, it also extends the regular expression format which is often used as a basis in language-processing tools.
The rule system of Dislog is viewed as a set of productive principles. A rule in Dislog has the following general form, which is globally quite close to Definite Clause Grammars in its spirit: It can also be a partial dependency structure or a more formal representation. These are included between curly brackets as in logic grammars to differentiate them from grammar symbols. R is a finite sequence of the following elements: These are used to capture various forms of generalisations, facilitating rule authoring and update.
Non-terminal symbols do not include discourse structure symbols: Dislog rules cannot call each other, this feature is dealt with by the selective-binding principle, which includes additional controls. A rule in Dislog thus basically encodes the recognition of a discourse function taken in isolation. A gap can appear only between terminal, preterminal or non-terminal symbols.
Dislog offers the possibility to specify in a gap a list of elements which must not be skipped: The length of the skipped string can also be controlled. Similar to DCGs and to Prolog clause systems, it is possible and often necessary to have several rules to describe the different realisations of a given discourse function. These all have the same identifier L, as it is the case, e. A set of rules with the same identifier is called a cluster of rules.
Dislog advanced features In this section, we describe the features offered by the Dislog language that complement the grammar rule system. These mostly play the role of restrictive principles. At the moment we have three sets of devices: Concurrency statements are closely related to the cascade system.
They are constrained by the notion of bounding node, which delimits the text portion in which discourse units can be bound. Similarities with sentence formal syntax are outlined when appropriate, however, phenomena are substantially different. Selective-binding rules Selective-binding rules is the means offered by Dislog to construct hierarchical discourse structures from the elementary ones identified by the rule system.
Selective-binding rules allow to link two or more already identified discourse functions. Moreover, the ethical problems in recording and making public what was produced in a private setting and intended for a limited audience are more frequently encountered in dealing with spoken texts than with written ones. Speech also poses difficult structural problems.
Unlike a written text, a speech event takes place in time. Its beginning and end may be hard to determine and its internal composition difficult to define. Most researchers agree that the utterances or turns of individual speakers form an important structural component in most kinds of speech, but these are rarely as well-behaved in the structural sense as paragraphs or other analogous units in written texts: Speech itself, though it may be represented as words, frequently contains items such as vocalized pauses which, although only semi-lexical, have immense importance in the analysis of spoken text.
Even non-vocal elements such as gestures may be regarded as forming a component of spoken text for some analytic purposes. Below the level of the individual utterance, speech may be segmented into units defined by phonological, prosodic, or syntactic phenomena; no clear agreement exists, however, even as to appropriate names for such segments. Spoken texts transcribed according to the guidelines presented here are organized as follows.
Even texts primarily composed of transcribed speech may also include conventional front and back matter, and may even be organized into divisions like printed texts. We may say, therefore, that these Guidelines regard transcribed speech as being composed of arbitrary high-level units called texts. A spoken text might typically be a conversation between a small number of people, a lecture, a broadcast TV item, or a similar event.