This paper is developed around two major arguments. Experiments on cross-lingual datasets provided by SemEval 2013 show that our method significantly outperforms the baseline systems and previous work. Besides, a new effective Kullback-Leibler (KL) based criterion is proposed to select the results from all possible iterations. We use those two views to conduct the co-training procedure to perform classification. Then we measure the similarities and differences between two texts and regard them as sufficient and redundant views. We first use an off-the-shelf machine translation tool to eliminate the language gap between two texts. In order to overcome the high cost of human annotation and further improve the recognition performance, we present a novel co-training approach to solve this problem. Previous work adopted machine learning algorithms and similarity measures as features to address this task. The paper highlights the different complementary and synergistic components and integration efforts, and presents some preliminary evaluation results on the inclusion of such resources in the eSPERTo paraphrase generation system.Ĭross-lingual textual entailment is a relatively new problem that detects the entailment relationship between two text fragments written in different languages. A set of local grammars explore the properties described in linguistic resources, enabling a variety of text transformation tasks for paraphrasing applications. The resource components include: (i) a lexicon-grammar based dictionary of 2100 predicate nouns co-occurring with the support verb ser de ‘be of’, such as in ser de uma ajuda inestimável ‘be of invaluable help’ (ii) a lexicon-grammar based dictionary of 6000 predicate nouns co-occurring with the support verb fazer ‘do’ or ‘make’, such as in fazer uma comparação ‘make a comparison’ and (iii) a lexicon-grammar based dictionary of about 5000 human intransitive adjectives co-occurring with the copula verbs ser and/or estar ‘be’, such as in ser simpático ‘be kind’ or estar entusiasmado ‘be enthusiastic’. This paper presents a new linguistic resource for the generation of paraphrases in Portuguese, based on the lexicon-grammar framework. The results show that the proposed methods have better performances than the baselines based on the established CIP dataset. We further deploy three baselines and two novel CIP approaches to deal with CIP problems. To circumvent difficulties in acquiring annotations, we first establish a large-scale CIP dataset based on human and machine collaboration, which consists of 115,530 sentence pairs. In this study, CIP task is treated as a special paraphrase generation task. Since the sentences without idioms are easier handled by Chinese NLP systems, CIP can be used to pre-process Chinese datasets, thereby facilitating and improving the performance of Chinese NLP tasks, e.g., machine translation system, Chinese idiom cloze, and Chinese idiom embeddings. CIP aims to rephrase idioms-included sentences to non-idiomatic ones under the premise of preserving the original sentence's meaning. This study proposes a novel task, denoted as Chinese Idiom Paraphrasing (CIP). Due to the properties of non-compositionality and metaphorical meaning, Chinese Idioms are hard to be understood by children and non-native speakers. Go to “Leave a Reply” at the bottom of this page.Idioms, are a kind of idiomatic expression in Chinese, most of which consist of four Chinese characters.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |