Guiding Theme B2: Content comparison through large-scale inference and reasoning

Guiding Theme B2 deals with the pairwise comparison and evaluation of statements, e.g. regarding their similarity or entailment. This is an important step towards multi-document summarization, since it allows to distinguish between similar parts of different documents, which should be summarized together, and completely independent parts in different documents, which cannot be summarized as one. Since predicting the similarity or entailment of statements is a difficult task, additional background knowledge is required, which can be learned or extracted at large scale.

Example Ph.D. project

This guiding theme conducts innovative research on new methods for tackling difficult pairwise statement classification tasks. Example tasks are textual entailment and natural language inference, where two statements are classified regarding whether or not one entails the other, or argument reasoning, where an argument (component) is classified as supporting or attacking another argument (component). For these tasks it is not sufficient to check the overlap or similarity between the two statements – rather, they require reasoning based on background knowledge. In order to apply reasoning to natural language statements in combination with knowledge bases, two connected problems have to be investigated:

  1.  Finding a suitable representation of the two statements that can be used in combination with an appropriate reasoning paradigm. This may be a logical representation, e.g. the parse structures of the statements as obtained using semantic parsing methods, or a numerical representation in the form of distributional semantics such as word or sentence vectors. Accordingly, purely logical or probabilistic reasoning methods may be applied.
  2. Choosing a (combination of) knowledge base(s), e.g. Wikidata, DBPedia, Freebase, Wordnet, etc., that provides background knowledge usable as a resource for the chosen reasoning paradigm.

In addition to creating a new state of the art approach for pairwise classification problems by injecting additional knowledge and applying reasoning methods, the PhD project aims at achieving explainability of the classification. For example, an explanation as to why a statement entails another one in terms of the reasoning steps and the background knowledge involved.

This project provides opportunities for collaboration with research area C (especially semantic role labelling in C3) as well as with the guiding themes A2 in terms of semantic relation extraction and D1 in terms of stance classification.

Research results of the first Ph.D. cohort

Guiding theme B2 aims at researching content selection methods for multi-document summarisatization, which is the first step in summarizing a document, whether it is abstractive or extractive. The goal of content selection is the identification and extraction of the key elements from the source documents. We focused on optimization-based content selection, as this yields very promising results and allows us to closely cooperate with guiding theme D2, which is concerned with analyzing and defining suitable evaluation metrics for multi-document summarisatization.

In particular, we formulated content selection as an optimization problem, where the goal is to choose a set of information nuggets which have some desired properties while meeting a length constraint. The objective function of the optimization methods should approximate as closely as possible the quality judgment of a summary. To this end, we developed objective functions to approximate the ROUGE metric (Peyrard and Eckle-Kohler, 2016a) and the Jensen-Shannon divergence (Peyrard and Eckle-Kohler, 2016b). We also explored the use of genetic algorithms to generate training data (Peyrard and Eckle-Kohler, 2017a) and used it to optimize towards the recent Automatic Pyramid metric (Peyrard and Eckle-Kohler, 2017b) and, in a collaboration with C3, towards human judgments (Peyrard et al., 2017).


Maxime Peyrard, Judith Eckle-Kohler. (2016a). Optimizing an Aproximation of ROUGE- a Problem-Reduction Approach to Extractive Multi-Document Summarization, In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), p 1825-1836. Berlin, Germany.

Maxime Peyrard, Judith Eckle-Kohler. (2016b). A General Optimization Framework for Multi-Document Summarization Using Genetic Algorithms and Swarm Intelligence, In: Proceedings of the 26th International Conference on Computational Linguistics (COLING), p. 247-257. Osaka, Japan.

Maxime Peyrard, Judith Eckle-Kohler. (2017a). A Principled Framework for Evaluating Summarizers: Comparing Models of Summary Quality against Human Judgments, In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), p. 26-31. Vancouver, Canada.

Maxime Peyrard, Judith Eckle-Kohler. (2017b). Supervised Learning of Automatic Pyramid for Optimisation-Based Multi-Document Summarisation, In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), p. 1084-1094. Vancouver, Canada.

Maxime Peyrard, Teresa Botschen, Iryna Gurevych. (2017). Learning to Score System Summaries for Better Content Selection Evaluation, In: Proceedings of the EMNLP workshop "New Frontiers in Summarisation", p. 34-44. Copenhagen, Denmark.

A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang