Guiding Theme C2: Methods for contextual and constraint-based ranking

Goal of this Guiding theme is the development of suitable ranking algorithms that will be used in other parts of the project, with a particular focus on multi-document summarization. Possible thesis topics relate to ranking problems in multi-document summarization.


The first key task is to support multi-document summarization by ranking sentences which can be selected for a summary (cf. B2). A key challenge is to find a suitable calibration point that separates relevant from not relevant sentences and thus allows a termination of the selection process when all relevant sentences have been added. For this, we intend to adapt methods that have been developed for label ranking (Hüllermeier & Fürnkranz, 2010) to object ranking problems. Also, rankings will be computed dynamically. When a sentence is selected for inclusion into the summary, this may change the ranking of other sentences because some of them may no longer be necessary because the information they carry has already been included.

The second key problem is to develop techniques that are able to combine multiple local rankings to an overall ranking which respects a given set of global constraints. Ranking problems occur at different levels of a language processing chain, such as the semantic level, where multiple meanings of a word must be ranked, or the discourse level, where common entities and complex event structures have to be identified across heterogeneous documents (cf. A1 and A2). These rankings problems can be solved in isolation, but they also have to respect global constraints.

Research results of the first Ph.D. cohort

In guiding theme C2, the main focus was the identification of important in formation units, but using machine learning and ranking methods. In particular, we developed supervised and unsupervised machine learning methods for estimating the intrinsic importance of text units, and used them as the backbone of the CPSum summarization system. Unlike conventional approaches, CPSum does not rely on centrality or structural features as indicators for information importance, but learns to rank sentences according to their perceived information importance directly from a background corpus (Zopf et al., 2016a). Furthermore, we developed a methodology to evaluate automatically generated summaries without reference summaries (Zopf, 2018). The basic summarization algorithm of CPSum, which generates contextual rankings of sentences for addressing both importance and redundancy jointly (Zopf, 2015; Zopf et al., 2016b), can not only learned information importance but also use a wide variety of different annotation types that are contributed by other guiding themes, such as named entities (A1), events and relations between them (A2), opinions (A3), concepts (B2), motifs (C1), and frames (C3), to estimate importance of text units. A joint publication that integrates the results from these work packages into our summarization system is currently under review.


  1. Markus Zopf, Eneldo Loza Mencía, Johannes Fürnkranz. Which Scores to Predict in Sentence Regression for Text Summarization? In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p. (to appear), June 2018.
  2. Markus Zopf. Estimating Summary Quality with Pairwise Preferences. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p. (to appear), June 2018, Association for Computational Linguistics.
  3. Markus Zopf. auto-hMDS: Automatic Construction of a Large Heterogeneous Multilingual Multi-Document Summarization Corpus. In: Proceedings of the 11th edition of the Language Resources and Evaluation Conference (LREC 2018), p. (to appear), May 2018.
  4. Markus Zopf, Maxime Peyrard, Judith Eckle-Kohler. The Next Step for Multi-Document Summarization: A Heterogeneous Multi-Genre Corpus Built with a Novel Construction Approach. In: Proceedings of the 26th International Conference on Computational Linguistics, p. 1535--1545, December 2016.
  5. Markus Zopf. SeqCluSum: Combining Sequential Clustering and Contextual Importance Measuring to Summarize Developing Events over Time. In: Proceedings of the 24th Text Retrieval Conference, November 2015, National Institute of Standards and Technology.
  6. Markus Zopf, Eneldo Loza Mencía, Johannes Fürnkranz. Beyond Centrality and Structural Features: Learning Information Importance for Text Summarization. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL 2016), p. 84-94, August 2016. Association for Computational Linguistics.
  7. Markus Zopf, Eneldo Loza Mencía, Johannes Fürnkranz. Sequential Clustering and Contextual Importance Measures for Incremental Update Summarization. In: Proceedings of the 26th International Conference on Computational Linguistics, p. 1071--1082, December 2016.
  8. Fürnkranz, J. und Hüllermeier, E., editors (2011). Preference Learning. Springer-Verlag.
  9. Fürnkranz, J., Hüllermeier, E., Loza Mencía, E., und Brinker, K. (2008). Multilabel Classification via Calibrated Label Ranking. Machine Learning, 73(2):133–153.
  10. Hüllermeier, E. und Fürnkranz, J. (2010). On Predictive Accuracy and Risk Minimization in Pairwise Label Ranking. Journal of Computer and System Sciences, 76(1):49–62.
  11. Hüllermeier, E., Fürnkranz, J., Cheng, W., und Brinker K. (2008). Label Ranking by Learning Pairwise Preferences. Artificial Intelligence, 172(16-17):1897–1916.
A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang