Guiding Theme D2: Manual and Automatic Quality Assessment of Summaries from Heterogeneous Sources

Guiding theme D2 conducts research on criteria and novel evaluation methods of assessing the results of summarization. Previous methods for manual quality assessment of summaries are time-consuming and require multiple people to read and judge the summaries. These methods involve Likert-scale evaluation or content-based evaluations such as the PYRAMID method. Additionally, several automatic methods have been developed. ROUGE for example evaluates summaries based on n-gram overlap with reference summaries, but this requires manually created summaries for comparison. Additionally, ROUGE benefits from having several reference summaries, which increases the effort involved in applying this method. Methods that do not rely on reference summaries have also been suggested, but so far they did not cover certain quality criteria such as readability or linguistic quality. 

A PhD project in this area will therefore focus on creating a comprehensive framework for the evaluation of summaries. The framework should include a wide range of criteria, including the linguistic criteria and the indicators derived from the user interaction with the system.The quality assessment framework will be developed in close collaboration with guiding theme D1: Modeling Information quality in Online Scenarios. The student will also closely collaborate with practitioners of the field in the domain of online journalism to formalize the quality criteria. A very recent paper on the topic of quality assessment without reference summaries is Louis, A. und Nenkova, A. (2013). Automatically Assessing Machine Summary Content Without a Gold Standard. Computational Linguistics, 39(2):267–300.

Finally, the guiding theme D2 will work closely will other AIPHES members to create a reference corpus for evaluation and to further enhance research in the Natural Language Processing specifically for German data.

Poster (in German)

Example thesis topics

  • Multi-Document Summarization: Manual and automatic assessment of quality
  • Task- and user-based assessment of quality of automatically created multi-document summaries
  • Language and genre independent evaluation metrics without reference summaries


Louis, A. und Nenkova, A. (2013). Automatically Assessing Machine Summary Content Without a Gold Standard. Computational Linguistics, 39(2):267–300

Lucie Flekova, Oliver Ferschke and Iryna Gurevych. 2014. What Makes a Good Biography? Multidimensional Quality Analysis Based on Wikipedia Article Feedback Data. In: Proceedings of the 23rd International World Wide Web Conference (WWW 2014), pp. 855-866, April 2014


A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang