Guiding Theme B3: Data-driven paraphrasing and harmonization of language style

When compiling a summary from heterogeneous sources, it must be homogenized with respect to style, genre, and text quality. For example, an executive summary requires different vocabulary and style than a topical survey for internal use in a research group. In this guiding theme, we will be concerned with using paraphrasing techniques and language modelling to measure and unify stylistic characteristics. We will implement a writing aid that learns through explicit feedback and user interaction.
A further subject of research is the generation of statistical unsupervised features for discourse processing and the induction of lexical resources from large unannotated corpora. We expect a higher user acceptance of summaries that are targeted for specific genres, as opposed to approaches that ignore genre information.

Poster (in German)

Example thesis topics

  • Machine learning for interactive writing aids
  • Data-driven paraphrasing for style homogenization
  • Adaptive generation of semantic information from domain-specific corpora


  • Remus, S., and Biemann, C. (2013): Three Knowledge-Free Methods for Automatic Lexical Chain Extraction. Proceedings of NAACL-2013, Atlanta, GA, USA (pdf)
  • Szarvas, G., Biemann, C., and Gurevych, G. (2013): Supervised All-Words Lexical Substitution using Delexicalized Features. Proceedings of NAACL-2013, Atlanta, GA, USA (pdf)
  • Biemann, C. (2011): Structure Discovery in Natural Language. In G. Hirst, E. Hovy and M. Johnson (Series Eds.): Theory and Applications of Natural Language Processing, Springer Heidelberg Dordrecht London New York ( eBook link)


A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang