Dennis Aumiller

Hi, I am Dennis!

I am a researcher interested in

Dataset Curation

Text Summarization

Text Simplification

Python

Huggingface

Natural Language Processing

Professional Experience

Member of Technical Staff

Cohere
Philadelphia, USA (remote)

Sep 2023 - current

As part of the data and evaluation team, my main objectives are to improve Cohere’s primary generative model through a series of automated data transformations, as well as improvements to their evaluation harness.

Applied Scientist Intern

Amazon Research
Berlin, Germany

Aug 2021 - Dec 2021

Situated within the organization of Amazon Search, I was part of a newly formed team investigating the applicability of multilingual NLP solutions for search scenarios.

Responsibilities:

Investigated sequential recommendation systems for customer query suggestions
Built dataset and implemented a training/evaluation pipeline for neural search query generation
Found flaws for tail queries in the existing live system and reported preliminary improvements in recall with my own sequential recommender tool

Software Engineer (part-time)

Codefy GmbH
Heidelberg, Germany

Sept 2019 - Jan 2021

During my PhD, I was involved in a local start-up, building a document search engine for lawyers.

Responsibilities:

Built backend for product prototype, helping the company secure 200.000€ in seed funding
Primary lead on document processing, developing a pipeline suitable for diverse document types
Optimized document database operators, improving ingestion time by over 30%
Built prototype for unsupervised keyphrase extraction module on legal documents

Software Engineering Intern

SAP SE
Walldorf, Germany

June 2018 - Sept 2018

At SAP, I was part of the Product & Innovations team, working on cloud-based ML solutions.

Responsibilities:

Optimization of Machine Learning operations with randomized algorithms
Achieved up to 1000x speed-up while maintaining comparable accuracy
Extensively benchmarked solutions to highlight performance discrepancies in existing tools

Heidelberg University

Heidelberg, Germany

Feb 2015 - Mar 2019

Throughout my studies, I worked at different groups within the Institute of Computer Science.

Teaching Assistant

Oct 2018 - Mar 2019

Teaching Assistant for the graduate lecture “Complex Network Analysis”
Additionally, helped design the assignments and final exam

Teaching Assistant

Apr 2016 - Sept 2017

Teaching Assistant for the lectures “Databases 1” in Summer 2016 and summer 2017
Lecture Assistant for the graduate course “Computer Graphics” in Winter 2016/17
Prepared and held weekly tutorial sessions for students and graded assignments

Student Assistant

Feb 2015 - Mar 2016

Tasked with the integration of a group’s website into the new corporate identity template

Education

		Database Systems Research Group, Heidelberg University June 2019 - exp. 2023 PhD in Computer Science; Supervised by: Prof. Dr. Michael Gertz Focus Area: Text Summarization and NLP Publications See the Publications section
		Heidelberg University Aug 2017 - May 2019 M.Sc. Applied Computer Science German GPA: 1.0 (with distinction; equiv. GPA: 4.0) Minor: Computational Linguistics Focus Area: NLP and Network Analysis Thesis: "Implementation of a Relational Document Hypergraph for Information Retrieval"; Grade: 1.0 (with distinction)
		University of Toronto Sept 2017 - Apr 2018 Exchange Year, Computer Science Program CGPA: 3.95 out of 4.0 Focus Area: Machine Learning and Algorithmic Game Theory Extracurricular Activities Executive Member, UofT eSports Club Executive Member, Undergraduate AI Group
		Heidelberg University Oct 2013 - Aug 2017 B.Sc. Applied Computer Science German GPA: 1.4 (equiv. GPA: 3.6) Minor: Computational Linguistics Focus Area: Computer Graphics and Visualization Thesis: "Mining Relation Networks from University Websites"; Grade: 1.0 (with distinction)

Publications

Evaluating Factual Consistency of Texts with Semantic Role Labeling

*SEM @ ACL 2023 July 2023

Jing Fan* Dennis Aumiller* Michael Gertz

Automatically checking whether generated text is factually consistent with an input segment is still challenging. We present a method that goes against the recent trend of directly utilizing large language models to evaluate factuality, and instead propose a more linguistically grounded approach, based on Semantic Role Labels.

Text Summarization NLP

Details

On the State of German (Abstractive) Text Summarization

BTW 2023 Mar 2023

Dennis Aumiller Jing Fan Michael Gertz

Relative to the English NLP community, we find that the quality of German summarization datasets (and models) is heavily lacking; oftentimes, not even basic filtering criteria are respected when training and evaluating systems.

Text Summarization NLP

Details

UniHD at TSAR-2022 Shared Task: Is Compute All We Need for Lexical Simplification?

TSAR @ EMNLP 2022 Dec 2022

Dennis Aumiller Michael Gertz

We describe our winning submission to the shared task on lexical simplification. In principle, we extract structured predictions from GPT-3 generations, and introduce a novel way of aggregating predictions across multiple prompt templates to increase result coverage.

Text Simplification NLP Large Language Models

Details

EUR-Lex-Sum: A Multi-and Cross-lingual Dataset for Long-form Summarization in the Legal Domain

EMNLP 2022 Dec 2022

Dennis Aumiller* Ashish Chouhan* Michael Gertz

This work introduces a highly multilingual summarization corpus, available in all of the 24 official languages of the European Union. It is based on legal acts published by the EU, and consists of extremely long documents in the legal domain.

Text Summarization NLP

Details

Online DATEing: A Web Interface for Temporal Annotations

SIGIR 2022 July 2022

Dennis Aumiller* Satya Almasian* David Pohl Michael Gertz

We present an interactive interface, unifying the access to several temporal annotation frameworks. Aside from a graphical interface, we allow users to programmatically access the various tools through a streamlined API interface.

Temporal Tagging NLP

Details

Klexikon: A German Dataset for Joint Summarization and Simplification

LREC 2022 June 2022

Dennis Aumiller Michael Gertz

We present a high-quality resource of full-text alignments between German Wikipedia, and a German children’s encyclopedia. This yields a dataset that we empirically show to be suited for both summarization and simplification tasks.

Text Summarization Text Simplification NLP

Details

Time for some German? Pre-Training a Transformer-based Temporal Tagger for German

Text2Story @ ECIR 2022 Apr 2022

Satya Almasian* Dennis Aumiller* Michael Gertz

Following a series of prior experiments in English, we outline a generalized training procedure for training a non-English temporal tagger with weakly supervised data. For this purpose, we utilize existing rule-based taggers as a way to scale up existing training resources in low-resource settings by several orders of magnitude.

Temporal Tagging NLP

Details

Deep Learning und Legal Tech - Eine Bestandsaufnahme

Legal Tech Zeitschrift (LTZ) 01/2022 Mar 2022

Michael Gertz Dennis Aumiller

In this (German) article, we outline the challenges that are currently preventing mainstream adoption of recent NLP advancements in the legal industry. Primarily, this can be attributed to a lack of proper domain generalization, as well as limited interpretability and scalability of such models.

Legal Tech NLP

BERT got a Date: Introducing Transformers to Temporal Tagging

arXiv Sept 2021

Satya Almasian* Dennis Aumiller* Michael Gertz

We experimented with various transformer-based architectures to see which ones would work best for extracting temporal annotations, such as ‘yesterday’ or ’every week’. However, we have since found a significant flaw in our evaluation setup for seq2seq-based models, so we decided to retract this article. Resulting tagging-based models are still valid, though, and are available online.

Temporal Tagging NLP

Structural Text Segmentation of Legal Documents

ICAIL 2021 June 2021

Dennis Aumiller* Satya Almasian* Sebastian Lackner Michael Gertz

Utilizing existing segmentation tools, which primarily operate on sentence-level granularity, yields poor performance when segmenting long documents, which are prevalent in a legal context. In this work, we address the issue by proposing a weakly-supervised paragraph-based segmenter, which we empirically show on a novel dataset consisting of web Terms of Service documents.

Legal Tech Text Summarization NLP

Details

UniHD @ CL-SciSumm 2020: Citation Extraction as Search

SDP @ EMNLP 2020 Nov 2020

Dennis Aumiller* Satya Almasian* Philip Hausner* Michael Gertz

We participated in the workshop’s shared task on extracting relevant paper sections in cited works. Interestingly, we show that our setup based on traditional search heuristics, coupled with improved pre-processing steps, outperforms our BERT-based retrieval setup. Overall, we placed third on the blind shared task test set.

Information Retrieval NLP

Details

TiCCo: Time-Centric Content Exploration

CIKM 2020 Oct 2020

Philip Hausner* Dennis Aumiller* Michael Gertz

This demonstration illustrates a time-centric approach to content exploration. Extracting and processing temporal mentions on large document collections allows a temporal expression of events, even when the documents themselves are not ordered chronologically.

Information Retrieval Temporal Tagging NLP

Details

A Versatile Hypergraph Model for Document Collections

SSDBM 2020 July 2020

Andreas Spitz Dennis Aumiller Bálint Soproni Michael Gertz

Results from my Master’s thesis contributed the experimental section of this work. Primarly, we present a theoretical retrieval model based on hypergraphs, and demonstrate that these operations can be utilized to perform common Information Retrieval operations more efficiently on co-occurrence based word networks than traditional dyadic graphs.

Information Retrieval NLP

Details

Time-centric Exploration of Court Documents

Text2Story @ ECIR 2020 Apr 2020

Philip Hausner Dennis Aumiller Michael Gertz

Ordering events in a chronological fashion requires a accurate modeling of the temporal hierarchy, which previously was not well-defined for long-term event horizons spanning several decades. Here, we present a temporal model that is shown to work well, even without explicit temporal ordering in underlying document collections.

Information Retrieval Temporal Tagging NLP

Details

DNA accessibility of chromatosomes quantified by automated image analysis of AFM data

Scientific Reports Sept 2019

Martin Würtz Dennis Aumiller Lina Gundelwein Philipp Jung Christian Schütz Kathrin Lehmann Katalin Tóth Karl Rohr

In a collaboration with molecular biosciences, we developed an automated Image Processing pipeline that was able to speed up the annotation process and accuracy for determining lengths of chromatosome strands in AFM images. In this work, it was ultimately shown how mutations in a particular gene can cause different winding patterns.

Biology

Details

Hi, I am Dennis!

Dennis Aumiller

Member of Technical Staff at Cohere

Skills

Python

Machine Learning

Huggingface Transformers

Pytorch

Public Speaking

Mentorship & Advising

Recent Posts

Reflections on Reaching 1 Million People on Stackoverflow

Filing for a Spousal Green Card from Abroad, An Experience Report

Discovery of the New Cohere Summarization Endpoint

Professional Experience

Member of Technical Staff

Cohere Philadelphia, USA (remote)

Applied Scientist Intern

Amazon Research Berlin, Germany

Responsibilities:

Software Engineer (part-time)

Codefy GmbH Heidelberg, Germany

Responsibilities:

Software Engineering Intern

SAP SE Walldorf, Germany

Responsibilities:

Heidelberg University

Heidelberg, Germany

Teaching Assistant

Teaching Assistant

Student Assistant

Education

Database Systems Research Group, Heidelberg University

PhD in Computer Science; Supervised by: Prof. Dr. Michael Gertz

Focus Area: Text Summarization and NLP

Publications

Heidelberg University

M.Sc. Applied Computer Science

German GPA: 1.0 (with distinction; equiv. GPA: 4.0)

Minor: Computational Linguistics

Focus Area: NLP and Network Analysis

Thesis: "Implementation of a Relational Document Hypergraph for Information Retrieval"; Grade: 1.0 (with distinction)

University of Toronto

Exchange Year, Computer Science Program

CGPA: 3.95 out of 4.0

Focus Area: Machine Learning and Algorithmic Game Theory

Extracurricular Activities

Heidelberg University

B.Sc. Applied Computer Science

German GPA: 1.4 (equiv. GPA: 3.6)

Minor: Computational Linguistics

Focus Area: Computer Graphics and Visualization

Thesis: "Mining Relation Networks from University Websites"; Grade: 1.0 (with distinction)

Publications

Evaluating Factual Consistency of Texts with Semantic Role Labeling

On the State of German (Abstractive) Text Summarization

UniHD at TSAR-2022 Shared Task: Is Compute All We Need for Lexical Simplification?

EUR-Lex-Sum: A Multi-and Cross-lingual Dataset for Long-form Summarization in the Legal Domain

Online DATEing: A Web Interface for Temporal Annotations

Klexikon: A German Dataset for Joint Summarization and Simplification

Time for some German? Pre-Training a Transformer-based Temporal Tagger for German

Deep Learning und Legal Tech - Eine Bestandsaufnahme

BERT got a Date: Introducing Transformers to Temporal Tagging

Structural Text Segmentation of Legal Documents

UniHD @ CL-SciSumm 2020: Citation Extraction as Search

TiCCo: Time-Centric Content Exploration

A Versatile Hypergraph Model for Document Collections

Time-centric Exploration of Court Documents

DNA accessibility of chromatosomes quantified by automated image analysis of AFM data

Cohere
Philadelphia, USA (remote)

Amazon Research
Berlin, Germany

Codefy GmbH
Heidelberg, Germany

SAP SE
Walldorf, Germany