Here's my Google Scholar.
Jump to selected publications or workshops.
* (asterisk)- signals authors contributed equally.

Selected Publications

SEQUEL: Semi-Supervised Preference-based RL with Query Synthesis via Latent Interpolation

Daniel Marta*, Simon Holk*, Christian Pek, Jana Tumova, and Iolanda Leite
ICRA'24

By utilizing semi-supervised learning we can improve the sample-efficiency by augmenting the preferences provided by humans. We train a VAE to learn a latent representation and thereafter synthesise new queries from existing ones by interpolating the trajectory pairs. The idea is that if a user prefers traj A over traj B, even if they are a bit closer (like 10%) they would still prefer it. This helps the feedback generalize better.

POLITE: Preferences Combined with Highlights in Reinforcement Learning

Simon Holk, Daniel Marta, and Iolanda Leite
ICRA'24 - Nominated for best HRI paper, best student paper, and best conference paper!

This paper aims to improve the granularity of preference-based feedback by adding temporal trajectory segmentation to highlight positive and negative parts. This helps avoid the uniform assignment of responsibility across the whole trajectory while it might just be some part that was actually preferred. The highlights are then optimized as an auxiliary task ensuring a improved shared representation.

PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning

Simon Holk*, Daniel Marta*, and Iolanda Leite
HRI'24

This paper aims to utilize the zero-shot capabilities of LLM in order to increase the granularity of preference-based feedback by extending my previous work. By letting the user give optional auxiliary textual descriptions of their preference we ensure that the reward function align with what the human actually wants as well as improve the sample-efficiency by avoiding over-sampling.

VARIQuery: VAE Segment-based Active Learning for Query Selection in Preference-based Reinforcement Learning

Daniel Marta*, Simon Holk*, Christian Pek, Jana Tumova, and Iolanda Leite
IROS'23

This paper addresses the often-overlooked aspect of query selection, which is closely related to active learning (AL). We propose a novel query selection approach that leverages variational autoencoder (VAE) representations of state sequences. In this manner, we formulate queries that are diverse in nature while simultaneously taking into account reward model estimations.

Aligning Human Preferences with Baseline Objectives in Reinforcement Learning

Daniel Marta, Simon Holk, Christian Pek, Jana Tumova, and Iolanda Leite
ICRA'23

By considering baseline objectives to be designed beforehand, we are able to narrow down the policy space, solely requesting human attention when their input matters the most. To allow for control over the optimization of different objectives, our approach contemplates a multi-objective setting. We achieve human-compliant policies by sequentially training an optimal policy from a baseline specification and collecting queries on pairs of trajectories.

Human-feedback shield synthesis for perceived safety in deep reinforcement learning

Daniel Marta, Christian Pek, Gaspar I. Melsion, Jana Tumova, and Iolanda Leite
RAL'21

To obtain policies that are perceived as safe, we propose a shield synthesis framework with two distinct loops: (1) an inner loop that trains policies with a set of actions that is constrained by shields whose conservativeness is parameterized, and (2) an outer loop that presents example rollouts of the policy to humans and collects their feedback to update the parameters of the shields in the inner loop.

Workshops

Human-Centric Autonomous Systems With LLMs for User Command Reasoning

Yi Yang, Qingwen Zhang, Ci Li, Daniel Marta, Nazre Batool, John Folkesson
WACV LLVM-AD Workshop'24 - Best student paper award!

To ensure that the autonomous system meets the user’s intent, it is essential to accurately discern and interpret user commands, especially in complex or emergency situations. To this end, we propose to leverage the reasoning capabilities of Large Language Models (LLMs) to infer system requirements from in-cabin users’ commands. Through a series of experiments that include different LLM models and prompt designs, we explore the few-shot multivariate binary classification accuracy of system requirements from natural language textual commands.