publications

Selected research on uncertainty-aware vision-language-action models, reinforcement learning, human feedback, safety, multimodal alignment, and human-centered autonomy.

Latest profile: Google Scholar. * indicates equal contribution.

selected publications

arXiv 2026

2026

Uncertainty Quantification for Flow-Based Vision-Language-Action Models

Ralf Römer, Maximilian Seeliger, Saida Liu, Ben Sturgis, Marco Bagatella, Daniel Marta, Andreas Krause, and Angela P. Schoellig

arXiv preprint arXiv:2606.18043, 2026

Introduces Velocity-Field Disagreement (VFD) for epistemic uncertainty in flow-based VLAs and SAVE, an uncertainty-guided active fine-tuning framework that needs at least 22% fewer expert demonstrations than baselines.

Paper Website
ICRA 2026

2026

MOSAIC: Multi-objective Optimization from Zero-Shot Language Reasoning in Preference-based RL

Daniel Marta*, Simon Holk*, and Iolanda Leite

In IEEE International Conference on Robotics and Automation (ICRA), 2026

Reframes preference-based RL as multi-objective learning: language explanations are parsed into objective-specific labels, weights, and highlights for scalarized policy optimization.

Paper Project
ICML + SPOT 2026

2026

Reinforcement Learning via Self-Distillation

Jonas Hübotter, Frederike Lübeck*, Lejs Behric*, Anton Baumann*, Marco Bagatella, Daniel Marta, Ido Hakimi, Idan Shenfeld, Thomas Kleine Buening, Carlos Guestrin, and Andreas Krause

In International Conference on Machine Learning (ICML), 2026; also accepted to the ICLR 2026 Workshop on Scaling Post-training for LLMs (SPOT)

Introduces Self-Distillation Policy Optimization (SDPO), converting rich environment feedback into dense self-distillation signals for more sample-efficient RL with language models.

Paper ICML SPOT Website Code
ICRA 2024

2024

SEQUEL: Semi-Supervised Preference-based RL with Query Synthesis via Latent Interpolation

Daniel Marta*, Simon Holk*, Christian Pek, Jana Tumova, and Iolanda Leite

In IEEE International Conference on Robotics and Automation, 2024

Improves preference-learning sample efficiency by augmenting human feedback with synthesized preference queries from latent interpolation.

Paper
ICRA 2024

2024

POLITE: Preferences Combined with Highlights in Reinforcement Learning

Simon Holk, Daniel Marta, and Iolanda Leite

In IEEE International Conference on Robotics and Automation, 2024

Combines preference feedback with temporal highlights to improve granularity and representation learning. Nominated for Best HRI Paper, Best Student Paper, and Best Conference Paper.

Paper
HRI 2024

2024

PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning

Simon Holk*, Daniel Marta*, and Iolanda Leite

In ACM/IEEE International Conference on Human-Robot Interaction, 2024

Uses zero-shot language-model reasoning over optional textual descriptions to align learned rewards with human preferences.

Paper ACM
WACV 2024

2024

Human-Centric Autonomous Systems With LLMs for User Command Reasoning

Yi Yang, Qingwen Zhang, Ci Li, Daniel Marta, Nazre Batool, John Folkesson

In WACV LLVM-AD Workshop, 2024

Explores few-shot LLM reasoning for inferring autonomous-system requirements from in-cabin natural-language commands. Best Student Paper Award.

Paper
IROS 2023

2023

VARIQuery: VAE Segment-based Active Learning for Query Selection in Preference-based Reinforcement Learning

Daniel Marta*, Simon Holk*, Christian Pek, Jana Tumova, and Iolanda Leite

In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2023

Proposes a VAE-based active-learning strategy for diverse and informative preference-query selection.

Paper
ICRA 2023

2023

Aligning Human Preferences with Baseline Objectives in Reinforcement Learning

Daniel Marta, Simon Holk, Christian Pek, Jana Tumova, and Iolanda Leite

In IEEE International Conference on Robotics and Automation, 2023

Narrows policy search with baseline objectives and requests human feedback when preferences matter most.

Paper
RA-L 2021

2021

Human-feedback shield synthesis for perceived safety in deep reinforcement learning

Daniel Marta, Christian Pek, Gaspar I. Melsion, Jana Tumova, and Iolanda Leite

In IEEE Robotics and Automation Letters, 2021

Learns shield parameters from human feedback to obtain policies perceived as safe.

Paper

awards & recognition

ICRA 2024: POLITE nominated for Best HRI Paper, Best Student Paper, and Best Conference Paper.
WACV 2024 LLVM-AD Workshop: Best Student Paper Award.

selected publications

Uncertainty Quantification for Flow-Based Vision-Language-Action Models

MOSAIC: Multi-objective Optimization from Zero-Shot Language Reasoning in Preference-based RL

Reinforcement Learning via Self-Distillation

SEQUEL: Semi-Supervised Preference-based RL with Query Synthesis via Latent Interpolation

POLITE: Preferences Combined with Highlights in Reinforcement Learning

PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning

Human-Centric Autonomous Systems With LLMs for User Command Reasoning

VARIQuery: VAE Segment-based Active Learning for Query Selection in Preference-based Reinforcement Learning

Aligning Human Preferences with Baseline Objectives in Reinforcement Learning

Human-feedback shield synthesis for perceived safety in deep reinforcement learning

awards & recognition