Tanmay Ambadkar

Tanmay Ambadkar

PhD Student @ Penn State

Tanmay Ambadkar

Hello, I'm

Tanmay Ambadkar.

I build trustworthy AI.

I am a PhD student in Computer Science at Penn State, advised by Dr. Abhinav Verma. My research focuses on integrating reinforcement learning with the safety and verification principles of programming languages to create reliable, interpretable, and safe AI systems.

Research Overview

My research is focused on making artificial intelligence more reliable and trustworthy. While reinforcement learning (RL) can train agents to perform incredibly complex tasks, it often struggles with three key challenges: it needs perfectly defined goals, it can behave unsafely, and it does not know how to handle conflicting objectives.

My work tackles these problems by creating frameworks that allow people to guide AI with high-level instructions. I am building systems that can automatically fix imperfect instructions, a "safety shield" that prevents agents from taking dangerous actions, and methods that allow users to balance competing goals on the fly. Ultimately, the goal is to build AI that is not only powerful but also safe, interpretable, and collaborative enough to be deployed in critical real-world scenarios.

Objective 1: Making RL Robust to Imperfect Instructions

The Problem

Manually designing a perfect reward function to guide an RL agent is extremely difficult and a major barrier for non-experts. A slightly flawed specification can lead to completely wrong or unexpected behavior.

My Approach (AutoSpec)

I developed a framework called AutoSpec that allows a user to provide an initial, high-level, and potentially imperfect specification. The agent then autonomously refines and corrects this specification during training by identifying and resolving inconsistencies, leading to better task performance without requiring constant human intervention. This makes RL more accessible to domain experts who need the technology but are not RL specialists.

Objective 2: Building a Scalable Safety Shield for RL Agents

The Problem

An RL agent trained to maximize a reward will do so at all costs, potentially violating critical safety constraints. Furthermore, standard safe RL algorithms (like CMDPs) often fail to learn when given only sparse, binary cost signals (e.g., "safe" or "unsafe").

My Approach (SPARKD)

I developed SPARKD, a scalable framework that learns a globally linear model of the environment's complex non-linear dynamics using Deep Koopman Operators. This "lifted" representation allows the shield to use formal methods, specifically weakest precondition calculus, to efficiently analyze the safety of an agent's actions over a finite horizon. If a proposed action could lead to an unsafe state, the shield intervenes with a safe alternative.

My Approach (RAMPS)

Building on this, RAMPS (Robust Adaptive Multi-Step Predictive Shielding) advances the state-of-the-art by introducing a robust, multi-step Control Barrier Function (CBF). This method not only uses a learned linear model but also explicitly accounts for model error and control delays. This allows RAMPS to provide stronger, continuous safety guarantees and has demonstrated over a 90% reduction in safety violations in complex, high-dimensional robotic simulations.

Next Steps (AutoCost)

  • To solve the sparse cost problem, I am developing AutoCost, which extracts a rich, continuous cost signal directly from these safety shields. When an agent's proposed action is unsafe, the safety Quadratic Program (QP) in RAMPS or SPARKD requires a large "slack" value to find a safe alternative. AutoCost captures the magnitude of this slack variable, which directly quantifies how unsafe the proposed action was. This provides a dense, informative cost gradient, enabling non-shielded CMDP agents to learn safe policies far more effectively than with traditional binary costs.

Objective 3: Enabling User-Driven Trade-offs in Multi-Objective RL

The Problem

Real-world applications rarely have a single goal. More often, they involve balancing a set of conflicting objectives, such as maximizing performance while minimizing cost and energy consumption.

My Approach (D³PO)

Decomposed, Diversity-Driven Policy Optimization. My work on the DoD-funded MIXTAPE initiative addresses this with D³PO. This framework trains a single, robust policy capable of generating a wide spectrum of optimal behaviors. This allows a human operator to interactively tune the agent's objective priorities at runtime (without retraining) to adapt its strategy to changing mission requirements.

Next Steps

  • Negative Preferences (Symmetric Scalarization): We are introducing a fundamentally new paradigm where negative preferences can be allowed, enabling an operator to actively penalize an objective. We will explore its impact by designing new environments and metrics and modifying D³PO to support this symmetric instruction space.
  • Non-Linear Scalarization: We are moving from linear combinations of objectives to non-linear scalarization functions. This will allow the agent to learn a richer class of non-convex trade-offs, giving the user a choice between multiple possible scalarization options.

Publications

Robust Adaptive Multi-Step Predictive Shielding

Tanmay Ambadkar, Darshan Chudiwal, Greg Anderson, Abhinav Verma

Accepted at International Conference on Learning Representations (ICLR) 2026; Accepted at AAAI Student Abstract and Poster Program

AutoSpec: Automating the Refinement of Reinforcement Learning Specifications

Tanmay Ambadkar, Đorđe Žikelić, Abhinav Verma

Accepted at International Conference on Learning Representations (ICLR) 2026; Accepted at The Workshop on Post-AI Formal Methods at AAAI-26; Accepted at PLDI SRC 2024

Specification Guided Reinforcement Learning

Tanmay Ambadkar

Accepted at AAAI Doctoral Consortium

Preference Conditioned Multi-Objective Reinforcement Learning: Decomposed, Diversity-Driven Policy Optimization

Tanmay Ambadkar, Sourav Panda, Shreyas Kale, Abhinav Verma, Jonathan Dodge

In Submission at ICML 2026

Safer Policies via Affine Representations using Koopman Dynamics

Tanmay Ambadkar, Darshan Chudiwal, Greg Anderson, Abhinav Verma

In Submission at NASA Formal Methods 2026

Scaling Strategy, Not Compute: A Stand-Alone, Open-Source StarCraft II Benchmark for Accessible RL Research

Sourav Panda, Tanmay Ambadkar, Shreyas Kale, Abhinav Verma, Jonathan Dodge

In Submission at ICML 2026

MIXTAPE: Middleware for Interactive XAI with Tree-Based AI Performance Evaluation

Tanmay Ambadkar, Hayden Moore, Sourav Panda, Shreyash Kale, Connor Greenwell, Brianna Major, Aashish Chaudhary, Jonathan Dodge, Abhinav Verma and Brian Hu

Simulation Interoperability Standards Organization (SISO) SIMposium, 2025

MIXTAPE: Middleware for Interactive XAI with Tree-Based AI Performance Evaluation

Brian Hu, Jonathan Dodge, Abhinav Verma, Tanmay Ambadkar, Sourav Panda, Sujay Koujalgi, Aashish Chaudhary, Brianna Major, and Bryon Lewis

Simulation Interoperability Standards Organization (SISO) SIMposium, 2024

Optimizing Operational Costs in Combined Heat and Power Integrated District Heating Systems: A Reinforcement Learning Approach

Saranya Anbarasu, Tanmay Ambadkar, Rosina Adhikari, Kathryn Hinkelman, Zhanwei He, Wangda Zuo, Ardeshir Moftakhari

SimBuild, 2024

A Simple Fast Resource-efficient Deep Learning for Automatic Image Colorization

Tanmay Ambadkar, Jignesh S. Bhatt

Color and Imaging Conference (CIC), 2023

Discrete Sequencing for Demand Forecasting: A novel data sampling technique for time series forecasting

N. Menon, S. Saboo, T. Ambadkar and U. Uppili

International Conference on Intelligent Data Science Technologies and Applications (IDSTA), 2022

Deep reinforcement learning approach to predict head movement in 360° videos

Tanmay Ambadkar, Pramit Mazumdar

Proc. IS&T Int’l. Symp. on Electronic Imaging: Image Processing: Algorithms and Systems, 2022

Education

The Pennsylvania State University

Ph.D in Computer Science and Engineering

Jan 2024 - May 2027

GPA: 3.56

The Pennsylvania State University

M.S. in Computer Science and Engineering

Aug 2022 - Dec 2023

GPA: 3.9

Indian Institute of Information Technology, Vadodara

B.Tech in Computer Science and Engineering

Aug 2018 - May 2022

GPA: 9.4

Work Experience

Research Assistant

Dept. of Architectural Engineering, Penn State

Aug 2023 - Jan 2025
  • Integrated Dymola simulation tools with Gymnasium, defining a robust multi-objective reward system.
  • Trained multiple RL agents for energy optimization.

Research Assistant

Dept. of Industrial and Manufacturing Engineering, Penn State

May 2023 - July 2023
  • Predicted Autism Spectrum Disorder in children using EHR data and time-series models.
  • Preprocessed over 500GB of data using PySpark for 600,000+ patients.

Research & Digitization Automation Intern

Siemens Technology and Services

Jan 2022 - July 2022
  • Detected anomalies in data using AutoEncoders and SHAP for explainability.
  • Developed a plug-and-play library for time-series anomaly detection.
  • Performed EDA on Starbucks coffee roaster data to predict burner cuts.

Research & Digitization Automation Intern

Siemens Technology and Services

May 2021 - July 2021
  • Integrated workflow creation and management using Celery, Redis, and Flask for the Industrial Predictive Analytics Engine (IPAE).
  • Reduced pipeline execution time by 30%.

Teaching Assistant

The Pennsylvania State University

Various Semesters
  • CMPEN 270: Digital Design: Theory and Practice
  • CMPSC 221: Object Oriented Design & Web Programming
  • CMPSC 448: Machine Learning and Algorithmic AI

Projects

Shared Critic PPO for Building Control

Developed a shared-critic PPO algorithm optimizing energy and comfort for buildings using multiple agents in a Sinergym environment.

Robust and Secure Deep Learning

Designed test-time evasion attacks on MNIST and implemented dataset poisoning with imperceptible patterns.

Deep Learning for Gravitational Lensing

Solved dark matter related tasks using CNNs and Vision Transformers for classification, denoising, and super-resolution.

Lead Scoring System

Created a logistic regression model and lead rating system for marketing teams to identify potential customers.

Certificate Generator

Developed a system for mass-mailing and verifying event certificates for IIIT Vadodara.

Gymkhana Website

Developed the student gymkhana website with role management and event calendars.