
Hello, I'm
Tanmay Ambadkar.
I build trustworthy AI.
I am a PhD student in Computer Science at Penn State, advised by Dr. Abhinav Verma. My research focuses on integrating reinforcement learning with the safety and verification principles of programming languages to create reliable, interpretable, and safe AI systems.
Research Overview
My research is focused on making artificial intelligence more reliable and trustworthy. While reinforcement learning (RL) can train agents to perform incredibly complex tasks, it often struggles with three key challenges: it needs perfectly defined goals, it can behave unsafely, and it does not know how to handle conflicting objectives.
My work tackles these problems by creating frameworks that allow people to guide AI with high-level instructions. I am building systems that can automatically fix imperfect instructions, a "safety shield" that prevents agents from taking dangerous actions, and methods that allow users to balance competing goals on the fly. Ultimately, the goal is to build AI that is not only powerful but also safe, interpretable, and collaborative enough to be deployed in critical real-world scenarios.
Objective 1: Making RL Robust to Imperfect Instructions
The Problem
My Approach (AutoSpec)
Objective 2: Building a Scalable Safety Shield for RL Agents
The Problem
My Approach (SPARKD)
My Approach (RAMPS)
Next Steps (AutoCost)
- To solve the sparse cost problem, I am developing AutoCost, which extracts a rich, continuous cost signal directly from these safety shields. When an agent's proposed action is unsafe, the safety Quadratic Program (QP) in RAMPS or SPARKD requires a large "slack" value to find a safe alternative. AutoCost captures the magnitude of this slack variable, which directly quantifies how unsafe the proposed action was. This provides a dense, informative cost gradient, enabling non-shielded CMDP agents to learn safe policies far more effectively than with traditional binary costs.
Objective 3: Enabling User-Driven Trade-offs in Multi-Objective RL
The Problem
My Approach (D³PO)
Next Steps
- Negative Preferences (Symmetric Scalarization): We are introducing a fundamentally new paradigm where negative preferences can be allowed, enabling an operator to actively penalize an objective. We will explore its impact by designing new environments and metrics and modifying D³PO to support this symmetric instruction space.
- Non-Linear Scalarization: We are moving from linear combinations of objectives to non-linear scalarization functions. This will allow the agent to learn a richer class of non-convex trade-offs, giving the user a choice between multiple possible scalarization options.
Publications
Robust Adaptive Multi-Step Predictive Shielding
Tanmay Ambadkar, Darshan Chudiwal, Greg Anderson, Abhinav Verma
Accepted at International Conference on Learning Representations (ICLR) 2026; Accepted at AAAI Student Abstract and Poster Program
AutoSpec: Automating the Refinement of Reinforcement Learning Specifications
Tanmay Ambadkar, Đorđe Žikelić, Abhinav Verma
Accepted at International Conference on Learning Representations (ICLR) 2026; Accepted at The Workshop on Post-AI Formal Methods at AAAI-26; Accepted at PLDI SRC 2024
Specification Guided Reinforcement Learning
Tanmay Ambadkar
Accepted at AAAI Doctoral Consortium
Preference Conditioned Multi-Objective Reinforcement Learning: Decomposed, Diversity-Driven Policy Optimization
Tanmay Ambadkar, Sourav Panda, Shreyas Kale, Abhinav Verma, Jonathan Dodge
In Submission at ICML 2026
Safer Policies via Affine Representations using Koopman Dynamics
Tanmay Ambadkar, Darshan Chudiwal, Greg Anderson, Abhinav Verma
In Submission at NASA Formal Methods 2026
Scaling Strategy, Not Compute: A Stand-Alone, Open-Source StarCraft II Benchmark for Accessible RL Research
Sourav Panda, Tanmay Ambadkar, Shreyas Kale, Abhinav Verma, Jonathan Dodge
In Submission at ICML 2026
MIXTAPE: Middleware for Interactive XAI with Tree-Based AI Performance Evaluation
Tanmay Ambadkar, Hayden Moore, Sourav Panda, Shreyash Kale, Connor Greenwell, Brianna Major, Aashish Chaudhary, Jonathan Dodge, Abhinav Verma and Brian Hu
Simulation Interoperability Standards Organization (SISO) SIMposium, 2025
MIXTAPE: Middleware for Interactive XAI with Tree-Based AI Performance Evaluation
Brian Hu, Jonathan Dodge, Abhinav Verma, Tanmay Ambadkar, Sourav Panda, Sujay Koujalgi, Aashish Chaudhary, Brianna Major, and Bryon Lewis
Simulation Interoperability Standards Organization (SISO) SIMposium, 2024
Optimizing Operational Costs in Combined Heat and Power Integrated District Heating Systems: A Reinforcement Learning Approach
Saranya Anbarasu, Tanmay Ambadkar, Rosina Adhikari, Kathryn Hinkelman, Zhanwei He, Wangda Zuo, Ardeshir Moftakhari
SimBuild, 2024
A Simple Fast Resource-efficient Deep Learning for Automatic Image Colorization
Tanmay Ambadkar, Jignesh S. Bhatt
Color and Imaging Conference (CIC), 2023
Discrete Sequencing for Demand Forecasting: A novel data sampling technique for time series forecasting
N. Menon, S. Saboo, T. Ambadkar and U. Uppili
International Conference on Intelligent Data Science Technologies and Applications (IDSTA), 2022
Education
The Pennsylvania State University
Ph.D in Computer Science and Engineering
GPA: 3.56
The Pennsylvania State University
M.S. in Computer Science and Engineering
GPA: 3.9
Indian Institute of Information Technology, Vadodara
B.Tech in Computer Science and Engineering
GPA: 9.4
Work Experience
Research Assistant
Dept. of Architectural Engineering, Penn State
- •Integrated Dymola simulation tools with Gymnasium, defining a robust multi-objective reward system.
- •Trained multiple RL agents for energy optimization.
Research Assistant
Dept. of Industrial and Manufacturing Engineering, Penn State
- •Predicted Autism Spectrum Disorder in children using EHR data and time-series models.
- •Preprocessed over 500GB of data using PySpark for 600,000+ patients.
Research & Digitization Automation Intern
Siemens Technology and Services
- •Detected anomalies in data using AutoEncoders and SHAP for explainability.
- •Developed a plug-and-play library for time-series anomaly detection.
- •Performed EDA on Starbucks coffee roaster data to predict burner cuts.
Research & Digitization Automation Intern
Siemens Technology and Services
- •Integrated workflow creation and management using Celery, Redis, and Flask for the Industrial Predictive Analytics Engine (IPAE).
- •Reduced pipeline execution time by 30%.
Teaching Assistant
The Pennsylvania State University
- •CMPEN 270: Digital Design: Theory and Practice
- •CMPSC 221: Object Oriented Design & Web Programming
- •CMPSC 448: Machine Learning and Algorithmic AI
Projects
Shared Critic PPO for Building Control
Developed a shared-critic PPO algorithm optimizing energy and comfort for buildings using multiple agents in a Sinergym environment.
Robust and Secure Deep Learning
Designed test-time evasion attacks on MNIST and implemented dataset poisoning with imperceptible patterns.
Deep Learning for Gravitational Lensing
Solved dark matter related tasks using CNNs and Vision Transformers for classification, denoising, and super-resolution.
Lead Scoring System
Created a logistic regression model and lead rating system for marketing teams to identify potential customers.
Certificate Generator
Developed a system for mass-mailing and verifying event certificates for IIIT Vadodara.