AutoSpec

Automating the Refinement of Reinforcement Learning Specifications

Tanmay Ambadkar, Đorđe Žikelić, Abhinav Verma (Accepted at ICLR 2026)


A novel framework that automatically transforms "coarse" or under-specified logical objectives into refined, guidance-rich specifications, enabling RL agents to master complex tasks where standard methods fail.

The "Specification Gap"

Why do concise specifications fail?

Reinforcement Learning (RL) has achieved remarkable feats, but specifying what an agent should do is challenging. Manually designing scalar reward functions is an art form, and slight flaws can lead to poor behavior.

Logical specifications (like "Reach Goal while Avoiding Obstacles") offer a promising, interpretable alternative. However, humans tend to write "coarse" specifications. For example, "Reach the Kitchen" is a valid goal, but if the kitchen is down a winding hallway with trap states (like a staircase), a standard RL agent will struggle to discover the path using only the sparse feedback from the coarse specification.

Trap States: Coarse regions may overlap with unrecoverable states.

Lack of Waypoints: Long-horizon tasks are difficult without intermediate sub-goals.

Overly Broad Goals: Large target regions dilute the learning signal.

Specification Refinement Problem

Given an initial specification ϕ\phi, we search for a refined specification ϕr\phi_r such that:

(ζϕr)    (ζϕ)(\zeta \models \phi_r) \implies (\zeta \models \phi)

"Satisfaction of the refined spec guarantees satisfaction of the original, but ϕr\phi_r is easier to learn."

Our Framework

How AutoSpec Works

AutoSpec acts as a wrapper around specification-guided RL algorithms. It monitors the learning process to identify why a policy fails and autonomously refines the specification graph.

1

Monitor

Track success rates of edge policies in the abstract specification graph.

2

Diagnose

Collect failure and success trajectories when policies underperform.

3

Refine

Apply targeted refinement strategies (SeqRefine, AddRefine, PastRefine, OrRefine).

4

Re-train

Train the policy on the new, easier-to-learn specification.

The Four Pillars of Refinement

AutoSpec employs four targeted procedures to address specific failure modes while maintaining logical soundness.

1

SeqRefine: Refining Predicates

Problem: Target region is too broad or contains trap states.

Solution: Automatically tightens the bounds of target regions (brb_r) and safety constraints (crc_r) using convex hulls of successful exploration traces. This effectively shrinks the target to exclude "unreachable" or dangerous areas.

  • Removes "unreachable" parts of goal regions.
  • Identifies and excludes trap states.
Placeholder: SeqRefine Visualization
(Trap State Elimination)
2

AddRefine: Adding Waypoints

Problem: Path is too long for a single policy to learn reliably.

Solution: Decomposes long-horizon tasks by identifying stable "midpoints" in successful trajectories. It splits an edge uvu \to v into umidvu \to mid \to v, creating two shorter, more manageable sub-tasks.

  • Breaks complex paths into learnable segments.
  • Reduces the effective horizon for the RL agent.
Placeholder: AddRefine Visualization
(Waypoint Introduction)
3

PastRefine: Source Partitioning

Problem: Some start states in a region are doomed to fail due to dynamics or obstacles.

Solution: Learns a separating hyperplane (via SVM) between initial states that lead to success and those that fail. It creates a new node for the "good" starts, focusing learning only where success is possible.

  • Focuses learning on viable starting conditions.
  • Improves reliability in stochastic environments.
Placeholder: PastRefine Visualization
(Source Partitioning)
4

OrRefine: Alternative Paths

Problem: The direct path is blocked or infeasible.

Solution: Discovers blocked paths and automatically wires new edges to alternative parent nodes in the specification graph. This enables the agent to backtrack or take entirely different routes (e.g., Path B instead of Path A).

  • Enables dynamic routing around obstacles.
  • Handles complex topology changes.
Placeholder: OrRefine Visualization
(Alternative Path Discovery)

Scalability in Randomized Environments

We evaluated AutoSpec on a challenging "100-Rooms" domain where wall configurations and predicate locations were fully randomized for each seed.

The "Bridge" Bottleneck

In 80% of random seeds, agents got stuck at narrow passages ("bridges") between key regions. Standard methods (like DiRL) often plateau at 20% success rates due to these bottlenecks. AutoSpec autonomously identifies them and deploys targeted refinements (mostly AddRefine and ReachRefine) to boost success rates to over 90%.

Placeholder: 100-Room Randomized Experiment
(Success Probability Comparison Curve)

Demonstrated Impact

90%
Success Rate

Achieved on "Bridge" bottlenecks in randomized 100-Room environments (vs < 20% baseline).

4x
Throughput

Improvement in task completion for high-dimensional robotic manipulation (PandaGym).

100%
Automated

No manual reward engineering or hand-crafted heuristics required.

Previous: RAMPS© 2026 AutoSpec Project