Reflective Evolving Research Engineer: A Framework for Continuous Learning and Adaptive Prompt Optimization in Research-Coding Agents

Current prompt optimization approaches for language agents primarily depend on local feedback signals to refine behavior. As a result, they often fail to capture recurring patterns across tasks, leading to limited generalization. In many cases, these methods rely on complete prompt rewrites or loosely structured updates, which can unintentionally discard previously learned knowledge. These challenges become more pronounced in research-coding environments, where workflows involve diverse repositories, incomplete configurations, and weak or inconsistent feedback signals. Reproducing results from public codebases—commonly used as a benchmark for evaluation—remains particularly difficult under such conditions. In this work, we introduce Reflective Evolving Research Engineer (REVERE), a framework designed to enable continuous learning and adaptive improvement. REVERE leverages a Global Training Context to capture patterns across tasks, identifies recurring failure modes from execution traces spanning multiple repositories, and transforms these insights into reusable heuristics. Instead of rewriting prompts entirely, the framework performs targeted updates across three configurable components:

The system prompt, which governs overall agent behavior.

A task-specific prompt template, used for guiding individual tasks.

A cumulative cheat-sheet, which stores learned strategies and corrections.

Through this reflective optimization process, REVERE demonstrates consistent improvements over prior state-of-the-art approaches. Specifically, it achieves performance gains of 4.50% on SUPER, 3.51% on ResearchCodeBench, and 4.89% on ScienceAgentBench. These results highlight the effectiveness of integrating long-term learning and structured memory into language agents, enabling them to evolve and improve their capabilities over time.

ReflectiveEvolvingResearchEngineer

A Framework for Continuous Learning and Adaptive Prompt Optimization in Research-Coding Agents

Abstract

Disclaimer

Citation