(formerly SARE: Self Adapting Research Engineer)
Current prompt optimization approaches for language agents primarily depend on local feedback signals to refine behavior. As a result, they often fail to capture recurring patterns across tasks, leading to limited generalization. In many cases, these methods rely on complete prompt rewrites or loosely structured updates, which can unintentionally discard previously learned knowledge. These challenges become more pronounced in research-coding environments, where workflows involve diverse repositories, incomplete configurations, and weak or inconsistent feedback signals. Reproducing results from public codebases—commonly used as a benchmark for evaluation—remains particularly difficult under such conditions.
In this work, we introduce Reflective Evolving Research Engineer (REVERE), a framework designed to enable continuous learning and adaptive improvement. REVERE leverages a Global Training Context to capture patterns across tasks, identifies recurring failure modes from execution traces spanning multiple repositories, and transforms these insights into reusable heuristics. Instead of rewriting prompts entirely, the framework performs targeted updates across three configurable components:
Through this reflective optimization process, REVERE demonstrates consistent improvements over prior state-of-the-art approaches. Specifically, it achieves performance gains of 4.50% on SUPER, 3.51% on ResearchCodeBench, and 4.89% on ScienceAgentBench. These results highlight the effectiveness of integrating long-term learning and structured memory into language agents, enabling them to evolve and improve their capabilities over time.
đźš§ This website is currently under construction. Some sections may be incomplete or subject to change as we continue to improve the content. The full paper, discussion, resources, and additional materials will be available soon.
@misc{gangireddi2026reverereflectiveevolvingresearch,
title={REVERE: Reflective Evolving Research Engineer for Scientific Workflows},
author={Balaji Dinesh Gangireddi and Aniketh Garikaparthi and Manasi Patwardhan and Arman Cohan},
year={2026},
eprint={2603.20667},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2603.20667},
}