Long-form, citable, implementation-grounded papers — written so reviewers can follow each claim back to a source file in the production codebase.
A controlled benchmark across 71 configurations showing how DAG-based agent networks and SEEMR self-evolving routing cut predicted inference energy by up to 94.9% — with output quality held non-inferior in aggregate.
A composable, six-tier dispatch architecture — policy, retrieval, rules, predictive re-ranking, contextual bandits, and challenger exploration — that turns model selection from a static configuration into a continuously-learning decision.
Ten implementation-grounded mechanisms — from multi-objective routing to measured-power sampling — that move the energy footprint of enterprise LLM inference from an implicit externality to a first-class engineering objective.