Defeating Nondeterminism In LLM Inference Reproducible outputs at temperature 0 should be straightforward in principle, the sampler always picks the highest probability token, yet production LLM endpoints still produce different completions f... attention batch-invariance determinism gpu-kernels llm-inference
EpMAN Reweights Attention With Episodic Memory To Tackle 256k-Token Contexts Long-context reasoning is still a weak spot for many large language models, even as context windows grow. The ACL 2025 paper EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts ... ACL 2025 attention episodic-memory LLM long-context RAG