A Harvard team recently published progress on QuArch, a dataset comprising 1,500 human-validated question-answer pairs designed to evaluate and enhance language models' understanding of computer architecture.
QuArch : Question-Answering Computer Architecture Dataset
The QuArch dataset covers a wide range of topics, including processor design, memory systems, and performance optimization. It is based on the Archipedia corpus, a comprehensive collection of scholarly articles, technical documentation, and insights spanning decades with answers grounded in curated technical content.
Fig. 3: Distribution of computer architecture topics in QuArch.
Paper: https://arxiv.org/abs/2501.01892
QuArch addresses a significant gap in the field of computer architecture by providing a specialized dataset to benchmark and improve AI models' understanding of architectural computing concepts. This is crucial as current language models struggle with domain-specific tasks in computer engineering, limiting the development of AI-driven solutions in this area.
"Hardware engineering has lagged significantly in adopting AI-driven solutions. This gap is evident in both the limitations of current language models (LMs) and the scarcity of specialized datasets tailored for hardware."
QuArch is being used to evaluate state-of-the-art language models, revealing a significant performance gap between large closed-source models and smaller open-source models.
The team observed the highest performing closed source model reaching 84% accuracy and open source model achieving 72%. Additionally, fine-tuning with QuArch improves small model accuracy by up to 8%, establishing a foundation for advancing AI-driven computer architecture research.
QuArch Dataset Construction Pipeline
The creation of QuArch involved several steps:
- Curation of Archipedia: A comprehensive compilation of computer architecture knowledge from academic literature, educational materials, technical documentation, and industry sources.
- QA Generation: Using commercial language models to synthesize questions grounded in the Archipedia corpus.
- Validation: A multi-tiered review process combining human expertise and language model assistance to ensure technical rigor and accuracy.
"The alpha release of QuArch v0.1 offers a foundation of question-answer pairs, designed to assess the computer architecture knowledge embedded in LMs today and bridge the gap between AI agent capabilities and specialized knowledge in computing hardware and architecture."
Fig. 4: QuArch accuracy ranges from 39%-84%. Larger models (>70B parameters) attain a max of 84%. Small model (<10B parameters) performance drops 12% in comparison.
The introduction of QuArch has several potential impacts:
Enhanced AI Models: By fine-tuning with QuArch, AI models can achieve better performance in understanding and solving computer architecture problems.
Benchmarking Tool: QuArch provides a robust benchmark for assessing the capabilities of AI models in the domain of computer architecture, providing a basis for improving domain knowledge across open and closed source models.
Advancement in AI-Driven Solutions: Improved AI models can lead to more sophisticated tools and solutions in hardware design, potentially transforming the field with new CPU/GPU designs, interfaces and innovative form factors.
QuArch sets a foundation for future advancements in AI-driven computer architecture research, highlighting the importance of domain-specific datasets in developing specialized AI expertise. With continued development, QuArch could serve as an industry standard in developing domain specific datasets for training specialized AI agents.
For more information, benchmarks and updates visit the GitHub page https://harvard-edge.github.io/QuArch/
Harvard Research Releases QuArch: A Q&A For AI Agents to Improve Computer Architecture Understanding