Report Outlines Creation of a $2.6B US National AI Research Capability
The Biden White House on Tuesday announced the release of a final report (PDF download) outlining a three-year plan to build a National Artificial Intelligence Research Resource (NAIRR).
The NAIRR is envisioned to be shared AI research infrastructure for public use, costing $2.6 billion over six years. The plan calls for a four-phased approach over three years to create a “democratized” AI infrastructure for students and researchers to tap. It’ll provide access to both governmental and nongovernmental data resources.
The State of AI
AI research currently is limited to “well-resourced” entities, hence the need for the NAIRR, according to the White House’s announcement. The report cited some numbers to that effect:
Even though private investment in AI more than doubled between 2020 and 2021 to approximately $93.5 billion, the number of new companies has decreased. The disparity in availability of AI research resources affects the quality and character of the US AI innovation ecosystem, contributing to a “brain drain” of top AI talent from academic and research institutions to a small set of well-resourced corporations.
Countries that have made long-term investments in AI research, “such as China,” are seeing technological achievements. China has more AI journal publication citations and more AI patent applications than the United States.
The report outlined the type of infrastructure that will be needed for the NAIRR, stating that “computational resources should include conventional servers, computing clusters, high-performance computing, and cloud computing, and should support access to edge computing resources and testbeds for AI R&D .”
A supercomputer will be needed as well:
To meet users’ capability needs, the NAIRR system should include at least one large-scale machine-learning supercomputer capable of training 1 trillion-parameter models.
NAIRR Plans and Funding
NAIRR creation is envisioned as requiring the execution of four planning stages over three years.
The first phase in building the NAIRR involves authorizing funds for its infrastructure. The second phase (year 1) involves working with an “operating entity,” which may work with “resource providers.” Initial NAIRR operations are expected to commence in the third stage (year 2). Lastly, full NAIRR capacity for steady-state operations is expected to occur in the fourth stage (year 3).
The NAIRR is expected to cost $2.6 billion over its initial six-year period. To keep NAIRR resources in a state-of-the-art condition, the report envisions making “new $750 million investments” every two years.
The report also offered cost estimates for building “large, computationally-intensive deep learning models,” as implemented by OpenAI with GPT-3 (175 billion parameters) and Google (1.6 trillion parameters).
Published cost estimates ballpark that training a 110 million-parameter language model costs about $50,000, a 340 million-parameter model costs about $200,000, and a 1.5 billion-parameter model costs about $1.6 million. Overall, the cost depends on multiple factors, including size of the training dataset, model architecture, and the number of training runs.
The resource providers hired by the operating entity overseeing NAIRR operations can be commercial entities. However, the operating entity itself “should be a distinct, non-government organization,” the report explained.
Most of the operations, though, would be handled by the resource providers:
The Operating Entity should not itself operate the totality of the computer hardware that makes up the NAIRR; instead, computing, data, and training resources would be delivered by resource providers at universities, FFRDCs [federally funded research and development centers]and from the private sector.
The report envisions private entities competing to become resource providers. They could get “funding” in exchange for making their resources available, or they could make a swap to gain access to NAIRR resources.
NAIRR could also take advantage of federal data resources that are already being stored in commercial clouds. The report pointed to “over 36 petabytes of public and controlled access genomic sequencing data hosted by the NIH’s National Library of Medicine” that are stored on two commercial cloud platforms. Also, “42 and 10 petabytes of public weather and environmental data” collected by the National Oceanic and Atmospheric Administration are available on three commercial cloud platforms.
The “National Artificial Intelligence Research Resource Task Force” developed this report after 1.5 years of work. Task Force members consisted of “12 leading experts equally representing academia, government, and private organizations” as appointed by the White House Office of Science and Technology Policy (OSTP) and National Science Foundation (NSF). The research effort was kicked off by the National AI Initiative Act of 2020.
Kurt Mackie is senior news producer for 1105 Media’s Converge360 group.