Friday, July 28, 2023

The Long and Mostly Short of China’s Newest GPT




Who said all large-language models (LLMs) necessarily need to be large? In China’s case, LLMs are currently downsizing in their size and number of parameters. According to sources, this is because the country is now focusing on enabling Chinese startups and smaller entities to build their own generative AI applications. As part of this downscaling trend, in June the Beijing Academy of Artificial Intelligence (BAAI) introduced Wu Dao 3.0, a series of open-source, LLMs.

Based on interviews with high-ranking, anonymous sources involved in the project, IEEE Spectrum can report that Wu Dao 3.0 builds on the academy’s work with Wu Dao 2.0, a sparse, multimodal generative AI model—as has been widely reported about version 2.0—with 1.75 trillion parameters. Although the number of Wu Dao 3.0’s parameters is currently a matter of speculation, it is certainly well below the high-water mark that version 2.0 set.

“Wu dao” means “path to enlightenment” in Chinese. Parameters are the weights of the connections between digital “neurons” in the model, representing the relationships between words and phrases. The number of parameters in a model is a measure of its complexity. In a sparse model, only a small subset of the parameters is actually used, making them more efficient than dense models, because they require less memory and computational resources.

“Ultimately, [the government-funded BAAI] can only produce models with smaller parameters, to be used within other Chinese companies.”
—Hejuan Zhao, TMTPost

Rather than another Wu Dao 2.0-size behemoth, however, the Wu Dao 3.0 project is a collection of smaller, nimbler, dense models under the name Wu Dao Aquila, reflecting efforts to enable companies to easily adopt generative AI in their products, Spectrum’s sources say. (In Chinese, the smaller models are called “Wu Dao Tianying,” meaning “path to enlightenment eagle.” Aquila is the Latin for “eagle” and so the smaller models are referred to as the Aquila models in English)

“Due to high costs, chip sanctions, and regulatory systems, large language models like Wu Dao 2.0 can’t be implemented,” said Hejuan Zhao, founder and CEO of TMTPost, one of China’s largest tech media outlets. “Ultimately, they can only produce models with smaller parameters, to be used within other Chinese companies.”

Open sourcing relatively smaller models may also be a strategic choice by BAAI, sources say, as the academy is a non-profit research organization—and the return on investment for training another large model is low. (BAAI officials declined to comment on the record for this story).

The new Wu Dao 3.0 Aquila models have failed to garner much attention in China, possibly due to the similarity in parameter scale to other available open-source models, like Meta’s LLaMA and its recently announced open-source(ish) language model, Llama 2.

China’s LLM landscape is dominated by companies like Alibaba, Baidu, Huawei, and others. Baidu’s largest model, Ernie 3.5, is arguably the most powerful, though it still lags the performance of OpenAI’s GPT-4. And China’s more powerful models including Ernie 3.5, Huawei’s Pangos 3.0, and Alibaba’s Tongyi suite, remain proprietary.

Smaller, open-source models have lower inference costs—that is, how much it costs to run the model as it provides an output—and can be commercialized more readily. They are particularly suitable for niche applications, such as medical chatbots.

Training smaller models also requires fewer chips, making them less vulnerable to hardware shortages. Access to sufficient hardware, especially graphics processing units for model training, is a critical aspect of China’s burgeoning AI sector.

The U.S. government has imposed export restrictions on Nvidia’s A100 GPUs and forthcoming H100 chips to China, including Hong Kong. In response, Nvidia has released a cut-down, slower version of the A100, known as the A800, specifically for China. But any further tightening of U.S. export controls on cutting-edge chips would severely hamper model training in China. There is already an underground trade in Nvidia GPUs due to the high demand and short supply.

Presently, China’s focus is centered on the practical application of AI models. Encouraging open sourcing of not just models, but also datasets and computational resources, the Chinese government hopes to boost the nation’s overall AI development. By building a foundation for large models and promoting innovation through open-source collaboration, BAAI said it is trying to create an open-source ecosystem akin to Linux.

Reference: https://ift.tt/EvURS3G

No comments:

Post a Comment

Startups Begin Geoengineering the Sea

If you search up Hawaii’s Keāhole Point on Google Maps, center it on your screen, and then zoom out until you can see the edges of the g...