Chinese technology firms are swiftly striving to replicate the capabilities of Stable Diffusion and DALL-E 2, but there are obstacles that they must overcome in their pursuit.




AI technology is now making remarkable leaps and bounds, leaving people excited and looking forward to future success. But there are also concerns about the consequences it can bring. Following the buzz around the text-to-image tools developed by Stability AI and OpenAI, ChatGPT’s proficiency in engaging in intelligent conversations has become the latest fixation across various industries.

In China, where the tech industry has been closely observing developments in the West, entrepreneurs, researchers, and investors are actively seeking opportunities to make their mark in the generative AI field. Tech companies are creating tools based on open source models to attract both consumer and enterprise customers.

Individuals are capitalizing on AI-generated content for various purposes. Regulators are responding swiftly by establishing guidelines on the appropriate use of text, image, and video synthesis. Meanwhile, concerns are arising over China’s ability to keep up with AI advancements due to U.S. Tech sanctions.

As generative AI gains momentum and becomes a global phenomenon towards the end of 2022, it’s worth examining how this revolutionary technology is unfolding in China.

Chinese flavors

With the rise of viral art creation platforms such as Stable Diffusion and DALL-E 2, generative AI has captured widespread attention. In China, tech giants have also seized the public’s interest with their own equivalent products, incorporating unique adaptations to align with the country’s cultural preferences and political landscape.

Baidu, renowned for its search engine expertise and advancements in autonomous driving, has developed ERNIE-ViLG, a massive 10-billion parameter model trained on a dataset of 145 million Chinese image-text pairs.

How does it compare to its American counterpart? Let’s examine the results when given the prompt “kids eating shumai in New York Chinatown” to Stable Diffusion, versus the same prompt in Chinese (纽约唐人街小孩吃烧卖) for ERNIE-ViLG.

As someone who has experienced eating dim sum in China and Chinatowns, I would agree that the results are inconclusive. Neither Stable Diffusion nor ERNIE-ViLG accurately captured the essence of shumai, which is a succulent dumpling made with shrimp and pork, wrapped in a half-open yellow casing, commonly found in dim sum cuisine.

While Stable Diffusion accurately portrays the ambiance of a Chinatown dim sum eatery, its depiction of shumai falls short. On the other hand, ERNIE-ViLG generates a type of shumai, but it appears to resemble a variety more commonly found in eastern China rather than the Cantonese version typically associated with dim sum. It highlights the nuances and challenges of generative AI in accurately capturing regional and cultural specificities.

Indeed, the challenges of capturing cultural nuances in generative AI outputs are evident in the quick test results. The bias in the training data sets used by both Stable Diffusion and ERNIE-ViLG could impact their ability to accurately generate culturally-specific content.

Tencent’s Different Dimension Me, a tool that transforms photos of people into anime characters, has gained attention in China and other anime-loving regions such as South America. However, the AI generator has exhibited its own biases. Intended for Chinese users, it has faced criticism for its failure to accurately identify and represent black and plus-size individuals.

This is due to the underrepresentation of these groups in traditional Japanese anime, which has led to offensive and inaccurate AI-generated results. This highlights the importance of addressing bias in AI technologies and ensuring that they are inclusive, diverse, and respectful of all individuals, irrespective of their race, size, or other characteristics.

Taiyi, a text-to-image model developed by IDEA, a research lab led by renowned computer scientist Harry Shum, who co-founded Microsoft’s largest research branch outside the U.S., Microsoft Research Asia, is another notable Chinese large-scale model for generating images from text. Taiyi is trained on 20 million carefully filtered Chinese image-text pairs and boasts a staggering one billion parameters.

Unlike profit-driven tech companies like Baidu and others, IDEA is among the few institutions that have received backing from local governments in recent years to conduct research on cutting-edge technologies.

This implies that the center likely benefits from greater research freedom without the pressure to prioritize commercial success. Located in the bustling tech hub of Shenzhen and supported by one of China’s wealthiest cities, IDEA is an emerging organization that is worth keeping an eye on.

Rules of AI

The generative AI tools developed in China are not only influenced by the domestic data they are trained on but also shaped by local laws. As highlighted by MIT Technology Review, Baidu’s text-to-image model, for example, filters out politically sensitive keywords. This is unsurprising, considering that censorship has been a pervasive practice on the Chinese internet for a long time.

A crucial aspect that will impact the future of the emerging field of generative AI in China is the set of new regulatory measures aimed at what the government refers to as “deep synthesis tech.” This term encompasses technologies that utilize deep learning, virtual reality, and other synthesis algorithms to generate text, images, audio, video, and virtual scenes.

Similar to other internet services in China, such as games and social media, users are required to verify their real names before accessing generative AI apps. This practice inevitably restricts user behavior, as the prompts used in these apps can be traced back to their real identities.

However, there is a positive aspect to these rules as well. They could potentially result in more responsible use of generative AI, which has been misused in other contexts to produce NSFW and sexist content. In China, for instance, the regulations explicitly prohibit the generation and dissemination of AI-created fake news. However, the implementation of these regulations ultimately rests with the service providers who are responsible for ensuring compliance.

“It’s interesting that China is at the forefront of trying to regulate [generative AI] as a country,” said Yoav Shoham, co-founder of AI21 Labs, an Israel-based OpenAI rival, in an interview. “There are various companies that are putting limits to AI…Every country I know of has efforts to regulate AI or to somehow make sure that the legal system, or the social system, is keeping up with the technology, specifically about regulating the automatic generation of content.”


But there’s no consensus as to how the fast-changing field should be governed, yet. “I think it’s an area we’re all learning together,” Shoham admitted. “It has to be a collaborative effort. It has to involve technologists who actually understand the technology and what it does and what it doesn’t do, the public sector, social scientists, and people who are impacted by the technology as well as the government, including the sort of commercial and legal aspect of the regulation.”

Monetizing AI

While concerns about AI replacing human artists persist, in China, many individuals, including opportunists and stay-at-home moms seeking extra income, are utilizing machine learning algorithms to make money in various ways.

They have realized that by refining their prompts, they can manipulate AI into generating creative emojis or captivating wallpapers, which they can then share on social media to generate ad revenues or charge users for downloads. Some skilled individuals even sell their prompts to others who want to participate in the lucrative market of AI-generated content, or even offer training services for a fee.

In China, similar to the rest of the world, AI is also being utilized in various formal job settings. For instance, light fiction writers can efficiently generate illustrations for their work, which is a genre that typically includes visuals and is shorter than novels.

Additionally, there is a compelling use case of leveraging AI to design consumer goods such as T-shirts, press-on nails, and prints, which has the potential to disrupt traditional manufacturing methods. By rapidly generating large batches of prototypes, manufacturers can save on design costs and shorten their production cycle, leading to increased efficiency and productivity.

While it’s still early to determine how generative AI is developing differently in China compared to the West, entrepreneurs have already made decisions based on their initial observations. Some founders have noted that businesses and professionals are generally willing to invest in AI as they see a direct return on investment, which has motivated startups to identify industry-specific use cases.

One interesting example comes from companies like Surreal (later renamed to Movio), which is backed by Sequoia China, and, backed by Hillhouse, who discovered during the pandemic that e-commerce sellers were facing challenges in finding foreign models due to China’s closed borders. To address this, the companies developed algorithms that generated fashion models of various shapes, colors, and races, providing a solution to the problem.

Despite the potential of generative AI in China, some entrepreneurs are not optimistic about their AI-powered Software-as-a-Service (SaaS) achieving the same skyrocketing valuation and rapid growth as their Western counterparts, such as Jasper and Stability AI. Many Chinese startups have expressed concerns that enterprise customers in China are generally less willing to pay for SaaS compared to those in developed economies. As a result, some Chinese startups have started looking to expand overseas in order to tap into markets where there may be more willingness to pay for their AI-powered SaaS products.

Competition in China’s SaaS space is also dog-eat-dog. “In the U.S., you can do fairly well by building product-led software, which doesn’t rely on human services to acquire or retain users. But in China, even if you have a great product, your rival could steal your source code overnight and hire dozens of customer support staff, which don’t cost that much, to outrace you,” said a founder of a Chinese generative AI startup, requesting anonymity.

Shi Yi, founder, and CEO of sales intelligence startup FlashCloud agreed that Chinese companies often prioritize short-term returns over long-term innovation. “In regard to talent development, Chinese tech firms tend to be more focused on getting skilled at applications and generating quick money,” he said. One Shanghai-based investor, who declined to be named, said he was “a bit disappointed that major breakthroughs in generative AI this year are all happening outside China.”


Chinese tech firms may face challenges in obtaining the best tools for training large neural networks due to export controls on high-end AI chips imposed by the U.S. government in September. While many Chinese AI startups focus on application-oriented tasks that do not require high-performance semiconductors, those involved in basic research may face longer computing times and higher costs when using less powerful chips, according to an enterprise software investor at a top Chinese VC firm who requested anonymity.

However, some companies like Baidu, which considers itself a leader in China’s AI field, have stated that the impact of U.S. chip sanctions on their AI business is limited in the short and long term. Baidu’s executive vice president and head of AI Cloud Group, Dou Shen, mentioned during a Q3 earnings call that a large portion of their AI cloud business does not heavily rely on advanced chips, and they have already stocked enough high-end chips to support their business in the near term. Nevertheless, the export controls on high-end AI chips may push China to invest in advanced technologies over the long run.


What about the future? “When we look at it at a mid- to a longer-term, we actually have our own developed AI chip, so named Kunlun,” the executive said confidently. “By using our Kunlun chips [Inaudible] in large language models, the efficiency to perform text and image recognition tasks on our AI platform has been improved by 40% and the total cost has been reduced by 20% to 30%.”

Indeed, only time will reveal the outcome of China’s efforts in developing indigenous AI chips such as Kunlun, and whether they will provide the country with a competitive advantage in the generative AI race.

Elon Musk and Tech Leaders Urge AI Pause over Societal Dangers – 2023

Leave a Reply

Your email address will not be published. Required fields are marked *