Tencent Cloud has set its sights on the chip design race

In addition to the chip design industry, Tencent will also focus on laying out multiple high-performance computing tracks such as cloud rendering and life sciences.
Wen | You Yong
Editor | Shi Zhao

As the demand for cloud access and usage gradually deepens, cloud vendors are actively penetrating the industry to create best customer practices.
Not long ago, Tencent Cloud collaborated with Sushi Technology to create an industry solution for the HPC (High Performance Computing) scenario for chip design company Suiyuan Technology. This solution is based on a one-stop chip design and development cloud platform jointly built by Tencent and Sashi Technology. It quickly and automatically calls on Tencent Cloud IaaS resources to build a simulation environment, meeting the business flexibility needs of Suiyuan Technology and improving the overall project research and development efficiency.
It is a visible blue ocean with great potential, “said Kevin, senior manager of Tencent Cloud’s high-performance computing industry. Tencent Cloud will increase investment in this area. According to Data Intelligence Frontline, in addition to the chip design industry, Tencent will also focus on laying out multiple high-performance computing tracks such as cloud rendering and life sciences.
01
Cloud is becoming a trend in the chip design industry
Suiyuan Technology, as a leading AI chip design enterprise in China, has set a record of successfully producing AI training chips with high technical barriers in just 18 months.
But as the process becomes more advanced, Suiyuan also faces a contradiction between IT resources and efficiency being unable to meet business needs.
The research and development cycle of chips is usually quite tight, especially for large chips, and task scheduling is often done on a daily basis in the middle and later stages. The industry generally adopts self built IDC (data center), and Kevin told Digital Frontline that this was mainly because the chip technology at that time was not so advanced and the demand for computing power was not as high.
Moreover, Vincent, the IT manager of Suiyuan Technology, revealed that the chip project will undergo extensive argumentation and planning in the early stages, including how much computing power and storage is required. But the problem is that there are often changes during the project progress process, including process improvements, functional changes, and performance indicator adjustments. This change will result in a large amount of sudden computing power demand. If we need to meet the demand by purchasing or renting servers, it will take a considerable period of time for the business team to use these computing power from deployment to online testing, which will affect the research and development progress.
Such efficiency is clearly unacceptable. Especially in recent years, the epidemic has led to an uncontrollable cycle of hardware procurement, but the cycle of chip projects is clear, which means that chip design companies face an uncertain risk of IT assets. For example, one or two hundred servers need to be prepared within a day, which can only be achieved through cloud deployment. If it is the original IT process, from confirming the server model to purchasing, from installing server cabinets to operating and maintaining the computer room, it can take up to 8 to 12 weeks, and the cost of IT capital occupation is too high.
This is an opportunity for us to go to the cloud, “Vincent said.
The design cycle of large chips exceeds 12 months, including multiple stages such as product definition, front-end design, IP verification, SOC verification, synthesis, layout and wiring, and different stages have different requirements for computing power. The validation phase is the peak period of computing power usage. So, Suiyuan also chose to move some of the simulation verification to the cloud. “The front-end IP verification process is basically done in the cloud, and in the future, we will definitely hope to make the entire elastic part as much as possible in the cloud,” said Eli, the project leader of Suiyuan Technology.
Sui already has a large demand for flexible operations, such as the need to configure hundreds of servers at the same time, which requires high stability and real-time response. At present, Tencent Cloud in collaboration with AliExpress can enable customers to quickly run simulation tasks within 1 hour, allowing them to run simulation and verification tasks more frequently within a limited time, and improving the success rate before streaming. At the same time, based on the business scenario optimization and CAD capabilities of Sushi, the overall job running time of Suiyuan was reduced by 50%, accelerating the research and development progress of the entire project.
Moreover, the chip design industry has now entered the 7nm or even 3nm era, with billions of transistors on a chip, which greatly increases the demand for computing power. This means that chip companies have a very clear demand for computing power during peak periods, and chip design companies such as Suiyuan are starting to seek flexible computing power solutions from cloud manufacturers.
Going to the cloud is an industry trend, “Vincent said.” Everyone is trying, but it will take some time for everything to go to the cloud
02
The Iron Triangle of Safety, Efficiency, and Cost
The core of chip design enterprises lies in various chip codes and intellectual property rights. Compared to many industries, this track has higher requirements for data security.
Suiyuan Technology’s attitude towards going to the cloud is that all data should be stored locally, with only the elastic part on the cloud and no data storage in the middle. So, with the advice and inspiration of Suiyuan, Tencent Cloud and Sushi explored a hybrid cloud computing architecture of “storage computing separation” and spent five or six months verifying it.
It can ensure that core data and code are stored locally, and connect with local computing clusters through the scheduling platform of AliExpress, allowing computing tasks to flexibly select local or cloud computing power queues.
Chen Lintao, Technical Director of Sushi Technology, revealed that the storage and computing separation scheme adopted this time is essentially a hybrid cloud scheme. In the Suiyuan project, the scheme faces further technical challenges, such as high requirements for network latency, bandwidth throughput, and efficiency in the entire hybrid cloud construction architecture. This requires the three parties to jointly seek the optimal architecture layout in this project.
Vincent admitted that due to the implementation of a storage and computing separation architecture, the data is local, so the company’s concerns about security will be reduced.
The previous separation of storage and computing was implemented within the same autonomous domain, such as on Tencent Cloud. But now Suiyuan’s solution is to deploy in a hybrid cloud within two autonomous domains, which increases physical distance and makes scheduling of various interfaces more complex, further testing the capabilities of cloud vendors and partners. On the other hand, the AliExpress platform does not change users’ usage habits, allowing them to seamlessly access cloud resources, making resource access more convenient and reducing the learning cost of going to the cloud.
This is also a common challenge that cloud vendors often encounter when delving into the industry. Tencent Cloud and Sashi previously considered directly uploading customer data to the cloud for convenience and efficiency. However, after communication, it was found that the chip customer’s requirements for data security were still most appropriate to adopt a hybrid cloud storage and computing separation architecture. At present, Tencent Cloud only provides support for computing power, while the Faststone platform provides automated and efficient environment construction. The core data of enterprises such as Suiyuan’s knowledge code are all stored offline. However, in Kevin’s view, some insensitive data can theoretically be uploaded to the cloud and simulation efficiency can be improved through caching technology.
Kevin told Digital Intelligence Frontline that early startups, with limited data and assets in stock, had less concerns about security and preferred to use a full cloud solution. However, as the scale grew, many companies tended to adopt a hybrid cloud architecture.
Moreover, many chip design companies have had a lot of IDC assets before, and how to utilize their existing resources is also their demand. It can better balance the investment of existing assets while also considering the flexibility, flexibility, speed, and convenience of the cloud. So from this perspective, hybrid cloud is currently a relatively good choice
Suiyuan did not move all of its business to the cloud, and some of it still uses local computing power. For example, the early project operation is more suitable for the existing local computing power. In fact, many chip design companies still focus on local areas and do the elastic part on the cloud.
The deployment method of hybrid cloud is gradually becoming a consensus on saving IT costs.
Suiyuan once calculated that if he purchased servers and built his own computer room, and compared them with the financial cycle of three to five years, the cost of monthly average sharing would be lower than the cost of monthly cloud average sharing. But if we consider saving time and manpower, improving efficiency, and overall comprehensive costs, the advantages of going to the cloud are still very obvious. Because the cloud does not require water and electricity, nor does it require self operation and maintenance, this part is saved. Moreover, the ability to quickly deploy and flexibly expand can enable expensive R&D personnel to improve efficiency and shorten the R&D cycle.
In addition to adopting a storage computing separation architecture, Tencent Cloud and Sashi have also created a complete security solution from terminal to cloud for chip design customers such as Suiyuan. At the terminal, Tencent Cloud’s zero trust security iOA solution can ensure that Suiyuan’s research and development personnel from all over the country can seamlessly experience a consistent simulation environment, while ensuring terminal security, information protection, and the protection of some vulnerabilities.
In the cloud, Tencent’s host security is used to ensure that the entire computing environment is secure and creditworthy, ensuring that the entire computing process is free from issues such as intrusion, data leakage, ransomware, and so on. Even at the transmission level, there is a super wide bandwidth network guarantee between Tencent Cloud and Suiyuan, ensuring the security and trustworthiness of the entire transmission channel.
It is not difficult to find that the architecture of storage and computing separation and the deployment plan of hybrid cloud not only meet the needs of elastic computing power and efficiency, but also meet the needs of cost savings and data security. And these are the things that enterprises are most concerned about when going to and using the cloud, and they are also the aspects that cloud manufacturers need to pay attention to and solve.
At present, the hybrid cloud architecture of “storage computing separation” has helped Suiyuan save considerable IT investment, and task concurrency can be increased through cloud elastic synchronization. At the same time, some simulation cycles can be shortened by 30-50%.
Of course, Eli also mentioned that the use of this tripartite storage and computing separation solution at the current stage meets the various definitions made by some businesses in terms of elastic computing power usage. And the next step is to further optimize and improve usage efficiency. “How to make more efficient use of cloud machines, how to align with business usage for efficient optimization, and migrate more businesses is what we need to do next
It is not difficult to find that the architecture of storage and computing separation and the deployment plan of hybrid cloud not only meet the needs of elastic computing power and efficiency, but also meet the needs of cost savings and data security.
In the future, GPU accelerated chip simulation and providing intelligent chip design optimization are the new directions of the industry. Tencent Cloud will also collaborate with domestic and foreign EDA software to build an accelerated simulation ecosystem, bringing several times the speed of chip simulation operations and providing AI intelligent PPA optimization capabilities. At the same time, Tencent Cloud is also trying to explore cloud based development, deploying the pre chip design process to the cloud, and building a chip design process based on the entire cloud to further improve the efficiency of large chip research and development design. In high concurrency scenarios, Tencent Cloud can complete multi generation and multi card verification work in a one-stop manner through the massive and large-scale scheduling capability of the native operating system of Alibaba Cloud, as well as a rich variety of bare metal and GPU instances in the chip simulation verification and performance comparison testing process, saving self built purchase costs and greatly improving deployment and testing efficiency.https://www.stoneitech.com/

By hmimcu