This article was transcribed by SimpRead. Original URL: mp.weixin.qq.com
Wang Qian said DeepSeek is certainly great, but we want to build a company like OpenAI.
Text by Shen Yuan
Edited by Song Wei
The first question of the interview took Wang Qian 30 minutes to answer, starting with why he chose AI and ending with missing out on a Turing Awardâlevel discovery.
He spoke at such length because his background is exceptionally complex: an undergraduate in Electronic Engineering at Tsinghua University, then switching to Biomedical Engineering for graduate studies, followed by a PhD in Robotics Learning at the University of Southern California (USC), and his first job running a quantitative hedge fund.
In short, heâs an atypical founder in embodied intelligence: neither having worked at major tech firms in China or the U.S., nor holding any prominent academic titles.
That doesnât diminish Wang Qianâs confidence.
Throughout the interview, he rarely hesitated. He typically responded quickly and directly, often citing wide-ranging references and making bold claims about why others fail while he can succeed.
As early as 2009, Wang was already working on neural networks. The architecture he designed was just one step away from the transformerâwhat he calls a missed opportunity worthy of a Turing Award, and also the origin of his technical self-assurance. He is among the most fervent advocates for end-to-end embodied physical models in the industry.
Early on, this level of confidence turned off some investors, but an increasing number have since been convinced. Wang Xinyu, partner at Meituan Longzhu, described Wang Qian as someone with unique insights and unwavering judgment about technology. After tracking him closely for a year, Meituan became a key investor in Zibianliang Robotics.
On January 12, Zibianliang Robotics announced it had completed a ÂĽ1 billion A++ round of financing, just four months after its previous round. According to our information, ByteDance led this latest investment.
This is a man waiting for his chance to change the world. Wang Qian wants to achieve original innovation from zero to one, like OpenAIâhe wants to be first.
Missing Out on a Turing-Level Achievement
LatePost: Before founding Zibianliang Robotics, your last experience was running a quant fund in the U.S. How did you make such a big career shift?
Wang Qian: Honestly, the shift wasnât that bigâthe underlying technology is essentially the same. My PhD focused on Robotics Learning, which still revolves around deep learning techniques quite similar to those used in quant trading.
For someone skilled in AI, making money through quant finance is very straightforward.
LatePost: How did you originally become interested in AI?
Wang Qian: As a child, I wanted to pursue mathematics and physics. Later, I realized that the professional lifespan of theoretical physicists and mathematicians has drastically shortened compared to a century ago. So I decided to build an âengine of human intelligenceââthatâs AI.
LatePost: You studied Electronic Engineering as an undergrad at Tsinghua, but switched to Biomedical Engineering for graduate school. Why?
Wang Qian: Fundamentally, why do we believe AI is possible? Because thereâs a naturally intelligent system right in front of us: the human brain. But back then, the dominant AI approach was statistical learning, improving accuracy by only 0.1% per yearâand even that improvement mightâve just been overfitting. Thatâs when I turned to neural networks.
At the time, no one believed in neural networks. I searched every lab across Tsinghuaâs School of Information Science, and not a single professor was working on them. So I went to the biomedical department to study computational neuroscience instead.
My advisor, who had returned from the U.S., told me about someone named Geoffrey Hinton who developed something called Deep Learning. I looked into it and thought: isnât that just neural networks? Since then, Iâve been doing deep learningâstarting in 2009âmaking me one of the earliest pioneers in China.
LatePost: Many sources cite you as one of the first people in China to explore attention mechanisms. How did you arrive at that direction?
Wang Qian: The highest form of human intelligence is self-consciousness, beneath that is consciousness, and below thatâmost would sayâis attention. So I wondered if we could integrate attention into neural networks. By 2014, I published a paper on it.
Note: The paper is titled Attentional Neural Network: Feature Selection Using Cognitive Feedback, [1411.5140] Attentional Neural Network: Feature Selection Using Cognitive Feedback
It proposed a new neural network framework unifying top-down attention mechanisms with bottom-up feature extraction.
The paper was submitted to NIPS (now NeurIPS) and was among the earliest three papers on attention mechanisms. So yes, you could say I missed out on a Turing Awardâlevel contribution.
LatePost: Youâre saying you actually missed the Turing Award?
Wang Qian: Absolutely. Among those three NIPS papers, the other two came from teams at DeepMind and ETH Zurich. Our architecture was much closer to todayâs Transformer than theirs.
LatePost: What was the key difference?
Wang Qian: Multiplicative operators are inherently hard to converge, especially when stacking many layers.
I interned at Microsoft Research Asia and discussed this with Kaiming He and Jian Sun. They were developing ResNet (Residual Networks), but I didnât pay much attention at the time.
After Transformers emerged, I realized what we lacked was connecting our architecture with ResNetâResNet makes convergence dramatically more stable.
LatePost: That must be deeply regrettable.
Wang Qian: The more I think about it, the more regretful it becomes. Every time I see progress in NLP, I feel pained (laughs)âbecause we were truly just one step away.
LatePost: After publishing the paper, why did you switch to robotics?
Wang Qian: After finishing my masterâs, I wanted to pursue further studies abroad. At the time, the first wave of Chinese AI âFour Little Dragonsâ was emerging, but I wasnât particularly interested in building security-focused applications. I wanted to find a major area where AI could truly be appliedâand robotics naturally stood out.
LatePost: Were deep learning methods already being used in robotics back then?
Wang Qian: In the U.S., only a few research groups were applying deep learning to robotics. One was led by Sergey Levine (co-founder of robotics company PI) and his advisor Pieter Abbeel. Others included MIT and CMU. Ultimately, I chose USC. So I suppose Iâm formally trained in Embodied AIâthough we called it Robotics Learning back then.
LatePost: Did that path eventually stall?
Wang Qian: Around 2018â2019, the entire AI field seemed to stagnate. In robotics, deep reinforcement learning hit a wall because it has a fundamental flaw: data requirements grow exponentially with task complexity. At the time, imitation learning wasnât widely adopted either, so the whole direction felt flawed.
LatePost: What about simulation?
Wang Qian: Simulation doesnât work either. The gap between the physical and virtual worlds is simply too large. The real world is often unobservable and highly stochastic.
(Wang Qian pressed his finger forward against the interview table.)
A fingertip deforms upon contact, and it experiences nonlinear frictionâthese two factors interact to create randomness. This kind of behavior is nearly impossible to simulate accurately. Anything trained in simulation fails in the real world. So my conclusion was: unless thereâs a fundamental breakthrough, robotics might take another thirty to fifty years to succeed.
LatePost: So you left academia and started a quant fund.
Wang Qian: I was genuinely depressed at the time. I also disliked the academic lifestyle, so I naturally thought: I should go make some money. Quant finance was the most direct path.
There were precedentsâmost notably James Simons of Renaissance Technologies, who won the Fields Medal with Shiing-Shen Chern. He succeeded enormously in quant finance and later donated funds to his alma mater, Stony Brook University, elevating its math department to world-class status.
Thereâs precedent in AI tooâthatâs Liang Wenfeng.
Silver Bullet: GPT-3
LatePost: When did you start considering returning to AI and launching an embodied intelligence startup?
Wang Qian: In 2021, when GPT-3 came out. I immediately recognized it as a massive paradigm shiftâbecause it introduced few-shot learning.
People had pursued this capability for decades without success. Reinforcement learning suffers from exponential explosion, but GPT-3 required fewer and fewer samples to learn new tasks. With ChatGPT, zero-shot learning emerged.
By the way, some people today are reviving reinforcement learning in robotics as a ânewâ approachâI find that absurd.
LatePost: Why didnât anyone pursue the GPT-3 path earlier?
Wang Qian: It goes completely against conventional intuition. Everyone assumed specialized models would always outperform general onesâbut now, no specialized model beats a general-purpose model.
This is the silver bulletâthe one solution that solves everything. I once thought the problems in robotics would take 30 or 50 years to solve, but now I see a path forward.
LatePost: When you saw GPT-3, did you think back to your days working on neural networks at Microsoft Research Asia?
Wang Qian: Exactly. Thatâs why I knew I had to return. Your first questionâwhy I moved from robotics PhD to quant and back againâitâs all consistent. Iâve always simply wanted to build AI. Everything else was just different approaches along the way.
Image source: WALL¡E. This inspired the name of Zibianliangâs robot model, WALL-A.
China Makes Hardware, U.S. Makes Software? Impossible.
LatePost: Once you decided to build robots, why choose China over staying in the U.S.?
Wang Qian: I initially considered the U.S., but after surveying the landscape in 2022, I concluded the entire U.S. hardware ecosystem had collapsed.
Supply chain issues are well knownârepairing a robotic arm in a U.S. lab might take two months, while in China it takes one day. Thatâs an order-of-magnitude difference.
More critically, Silicon Valley VCs no longer invest in hardware. Early investors in Figure AI were either the CEO himself, or giants like NVIDIA, OpenAI, Microsoft, and Jeff Bezosâno serious financial VC firms.
Itâs the same with capital, talent, and supply chains. No top engineers want to leave Apple or Meta; if they do, itâs usually with the hope of getting acquired back by Apple. The flow of people, information, money, and componentsâall signs show Silicon Valleyâs hardware ecosystem has completely broken down.
LatePost: China clearly has advantages, but what about disadvantagesâlike fundraising and computing power?
Wang Qian: Fundraising in China is definitely harder than in the U.S. But for embodied intelligence, scaling isnât primarily limited by computeâitâs data. And data costs in China are an order of magnitude lower than in the U.S.
So overall, while funding may be one order of magnitude lower, costs are also one order of magnitude lowerâroughly balancing out. Moreover, the funding disadvantage isnât permanent, but the cost advantage persists.
LatePost: What about talent?
Wang Qian: In 2022, people still debated Silicon Valleyâs talent edge. Now, no one asks that anymoreâeveryone knows the AI researchers in the U.S. and China are essentially the same group: former classmates from the same universities. Whoâs really stronger?
LatePost: Since starting your company, has your assessment changed?
Wang Qian: The U.S. is moving faster than I expected.
Take Figure AI: its high valuation stems partly from its alignment with the narrative of reshoring manufacturing to the U.S. Theyâre spending an order of magnitude more on in-house hardware productionâsoon theyâll manufacture joints, motors, batteries, even motor winding equipment. Only thing missing is screwing it together themselves.
Many claimed China builds hardware while the U.S. builds software, and both could coexist peacefully. Thatâs impossible. U.S. companies like Figure arenât worse at hardwareâtheyâre better. Whether they can mass-produce is another issue, but in pre-production hardware quality, Iâd say they outperform 99% of Chinese firms.
LatePost: When you started building your team, who was the first person you reached out to?
Wang Qian: Our CTO, Wang Hao. We met in 2021âhis boss at IDEA Research Institute was a co-author on my Attention paper. When I started the quant fund, I needed help with infrastructure (infra), which I hadnât done before. He was recommended to me. He started working on large models very earlyâback in 2021, the two main open-source LLM groups in China were BAAI and IDEA.
By the way, many embodied intelligence startups today will struggle with infra-algorithm integration because theyâve never done it beforeâitâs a significant leap.
When I approached Wang Hao, he was frustrated working on AI application projects, which are notoriously hard to deploy. Even today, unless you code, deployment is nearly impossible. After I explained my vision, he agreed robotics was the perfect application domain. Looking back now, we were perhaps overly optimistic.
LatePost: Because itâs not that easy to deploy, right?
Wang Qian: Rightârobots involve more than just models: hardware, systems, etc. But after our conversation, he came to Beijing, joined me, and never left.
You Canât See Scaling Laws in Embodied Intelligence? Your Data Is Just Too Bad
LatePost: Zibianliangâs WALL-A model is described as an end-to-end embodied foundation model, alongside large language models. Given the deep divide in approaches within embodied intelligence, why are you so confident in end-to-end?
Wang Qian: When we founded the company in late 2023, no one believed in end-to-end. Investors kept telling me to build a layered or specialized model. But if thereâs no paradigm shiftâif I stick to specialized or layered modelsâwhy would it be my turn to succeed? Specialized models will absolutely fail. We must build foundation models first, then specialize on top.
LatePost: Whatâs wrong with layered models?
Wang Qian: Say you want to grasp an object. With a layered approach, you first reconstruct the objectâs 3D shape, estimate its center of gravity, select a grasp point, generate a trajectory, and finally execute the grasp.
But 3D reconstruction canât perfectly capture surface properties like burrs or dentsâfeatures extremely sensitive to physical contact. A tiny initial error gets rapidly amplified across layersâthe deeper the pipeline, the faster errors cascade. People have followed this path for 80 years and achieved nothing.
LatePost: Can end-to-end avoid this?
Wang Qian: Yesâbecause you can backpropagate from the final grasping result to correct the initial action, increasing the success rate at certain grasp points. End-to-end doesnât require perfect reconstruction.
Also, end-to-end isnât new to the era of large models. Back in 2014â2015, Sergey Levine and our team were already using end-to-end methods. Around 2018, machines achieved true general grasping for the first timeâusing end-to-end deep reinforcement learning.
LatePost: Whatâs currently the main bottleneck limiting model performance?
Wang Qian: Data quality is paramount. Some say they canât observe scaling laws in embodied intelligenceâI say thatâs because their data is terrible, full of noise.
Previously, 80% of effort went into model algorithms; now, 80% goes into data. The rest is letting the model decide autonomously. This is a major methodological shift.
LatePost: So simulation data doesnât work?
Wang Qian: You need high-quality real-world dataâcollected by performing actual tasks in real physical environments.
LatePost: What about virtual simulation platforms like NVIDIAâs Omniverse?
Wang Qian: GR00Tâs first version performed poorly because it relied solely on synthetic simulation data. Later versions shifted toward hybrid data.
I often tell investors: Do you really believe any simulation company can out-compute NVIDIA? NVIDIA sets the upper limit for all of themâand even NVIDIA has pivoted to real-world data.
Every PhD in our generation started with simulation. Now, not a single one of us still uses itâbecause it simply doesnât work.
LatePost: But many in the embodied intelligence field still use simulation data.
Wang Qian: I come from a legitimate, orthodox robotics background. Others may come from computer vision or graphicsâthey might think simulation works. But weâve personally stepped into every single pitfall.
LatePost: So compute power isnât a core bottleneck?
Wang Qian: Not currently. Under comparable capabilities, multimodal models are one to two orders of magnitude smaller than language models. Language models need to memorize vast knowledge; physical world models donâtâthey only need to understand physical laws.
This was another factor in choosing to return to China: embodied intelligence currently faces no compute bottlenecks.
LatePost: In theory, embodied foundation models, like multimodal models, are extremely hard to converge.
Wang Qian: Multimodal models are hard to train due to inherent data scarcity. First, they lack temporal continuity with causal understanding. For example, when a person sees a cat for the first time, they can walk around it, gaining continuous spatial perception. They know their own position, enabling 3D understanding. They can interactâpet it, play with itâadding rich contextual signals. Humans donât need ten thousand cat images to recognize a cat.
Introduce action continuity, and embodied models become easier to train than pure multimodal models. Ten years from now, weâll realize the best multimodal models are embodied models. I tell many multimodal researchers: if you truly want to excel, you should enter embodied intelligence.
LatePost: Does Zibianliang have any technical secret sauce?
Wang Qian: What we can shareâweâve already published openly. The rest remains confidential.
LatePost: Critics argue that expecting one model to handle vastly different tasksâlike walking and solving a Rubikâs cubeâis unrealistic. How do you respond?
Wang Qian: Actually, it doesnât need to be one single functional model. âEnd-to-endâ refers to internal architecture, not functional partitioning. The human brain is end-to-end, yet different regions handle different functions.
But practically, combining navigation and manipulation in one model does yield better performance.
LatePost: Is the model showing more generalization?
Wang Qian: Everything improves slightly. Most notably, COT (Chain-of-Thought). What people call âembodied COTâ today still means doing language-based COT first, then attaching a control modelâthatâs still layered.
We were the first globally to develop native COTâstarting in late 2024, achieving it around the same time as Gemini Robotics in 2025. Ideally, it enables infinitely long planning and strategy.
LatePost: Can you give an example?
Wang Qian: For instance, given a blueprint and a set of blocks nearby, it can assemble the structure according to the plan. First, it understands the blueprint. Second, it evaluates the gap between each step and the final goal. Third, it physically constructs it.
LatePost: Can your model already do this?
Wang Qian: Yes.
LatePost: What still isnât good enough?
Wang Qian: Overall, everything still needs improvement. The core reason is insufficient data volume. Of course, algorithms matter tooâbut data comes first.
LatePost: What do you think of Fei-Fei Liâs world model concept?
Wang Qian: Fei-Fei Liâs idea of spatial intelligence leans toward 3D generation. But as I said earlier, knowing all 3D shapes doesnât mean you can perform all tasks.Perfect spatial intelligence models only account for 40% to 50% of a complete embodied intelligence systemâthe rest is all related to direct physical interaction.
Hardware Must Be Defined by AI
LatePost: Youâve already launched two generations of wheeled robots. Rumor has it that the project didnât start until late 2024âwhy so late?
Wang Qian: Weâve always believed that AI is primary, and hardware secondary. In the early days, our conditions for building hardware werenât mature, and we were just a small team. Later, we realized that once we started building our own hardware, many AI problems actually became easier to solve.
In this sense, we might indeed be a bit lateâwe only began大č§ć¨Ą hiring hardware engineers in January 2025.
LatePost: You come from an embodied AI backgroundâdid you not initially see hardware as important?
Wang Qian: A companyâs resources are limited, especially in the early stages when funding is tight. We thought we should rely more on suppliers.
LatePost: You said building hardware made AI problems easierâcan you give an example?
Wang Qian: Take robotic arms: even though they may look similar, thereâs a huge difference between those designed with AI-native principles versus traditional designs. I know exactly how a robotic arm should be used during data collection and inference phases. Only with hardware naturally suited for AI can you conduct meaningful research.
Now there are two prevailing views. One believes you should first build a perfect piece of hardware, then develop AI on top of itâthis is completely wrong. The other view is mine: AI must define the hardware.
Hereâs another example: dexterous hands. Human palms donât contain musclesâtheyâre soft and highly adaptable. But many dexterous hands place motors inside, making them thick and rigid, despite mimicking the human handâs appearance. In such cases, the palm loses functionalityâit canât wrap around anything, and during grasping, force is applied only at the finger base.
This is a classic caseâonly companies that have never collected real-world data or trained models would produce such absurd hardware designs.
LatePost: Does AutoVariableâs dexterous hand capability also depend on iterations of its embodied physics model?
Wang Qian: The physical laws, motion patterns, and understanding of object properties learned by the foundational model do not change based on whether youâre using grippers or dexterous hands. If you have a strong gripper-based model, training on dexterous hands becomes dramatically faster and less resource-intensive.
Of course, fine-tuning and post-training are still neededâbut itâs analogous to large language models: the better your model is trained in English, the easier it is to transfer to Chinese.
LatePost: Elon Musk said dexterous hands are harder to develop than Tesla cars, second only to SpaceXâs reusable rockets.
Wang Qian: Hardware is indeed hardâbut I see hardware and model capability as two parallel tracks. Weâre building dexterous hands, but mainly to aid model training.
Frankly, most scenarios donât require a hand with full human-level degrees of freedom. Cost is one factor, but more importantly, itâs often unnecessary. Humans can perform very complex tasks using simple grippersâand grippers are sufficient for at least half of real-world applications.
LatePost: But people feel that creating a fully human-like dexterous hand would be a massive breakthrough.
Wang Qian: Iâm not so sure. People used to think robots running or dancing was groundbreakingâbut was it really? Mostly emotional value. High-DoF dexterous hands are useful in certain tasks, but much of the time, theyâre just providing emotional satisfactionâlooking human-like, complex, impressive. Thatâs it.
LatePost: How far along is AutoVariableâs dexterous hand development?
Wang Qian: Weâve built a 20-degree-of-freedom hand with decent performanceâbut this isnât our main focus. Itâs primarily a tool for model training.
LatePost: Your robot uses wheels instead of bipedal legsâwhy?
Wang Qian: Legs have two fundamental issues: safety and cost. Theyâre inherently more prone to falling than wheeled systems. And theyâre significantly more expensiveârequiring an order of magnitude more motors and joints.
LatePost: But do legs offer no advantages?
Wang Qian: Their practical benefits are minimal. Sure, thereâs emotional appealâbut aside from that, how many indoor scenarios truly require legs? The disadvantages far outweigh the benefits.
LatePost: So AutoVariable wonât pursue bipedal robots?
Wang Qian: We mightâbut only where it makes sense. Running a company often means knowing where not to go. This is one place weâve chosen not to go.
We Want to Be an OpenAI-Like Company
LatePost: Some investors say your technical vision hasnât changed since day one, and youâve stayed focused without rushing into commercialization. Didnât that make early fundraising difficult?
Wang Qian: At the time, investor logic was simple: youâre neither ByteDance nor Googleâwhy should you build large models? Even if embodied AI requires large models, why you and not someone else? Many companies had already raised over 1 billion RMBâwe were still at the seed round.
LatePost: How did you respond?
Wang Qian: Honestly, I couldnât. I think this reflects a problem in Chinaâs capital market: people donât believe technology is primary. Subconsciously, they assume anyone can do techâit lacks uniqueness.
Because historically, successful Chinese companies have all been fast followers. Thereâs never been a case where a company achieved global leadership from zero to one.
LatePost: You believe China could actually lead globally in embodied AI from zero to one?
Wang Qian: Someone asked if I wanted to be the âDeepSeek of embodied AI.â I said DeepSeek is a great company, but we want to be like OpenAI.
LatePost: So only investors who buy into that vision would fund you?
Wang Qian: Exactly. Our investors fundamentally believe in our ambition to become world-class. If youâre only interested in quick returns, you wouldnât invest in us. Some of our shareholders told me: âJust focus on building the foundational model. If you need money, come to us.â
LatePost: Any specific examples?
Wang Qian: I wonât name namesâbut look at the two best domestic large-model companies: Alibaba and ByteDance. Both invested in us. Weâre also the only embodied AI company ByteDance has ever backed.
LatePost: I heard investors gave your robot a surprise testârolling toilet paperâand you performed well.
Wang Qian: Not surpriseâthree daysâ notice. They said, âYou claim few-shot learning ability? Hereâs a task youâve never seenâdeliver in three days.â
The task: organize toilet paper rolls. Remove dirty or wrinkled parts, apply plastic seals, then repackageâessentially replicating hotel bathroom cleaning procedures.
LatePost: And you succeeded.
Wang Qian: Performance was quite good.
We spent one day collecting data, one day training. On the third day, investors arrived with stacks of various toilet papersâso effectively, we had two days of preparation.
LatePost: With improved model capabilities, fundraising should now be easier than in the early days.
Wang Qian: Yes, things are slightly better now. People realize Chinaâs talent pool and density are no worse than Americaâs. And with successes like DeepSeek and Unitree, everyone sees China can achieve world-class results. No problem is insurmountable. Resources, computeânone of these are fundamental barriers anymore.
LatePost: So no one questions why you, rather than Google or Agibot, should lead?
Wang Qian: That question rarely comes up now.
LatePost: You seem to have never held conventional biases.
Wang Qian: Maybe because I understand both sidesâI never assumed something possible in the U.S. must be impossible in China.
Team Rating: 8/10
LatePost: You previously lacked experience managing large teams. How do you prioritize your time?
Wang Qian: I spend significant time on hiring and fundraising. Technically, I weigh in on major decisions. For key products, I may personally oversee progress.
But I donât micromanageâany CEO needing to do that likely has deeper organizational issues. Iâm not control-oriented, nor do I want my team coming to me for every decision.
LatePost: Compared to other robotics firms, AutoVariable lacks glamour. Is recruiting hard?
Wang Qian: Different company cultures attract different people. We draw idealists who care deeply about technological fundamentalsâthatâs clear.
LatePost: Any trendsâdo candidates from certain companies or industries stand out?
Wang Qian: Fresh graduates. This field doesnât value experienceâalmost no one has done it before. Everyone is part of the first generation. Recently, weâre seeing people from big tech or startups whoâve actually trained modelsâsome from large language models, some from autonomous driving. We prefer those with LLM backgrounds.
LatePost: Why canât autonomous driving companies succeed in embodied AI?
Wang Qian: First, their understanding of large models generally lags.
Second, autonomous driving and robotics arenât as aligned as people assume. Driving involves no physical contact; robotics does. Core technologies differ.
Third, autonomous driving demands extreme safety standards, leading to divergent mindsets. Though these last two are secondaryâthe main issue is the first.
LatePost: Can other large model companies do what you do?
Wang Qian: This isnât purely a large model problem. It involves hardware, systems, real-world randomness, experimentation, and organizational challengesâfundamentally mismatched with typical large model team DNA.
Large model teams are like air forces: a brilliant pilot plus an aircraftâmission success depends on individual skill. Their core is essentially a loosely connected lab of top-tier individuals.
Hardware teams are like navies: youâre on a ship. Every role must work in syncâfrom front-end hardware and data interaction, through data processing, to model training. The chain is long; if one link fails, the whole ship sinks.
LatePost: How do you overcome this cultural clash?
Wang Qian: By finding the right people. Also, technically speaking, action as a modality differs from language or visionâyou need entirely new methods to leverage action data. This creates a high technical barrier, requiring a native embodied intelligence team.
LatePost: How well are your algorithm and hardware teams integrated today?
Wang Qian: Almost no silos exist. Teams collaborate well as a unified whole.
LatePost: Rate it on a scale of 10?
Wang Qian: 8 out of 10.
First Place, No Bubble, Market Consolidation
LatePost: Omdia recently reported global humanoid robot shipments reached 13,000 units. Top players include Agibot, Unitree, Ubtech, etc. What do you think of this report? What progress will commercialization see by 2026?
Wang Qian: That report has limited valueâmostly emotional impact. Does it really matter if there are 1,000 more or fewer dancing robots? Robots still canât do useful work.
Commercialization has been like âthe boy who cried wolf.â For the past two years, everyone called it the âyear of commercialization,â but now that it might actually arrive, people donât believe anymoreâexpectations were oversold too early.
LatePost: Do you believe 2026 will be the true year of commercialization?
Wang Qian: Commercialization can beginânot instantly mature, but at least feasible.
LatePost: How did you arrive at this judgment?
Wang Qian: Mainly because technology has hit a threshold. Reinforcement learning now works. Few-shot learning enables rapid deployment on specific products.
Until foundational models reach a certain quality, reinforcement learning simply doesnât work. These milestones are significant. Until now, embodied AI couldnât do much beyond dancing.
LatePost: Whatâs AutoVariableâs commercial strategy for 2026?
Wang Qian: Achieve positive ROI in certain scenariosâthat would be a major milestone. No company has done this yet, except for gimmicks like dancing.
LatePost: Which scenarios? You mentioned public services, elder care earlier.
Wang Qian: Household choresâcleaning, organizing. And industrial verticals like screw-drivingâtasks previously only humans could do.
Weâll see real commercial deployment this year, with positive ROI. Iâm quite confident.
LatePost: Who else can achieve positive ROI besides you?
Wang Qian: Mostly overseasâlike 1X, which has already sold hundreds of units. Figure is making progress in industrial settings, close to deployment. These companies are strong.
LatePost: Domestically?
Wang Qian: I think domestic players mostly focus on dancingâclearly behind the global leaders.
LatePost: Meaning theyâre also behind AutoVariable?
Wang Qian: Well, of course we believe weâre doing better.
LatePost: How do you view competition with domestic peers?
Wang Qian: First, we need to clarify who counts as a peer. Within embodied AI, some companies focus on locomotionâa domain not necessarily requiring AI. Itâs pure control theory. Starting with Boston Dynamics, they used zero lines of AI code.
These are manufacturing-driven: improve product quality, reduce costs. Nothing wrong with thatâbut it has nothing to do with AI.
So weâre on one end (AI-first), Unitree on the other. Eventually, both sides will convergeâbut I believe itâs easier for us to master hardware than for them to master AI.
Then thereâs another type: resource integrators, who function more like real estate developers.
LatePost: Whatâs the competitive landscape across these categories?
Wang Qian: The hype around dancing robots is fading fast. Only the top few will surviveâmarket consolidation is underway.
Our side is showing similar trends. By 2026, youâll need to demonstrate real progressâeither commercially or technically. In 2025, we saw many new entrants. Recently, however, almost no new players have entered either the model or full-system robotics spaceâelimination has begun.
Overall, the industry will improve as robots actually deploy. As the market grows, people will see this isnât just hype. If you fail to deliver practical value for years, you risk a crashâlike what happened in autonomous driving. But I donât think robotics will face such a trough, because real deployment is happening.
LatePost: Many say embodied AI is overheated, with a bubble.
Wang Qian: Thereâs absolutely no bubble. Compared to autonomous driving or any previous mega-sector, embodied AI is tiny in terms of investment, valuation, and fundingâeven smaller than in the U.S., by an order of magnitude.
LatePost: Doesnât Americaâs fundraising advantage make you wish youâd stayed in the U.S.?
Wang Qian: Long-term, China has bigger advantages. Across industries, scaling from 1 to 10, or 10 to 100, China consistently outperforms the U.S. So if we can match or exceed U.S. performance in the 0-to-1 phase, weâll definitely have the upper hand long-term.
LatePost: Do you consider AutoVariable technically superior compared to others?
Wang Qian: I firmly believe weâre the best technicallyâand within the industry, this reputation exists.
Very few truly understand how to build large models correctlyâespecially in embodied AI, where itâs nearly nonexistent. Among all embodied AI companies globally, weâre the only one built around a core large-model team. In terms of technical strength among startups, weâre clearly number one.
Confident, Yet Anxious
LatePost: The strongest impression from this interview is your confidence.
Wang Qian: My predictions over the past two years have mostly been accurateâfor instance, we deliberately delayed commercialization, which now looks like the right call.
LatePost: Itâs not just recent yearsâI sense this mindset dates back to your student days.
Wang Qian: Thatâs vision, right? Iâd say my vision is pretty solid.
LatePost: Your way of thinking seems fundamentally different from most people.
Wang Qian: I believe if you do something, aim to be first. Otherwise, itâs just not interesting. If I just wanted money, Iâd have stayed in quant tradingâno need to endure this hardship.
LatePost: So you do see this as hardship.
Wang Qian: Definitely tough.
LatePost: That didnât come across in our conversation.
Wang Qian: Of course notâI wouldnât let you feel that.
LatePost: Are you reluctant to show that side?
Wang Qian: People prefer seeing someone strong, flawless.
LatePost: But that feels fake.
Wang Qian: Then the presentation is off. Overall, you still need to appear flawlessâotherwise, people wonât trust you.
LatePost: When you do get time off, what do you usually do?
Wang Qian: Sleep. Iâm very introvertedâsleep first, wake up, maybe read a book.
LatePost: Do you have poor sleep quality?
Wang Qian: When anxious, yes.
LatePost: Whatâs the last book you read?
Wang Qian: Scientific American.
LatePost: Okay⌠I heard you also enjoy browsing Bilibili.
Wang Qian: Even when not resting, Iâm scrolling.
LatePost: Any preferred content?
Wang Qian: Noâjust random scrolling.
(As he speaks, Wang Qian reads aloud the current titles on his Bilibili homepage: âInside Google DeepMind Labâ; âHigh School New Yearâs Dance Performanceâ; âThe Most Raw-Cooked Meat in the Worldâ; âTodayâs Source of Joyâ; âFloating Wind Power System Completes Grid Connection Testâ⌠)



