Interview with Wang Qian, a variable at Latecomer

This article was transcribed by SimpRead. Original URL: mp.weixin.qq.com

Wang Qian said DeepSeek is certainly great, but we want to build a company like OpenAI.

Text by Shen Yuan

Edited by Song Wei

The first question of the interview took Wang Qian 30 minutes to answer, starting with why he chose AI and ending with missing out on a Turing Award–level discovery.

He spoke at such length because his background is exceptionally complex: an undergraduate in Electronic Engineering at Tsinghua University, then switching to Biomedical Engineering for graduate studies, followed by a PhD in Robotics Learning at the University of Southern California (USC), and his first job running a quantitative hedge fund.

In short, he’s an atypical founder in embodied intelligence: neither having worked at major tech firms in China or the U.S., nor holding any prominent academic titles.

That doesn’t diminish Wang Qian’s confidence.

Throughout the interview, he rarely hesitated. He typically responded quickly and directly, often citing wide-ranging references and making bold claims about why others fail while he can succeed.

As early as 2009, Wang was already working on neural networks. The architecture he designed was just one step away from the transformer—what he calls a missed opportunity worthy of a Turing Award, and also the origin of his technical self-assurance. He is among the most fervent advocates for end-to-end embodied physical models in the industry.

Early on, this level of confidence turned off some investors, but an increasing number have since been convinced. Wang Xinyu, partner at Meituan Longzhu, described Wang Qian as someone with unique insights and unwavering judgment about technology. After tracking him closely for a year, Meituan became a key investor in Zibianliang Robotics.

On January 12, Zibianliang Robotics announced it had completed a ÂĽ1 billion A++ round of financing, just four months after its previous round. According to our information, ByteDance led this latest investment.

This is a man waiting for his chance to change the world. Wang Qian wants to achieve original innovation from zero to one, like OpenAI—he wants to be first.

Missing Out on a Turing-Level Achievement

LatePost: Before founding Zibianliang Robotics, your last experience was running a quant fund in the U.S. How did you make such a big career shift?

Wang Qian: Honestly, the shift wasn’t that big—the underlying technology is essentially the same. My PhD focused on Robotics Learning, which still revolves around deep learning techniques quite similar to those used in quant trading.

For someone skilled in AI, making money through quant finance is very straightforward.

LatePost: How did you originally become interested in AI?

Wang Qian: As a child, I wanted to pursue mathematics and physics. Later, I realized that the professional lifespan of theoretical physicists and mathematicians has drastically shortened compared to a century ago. So I decided to build an “engine of human intelligence”—that’s AI.

LatePost: You studied Electronic Engineering as an undergrad at Tsinghua, but switched to Biomedical Engineering for graduate school. Why?

Wang Qian: Fundamentally, why do we believe AI is possible? Because there’s a naturally intelligent system right in front of us: the human brain. But back then, the dominant AI approach was statistical learning, improving accuracy by only 0.1% per year—and even that improvement might’ve just been overfitting. That’s when I turned to neural networks.

At the time, no one believed in neural networks. I searched every lab across Tsinghua’s School of Information Science, and not a single professor was working on them. So I went to the biomedical department to study computational neuroscience instead.

My advisor, who had returned from the U.S., told me about someone named Geoffrey Hinton who developed something called Deep Learning. I looked into it and thought: isn’t that just neural networks? Since then, I’ve been doing deep learning—starting in 2009—making me one of the earliest pioneers in China.

LatePost: Many sources cite you as one of the first people in China to explore attention mechanisms. How did you arrive at that direction?

Wang Qian: The highest form of human intelligence is self-consciousness, beneath that is consciousness, and below that—most would say—is attention. So I wondered if we could integrate attention into neural networks. By 2014, I published a paper on it.

Note: The paper is titled Attentional Neural Network: Feature Selection Using Cognitive Feedback, [1411.5140] Attentional Neural Network: Feature Selection Using Cognitive Feedback

It proposed a new neural network framework unifying top-down attention mechanisms with bottom-up feature extraction.

The paper was submitted to NIPS (now NeurIPS) and was among the earliest three papers on attention mechanisms. So yes, you could say I missed out on a Turing Award–level contribution.

LatePost: You’re saying you actually missed the Turing Award?

Wang Qian: Absolutely. Among those three NIPS papers, the other two came from teams at DeepMind and ETH Zurich. Our architecture was much closer to today’s Transformer than theirs.

LatePost: What was the key difference?

Wang Qian: Multiplicative operators are inherently hard to converge, especially when stacking many layers.

I interned at Microsoft Research Asia and discussed this with Kaiming He and Jian Sun. They were developing ResNet (Residual Networks), but I didn’t pay much attention at the time.

After Transformers emerged, I realized what we lacked was connecting our architecture with ResNet—ResNet makes convergence dramatically more stable.

LatePost: That must be deeply regrettable.

Wang Qian: The more I think about it, the more regretful it becomes. Every time I see progress in NLP, I feel pained (laughs)—because we were truly just one step away.

LatePost: After publishing the paper, why did you switch to robotics?

Wang Qian: After finishing my master’s, I wanted to pursue further studies abroad. At the time, the first wave of Chinese AI “Four Little Dragons” was emerging, but I wasn’t particularly interested in building security-focused applications. I wanted to find a major area where AI could truly be applied—and robotics naturally stood out.

LatePost: Were deep learning methods already being used in robotics back then?

Wang Qian: In the U.S., only a few research groups were applying deep learning to robotics. One was led by Sergey Levine (co-founder of robotics company PI) and his advisor Pieter Abbeel. Others included MIT and CMU. Ultimately, I chose USC. So I suppose I’m formally trained in Embodied AI—though we called it Robotics Learning back then.

LatePost: Did that path eventually stall?

Wang Qian: Around 2018–2019, the entire AI field seemed to stagnate. In robotics, deep reinforcement learning hit a wall because it has a fundamental flaw: data requirements grow exponentially with task complexity. At the time, imitation learning wasn’t widely adopted either, so the whole direction felt flawed.

LatePost: What about simulation?

Wang Qian: Simulation doesn’t work either. The gap between the physical and virtual worlds is simply too large. The real world is often unobservable and highly stochastic.

(Wang Qian pressed his finger forward against the interview table.)

A fingertip deforms upon contact, and it experiences nonlinear friction—these two factors interact to create randomness. This kind of behavior is nearly impossible to simulate accurately. Anything trained in simulation fails in the real world. So my conclusion was: unless there’s a fundamental breakthrough, robotics might take another thirty to fifty years to succeed.

LatePost: So you left academia and started a quant fund.

Wang Qian: I was genuinely depressed at the time. I also disliked the academic lifestyle, so I naturally thought: I should go make some money. Quant finance was the most direct path.

There were precedents—most notably James Simons of Renaissance Technologies, who won the Fields Medal with Shiing-Shen Chern. He succeeded enormously in quant finance and later donated funds to his alma mater, Stony Brook University, elevating its math department to world-class status.

There’s precedent in AI too—that’s Liang Wenfeng.

Silver Bullet: GPT-3

LatePost: When did you start considering returning to AI and launching an embodied intelligence startup?

Wang Qian: In 2021, when GPT-3 came out. I immediately recognized it as a massive paradigm shift—because it introduced few-shot learning.

People had pursued this capability for decades without success. Reinforcement learning suffers from exponential explosion, but GPT-3 required fewer and fewer samples to learn new tasks. With ChatGPT, zero-shot learning emerged.

By the way, some people today are reviving reinforcement learning in robotics as a “new” approach—I find that absurd.

LatePost: Why didn’t anyone pursue the GPT-3 path earlier?

Wang Qian: It goes completely against conventional intuition. Everyone assumed specialized models would always outperform general ones—but now, no specialized model beats a general-purpose model.

This is the silver bullet—the one solution that solves everything. I once thought the problems in robotics would take 30 or 50 years to solve, but now I see a path forward.

LatePost: When you saw GPT-3, did you think back to your days working on neural networks at Microsoft Research Asia?

Wang Qian: Exactly. That’s why I knew I had to return. Your first question—why I moved from robotics PhD to quant and back again—it’s all consistent. I’ve always simply wanted to build AI. Everything else was just different approaches along the way.

Image source: WALL·E. This inspired the name of Zibianliang’s robot model, WALL-A.

China Makes Hardware, U.S. Makes Software? Impossible.

LatePost: Once you decided to build robots, why choose China over staying in the U.S.?

Wang Qian: I initially considered the U.S., but after surveying the landscape in 2022, I concluded the entire U.S. hardware ecosystem had collapsed.

Supply chain issues are well known—repairing a robotic arm in a U.S. lab might take two months, while in China it takes one day. That’s an order-of-magnitude difference.

More critically, Silicon Valley VCs no longer invest in hardware. Early investors in Figure AI were either the CEO himself, or giants like NVIDIA, OpenAI, Microsoft, and Jeff Bezos—no serious financial VC firms.

It’s the same with capital, talent, and supply chains. No top engineers want to leave Apple or Meta; if they do, it’s usually with the hope of getting acquired back by Apple. The flow of people, information, money, and components—all signs show Silicon Valley’s hardware ecosystem has completely broken down.

LatePost: China clearly has advantages, but what about disadvantages—like fundraising and computing power?

Wang Qian: Fundraising in China is definitely harder than in the U.S. But for embodied intelligence, scaling isn’t primarily limited by compute—it’s data. And data costs in China are an order of magnitude lower than in the U.S.

So overall, while funding may be one order of magnitude lower, costs are also one order of magnitude lower—roughly balancing out. Moreover, the funding disadvantage isn’t permanent, but the cost advantage persists.

LatePost: What about talent?

Wang Qian: In 2022, people still debated Silicon Valley’s talent edge. Now, no one asks that anymore—everyone knows the AI researchers in the U.S. and China are essentially the same group: former classmates from the same universities. Who’s really stronger?

LatePost: Since starting your company, has your assessment changed?

Wang Qian: The U.S. is moving faster than I expected.

Take Figure AI: its high valuation stems partly from its alignment with the narrative of reshoring manufacturing to the U.S. They’re spending an order of magnitude more on in-house hardware production—soon they’ll manufacture joints, motors, batteries, even motor winding equipment. Only thing missing is screwing it together themselves.

Many claimed China builds hardware while the U.S. builds software, and both could coexist peacefully. That’s impossible. U.S. companies like Figure aren’t worse at hardware—they’re better. Whether they can mass-produce is another issue, but in pre-production hardware quality, I’d say they outperform 99% of Chinese firms.

LatePost: When you started building your team, who was the first person you reached out to?

Wang Qian: Our CTO, Wang Hao. We met in 2021—his boss at IDEA Research Institute was a co-author on my Attention paper. When I started the quant fund, I needed help with infrastructure (infra), which I hadn’t done before. He was recommended to me. He started working on large models very early—back in 2021, the two main open-source LLM groups in China were BAAI and IDEA.

By the way, many embodied intelligence startups today will struggle with infra-algorithm integration because they’ve never done it before—it’s a significant leap.

When I approached Wang Hao, he was frustrated working on AI application projects, which are notoriously hard to deploy. Even today, unless you code, deployment is nearly impossible. After I explained my vision, he agreed robotics was the perfect application domain. Looking back now, we were perhaps overly optimistic.

LatePost: Because it’s not that easy to deploy, right?

Wang Qian: Right—robots involve more than just models: hardware, systems, etc. But after our conversation, he came to Beijing, joined me, and never left.

You Can’t See Scaling Laws in Embodied Intelligence? Your Data Is Just Too Bad

LatePost: Zibianliang’s WALL-A model is described as an end-to-end embodied foundation model, alongside large language models. Given the deep divide in approaches within embodied intelligence, why are you so confident in end-to-end?

Wang Qian: When we founded the company in late 2023, no one believed in end-to-end. Investors kept telling me to build a layered or specialized model. But if there’s no paradigm shift—if I stick to specialized or layered models—why would it be my turn to succeed? Specialized models will absolutely fail. We must build foundation models first, then specialize on top.

LatePost: What’s wrong with layered models?

Wang Qian: Say you want to grasp an object. With a layered approach, you first reconstruct the object’s 3D shape, estimate its center of gravity, select a grasp point, generate a trajectory, and finally execute the grasp.

But 3D reconstruction can’t perfectly capture surface properties like burrs or dents—features extremely sensitive to physical contact. A tiny initial error gets rapidly amplified across layers—the deeper the pipeline, the faster errors cascade. People have followed this path for 80 years and achieved nothing.

LatePost: Can end-to-end avoid this?

Wang Qian: Yes—because you can backpropagate from the final grasping result to correct the initial action, increasing the success rate at certain grasp points. End-to-end doesn’t require perfect reconstruction.

Also, end-to-end isn’t new to the era of large models. Back in 2014–2015, Sergey Levine and our team were already using end-to-end methods. Around 2018, machines achieved true general grasping for the first time—using end-to-end deep reinforcement learning.

LatePost: What’s currently the main bottleneck limiting model performance?

Wang Qian: Data quality is paramount. Some say they can’t observe scaling laws in embodied intelligence—I say that’s because their data is terrible, full of noise.

Previously, 80% of effort went into model algorithms; now, 80% goes into data. The rest is letting the model decide autonomously. This is a major methodological shift.

LatePost: So simulation data doesn’t work?

Wang Qian: You need high-quality real-world data—collected by performing actual tasks in real physical environments.

LatePost: What about virtual simulation platforms like NVIDIA’s Omniverse?

Wang Qian: GR00T’s first version performed poorly because it relied solely on synthetic simulation data. Later versions shifted toward hybrid data.

I often tell investors: Do you really believe any simulation company can out-compute NVIDIA? NVIDIA sets the upper limit for all of them—and even NVIDIA has pivoted to real-world data.

Every PhD in our generation started with simulation. Now, not a single one of us still uses it—because it simply doesn’t work.

LatePost: But many in the embodied intelligence field still use simulation data.

Wang Qian: I come from a legitimate, orthodox robotics background. Others may come from computer vision or graphics—they might think simulation works. But we’ve personally stepped into every single pitfall.

LatePost: So compute power isn’t a core bottleneck?

Wang Qian: Not currently. Under comparable capabilities, multimodal models are one to two orders of magnitude smaller than language models. Language models need to memorize vast knowledge; physical world models don’t—they only need to understand physical laws.

This was another factor in choosing to return to China: embodied intelligence currently faces no compute bottlenecks.

LatePost: In theory, embodied foundation models, like multimodal models, are extremely hard to converge.

Wang Qian: Multimodal models are hard to train due to inherent data scarcity. First, they lack temporal continuity with causal understanding. For example, when a person sees a cat for the first time, they can walk around it, gaining continuous spatial perception. They know their own position, enabling 3D understanding. They can interact—pet it, play with it—adding rich contextual signals. Humans don’t need ten thousand cat images to recognize a cat.

Introduce action continuity, and embodied models become easier to train than pure multimodal models. Ten years from now, we’ll realize the best multimodal models are embodied models. I tell many multimodal researchers: if you truly want to excel, you should enter embodied intelligence.

LatePost: Does Zibianliang have any technical secret sauce?

Wang Qian: What we can share—we’ve already published openly. The rest remains confidential.

LatePost: Critics argue that expecting one model to handle vastly different tasks—like walking and solving a Rubik’s cube—is unrealistic. How do you respond?

Wang Qian: Actually, it doesn’t need to be one single functional model. “End-to-end” refers to internal architecture, not functional partitioning. The human brain is end-to-end, yet different regions handle different functions.

But practically, combining navigation and manipulation in one model does yield better performance.

LatePost: Is the model showing more generalization?

Wang Qian: Everything improves slightly. Most notably, COT (Chain-of-Thought). What people call “embodied COT” today still means doing language-based COT first, then attaching a control model—that’s still layered.

We were the first globally to develop native COT—starting in late 2024, achieving it around the same time as Gemini Robotics in 2025. Ideally, it enables infinitely long planning and strategy.

LatePost: Can you give an example?

Wang Qian: For instance, given a blueprint and a set of blocks nearby, it can assemble the structure according to the plan. First, it understands the blueprint. Second, it evaluates the gap between each step and the final goal. Third, it physically constructs it.

LatePost: Can your model already do this?

Wang Qian: Yes.

LatePost: What still isn’t good enough?

Wang Qian: Overall, everything still needs improvement. The core reason is insufficient data volume. Of course, algorithms matter too—but data comes first.

LatePost: What do you think of Fei-Fei Li’s world model concept?

Wang Qian: Fei-Fei Li’s idea of spatial intelligence leans toward 3D generation. But as I said earlier, knowing all 3D shapes doesn’t mean you can perform all tasks.Perfect spatial intelligence models only account for 40% to 50% of a complete embodied intelligence system—the rest is all related to direct physical interaction.

Hardware Must Be Defined by AI

LatePost: You’ve already launched two generations of wheeled robots. Rumor has it that the project didn’t start until late 2024—why so late?

Wang Qian: We’ve always believed that AI is primary, and hardware secondary. In the early days, our conditions for building hardware weren’t mature, and we were just a small team. Later, we realized that once we started building our own hardware, many AI problems actually became easier to solve.

In this sense, we might indeed be a bit late—we only began大规模 hiring hardware engineers in January 2025.

LatePost: You come from an embodied AI background—did you not initially see hardware as important?

Wang Qian: A company’s resources are limited, especially in the early stages when funding is tight. We thought we should rely more on suppliers.

LatePost: You said building hardware made AI problems easier—can you give an example?

Wang Qian: Take robotic arms: even though they may look similar, there’s a huge difference between those designed with AI-native principles versus traditional designs. I know exactly how a robotic arm should be used during data collection and inference phases. Only with hardware naturally suited for AI can you conduct meaningful research.

Now there are two prevailing views. One believes you should first build a perfect piece of hardware, then develop AI on top of it—this is completely wrong. The other view is mine: AI must define the hardware.

Here’s another example: dexterous hands. Human palms don’t contain muscles—they’re soft and highly adaptable. But many dexterous hands place motors inside, making them thick and rigid, despite mimicking the human hand’s appearance. In such cases, the palm loses functionality—it can’t wrap around anything, and during grasping, force is applied only at the finger base.

This is a classic case—only companies that have never collected real-world data or trained models would produce such absurd hardware designs.

LatePost: Does AutoVariable’s dexterous hand capability also depend on iterations of its embodied physics model?

Wang Qian: The physical laws, motion patterns, and understanding of object properties learned by the foundational model do not change based on whether you’re using grippers or dexterous hands. If you have a strong gripper-based model, training on dexterous hands becomes dramatically faster and less resource-intensive.

Of course, fine-tuning and post-training are still needed—but it’s analogous to large language models: the better your model is trained in English, the easier it is to transfer to Chinese.

LatePost: Elon Musk said dexterous hands are harder to develop than Tesla cars, second only to SpaceX’s reusable rockets.

Wang Qian: Hardware is indeed hard—but I see hardware and model capability as two parallel tracks. We’re building dexterous hands, but mainly to aid model training.

Frankly, most scenarios don’t require a hand with full human-level degrees of freedom. Cost is one factor, but more importantly, it’s often unnecessary. Humans can perform very complex tasks using simple grippers—and grippers are sufficient for at least half of real-world applications.

LatePost: But people feel that creating a fully human-like dexterous hand would be a massive breakthrough.

Wang Qian: I’m not so sure. People used to think robots running or dancing was groundbreaking—but was it really? Mostly emotional value. High-DoF dexterous hands are useful in certain tasks, but much of the time, they’re just providing emotional satisfaction—looking human-like, complex, impressive. That’s it.

LatePost: How far along is AutoVariable’s dexterous hand development?

Wang Qian: We’ve built a 20-degree-of-freedom hand with decent performance—but this isn’t our main focus. It’s primarily a tool for model training.

LatePost: Your robot uses wheels instead of bipedal legs—why?

Wang Qian: Legs have two fundamental issues: safety and cost. They’re inherently more prone to falling than wheeled systems. And they’re significantly more expensive—requiring an order of magnitude more motors and joints.

LatePost: But do legs offer no advantages?

Wang Qian: Their practical benefits are minimal. Sure, there’s emotional appeal—but aside from that, how many indoor scenarios truly require legs? The disadvantages far outweigh the benefits.

LatePost: So AutoVariable won’t pursue bipedal robots?

Wang Qian: We might—but only where it makes sense. Running a company often means knowing where not to go. This is one place we’ve chosen not to go.

We Want to Be an OpenAI-Like Company

LatePost: Some investors say your technical vision hasn’t changed since day one, and you’ve stayed focused without rushing into commercialization. Didn’t that make early fundraising difficult?

Wang Qian: At the time, investor logic was simple: you’re neither ByteDance nor Google—why should you build large models? Even if embodied AI requires large models, why you and not someone else? Many companies had already raised over 1 billion RMB—we were still at the seed round.

LatePost: How did you respond?

Wang Qian: Honestly, I couldn’t. I think this reflects a problem in China’s capital market: people don’t believe technology is primary. Subconsciously, they assume anyone can do tech—it lacks uniqueness.

Because historically, successful Chinese companies have all been fast followers. There’s never been a case where a company achieved global leadership from zero to one.

LatePost: You believe China could actually lead globally in embodied AI from zero to one?

Wang Qian: Someone asked if I wanted to be the “DeepSeek of embodied AI.” I said DeepSeek is a great company, but we want to be like OpenAI.

LatePost: So only investors who buy into that vision would fund you?

Wang Qian: Exactly. Our investors fundamentally believe in our ambition to become world-class. If you’re only interested in quick returns, you wouldn’t invest in us. Some of our shareholders told me: “Just focus on building the foundational model. If you need money, come to us.”

LatePost: Any specific examples?

Wang Qian: I won’t name names—but look at the two best domestic large-model companies: Alibaba and ByteDance. Both invested in us. We’re also the only embodied AI company ByteDance has ever backed.

LatePost: I heard investors gave your robot a surprise test—rolling toilet paper—and you performed well.

Wang Qian: Not surprise—three days’ notice. They said, “You claim few-shot learning ability? Here’s a task you’ve never seen—deliver in three days.”

The task: organize toilet paper rolls. Remove dirty or wrinkled parts, apply plastic seals, then repackage—essentially replicating hotel bathroom cleaning procedures.

LatePost: And you succeeded.

Wang Qian: Performance was quite good.

We spent one day collecting data, one day training. On the third day, investors arrived with stacks of various toilet papers—so effectively, we had two days of preparation.

LatePost: With improved model capabilities, fundraising should now be easier than in the early days.

Wang Qian: Yes, things are slightly better now. People realize China’s talent pool and density are no worse than America’s. And with successes like DeepSeek and Unitree, everyone sees China can achieve world-class results. No problem is insurmountable. Resources, compute—none of these are fundamental barriers anymore.

LatePost: So no one questions why you, rather than Google or Agibot, should lead?

Wang Qian: That question rarely comes up now.

LatePost: You seem to have never held conventional biases.

Wang Qian: Maybe because I understand both sides—I never assumed something possible in the U.S. must be impossible in China.

Team Rating: 8/10

LatePost: You previously lacked experience managing large teams. How do you prioritize your time?

Wang Qian: I spend significant time on hiring and fundraising. Technically, I weigh in on major decisions. For key products, I may personally oversee progress.

But I don’t micromanage—any CEO needing to do that likely has deeper organizational issues. I’m not control-oriented, nor do I want my team coming to me for every decision.

LatePost: Compared to other robotics firms, AutoVariable lacks glamour. Is recruiting hard?

Wang Qian: Different company cultures attract different people. We draw idealists who care deeply about technological fundamentals—that’s clear.

LatePost: Any trends—do candidates from certain companies or industries stand out?

Wang Qian: Fresh graduates. This field doesn’t value experience—almost no one has done it before. Everyone is part of the first generation. Recently, we’re seeing people from big tech or startups who’ve actually trained models—some from large language models, some from autonomous driving. We prefer those with LLM backgrounds.

LatePost: Why can’t autonomous driving companies succeed in embodied AI?

Wang Qian: First, their understanding of large models generally lags.

Second, autonomous driving and robotics aren’t as aligned as people assume. Driving involves no physical contact; robotics does. Core technologies differ.

Third, autonomous driving demands extreme safety standards, leading to divergent mindsets. Though these last two are secondary—the main issue is the first.

LatePost: Can other large model companies do what you do?

Wang Qian: This isn’t purely a large model problem. It involves hardware, systems, real-world randomness, experimentation, and organizational challenges—fundamentally mismatched with typical large model team DNA.

Large model teams are like air forces: a brilliant pilot plus an aircraft—mission success depends on individual skill. Their core is essentially a loosely connected lab of top-tier individuals.

Hardware teams are like navies: you’re on a ship. Every role must work in sync—from front-end hardware and data interaction, through data processing, to model training. The chain is long; if one link fails, the whole ship sinks.

LatePost: How do you overcome this cultural clash?

Wang Qian: By finding the right people. Also, technically speaking, action as a modality differs from language or vision—you need entirely new methods to leverage action data. This creates a high technical barrier, requiring a native embodied intelligence team.

LatePost: How well are your algorithm and hardware teams integrated today?

Wang Qian: Almost no silos exist. Teams collaborate well as a unified whole.

LatePost: Rate it on a scale of 10?

Wang Qian: 8 out of 10.

First Place, No Bubble, Market Consolidation

LatePost: Omdia recently reported global humanoid robot shipments reached 13,000 units. Top players include Agibot, Unitree, Ubtech, etc. What do you think of this report? What progress will commercialization see by 2026?

Wang Qian: That report has limited value—mostly emotional impact. Does it really matter if there are 1,000 more or fewer dancing robots? Robots still can’t do useful work.

Commercialization has been like “the boy who cried wolf.” For the past two years, everyone called it the “year of commercialization,” but now that it might actually arrive, people don’t believe anymore—expectations were oversold too early.

LatePost: Do you believe 2026 will be the true year of commercialization?

Wang Qian: Commercialization can begin—not instantly mature, but at least feasible.

LatePost: How did you arrive at this judgment?

Wang Qian: Mainly because technology has hit a threshold. Reinforcement learning now works. Few-shot learning enables rapid deployment on specific products.

Until foundational models reach a certain quality, reinforcement learning simply doesn’t work. These milestones are significant. Until now, embodied AI couldn’t do much beyond dancing.

LatePost: What’s AutoVariable’s commercial strategy for 2026?

Wang Qian: Achieve positive ROI in certain scenarios—that would be a major milestone. No company has done this yet, except for gimmicks like dancing.

LatePost: Which scenarios? You mentioned public services, elder care earlier.

Wang Qian: Household chores—cleaning, organizing. And industrial verticals like screw-driving—tasks previously only humans could do.

We’ll see real commercial deployment this year, with positive ROI. I’m quite confident.

LatePost: Who else can achieve positive ROI besides you?

Wang Qian: Mostly overseas—like 1X, which has already sold hundreds of units. Figure is making progress in industrial settings, close to deployment. These companies are strong.

LatePost: Domestically?

Wang Qian: I think domestic players mostly focus on dancing—clearly behind the global leaders.

LatePost: Meaning they’re also behind AutoVariable?

Wang Qian: Well, of course we believe we’re doing better.

LatePost: How do you view competition with domestic peers?

Wang Qian: First, we need to clarify who counts as a peer. Within embodied AI, some companies focus on locomotion—a domain not necessarily requiring AI. It’s pure control theory. Starting with Boston Dynamics, they used zero lines of AI code.

These are manufacturing-driven: improve product quality, reduce costs. Nothing wrong with that—but it has nothing to do with AI.

So we’re on one end (AI-first), Unitree on the other. Eventually, both sides will converge—but I believe it’s easier for us to master hardware than for them to master AI.

Then there’s another type: resource integrators, who function more like real estate developers.

LatePost: What’s the competitive landscape across these categories?

Wang Qian: The hype around dancing robots is fading fast. Only the top few will survive—market consolidation is underway.

Our side is showing similar trends. By 2026, you’ll need to demonstrate real progress—either commercially or technically. In 2025, we saw many new entrants. Recently, however, almost no new players have entered either the model or full-system robotics space—elimination has begun.

Overall, the industry will improve as robots actually deploy. As the market grows, people will see this isn’t just hype. If you fail to deliver practical value for years, you risk a crash—like what happened in autonomous driving. But I don’t think robotics will face such a trough, because real deployment is happening.

LatePost: Many say embodied AI is overheated, with a bubble.

Wang Qian: There’s absolutely no bubble. Compared to autonomous driving or any previous mega-sector, embodied AI is tiny in terms of investment, valuation, and funding—even smaller than in the U.S., by an order of magnitude.

LatePost: Doesn’t America’s fundraising advantage make you wish you’d stayed in the U.S.?

Wang Qian: Long-term, China has bigger advantages. Across industries, scaling from 1 to 10, or 10 to 100, China consistently outperforms the U.S. So if we can match or exceed U.S. performance in the 0-to-1 phase, we’ll definitely have the upper hand long-term.

LatePost: Do you consider AutoVariable technically superior compared to others?

Wang Qian: I firmly believe we’re the best technically—and within the industry, this reputation exists.

Very few truly understand how to build large models correctly—especially in embodied AI, where it’s nearly nonexistent. Among all embodied AI companies globally, we’re the only one built around a core large-model team. In terms of technical strength among startups, we’re clearly number one.

Confident, Yet Anxious

LatePost: The strongest impression from this interview is your confidence.

Wang Qian: My predictions over the past two years have mostly been accurate—for instance, we deliberately delayed commercialization, which now looks like the right call.

LatePost: It’s not just recent years—I sense this mindset dates back to your student days.

Wang Qian: That’s vision, right? I’d say my vision is pretty solid.

LatePost: Your way of thinking seems fundamentally different from most people.

Wang Qian: I believe if you do something, aim to be first. Otherwise, it’s just not interesting. If I just wanted money, I’d have stayed in quant trading—no need to endure this hardship.

LatePost: So you do see this as hardship.

Wang Qian: Definitely tough.

LatePost: That didn’t come across in our conversation.

Wang Qian: Of course not—I wouldn’t let you feel that.

LatePost: Are you reluctant to show that side?

Wang Qian: People prefer seeing someone strong, flawless.

LatePost: But that feels fake.

Wang Qian: Then the presentation is off. Overall, you still need to appear flawless—otherwise, people won’t trust you.

LatePost: When you do get time off, what do you usually do?

Wang Qian: Sleep. I’m very introverted—sleep first, wake up, maybe read a book.

LatePost: Do you have poor sleep quality?

Wang Qian: When anxious, yes.

LatePost: What’s the last book you read?

Wang Qian: Scientific American.

LatePost: Okay… I heard you also enjoy browsing Bilibili.

Wang Qian: Even when not resting, I’m scrolling.

LatePost: Any preferred content?

Wang Qian: No—just random scrolling.

(As he speaks, Wang Qian reads aloud the current titles on his Bilibili homepage: “Inside Google DeepMind Lab”; “High School New Year’s Dance Performance”; “The Most Raw-Cooked Meat in the World”; “Today’s Source of Joy”; “Floating Wind Power System Completes Grid Connection Test”… )