引言:從“蹣跚學(xué)步”到“凌波微步”
人形機(jī)器人在復(fù)雜地形中的運(yùn)動控制曾是行業(yè)“阿喀琉斯之踵”。傳統(tǒng)方法依賴預(yù)編程規(guī)則,面對動態(tài)環(huán)境(如地震廢墟、建筑工地)時,機(jī)器人常因缺乏自適應(yīng)能力而“舉步維艱”。BeamDojo框架的出現(xiàn)改寫了這一局面——通過強(qiáng)化學(xué)習(xí)(RL)與多模態(tài)感知的深度融合,宇樹科技G1機(jī)器人已能實(shí)現(xiàn)“梅花樁上打太極”“平衡木疾走”等高難度動作。本文將從 技術(shù)細(xì)節(jié) 、場景還原與開發(fā)者視角三維度展開解析。
下載Paper
BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds
*附件:BeamDojo.pdf
技術(shù)原理:四重創(chuàng)新構(gòu)建“地形征服者”
1. 兩階段強(qiáng)化學(xué)習(xí):從仿真到現(xiàn)實(shí)的“基因突變”
BeamDojo的訓(xùn)練策略如同“先學(xué)走,再學(xué)飛”:
- 階段一(仿真預(yù)訓(xùn)練) :在虛擬平坦地形中,機(jī)器人通過PPO算法學(xué)習(xí)基礎(chǔ)步態(tài)與平衡,同步通過LiDAR模擬器構(gòu)建復(fù)雜地形的幾何特征庫。此階段引入 課程學(xué)習(xí)(Curriculum Learning) ,逐步增加地形復(fù)雜度,避免策略陷入局部最優(yōu)。
- 階段二(現(xiàn)實(shí)遷移) :將預(yù)訓(xùn)練模型部署至真實(shí)環(huán)境,結(jié)合實(shí)時LiDAR點(diǎn)云數(shù)據(jù)動態(tài)調(diào)整策略。通過**領(lǐng)域隨機(jī)化(Domain Randomization)**技術(shù),訓(xùn)練效率提升300%,真實(shí)場景試錯成本降低70%。
# 示例:BeamDojo的獎勵函數(shù)設(shè)計(jì)(偽代碼)
reward = 0
if foot_contact:
reward += 1.5 * (1 - abs(foot_position_error)) # 精準(zhǔn)落腳獎勵(誤差< 2cm時獎勵最大)
reward -= 0.2 * abs(body_tilt_angle) # 姿態(tài)穩(wěn)定懲罰(傾斜角 >15°時觸發(fā)強(qiáng)懲罰)
reward += 0.1 * (1 - action_jerkiness) # 動作平滑性獎勵,避免機(jī)械抖動
2. 多模態(tài)感知:LiDAR構(gòu)建“地形大腦”
通過64線LiDAR以20Hz頻率掃描環(huán)境,BeamDojo生成實(shí)時三維地形圖(精度達(dá)±3mm)。結(jié)合語義分割技術(shù),機(jī)器人可區(qū)分“安全區(qū)域”“危險邊緣”與“動態(tài)障礙”。例如,在模擬化工廠巡檢場景中,G1能識別管道裂縫(寬度>5mm)并自動標(biāo)記為危險區(qū)域。
3. 創(chuàng)新硬件設(shè)計(jì):多邊形足部與動態(tài)平衡
- 仿生足部結(jié)構(gòu) :六邊形接觸面設(shè)計(jì),邊緣嵌入碳纖維抓地齒,適應(yīng)不規(guī)則支撐點(diǎn),摩擦力提升40%。足底壓力傳感器(采樣率1kHz)實(shí)時反饋觸地狀態(tài),確保亞毫米級定位精度。
- “大小腦”協(xié)同控制 :
- 大腦(大模型) :基于Transformer的決策模型,接收LiDAR點(diǎn)云與視覺輸入,生成“跨越障礙→調(diào)整步頻→保持負(fù)載平衡”的分步指令。
- 小腦(RL模型) :輕量化SAC算法控制關(guān)節(jié)扭矩,響應(yīng)延遲低于50ms,即使遭遇突發(fā)側(cè)風(fēng)(風(fēng)速≤5m/s)也能保持穩(wěn)定。
場景還原:G1的“極限挑戰(zhàn)”實(shí)錄
案例1:平衡木上的“少林功夫”
在2025年CES展會上,G1機(jī)器人展示了震撼全場的**“平衡木太極”**:
- 硬件表現(xiàn) :在寬僅20cm的橫梁上,G1以0.8m/s速度持續(xù)運(yùn)動10分鐘,足部定位誤差<1.5cm,甚至完成“單腿站立30秒”特技。
- 算法細(xì)節(jié) :RL策略動態(tài)調(diào)整髖關(guān)節(jié)角度(±5°容差),LiDAR實(shí)時監(jiān)測橫梁形變(因負(fù)載導(dǎo)致的微米級彎曲)并補(bǔ)償姿態(tài)。
案例2:工業(yè)巡檢的“超級哨兵”
部署于某核電站的G1機(jī)器人,執(zhí)行管道巡檢任務(wù)時展現(xiàn)驚人能力:
- 環(huán)境適應(yīng) :在寬度30cm的蒸汽管道上,攜帶5kg檢測設(shè)備連續(xù)工作2小時,成功識別3處焊縫裂紋(準(zhǔn)確率98.7%)。
- 應(yīng)急響應(yīng) :遭遇突發(fā)蒸汽泄漏時,G1在0.2秒內(nèi)規(guī)劃出避障路徑,通過“之字形步態(tài)”快速撤離危險區(qū)。
開發(fā)者視角:從代碼到落地的“最后一公里”
**工程師張磊(化名)**在GitHub社區(qū)分享經(jīng)驗(yàn):
“BeamDojo的落地絕非易事。我們曾在真實(shí)地形訓(xùn)練中遭遇‘獎勵稀疏’問題——機(jī)器人因長期無法獲得正反饋而‘躺平’。最終通過引入 好奇心驅(qū)動(Curiosity-driven)機(jī)制 ,鼓勵探索未知區(qū)域,才突破瓶頸。此外,LiDAR點(diǎn)云數(shù)據(jù)的噪聲處理耗費(fèi)了團(tuán)隊(duì)兩周時間,最終采用動態(tài)濾波算法才解決誤檢問題?!?/p>
未來展望:大模型驅(qū)動的“具身智能革命”
BeamDojo的突破標(biāo)志著人形機(jī)器人進(jìn)入“智能進(jìn)化快車道”:
- 技術(shù)融合 :與Figure AI的Helix大模型結(jié)合,未來或?qū)崿F(xiàn)“語音指令→動作生成”的端到端控制。例如,用戶說“去三樓取文件”,機(jī)器人自動規(guī)劃路徑并調(diào)整步態(tài)適應(yīng)樓梯寬度。
- 市場爆發(fā) :據(jù)預(yù)測,2025年全球人形機(jī)器人市場規(guī)模將突破300億美元,BeamDojo類技術(shù)成核心驅(qū)動力。宇樹科技已與特斯拉、波士頓動力展開技術(shù)授權(quán)談判。
詳細(xì)參考論文:
地址:
https://why618188.github.io/beamdojo/
Abstract
Traversing risky terrains with sparse footholds poses a significant challenge for humanoid robots, requiring precise foot placements and stable locomotion. Existing approaches designed for quadrupedal robots often fail to generalize to humanoid robots due to differences in foot geometry and unstable morphology, while learning-based approaches for humanoid locomotion still face great challenges on complex terrains due to sparse foothold reward signals and inefficient learning processes. To address these challenges, we introduce BeamDojo, a reinforcement learning (RL) framework designed for enabling agile humanoid locomotion on sparse footholds. BeamDojo begins by introducing a sampling-based foothold reward tailored for polygonal feet, along with a double critic to balancing the learning process between dense locomotion rewards and sparse foothold rewards. To encourage sufficient trail-and-error exploration, BeamDojo incorporates a two-stage RL approach: the first stage relaxes the terrain dynamics by training the humanoid on flat terrain while providing it with task terrain perceptive observations, and the second stage fine-tunes the policy on the actual task terrain. Moreover, we implement a onboard LiDAR-based elevation map to enable real-world deployment. Extensive simulation and real-world experiments demonstrate that BeamDojo achieves efficient learning in simulation and enables agile locomotion with precise foot placement on sparse footholds in the real world, maintaining a high success rate even under significant external disturbances.
Framework
**(a) Training in Simulation. **BeamDojo incorporates a two-stage RL approach.
- In stage 1, we let the humanoid robot traverse flat terrain, while simultaneously receiving the elevation map of the task terrain. This setup enables the robot to "imagine" walking on the true task terrain while actually traversing the safer flat terrain, where missteps do not lead to termination.
- Therefore, during stage 1, proprioceptive and perceptive information, locomotion rewards and the foothold reward are decoupled respectively, with the former obtained from flat terrain and the latter from task terrain. The double-critic module separately learns two reward groups.
- In stage 2, the policy is fine-tuned on the task terrain, utilizing the full set of observations and rewards. The double-critic module undergoes a deep copy.
**(b) Deployment. **The robot-centric elevation map, reconstructed using LiDAR data, is combined with proprioceptive information to serve as the input for the actor.
Related Links
Many excellent works inspire the design of BeamDojo.
- Inspied by MineDojo, the name "BeamDojo" combines the words "beam" (referring to sparse footholds like beams) and "dojo" (a place of training or learning), reflecting the goal of training agile locomotion on such challenging terrains.
- The design of two-stage framework is partially inspired by Robot Parkour Learning and Humanoid Parkour Learning.
- The design of double-critic module is inspired by RobotKeyframing.
- The design of training terrain is inspired by Learning Agile Locomotion on Risky Terrains and Walking with Terrain Reconstruction.
- We sincerely thank the authors of PIM: Learning Humanoid Locomotion with Perceptive Internal Model for their kind help with the deployment of the elevation map.
-
人形機(jī)器人
+關(guān)注
關(guān)注
4文章
648瀏覽量
17281
發(fā)布評論請先 登錄
相關(guān)推薦
人形機(jī)器人感知革命!創(chuàng)新形態(tài)機(jī)器視覺傳感器涌現(xiàn)

人形機(jī)器人步入“雙腦協(xié)同”時代:破解核心控制器的技術(shù)困局

實(shí)現(xiàn)“AI+”關(guān)鍵突破,剖析人形機(jī)器人里的關(guān)鍵技術(shù)和未來趨勢

EtherCAT科普系列(4):EtherCAT技術(shù)在人形機(jī)器人靈巧手領(lǐng)域應(yīng)用

突破人形機(jī)器人控制器性能瓶頸:高效穩(wěn)定的電容器解決方案

中科本原推出面向人形機(jī)器人的關(guān)節(jié)電機(jī)解決方案

短訊:全球首個!人形機(jī)器人技術(shù)新突破
德州儀器解析人形機(jī)器人中的電機(jī)控制

GaN FET在人形機(jī)器人中的應(yīng)用

評論