Holistic Co-Speech Motion Generation via Cross-Gated Attention and Cross-Limb Interaction

Zixiang Lua,*,1, Zhixiang Shenga,1, Zhitong Hea, Ping Gaob, Yunan Lia, Qiguang Miaoa,*
aSchool of Computer Science and Technology, Xidian University, No. 266 Xinglong Section of Xifeng Road, Xi'an, 710126, Shaanxi, China
bSchool of Statistics and Data Science, Xi'an University of Finance and Economics, No. 360 Changning Street, Xi'an, 710100, Shaanxi, China
1Equal contribution. *Corresponding authors.

Video Demonstration

Video Comparison

Ours
Echo
MambaTalk
GT
Example 1
Example 2
Example 3

Ablation Study

Left: Without This Module, Right: With This Module
ECCGA
CLIM
MB

Emotion Control

Generated videos for the same speech under different emotions. Disgust is the original emotion label of this speech clip.

Neutral
Disgust
Surprise

Jitter Comparison

Limb Jitter Caused by Mamba
Ours