BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization

Overview of the Objective-Decoupled training framework for backdoor injection in VLA models.

Overview of our Objective-Decoupled training framework for backdoor injection in VLA models. Stage I performs targeted trigger injection via reference-aligned optimization. Stage II fine-tunes the remaining modules using only clean data to ensure clean-task.

Abstract

Vision-Language-Action (VLA) models have advanced robotic control by enabling end-to-end decision-making directly from multimodal inputs. However, their tightly coupled architectures expose novel security vulnerabilities. Unlike tradi4 tional adversarial perturbations, backdoor attacks represent a stealthier, persistent, and practically significant threat—particularly under the emerging Training-as-a6 Service paradigm—but remain largely unexplored in the context of VLA models. To address this gap, we propose BadVLA, a backdoor attack method based on Objective-Decoupled Optimization, which for the first time exposes the backdoor vulnerabilities of VLA models. Specifically, it consists of a two-stage process: (1) explicit feature-space separation to isolate trigger representations from benign inputs, and (2) conditional control deviations that activate only in the presence of the trigger, while preserving clean-task performance. Empirical results on multiple VLA benchmarks demonstrate that BadVLA consistently achieves near-100% at tack success rates with minimal impact on clean task accuracy. Further analyses confirm its robustness against common input perturbations, task transfers, and model fine-tuning, underscoring critical security vulnerabilities in current VLA deployments. Our work offers the first systematic investigation of backdoor vulner18 abilities in VLA models, highlighting an urgent need for secure and trustworthy embodied model design practices.

Main Results

Performance of BadVLA across different trigger types (Block, Mug, Stick) on OpenVLA under LIBERO benchmarks. Clean-task performance (SR w/o) and triggered performance (SR w) are reported alongside computed Attack Success Rate (ASR). Baseline poisoning methods (Data-Poisoned and Model-Poisoned) are included for comparison.

Cosine similarity between clean and triggered features before and after Stage I. Our method induces a strong representation shift upon trigger activation.

Robo Manipulation Demos -- LIBERO Goal

Normal Trajectory

Trigger: None

✅

Trigger Trajectory

Trigger: Pixel Block

✘

Trigger Trajectory

Trigger: Red Stick

✘

Trigger Trajectory

Trigger: Yellow Mug

✘

Robot Manipulation Demos -- LIBERO Long

Normal Trajectory

Trigger: None

✅

Trigger Trajectory

Trigger: Pixel Block

✘

Trigger Trajectory

Trigger: Red Stick

✘

Trigger Trajectory

Trigger: Yellow Mug

✘

Robot Manipulation Demos -- LIBERO Object

Normal Trajectory

Trigger: None

✅

Trigger Trajectory

Trigger: Pixel Block

✘

Trigger Trajectory

Trigger: Red Stick

✘

Trigger Trajectory

Trigger: Yellow Mug

✘

Robot Manipulation Demos -- LIBERO Spatial

Normal Trajectory

Trigger: None

✅

Trigger Trajectory

Trigger: Pixel Block

✘

Trigger Trajectory

Trigger: Red Stick

✘

Trigger Trajectory

Trigger: Yellow Mug

✘

Robot Manipulation Demos -- Google Robot Move Near

Normal Trajectory

Trigger: None

✅

Trigger Trajectory

Trigger: Pixel Block

✘

Robot Manipulation Demos -- Google Robot Pick Coke Can

Normal Trajectory

Trigger: None

✅

Trigger Trajectory

Trigger: Pixel Block

✘

Robot Manipulation Demos -- Google Robot Pick Object

Normal Trajectory

Trigger: None

✅

Trigger Trajectory

Trigger: Pixel Block

✘

BibTeX

@misc{zhou2025badvlabackdoorattacksvisionlanguageaction,
        title={BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization}, 
        author={Xueyang Zhou and Guiyao Tie and Guowen Zhang and Hechang Wang and Pan Zhou and Lichao Sun},
        year={2025},
        eprint={2505.16640},
        archivePrefix={arXiv},
        primaryClass={cs.CR},
        url={https://arxiv.org/abs/2505.16640}, 
  }

BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization

Overview of our Objective-Decoupled training framework for backdoor injection in VLA models. Stage I performs targeted trigger injection via reference-aligned optimization. Stage II fine-tunes the remaining modules using only clean data to ensure clean-task.

Abstract

Main Results

Trajectory comparison.

Trajectory comparison.

Trajectory comparison.

Trajectory comparison.

Robo Manipulation Demos -- LIBERO Goal

Robot Manipulation Demos -- LIBERO Long

Robot Manipulation Demos -- LIBERO Object

Robot Manipulation Demos -- LIBERO Spatial

Robot Manipulation Demos -- Google Robot Move Near

Robot Manipulation Demos -- Google Robot Pick Coke Can

Robot Manipulation Demos -- Google Robot Pick Object

BibTeX