⚠ Adversarial ML · VLA Security

Tex3D: Objects as Attack Surfaces via
Adversarial 3D Textures for
Vision-Language-Action Models

Jiawei Chen*,1,2, Simin Huang*,1, Jiawei Du3, Shuaihang Chen2,5, Yu Tian4, Mingjie Wei2,5, Chao Yu†,4, Zhaoxia Yin†,1

* Equal contribution    Corresponding authors

1 East China Normal University    2 Zhongguancun Academy    3 CFAR, A*STAR, Singapore    4 Tsinghua University    5 Harbin Institute of Technology

Abstract
Overview figure
Figure 1: Overview of the Tex3D attack pipeline.
Vision-language-action (VLA) models have shown strong performance in robotic manipulation, yet their robustness to physically realizable adversarial attacks remains underexplored. Existing studies reveal vulnerabilities through language perturbations and 2D visual attacks, but these attack surfaces are either less representative of real deployment or limited in physical realism. In contrast, adversarial 3D textures pose a more physically plausible and damaging threat, as they are naturally attached to manipulated objects and are easier to deploy in physical environments.

Bringing adversarial 3D textures to VLA systems is nevertheless nontrivial. A central obstacle is that standard 3D simulators do not provide a differentiable optimization path from the VLA objective function back to object appearance. To address this, we introduce Foreground-Background Decoupling (FBD), which enables differentiable texture optimization through dual-renderer alignment while preserving the original simulation environment. We further propose Trajectory-Aware Adversarial Optimization (TAAO), which prioritizes behaviorally critical frames and stabilizes optimization with a vertex-based parameterization.

Built on these designs, we present Tex3D, the first framework for end-to-end optimization of 3D adversarial textures directly within the VLA simulation environment. Experiments in both simulation and real-robot settings show that Tex3D significantly degrades VLA performance across multiple manipulation tasks, achieving task failure rates of up to 96.7%.
Method figure
Figure 2: Detailed illustration of FBD and TAAO components.

Key Contributions

Two novel components power Tex3D

🎨
Foreground-Background Decoupling (FBD)
Enables differentiable texture optimization through dual-renderer alignment while preserving the original simulation environment — bypassing the non-differentiable simulator bottleneck.
🎯
Trajectory-Aware Adversarial Optimization (TAAO)
Prioritizes behaviorally critical frames in the robot's trajectory and stabilizes optimization via vertex-based parameterization for robust cross-viewpoint attacks.
🤖
First End-to-End 3D Texture Attack on VLAs
Tex3D is the first framework enabling end-to-end optimization of 3D adversarial textures directly within the VLA simulation environment.
📊
96.7% Failure Rate Achieved
Evaluated on LIBERO benchmark across four task suites and four SOTA VLA models, demonstrating critical robustness gaps in current systems.

Experimental Results

Task Failure Rates (%) on LIBERO Benchmark

Evaluated under Untargeted and Targeted attack settings across four task suites. Higher failure rate = stronger attack effectiveness. Tex3D consistently achieves the highest failure rates.

Model Task Untargeted Attack Targeted Attack
No AttackGaussianSingle-frameVertex Param.Tex+Temp.Tex3D No AttackGaussianSingle-frameVertex Param.Tex+Temp.Tex3D
OpenVLA
Spatial 15.624.675.180.590.395.8 15.624.682.486.593.296.7
Object 11.862.169.874.681.183.2 11.818.870.974.881.685.5
Goal 23.628.865.070.779.184.8 23.628.871.574.281.686.9
Long 40.445.273.280.983.990.3 40.445.279.679.990.592.8
Avg. 24.131.169.075.582.988.1 24.131.176.179.986.690.5
OpenVLA-OFT
Spatial 3.88.869.674.879.178.6 3.88.870.880.776.980.1
Object 1.73.657.464.967.770.6 1.73.663.769.271.276.4
Goal 3.85.461.368.176.876.8 3.85.464.660.873.979.8
Long 9.310.965.568.073.678.2 9.310.968.971.775.880.2
Avg. 4.76.563.568.373.676.0 4.76.567.070.674.477.4
π₀
Spatial 3.511.854.759.465.274.9 3.511.858.863.269.575.9
Object 2.38.648.353.159.368.9 2.38.652.558.462.672.3
Goal 5.29.653.058.363.170.3 5.29.656.260.568.673.4
Long 7.212.554.257.064.572.9 7.212.557.157.068.473.3
Avg. 4.610.752.657.063.071.8 4.610.756.260.667.373.3
π₀.₅
Spatial 1.77.247.153.160.171.8 1.27.849.363.263.272.9
Object 1.86.939.746.052.465.2 1.86.941.148.254.368.3
Goal 2.05.544.142.755.269.7 2.05.547.545.560.871.3
Long 6.09.350.055.263.070.6 6.09.351.856.865.172.1
Avg. 2.87.445.451.258.369.3 2.87.447.453.461.071.2

Table 1: Task failure rates (%) on four LIBERO task variants under untargeted and targeted settings. "No Attack": clean evaluation. "Gaussian": random Gaussian noise. "Single-frame": one-frame perturbation. "Vertex Param.": parameterized texture attack without temporal consistency. "Tex+Temp.": texture-only temporal variant. Tex3D: our full method. ↑ indicates that higher failure rates reflect stronger attack effectiveness.


Simulation Setup & Demo

Attack demonstrations across LIBERO tasks

We evaluate on four LIBERO task suites: Spatial, Object, Goal, and LIBERO-10 (Long). Each panel below shows side-by-side clean vs. attacked robot behavior.

LIBERO-Spatial

Black bowl from table center Tex3D
Black bowl on the ramekin Tex3D
Black bowl on the stove Tex3D
Black bowl on the wooden cabinet Tex3D

LIBERO-Object

Pick up the chocolate pudding Tex3D
Pick up the milk Tex3D
Pick up the tomato sauce Tex3D
Pick up the cream cheese Tex3D

LIBERO-Goal

Push the plate Tex3D
Pick up the bowl Tex3D
Pick up the cream cheese Tex3D
Pick up the bowl (variant) Tex3D

LIBERO-Long (LIBERO-10)

Alphabet soup → basket Tex3D
Salad dressing → basket Tex3D
BBQ sauce → basket Tex3D
Orange juice → basket Tex3D