⚠ Adversarial ML · VLA Security

Tex3D: Objects as Attack Surfaces via
Adversarial 3D Textures for
Vision-Language-Action Models

Jiawei Chen^*,1,2, Simin Huang^*,1, Jiawei Du³, Shuaihang Chen^2,5, Yu Tian⁴, Mingjie Wei^2,5, Chao Yu^†,4, Zhaoxia Yin^†,1

^* Equal contribution ^† Corresponding authors

¹ East China Normal University ² Zhongguancun Academy ³ CFAR, A*STAR, Singapore ⁴ Tsinghua University ⁵ Harbin Institute of Technology

🏠 Project 📄 Paper 💻 Code

Abstract

Figure 1: Overview of the Tex3D attack pipeline.

Vision-language-action (VLA) models have shown strong performance in robotic manipulation, yet their robustness to physically realizable adversarial attacks remains underexplored. Existing studies reveal vulnerabilities through language perturbations and 2D visual attacks, but these attack surfaces are either less representative of real deployment or limited in physical realism. In contrast, adversarial 3D textures pose a more physically plausible and damaging threat, as they are naturally attached to manipulated objects and are easier to deploy in physical environments.

Bringing adversarial 3D textures to VLA systems is nevertheless nontrivial. A central obstacle is that standard 3D simulators do not provide a differentiable optimization path from the VLA objective function back to object appearance. To address this, we introduce Foreground-Background Decoupling (FBD), which enables differentiable texture optimization through dual-renderer alignment while preserving the original simulation environment. We further propose Trajectory-Aware Adversarial Optimization (TAAO), which prioritizes behaviorally critical frames and stabilizes optimization with a vertex-based parameterization.

Built on these designs, we present Tex3D, the first framework for end-to-end optimization of 3D adversarial textures directly within the VLA simulation environment. Experiments in both simulation and real-robot settings show that Tex3D significantly degrades VLA performance across multiple manipulation tasks, achieving task failure rates of up to 96.7%.

Figure 2: Detailed illustration of FBD and TAAO components.

Key Contributions

Two novel components power Tex3D

🎨

Foreground-Background Decoupling (FBD)

Enables differentiable texture optimization through dual-renderer alignment while preserving the original simulation environment — bypassing the non-differentiable simulator bottleneck.

🎯

Trajectory-Aware Adversarial Optimization (TAAO)

Prioritizes behaviorally critical frames in the robot's trajectory and stabilizes optimization via vertex-based parameterization for robust cross-viewpoint attacks.

🤖

First End-to-End 3D Texture Attack on VLAs

Tex3D is the first framework enabling end-to-end optimization of 3D adversarial textures directly within the VLA simulation environment.

📊

96.7% Failure Rate Achieved

Evaluated on LIBERO benchmark across four task suites and four SOTA VLA models, demonstrating critical robustness gaps in current systems.

Experimental Results

Task Failure Rates (%) on LIBERO Benchmark

Evaluated under Untargeted and Targeted attack settings across four task suites. Higher failure rate = stronger attack effectiveness. Tex3D consistently achieves the highest failure rates.

Model	Task	Untargeted Attack						Targeted Attack
Model	Task	No Attack	Gaussian	Single-frame	Vertex Param.	Tex+Temp.	Tex3D	No Attack	Gaussian	Single-frame	Vertex Param.	Tex+Temp.	Tex3D
OpenVLA
	Spatial	15.6	24.6	75.1	80.5	90.3	95.8	15.6	24.6	82.4	86.5	93.2	96.7
	Object	11.8	62.1	69.8	74.6	81.1	83.2	11.8	18.8	70.9	74.8	81.6	85.5
	Goal	23.6	28.8	65.0	70.7	79.1	84.8	23.6	28.8	71.5	74.2	81.6	86.9
	Long	40.4	45.2	73.2	80.9	83.9	90.3	40.4	45.2	79.6	79.9	90.5	92.8
	Avg.	24.1	31.1	69.0	75.5	82.9	88.1	24.1	31.1	76.1	79.9	86.6	90.5
OpenVLA-OFT
	Spatial	3.8	8.8	69.6	74.8	79.1	78.6	3.8	8.8	70.8	80.7	76.9	80.1
	Object	1.7	3.6	57.4	64.9	67.7	70.6	1.7	3.6	63.7	69.2	71.2	76.4
	Goal	3.8	5.4	61.3	68.1	76.8	76.8	3.8	5.4	64.6	60.8	73.9	79.8
	Long	9.3	10.9	65.5	68.0	73.6	78.2	9.3	10.9	68.9	71.7	75.8	80.2
	Avg.	4.7	6.5	63.5	68.3	73.6	76.0	4.7	6.5	67.0	70.6	74.4	77.4
π₀
	Spatial	3.5	11.8	54.7	59.4	65.2	74.9	3.5	11.8	58.8	63.2	69.5	75.9
	Object	2.3	8.6	48.3	53.1	59.3	68.9	2.3	8.6	52.5	58.4	62.6	72.3
	Goal	5.2	9.6	53.0	58.3	63.1	70.3	5.2	9.6	56.2	60.5	68.6	73.4
	Long	7.2	12.5	54.2	57.0	64.5	72.9	7.2	12.5	57.1	57.0	68.4	73.3
	Avg.	4.6	10.7	52.6	57.0	63.0	71.8	4.6	10.7	56.2	60.6	67.3	73.3
π₀.₅
	Spatial	1.7	7.2	47.1	53.1	60.1	71.8	1.2	7.8	49.3	63.2	63.2	72.9
	Object	1.8	6.9	39.7	46.0	52.4	65.2	1.8	6.9	41.1	48.2	54.3	68.3
	Goal	2.0	5.5	44.1	42.7	55.2	69.7	2.0	5.5	47.5	45.5	60.8	71.3
	Long	6.0	9.3	50.0	55.2	63.0	70.6	6.0	9.3	51.8	56.8	65.1	72.1
	Avg.	2.8	7.4	45.4	51.2	58.3	69.3	2.8	7.4	47.4	53.4	61.0	71.2

Table 1: Task failure rates (%) on four LIBERO task variants under untargeted and targeted settings. "No Attack": clean evaluation. "Gaussian": random Gaussian noise. "Single-frame": one-frame perturbation. "Vertex Param.": parameterized texture attack without temporal consistency. "Tex+Temp.": texture-only temporal variant. Tex3D: our full method. ↑ indicates that higher failure rates reflect stronger attack effectiveness.

Simulation Setup & Demo

Attack demonstrations across LIBERO tasks

We evaluate on four LIBERO task suites: Spatial, Object, Goal, and LIBERO-10 (Long). Each panel below shows side-by-side clean vs. attacked robot behavior.

LIBERO-Spatial