Evaluated under Untargeted and Targeted attack settings across four task suites. Higher failure rate = stronger attack effectiveness. Tex3D consistently achieves the highest failure rates.
| Model | Task | Untargeted Attack | Targeted Attack | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No Attack | Gaussian | Single-frame | Vertex Param. | Tex+Temp. | Tex3D | No Attack | Gaussian | Single-frame | Vertex Param. | Tex+Temp. | Tex3D | ||
| OpenVLA | |||||||||||||
| Spatial | 15.6 | 24.6 | 75.1 | 80.5 | 90.3 | 95.8 | 15.6 | 24.6 | 82.4 | 86.5 | 93.2 | 96.7 | |
| Object | 11.8 | 62.1 | 69.8 | 74.6 | 81.1 | 83.2 | 11.8 | 18.8 | 70.9 | 74.8 | 81.6 | 85.5 | |
| Goal | 23.6 | 28.8 | 65.0 | 70.7 | 79.1 | 84.8 | 23.6 | 28.8 | 71.5 | 74.2 | 81.6 | 86.9 | |
| Long | 40.4 | 45.2 | 73.2 | 80.9 | 83.9 | 90.3 | 40.4 | 45.2 | 79.6 | 79.9 | 90.5 | 92.8 | |
| Avg. | 24.1 | 31.1 | 69.0 | 75.5 | 82.9 | 88.1 | 24.1 | 31.1 | 76.1 | 79.9 | 86.6 | 90.5 | |
| OpenVLA-OFT | |||||||||||||
| Spatial | 3.8 | 8.8 | 69.6 | 74.8 | 79.1 | 78.6 | 3.8 | 8.8 | 70.8 | 80.7 | 76.9 | 80.1 | |
| Object | 1.7 | 3.6 | 57.4 | 64.9 | 67.7 | 70.6 | 1.7 | 3.6 | 63.7 | 69.2 | 71.2 | 76.4 | |
| Goal | 3.8 | 5.4 | 61.3 | 68.1 | 76.8 | 76.8 | 3.8 | 5.4 | 64.6 | 60.8 | 73.9 | 79.8 | |
| Long | 9.3 | 10.9 | 65.5 | 68.0 | 73.6 | 78.2 | 9.3 | 10.9 | 68.9 | 71.7 | 75.8 | 80.2 | |
| Avg. | 4.7 | 6.5 | 63.5 | 68.3 | 73.6 | 76.0 | 4.7 | 6.5 | 67.0 | 70.6 | 74.4 | 77.4 | |
| π₀ | |||||||||||||
| Spatial | 3.5 | 11.8 | 54.7 | 59.4 | 65.2 | 74.9 | 3.5 | 11.8 | 58.8 | 63.2 | 69.5 | 75.9 | |
| Object | 2.3 | 8.6 | 48.3 | 53.1 | 59.3 | 68.9 | 2.3 | 8.6 | 52.5 | 58.4 | 62.6 | 72.3 | |
| Goal | 5.2 | 9.6 | 53.0 | 58.3 | 63.1 | 70.3 | 5.2 | 9.6 | 56.2 | 60.5 | 68.6 | 73.4 | |
| Long | 7.2 | 12.5 | 54.2 | 57.0 | 64.5 | 72.9 | 7.2 | 12.5 | 57.1 | 57.0 | 68.4 | 73.3 | |
| Avg. | 4.6 | 10.7 | 52.6 | 57.0 | 63.0 | 71.8 | 4.6 | 10.7 | 56.2 | 60.6 | 67.3 | 73.3 | |
| π₀.₅ | |||||||||||||
| Spatial | 1.7 | 7.2 | 47.1 | 53.1 | 60.1 | 71.8 | 1.2 | 7.8 | 49.3 | 63.2 | 63.2 | 72.9 | |
| Object | 1.8 | 6.9 | 39.7 | 46.0 | 52.4 | 65.2 | 1.8 | 6.9 | 41.1 | 48.2 | 54.3 | 68.3 | |
| Goal | 2.0 | 5.5 | 44.1 | 42.7 | 55.2 | 69.7 | 2.0 | 5.5 | 47.5 | 45.5 | 60.8 | 71.3 | |
| Long | 6.0 | 9.3 | 50.0 | 55.2 | 63.0 | 70.6 | 6.0 | 9.3 | 51.8 | 56.8 | 65.1 | 72.1 | |
| Avg. | 2.8 | 7.4 | 45.4 | 51.2 | 58.3 | 69.3 | 2.8 | 7.4 | 47.4 | 53.4 | 61.0 | 71.2 | |
Table 1: Task failure rates (%) on four LIBERO task variants under untargeted and targeted settings. "No Attack": clean evaluation. "Gaussian": random Gaussian noise. "Single-frame": one-frame perturbation. "Vertex Param.": parameterized texture attack without temporal consistency. "Tex+Temp.": texture-only temporal variant. Tex3D: our full method. ↑ indicates that higher failure rates reflect stronger attack effectiveness.
We evaluate on four LIBERO task suites: Spatial, Object, Goal, and LIBERO-10 (Long). Each panel below shows side-by-side clean vs. attacked robot behavior.
LIBERO-Spatial
LIBERO-Object
LIBERO-Goal
LIBERO-Long (LIBERO-10)