Publications

VI Lab

Current (2015~)

Discrete Wavelet Transform Meets Transformer: Unleashing the Full Potential of the Transformer for Visual Recognition
Journal
IEEE Access
Author
Dongwook Yang and Seung-Woo Seo
Class of publication
International Journal
Date
September, 2023
Traditionally, the success of the Transformer has been attributed to its token mixer, particularly the self-attention mechanism. However, recent studies suggest that replacing such attention-based token mixer with alternative techniques can yield comparable results in various vision tasks. This highlights the importance of the model’s overall architectural structure in achieving optimal performance, rather than relying exclusively on the specific choice of the token mixer. Building on this insight, we introduce Discrete Wavelet TransFormer, an innovative framework that incorporates Discrete Wavelet Transform to elevate all the building blocks of the Transformer to a higher standard. By exploiting distinct attributes of Discrete Wavelet Transform, Discrete Wavelet TransFormer not only strengthens the network’s ability to learn more intricate feature representations across different levels of abstraction, but also facilitates lossless down-sampling to promote a more resilient and efficient network. A comprehensive evaluation is conducted on diverse vision tasks, and the results conclusively demonstrate that Discrete Wavelet TransFormer outperforms all other state-of-the-art Transformer-based models across all tasks by a significant margin.