Audio Vision Transformer Vision Transformer on spectrograms for audio classification, with data augmentation.