CNNs vs ViT

Both CNNs and Transformers are prominent architectures in AI and ML.
Following the success of language transformers, Vision Transformers (ViTs) are gaining popularity in computer vision.
This study aimed to compare the performance of DenseNet and the Vision Transformer ViT-16 on an image classification and identification task using the Food101 dataset.
The evaluation was based on common metrics like accuracy, F1 score, precision, and recall.

vitt densnet

Read on Medium