Vedecký časopis - archív

65 2026

Journal of Food and Nutrition Research
Súhrny čísla 1 / 2026

Maolan, L. – Zhenchang, G. – Wenliang, L. – Honghao, C.
Comparative review of Vision Transformer and YOLO in food and agriculture
Journal of Food and Nutrition Research, 65, 2026, č. 1, s. 1-12

Cai Honghao, Department of Physics, School of Science, Jimei University, Yinjiang Road 185, 361021 Xiamen, Fujian Province, China. E-mail: hhcai@jmu.edu.cn
Liao Wenliang, Department of Physics, School of Science, Jimei University, Yinjiang Road 185, 361021 Xiamen, Fujian Province, China. E-mail: 200661000118@jmu.edu.cn

Review article
Received 30 June 2025; 1st revised 4 november 2025; accepted 11 December 2025; published online 18 December 2025.

DOI: https://doi.org/10.64122/HOYB1678

Súhrn: Computer vision is vital in food and agriculture, with object detection being crucial for automation. Among these, You Only Look Once (YOLO) and Vision Transformer (ViT) models have emerged as two influential approaches. Despite their architectural differences, the two approaches often complement each other in practice. However, direct comparative studies remain limited and the literature fragmented. This paper starts by mapping the road from YOLO version 1 to the latest convolutional neural networks (CNN) and from the original Transformer to recent vision variants, showing why key designs were made and how they set the two streams apart. We benchmark the base architectures and their popular variants on food-and-agriculture datasets and quantify the gap between reported accuracy and reproduced/emerged accuracy. We conclude with an outlook on open challenges, emerging remedies, and recent advances poised to define the next generation of both paradigms. By integrating key literature (peer-reviewed articles within the last 10 years), this study constructs a systematic comparison of YOLO and ViT in the food and agricultural fields, which not only clarifies the technical boundaries and applicable scenarios of the two algorithms but also provides a theoretical basis and practical guidance for algorithm selection and optimisation in actual production.

Kľúčové slová: deep learning; convolutional neural network; computer vision; intelligent agriculture; food quality; attention mechanism; You Only Look Once

Na stiahnutie:
  jfnr-2026-1-pp001-012-maolan.pdf (PDF, 564.43 Kb, 295x)