Plant disease looms over global food security as a significant threat. Despite this, accurately identifying diseases from images taken in real-world field conditions remains a major challenge. Standard classification models often fail in scenarios with complex backgrounds, variable lighting, and image noise characteristic of datasets like PlantDoc.
To address this, this study proposes a robust two-stage detection-classification pipeline. The first stage uses a YOLOv11n object detector that was trained to find and separate leaf areas from their messy surroundings. It got a mean Average Precision (mAP@0.5) of 92.9%. In the second stage, these cropped leaf images are put into an ECA-NFNet-L0 classification framework that uses a smart channel attention mechanism to find diseases in detail.
On the hard-to-use PlantDoc dataset, our full pipeline gets a final classification accuracy of 78.5% and a weighted F1-score of 78.4%. This decoupled method, which separates localization from classification, makes the model much stronger and is a better way to diagnose plant diseases in the field.
Our approach tackles the "generalization gap" between lab-based and field-based plant disease datasets by decoupling the problem into two distinct stages.
We employ the YOLOv11n architecture to detect and crop leaves from complex backgrounds. Unlike standard classification which processes the entire noisy image, our detector isolates the region of interest, removing background clutter.
Key Metric: The detector achieves a 92.9% mAP@0.5 on the merged leaf dataset.
The cropped leaf regions are processed by an ECA-NFNet-L0 (Efficient Channel Attention Normalization-Free Network). This network uses an attention mechanism to focus on specific disease characteristics without the need for Batch Normalization layers, improving stability and performance.
We compared YOLOv8 and YOLOv11 on the PlantDoc dataset. YOLOv11 demonstrated superior performance in localizing leaves in complex environments.
| Model | mAP@0.5 | mAP@0.5:0.95 | F1-Score |
|---|---|---|---|
| YOLOv8 | 63.4% | 51.0% | 59.0% |
| YOLOv11 | 69.5% | 54.0% | 65.0% |
| YOLOv11 (Merged Leaf) | 92.9% | 72.0% | 87.0% |
Table 1: Performance comparison. The "Merged Leaf" model treats all leaves as a single class for robust localization.
On the held-out test set, our ECA-NFNet-L0 classifier achieved robust results across diverse disease classes.
To ensure trust in our model's predictions, we applied Grad-CAM (Gradient-weighted Class Activation Mapping). The visualizations below confirm that the model focuses on the actual diseased regions of the leaf rather than background noise.
Figure 7: Grad-CAM visualization for Apple Rust Leaf. The model accurately focuses on the rust spots (warm colors indicate high attention).
Figure 8: Grad-CAM visualization for Potato Leaf Late Blight. Attention is centered on the necrotic lesions rather than the background.
@inproceedings{Hasan2025PlantDisease,
title={An Explainable AI based Plant Disease Identification using a Two-Stage Detection-Classification Pipeline with YOLO and ECA-NFNet Framework},
author={Hasan, Tahir and Ilham, Md. Fatin and Nasim, Md. Farhan Tanvir},
booktitle={2025 28th International Conference on Computer and Information Technology (ICCIT)},
year={2025},
publisher={IEEE},
address={Cox’s Bazar, Bangladesh}
}