基于改進(jìn)VisionTransformer的森林火災(zāi)視頻識(shí)別研究

打開(kāi)文本圖片集
鍵詞:森林火災(zāi);深度學(xué)習(xí);目標(biāo)檢測(cè);三維卷積神經(jīng)網(wǎng)絡(luò);VisionTransformer中圖分類(lèi)號(hào):S762 文獻(xiàn)標(biāo)志碼:A 文章編號(hào):1000-2006(2025)04-0186-09
Research on forest fire video recognition based on improved Vision Transformer
ZHANG Min,XIN Ying*,HUANG Tianqi
(College of Mechanical and Electrical Engineering,Northeast Forestry University,Harbin15O04O,China)
Abstract:【Objective】Thisresearchaimstoresolvethelimitationsofexistingforestfirerecognitionalgorithmsin temporal featureutilizationandcomputational eficiency,this studyproposesavideo-basedrecognitionmodel(C3DViT)to enhance bothdetection acuracyandoperational eficiencyin practical forest monitoring scenarios.【Method】We presentedahybridarchitectureintegrating 3DConvolutional Neural Networks(3DCNN)with Vision Transformer(ViT). Theframework emploied3Dconvolution kernels to extract spatiotemporal features from video sequences,which were subsequently tokenized intovectorrepresentations.Vision Transformer'sself-atentionmechanism thengloballymodels feature relationshipsacross temporalandspatial dimensions,with final classificationachievedthroughtheMLPHead layer.Comprehensive ablation studiedandcomparative experiments were conductedagainst ResNet5O,LSTM,YOLOv5, and baseline 3DCNN,ViT models.【Result】The C3D-ViT achieves 96. 10% accuracy,outperforming ResNet50 89.07% ),LSTM 93.26% ),and YOLOv5( 91.46% ),and hasimproved compared to the accuracyof the original 3DCNN and Vision Transformer( 93.91% , 90.43% ).The improved C3D-ViT model performs better in recognition performance,with highrecognition accuracyand stabilityunder unfavorable conditions such asocclusion,long distance, andthinsmoke.Thedemandforreal-timedetectioncanberealized.【Conclusion】The C3D-ViTframework ffectively addresses spatiotemporal modelingchallngesinwildfiredetection through synergistic CNN-Transformerinteraction, providing a technically viable solution for next-generation forest fire early warning systems.
Keywords:forest fire;deep learning;object detection;3DCNN;Vision Transformer(ViT)
森林是陸地生態(tài)系統(tǒng)重要的組成部分,在全球生態(tài)、碳循環(huán)及氣候演變中起著重要作用[1]。(剩余16718字)