融合U-net網(wǎng)絡(luò)的純卷積視頻預(yù)測模型

打開文本圖片集
中圖分類號:TP391.41 文獻(xiàn)標(biāo)志碼:A
DOI: 10.7652/xjtuxb202506012 文章編號:0253-987X(2025)06-0112-10
A Pure Convolutional Model Fused with U-net Network for Video Prediction
XIE Yumei1 ,CAI Yuanli2,GAO Haiyan3,GUANG Xiangfeng1,TANG Weiqiang4 (1.SchoolofElectronicInformationScience,F(xiàn)ujianJiangxiaUniversityFuzhou35olo8,China;2.FacultyofElectrical and Information Engineering,Xi'an jiaotong University,Xi'an 71o49,China;3. Schoolof Electrical Engineering and Automation,Xiamen Universityof Technology, Xiamen,F(xiàn)ujian 361024,China;4.College of Electrical and Information Engineering,Lanzhou University of Technology,Lanzhou 73oo5o3,China)
Abstract: To address the issues of insufficient spatiotemporal feature extraction and inadequate image detail preservation in deep learning-based video prediction,a pure convolutional video prediction model (CUnet) fused with the U-net network,using the Inception unit from the SimVP model,is proposed. CUnet model consists of 3 core modules. Firstly,the Cell module uses 2D convolutional layers to extract spatial features and feeds these features into multiple Inception units to capture spatiotemporal features. Secondly, the DeCell module captures spatiotemporal features through Inception units and performs upsampling operations using 2D deconvolutional layers to restore the original image size. Finally,U-net is introduced as the backbone network to organically integrate the Cell module and the DeCell module,effectively preserving the detailed information of the image and achieving high-quality image reconstruction. The experimental results showed that on the TaxiBJ dataset,compared with the currently bestperforming TAU model, the prediction accuracy of the CUnet model had increased by 5.23% : On the Human3.6M dataset,compared with the currently best-performing FFINet model,the prediction accuracy of the CUnet model had increased by 12.88% .The CUnet model demonstrates exceptional predictive capabilities,offering valuable insights for the application of pure convolutional neural networks in the field of video prediction.
Keywords: deep learning; video prediction; spatiotemporal features; U-net; pure convolutionalneural network
視頻預(yù)測是通過對歷史幀的學(xué)習(xí),實現(xiàn)對未來幀的精準(zhǔn)預(yù)測。(剩余16553字)