基于GPU的Winograd 卷積算法并行化

打開文本圖片集
中圖分類號:TP183 文獻(xiàn)標(biāo)志碼:A 文章編號:1001-3695(2025)08-026-2446-06
doi:10.19734/j. issn.1001-3695.2024.11.0502
GPU-based parallelization of Winograd convolution algorithm
Wang Xin?,Zhen Xueru (KeyLboratodedrotroghstr(tcto)gUsitui)
Abstract:This paper proposedaninovativeWinogradparalelconvolutionalgorithmbasedonGPU toaddress theproblemof excessivecomputationalloadinmodernconvolutionalneuralnetworks.Thealgorithmusedload-balanced task mapping,optimized thedataloadingstrategyto hidelatency,andcombined thedynamic padding methodtofullexplore thesynergybetwen theWinogradconvolution algorithmandtheGPUarchitecture.Experimentalresultsshowthatonmultipleconvolutionallayers of the classic convolutional l network model ResNet,the proposed algorithm outperforms the standard Winograd convolutionalgorithmintheNVIDIAcuDNN8.3.Olibrary.Itachievesaspeed-upratioofupto2.46ontheTuringarchitecture RTX 2080Ti GPUand maintainshigh computational accuracy.Compared with the standard Winograd convolutionalgorithm based on GPU,the algorithm significantly improves the efficiency of convolutional computation.
Key Words:Winograd algorithm;parallel computing;CUDA;convolutional neural network
0 引言
卷積神經(jīng)網(wǎng)絡(luò)(convolutionalneuralnetwork,CNN)作為深度學(xué)習(xí)(deeplearning,DL)中的核心技術(shù),已經(jīng)在圖像分類[1]和目標(biāo)分割[2]等多個(gè)領(lǐng)域得到了廣泛應(yīng)用。(剩余14861字)