關(guān)于大語言模型一體化評測的研究和實踐

打開文本圖片集
中圖分類號:TP391.1
文獻標識碼:A 文章編號:2096-4706(2025)11-0059-06
Research and Practice on Integrated Evaluation of Large Language Models
HEQi,HANXiao,MAOHaotian,QIUJianmin (ChinaTelecomCorporationLimitedJiangsu Branch,Nanjing21oo37,China)
Abstract: With the increasing application of LLMs, how to accurately, objectivelyand comprehensively evaluate the ability of large models has becomeanimportanttopicofcommon concern inacademia and idustry.Inrecentyears,Jiangsu Telecom hasactivelycarriedoutthe exploration and practice of LLMs,and reconstructed multiple applications in the BMO domains through large models.Thispaperintroduces theintegratedevaluationschemeandsystempracticeofJiangsuTelecom basedonthecurrntopensourcebig modelecology.Thisschemecanagilelyaccessthelatestreleasedopensourcelargemodels, and realize theblind testselectionoflarge models basedonpracticalapplications,providing ausefulreference forbuilding a morescientificand perfectLargeLanguageModel evaluationsystem.
Keywords:LLMs; evaluation; framework
0 引言
在大模型應(yīng)用實踐初期,往往通過算力分配的方式,由各應(yīng)用方自行開展大模型實踐。(剩余5772字)