面向多義詞例句語料生成的大模型微調(diào)指令自動化生成框架

打開文本圖片集
Abstract:First,a manual instruction setcontaining a body description set and a list of instruction examples is constructed as the initial input for the instruction pool.Then,input the instructions from the instruction pool into the large model to generate a number of machine-generated instructions corresponding to their corpora,the generated corpora are refined with text correction to obtain the desired polysemy example sentence corpus. Finaly,the edit distance algorithm is used to remove the weight of machine instructions,and the spectral clustering algorithm is used to cluster the candidate machine instructions,thereby achieving automated generation of machine instructions.By updating the instruction pool, iterative generation of the polysemy example sentence corpus is realized. The results show that the constructed polysemy example sentence dataset and its corresponding large model machine instruction set exhibit good linguistic diversity and content diversity. The constructed polysemy example sentence dataset meets the needs of second language learners in terms of sentence length,sentiment,vocabulary difficulty standard level ,and topics. Keywords:large language model; instruction generation; polysemy; example sentence generation; ChatGPT
中文作為一種復雜的語言,具有豐富的多義詞現(xiàn)象,即一個字或一個詞有多個不同的意義。(剩余11760字)