如何做好网站推广优化,phpcms怎么做网站,品牌网站建设的关键事项,附近的装修公司电话最近在研究自然语言处理过程中#xff0c;正好接触到大模型#xff0c;特别是在年初chatgpt引来的一大波AIGC热潮以来#xff0c;一直都想着如何利用大模型帮助企业的各项业务工作#xff0c;比如智能检索、方案设计、智能推荐、智能客服、代码设计等等#xff0c;总得感觉… 最近在研究自然语言处理过程中正好接触到大模型特别是在年初chatgpt引来的一大波AIGC热潮以来一直都想着如何利用大模型帮助企业的各项业务工作比如智能检索、方案设计、智能推荐、智能客服、代码设计等等总得感觉相比传统的搜索和智能化辅助手段大模型提供的方式更高效、直接和精准等而且结合chat能够实现多轮次的迭代更接近或了解用户需求提供更精准的答复。目前正在开展大模型部署应用测试目前开源大模型主要就是Llama、ChatGLM大模型等包括Llama-1和Llama-2在其基础上的改进大模型有Chinese-LLaMA、OpenChineseLLaMA、Moss、baichuan等等本文主要对原始Llama大模型进行了本地部署与测试后续再逐步扩展结合行业数据资源进行finetune希望在开源模型的基础上对油气行业大模型构建有所帮助Llama-2大模型部署及应用测试如下。
一、部署环境
环境利用anaconda管理python环境 condaconda 4.3.30 pythonPython 3.10.4 cuda version11.0安装低于该版本的包即可我安装的是cu102GPU采用Tesla V100详见GPU监测情况 env/root/anaconda3/envs/torch/ require包如下主要看torch、torchaudio、torchvision、transformers、uvicorn、fastapi、accelerate。
二、目前已部署的大模型和运行比较
Chinese-Llama-2-7b运行速度慢加载速度快 Chinese-Llama-2-7b-4bit运行速度相对快加载速度最快 chinese-alpaca-2-7b-hf运行速度更快加载速度慢 chinese-alpaca-2-13b-hf运行速度更快加载速度慢 open-chinese-llama-7b-patch运行速度中等加载速度慢
三、目前支持的运行方式
1.控制台运行详见chinese-llama2Test2.py运行命令python chinese-llama2Test2.py Chinese-Llama-2-7b 2.Rest服务运行restful运行详见restApi.py运行命令python restApi.py Chinese-Llama-2-7b 对于Rest服务的调用主要用postman或DHC客户端模拟POST请求Content-Typeapplication/jsonpost参数是json格式如 {prompt: 北京最佳的旅游时间, history: []}
四、应用测试
1.单次测试代码
# 一次性访问
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
model_path model/Chinese-Llama-2-7b
tokenizer AutoTokenizer.from_pretrained(model_path, use_fastFalse)
model AutoModelForCausalLM.from_pretrained(model_path).half().cuda()
streamer TextStreamer(tokenizer, skip_promptTrue, skip_special_tokensTrue)instruction [INST] SYS\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you dont know the answer to a question, please dont share false information.\n/SYS\n\n{} [/INST]prompt instruction.format(用中文回答When is the best time to visit Beijing, and do you have any suggestions for me?)
generate_ids model.generate(tokenizer(prompt, return_tensorspt).input_ids.cuda(), max_new_tokens4096, streamerstreamer)2.输出结果 3.循环交互模式测试代码
#循环交互模式
import torch
import sys, getopt
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
if (__name__ __main__) or (__name__ main):# 检查参数个数argc len(sys.argv)if (argc 1):print(missingParms % locals())sys.exit()#处理命令行参数modelName sys.argv[1]#model_path model/Chinese-Llama-2-7bmodel_path model/modelNametokenizer AutoTokenizer.from_pretrained(model_path, use_fastFalse)if model_path.endswith(4bit): #支持q4的轻量化模型选择对应模型即可。model AutoModelForCausalLM.from_pretrained(model_path,torch_dtypetorch.float16,device_mapauto)else:model AutoModelForCausalLM.from_pretrained(model_path).half().cuda()streamer TextStreamer(tokenizer, skip_promptTrue, skip_special_tokensTrue)instruction [INST] SYS\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you dont know the answer to a question, please dont share false information.\n/SYS\n\n{} [/INST]while True:text input(请输入提问 prompt\n)if text q:breakprompt instruction.format(text)generate_ids model.generate(tokenizer(prompt, return_tensorspt).input_ids.cuda(), max_new_tokens4096, streamerstreamer)
4.输出结果 五、监测GPU的使用情况
命令watch -n 1 -d nvidia-smi 1.启动时的GPU状态 2.运行过程中的GPU状态