中文自然语言处理-商品评论情感判别

上传人：小*** IP属地：天津上传时间：2022-08-13 格式：DOC 页数：3 大小：39KB 积分：20 举报 版权申诉

全文预览已结束

 下载本文档

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

1、中文自然语言处理一商品评论情感判别1、数据集下载fromsklearn.model_selectionimporttrain_test_splitfromgensim.models.word2vecimportWord2Vecimportnumpyasnpimportpandasaspdimportjiebafromsklearn.externalsimportjoblibfromsklearn.svmimportSVC2、载入数据，做预处理(分词)，切分训练集与测试集#载入数据，做预处理(分词)，切分训练集与测试集defload_file_and_preprocessing():neg=pd

2、.read_excel(chinese_data/neg.xls,header=None,index=None)pos=pd.read_excel(chinese_data/pos.xls,header=None,index=None)cw=lambdax:list(jieba.cut(x)poswords=pos0.apply(cw)negwords=neg0.apply(cw)#use1forpositivesentiment,0fornegativey=np.concatenate(np.ones(len(pos),np.zeros(len(neg)#训练集：测试集=8:2x_train

3、,x_test,y_train,y_test=train_test_split(np.concatenate(poswords,negwords),y,test_size=0.2)#NumPy提供了多种文件操作函数方便存取数组内容(npy格式以二进制存储数据的)np.save(pre_data/y_train.npy,y_train)np.save(pre_data/y_test.npy,y_test)returnx_train,x_test3、计算训练集和测试集每条评论数据的向量并存入文件#对每个句子的所有词向量取均值，来生成一个句子的/ectordefbuild_sentence_vect

4、or(text,size,w2v_model):vec=np.zeros(size).reshape(1,size)count=0forwordintext:try:vec+=w2v_modelword.reshape(1,size)count+=1exceptKeyError:continueifcount!=0:vec/=countreturnvec#计算词向量defget_train_vecs(x_train,x_test):n_dim=300#词向量维度#试用Word2Vec建立词向量模型w2v_model=Word2Vec(size=n_dim,window=5,sg=0,hs=0,

5、negative=5,min_count=10)w2v_model.build_vocab(x_train)#准备模型词汇表#在评论训练集上建模w2v_model.train(x_train,total_examples=w2v_model.corpus_count,epochs=w2v_model.iter)#川练词向量#训练集评论向量集合train_vecs=np.concatenate(build_sentence_vector(z,n_dim,w2v_model)forzinx_train)np.save(pre_data/train_vecs.npy,train_vecs)#将训练集

6、保存到文件中print(train_vecs.shape)#输出训练集的维度#在测试集上训练w2v_model.train(x_test,total_examples=w2v_model.corpus_count,epochs=w2v_model.iter)w2v_model.save(pre_data/w2v_model/w2v_model.pkl)test_vecs=np.concatenate(build_sentence_vector(z,n_dim,w2v_model)forzinx_test)np.save(pre_data/test_vecs.npy,test_vecs)prin

7、t(test_vecs.shape)4、获得训练集向量和标签，测试集向量和标签#获得训练集向量和标签，测试集向量和标签defget_data():train_vecs=np.1oad(pre_data/train_vecs.npy)y_train=np.load(pre_data/y_train.npy)test_vecs=np.1oad(pre_data/test_vecs.npy)y_test=np.load(pre_data/y_test.npy)returntrain_vecs,y_train,test_vecs,y_test5、训练SVM模型#训练SVM模型defsvm_train(

8、train_vecs,y_train,test_vecs,y_test):c1f=SVC(kerne1=rbf,verbose=True)c1f.fit(train_vecs,y_train)#艮据给定的训练数据拟合SVM模型job1ib.dump(c1f,pre_data/svm_mode1/mode1.pk1)#保存训练好的SVM模型print(c1f.score(test_vecs,y_test)#输出测试数据的平均准确度6、构建待遇测句子的向量#构建待遇测句子的向量defget_predict_vecs(words):n_dim=300w2v_mode1=Word2Vec.load(p

9、re_data/w2v_mode1/w2v_mode1.pk1)train_vecs=bui1d_sentence_vector(words,n_dim,w2v_mode1)returntrain_vecs7、对单个句子进行情感判断#对单个句子进行情感判断defsvm_predict(string):words=jieba.lcut(string)words_vecs=get_predict_vecs(words)clf=joblib.load(pre_data/svm_model/model.pkl)result=clf.predict(words_vecs)ifint(resultO)=1:print(string,positive)else:print(string,negative)if_name_=_main_:#x_train,x_test=loadile_and_preprocessing()#get_train_vecs(x_train,x_test)#train_vecs,y_train,test_vecs,y_test=get_data()#svm_train(train_vecs,y_train,test_v

人人文库> 全部分类> 行业资料 > 信息产业

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

中文自然语言处理-商品评论情感判别

文档简介

温馨提示

最新文档

评论

相关文档