好文档 - 专业文书写作范文服务资料分享网站

探索用户自然输入标记及其在构建分词语料库中的作用

天下 分享 时间: 加入收藏 我要投稿 点赞

探索用户自然输入标记及其在构建分词语料库中的作用

张大奎;尹德春;汤世平;毛煜;樊孝忠

【期刊名称】《中文信息学报》 【年(卷),期】2018(032)002

【摘要】With the optimization of Chinese word segmentation algorithms,the performance of a word segmenter is more dependent on the coverage and completeness of the training corpus.Therefore,how to quickly,effectively,au-tomatically build word segmentation corpus has become a pressing issue to be addressed.This paper aims to explore the valuable natural word segmentation information,which is produced when users type in Chinese text.This infor-mation provides a new perspective for building Chinese segmentation training corpus,which is less touched in the lit-erature.In this paper,we have shown that user-produced word segmentation information can be used to segmenta-tion corpus,and its performance is acceptable.Moreover,some texts with this information from the excellent users are very close to the gold standard segmentation result.In this study,we use the classification model and the voting mechanism to find three of these excellent users,and get texts with natural word segmentation information.Experi-mental results show that these texts can be used to build segmentation training corpus,which greatly improves the accuracy of the segmenter.%当分词算法优化到接近极限时,分词器的性能指标就较多地取决于训练语料的覆

探索用户自然输入标记及其在构建分词语料库中的作用

探索用户自然输入标记及其在构建分词语料库中的作用张大奎;尹德春;汤世平;毛煜;樊孝忠【期刊名称】《中文信息学报》【年(卷),期】2018(032)002【摘要】WiththeoptimizationofChinesewordsegmentationalgorithms,theperformanceofawo
推荐度:
点击下载文档文档为doc格式
5gyd473jlp7u3cm9b9nu86wqu5roq7003b6
领取福利

微信扫码领取福利

微信扫码分享