收藏本站
收藏 | 手机打开
二维码
手机客户端打开本文

Looking for Better Chinese Indexes: A Corpus-based Approach to Base NP Detection and Indexing

Hongbiao CHEN  
【摘要】:Previotfs studies have shown that the use of phrases to represent a document抯 content can enhance the effectiveness of an automatic information retrieval (IR) system. However, among those few Chinese IR systems that have adopted phrase indexing strategy, most do not have a real automatic phrase finder. They merely extract phrases by means of maximum matching against a pre-compiled dictionary. On the other hand, the structures of the phrases extracted by most current phrase extraction methods are too complicated for indexing. This study proposes the use of Chinese base noun phrase (baseNP) as a complex indexing unit. A relatively effective and easy-to-be-implemented baseNP extraction method and a baseNP indexing method have been designed and tested. Chinese baseNP is defined as a combination of conceptual words. A corpus-based approach is adopted to acquiring the probabilities of words, tags and tag sequences in constituting baseNPs. Four detection algorithms have been designed and tested. The results show that 90.21% of the word combinations that contain good baseNPs can be extracted with the help of the word's probability information only. By combining template checking, the hybrid method can produce a precision of 60.43% and a recall of 58.93%. Two kinds of index databases have been generated: one is with the single words only (i.e., the single word indexing method) and the other is with single word supplemented with baseNPs (i.e., the baseNP indexing method). Retrieval experimental results show that baseNP indexing method can increase the retrieval precision at an average rate of 23.10% as compared to single word indexing method. It is concluded that baseNP is a kind of complex indexes capable of enhancing Chinese JR system performances and the baseNP indexing method is more effective than single word indexing method. The Chinese Experimental JR System (CEIRS 1.0) was developed and used as the retrieval experimental environment. Vector Space Model (VSM) is adopted as the retrieval model.


知网文化
【相似文献】
中国期刊全文数据库 前19条
1 Lie-fu AI;Jun-qing YU;Yun-feng HE;Tao GUAN;;High-dimensional indexing technologies for large scale content-based image retrieval: a review[J];Journal of Zhejiang University-Science C(Computers & Electronics);2013年07期
2 于水;叶允明;马范援;;Accurate performance estimators for information retrieval based on span bound of support vector machines[J];Journal of Harbin Institute of Technology;2006年01期
3 廖迎娣,张玮,P.Y.Deschamps;Atmospheric Correction of SeaWiFS Data for the Retrieval of SuspendedSediment in East China Coastal Waters[J];China Ocean Engineering;2005年02期
4 杜林,孙玉芳;A New Indexing Method Based on Word Proximity for Chinese Text Retrieval[J];Journal of Computer Science and Technology;2000年03期
5 ;Precious Literature Necessary for Clinical Research 1981-2008 Bound CD of English Version of JTCM[J];Journal of Traditional Chinese Medicine;2018年04期
6 ZHAO MingWei;ZHANG XingYing;YUE TianXiang;WANG Chun;JIANG Ling;SUN JingLu;;A high-accuracy method for simulating the XCO_2 global distribution using GOSAT retrieval data[J];Science China(Earth Sciences);2017年01期
7 ;Precious Literature Necessary for Clinical Research 1981-2008 Bound CD of English Version of JTCM[J];Journal of Traditional Chinese Medicine;2016年05期
8 ZHOU MinQiang;ZHANG XingYing;WANG PuCai;WANG ShuPeng;GUO LiLi;HU LieQun;;XCO_2 satellite retrieval experiments in short-wave infrared spectrum and ground-based validation[J];Science China(Earth Sciences);2015年07期
9 ZOU Yongli;;Non-topical attributes of academic papers andimplications to information retrieval[J];Journal of Library Science in China;2012年00期
10 Jianbo Deng;Yi Liu;Dongxu Yang;Zhaonan Cai;;CH_4 retrieval from hyperspectral satellite measurements in shortwave infrared: sensitivity study and preliminary test with GOSAT data[J];Chinese Science Bulletin;2014年14期
11 CAO WenMing;LIU Ning;KONG QiCong;FENG Hao;;Content-based image retrieval using high-dimensional information geometry[J];Science China(Information Sciences);2014年07期
12 Lei YANG;YunGang CAO;XiaoHua ZHU;ShengHe ZENG;GuoJiang YANG;JiangYong HE;XiuChun YANG;;Land surface temperature retrieval for arid regions based on Landsat-8 TIRS data:a case study in Shihezi,Northwest China[J];Journal of Arid Land;2014年06期
13 黄伟;刘德安;张雪洁;张燕;朱健强;;Analysis of a digital phase retrieval method for wave-front reconstruction[J];Chinese Optics Letters;2011年08期
14 陈文浩;方昱春;姚继锋;张武;;Multi-core based parallel computing technique for content-based image retrieval[J];Journal of Shanghai University(English Edition);2010年01期
15 ;Support Vector Machine active learning for 3D model retrieval[J];Journal of Zhejiang University(Science A:An International Applied Physics & Engineering Journal);2007年12期
16 ;Neural network wind retrieval from ERS-1/2 scatterometer data[J];Acta Oceanologica Sinica;2006年03期
17 ;Simulation study of phase retrieval for hard X-ray in-line phase contrast imaging[J];Science in China(Series G:Physics,Mechanics & Astronomy);2005年04期
18 ;A Fast Image Retrieval Algorithm with Multi-Channel Textural Features in PACS[J];Wuhan University Journal of Natural Sciences;2005年05期
19 ;Color-image retrieval based on fuzzy correlation[J];Science in China(Series F:Information Sciences);2004年03期
中国重要会议论文全文数据库 前10条
1 ;An Efficiency Hierarchy Indexing Method with Dynamic Clustering Algorithm for ROIBIR System[A];第十五届全国图象图形学学术会议论文集[C];2010年
2 ;Regional objects based image retrieval[A];Proceedings of the 2011 Chinese Control and Decision Conference(CCDC)[C];2011年
3 Synnove CARLSON;Larry D.SANFORD;Wilson A.W.FRASER;;Learning large-scale spatial relationships in a maze and effects of MK801 on retrieval in the rhesus monkey[A];Proceedings of the 7th Biennial Meeting and the 5th Congress of the Chinese Society for Neuroscience[C];2007年
4 ;The experimental study on the cuing retrieval mechanism Duan Haijun,Qin Jinliang[A];第十届全国心理学学术大会论文摘要集[C];2005年
5 Kuansheng Zou;Qian Zhang;;Research Progresses and Trends of Content Based 3D Model Retrieval[A];第30届中国控制与决策会议论文集(3)[C];2018年
6 ;Comparative study of haloperidol,clozapine and olanzapine on acquisition,consolidation and retrieval processes of learning and memory in mice[A];Proceedings of the 7th Biennial Meeting and the 5th Congress of the Chinese Society for Neuroscience[C];2007年
7 ;Natural Language Retrieval for Voice Interface Pascale Fung Weniwen Technologies November,2001[A];第六届全国人机语音通讯学术会议论文集[C];2001年
8 ;Glucocorticoid receptor in the basolateral nucleus of amygdala is required for post-reactivation reconsolidation of auditory fear memory[A];Proceedings of the 7th Biennial Meeting and the 5th Congress of the Chinese Society for Neuroscience[C];2007年
9 Shu-an Liu;Qing Wang;Jiawei Sun;;Integrated Optimization of Storage Allocations in Automated Storage and Retrieval System of Bearings[A];第25届中国控制与决策会议论文集[C];2013年
10 Jun Jiang;Guang-Yu Wang;Wenhan Luo;Yang Lin;Yi Hu;Hong Xie;Ji-Song Guan;;The dynamics of memory retrieval and consolidation is regulated by brain states[A];中国神经科学学会第十二届全国学术会议论文集[C];2017年
中国博士学位论文全文数据库 前3条
1 Hongbiao CHEN;[D];广东外语外贸大学;2001年
2 刘爽;多特征融合图像检索方法及其应用研究[D];哈尔滨理工大学;2016年
3 朱星玮;社交媒体信息结构化组织及其应用研究[D];清华大学;2015年
中国硕士学位论文全文数据库 前10条
1 Kogure Yasushi;《中华人民共和国反垄断法》与有效的风险管理[D];复旦大学;2008年
2 Hasnain Ali Salman;[D];中南大学;2012年
3 Zholbolduev Duishonbek;[D];华南理工大学;2015年
4 苏凤娟;翻译中的重复与多样性:汉法对比研究[D];外交学院;2017年
5 齐怀峰;融合四种特征的基于内容的图像检索算法研究[D];云南师范大学;2006年
6 Kavutse Vianney Augustine;[D];中南大学;2009年
7 NZEGGE CHANTALE EBUDE;变电培训中的技能教学系统的研究和实现[D];南昌大学;2008年
8 王晓彤;基底外侧杏仁核内的蛋白降解调控线索性恐惧记忆的擦除[D];山东大学;2016年
9 沈彦波;面向智能信息检索:集成支撑矢量机排序、约束自适应传递和交互式图像检索[D];西安电子科技大学;2012年
10 Conklin Lisa;2008年度中国电子商务[D];复旦大学;2008年
中国重要报纸全文数据库 前2条
1 ;HARDWWORD猜猜猜[N];电脑报;2004年
2 重庆森林;你也可以做“小偷”[N];电脑报;2004年
 快捷付款方式  订购知网充值卡  订购热线  帮助中心
  • 400-819-9993
  • 010-62982499
  • 010-62783978