Chinese Word Splitter(䏿–‡åˆ†è¯)
Categories
Component ID
Component name
Component type
Component security advisory coverage
Downloads
Component created
Component changed
Component body
Support _search_preprocess interface. This module split chinese word with space. So it make search module to add correct chinese word into index table. You need re-index your site after active this module.Module works with a user-defined dictionary. So in fact it can support split other languages.
The dictionary have recreated with B-tree index. So if you use this module with indexed dictionary, it will 10 times faster than ever and take only a little memory.
Now there are two match arithmetic in module.
Using with 4.7 above, you should disable "simple Chinese/Japanese/Korean tokenizer" in search.module setting.
BTW: If you have Japanese or Korean dictionary, please kindly contact me with i.zealy ~at~ gmail.com. It's possible to make this module to process Japanese or Korean words.
最新更新:
drupal 6å·²ç»æŽ¨å‡ºè®¸ä¹…ï¼Œå¾ˆé•¿æ—¶é—´æ²¡æœ‰ç²¾åŠ›ä¸ºå¼€æºç¤¾åŒºåšç‚¹ä»€ä¹ˆã€‚ç±ç€æ¤æ¬¡drupal 6.1å‡çº§çš„æœºä¼šï¼Œå°†ä¸æ–‡åˆ†è¯æ¨¡å—按照原æ¥çš„ç†æƒ³åšäº†æœ‰å²ä»¥æ¥æœ€å¤§çš„一次改进,相信改进的内容还是能让人振奋一下:
终于实现了预索引的分è¯å—典文件,使用B-Tree算法组织,å¯ä»¥å¿«é€Ÿè¿›è¡ŒåŸºäºŽæ–‡ä»¶çš„æŸ¥æ‰¾ã€‚获得的好处有:现在å—典文件å¯ä»¥ä¸å†è½½å…¥å†…å˜ï¼Œä½¿ç”¨B-æ ‘å—å…¸æ—¶åŸºæœ¬ä¸æ¶ˆè€—内å˜ï¼Œè¿™æ ·å¯ä»¥é‡‡ç”¨å·¨åž‹å—典,也å¯ä»¥é¿å…大家的php内å˜è¶…é™åˆ¶ã€‚
æä¾›äº†B-æ ‘æœç´¢ç”¨çš„简体/ç¹ä½“ä¸¤ç”¨ä¸æ–‡å·¨åž‹å—典,本人专门生æˆçš„,准确性大大æé«˜ã€‚
优化了算法,现在匹é…循环比原æ¥è‡³å°‘少三分之一。
æä¾›äº†æ£å‘最å°åŒ–å’Œé€†å‘æœ€å°åŒ–ä¸¤ç§æ–°çš„匹é…算法,相对最大化匹é…算法,其匹é…循环å¯ä»¥å‡å°‘一åŠä»¥ä¸Šï¼Œè€Œç»“æžœä¹Ÿåœ¨å¯æŽ¥å—的范围。
æä¾›ç±»æœç´¢çš„è¯é•¿åº¦é€‰é¡¹ï¼Œè¿™ä¸ªå¯¹æ€§èƒ½æœ‰ä¸€å®šçš„å½±å“,需è¦å¤§å®¶æµ‹è¯•下看多少最为åˆç†ï¼Œå› ä¸ºç›®å‰æä¾›çš„è¯åº“æœ€é•¿åªæœ‰å››ä¸ªå—ï¼Œå› æ¤ä¹Ÿåªæœ‰2,3,4çš„é•¿åº¦é€‰é¡¹æ‰æœ‰æ„ä¹‰ã€‚å› ä¸ºè¯—è¯çš„关系,今åŽä¹Ÿè®¸ä¼šæä¾›æœ€é•¿7个å—çš„è¯åº“
ä¿®æ£äº†åŽŸæ¥ç¨‹åºä¸çš„分è¯é”™è¯¯ï¼ŒçŽ°åœ¨å¯¹ä¸è‹±æ–‡æ•°å—æ··åˆå—符串处ç†çš„æ£ç¡®çŽ‡å¤§å¤§æé«˜äº†ã€‚
结åˆä¸Šé¢è¿™äº›æ”¹è¿›ï¼Œæ€§èƒ½è‡³å°‘超过原æ¥çš„åå€ï¼Œå†…å˜æ¶ˆè€—从巨大é™åˆ°å¾ˆå°ï¼ŒCPUå 用率也很低(这些都基于我的VPS,我是lighttpd,大家å¯ä»¥æä¾›åé¦ˆï¼Œçœ‹çœ‹ä½ ä»¬çš„æƒ…å†µï¼‰ã€‚ä½¿ç”¨æ—¶è¯·å…³é—æœç´¢è®¾ç½®é‡Œçš„“简å•䏿—¥éŸ©å¤„ç†â€ã€‚
æ¤æ¨¡å—支æŒ_search_preprocess接å£ï¼Œå¯å¯¹ä¸æ–‡è¿›è¡Œåˆ†è¯ï¼Œä»¥ä¾¿åœ¨search模å—的预索引和æœç´¢æ—¶èŽ·å¾—æ£ç¡®çš„䏿–‡ç»“果,é¿å…使用简å•䏿—¥éŸ©å¤„ç†æ—¶äº§ç”Ÿå·¨é‡çš„æœç´¢æ¡ç›®ã€‚å®‰è£…æ¤æ¨¡å—åŽï¼Œéœ€è¦é‡æ–°ç”ŸæˆSearch索引,建议索引è¯é•¿åº¦ä¸º1或2。
模å—使用用户定义å—å…¸ï¼Œå› æ¤å®žé™…上使用åˆé€‚çš„å—å…¸å¯ä»¥æ”¯æŒå…¶ä»–çš„è¯è¨€ã€‚
ç›®å‰æä¾›æ£å‘最大匹é…å’Œé€†å‘æœ€å¤§åŒ¹é…两ç§ç®—法。
在4.7下使用时,需è¦å…³é— 管ç†-〉设置-〉æœç´¢ ä¸çš„“简å•CJKï¼ˆä¸æ—¥éŸ©å—符)处ç†â€é€‰é¡¹ã€‚
注æ„:å—典文件是UTF-8æ ¼å¼ï¼ˆå¸¦BOMå¤´æ ‡ï¼‰ã€‚åœ¨æœ‰äº›ç³»ç»Ÿä¸Šä½ å¯èƒ½éœ€è¦åŽ»æŽ‰BOMå¤´æ ‡ï¼Œæ¨¡å—æ‰èƒ½æ£ç¡®çš„读å–å—典并匹é…分è¯ï¼Œå¦åˆ™å¯èƒ½ä¸èƒ½åˆ†è¯æˆåŠŸã€‚
now support 4.7, 5.x, and 6.x