SphinxSearch WARNING: word overrun buffer, clipped!!!

It is a warning information when the cronjob running every night.  This cronjob is used to update or re-index the phpBB powered forum, which is a Chinese forum.

The whole error is a kind of a list of warning with same problem.

WARNING: word overrun buffer, clipped!!!
clipped (len=126, word='请注意只有使用中国付款互联网付款服务支持的中国信用卡和借记卡才可以启动该话务中心的服')
original (len=126, word='请注意只有使用中国付款互联网付款服务支持的中国信用卡和借记卡才可以启动该话务中心的服')
WARNING: word overrun buffer, clipped!!!
clipped (len=126, word='我想不管你后面是因为衣服的问题还是喜欢不喜欢的问题应该不能退了如果当时店里的人已经告')
original (len=126, word='我想不管你后面是因为衣服的问题还是喜欢不喜欢的问题应该不能退了如果当时店里的人已经告')
WARNING: word overrun buffer, clipped!!!
clipped (len=126, word='新的网上申请表含有条形码而这些条形码将可以允许我们电子传输信息从而减少了签证申请过程')
original (len=126, word='新的网上申请表含有条形码而这些条形码将可以允许我们电子传输信息从而减少了签证申请过程')

How to solve the problem with the overrun buffer warning?

Edit the conf file at /etc/sphinxsearch/sphinx.conf

Add following into index section:

ngram_len             = 1
ngram_chars           = U+3000..U+2FA1F

Save it and run the cronjob. No more warnings.

Based on the sphinx.conf.dist explaination:

# n-gram length to index, for CJK indexing
# only supports 0 and 1 for now, other lengths to be implemented
# optional, default is 0 (disable n-grams)
#
# ngram_len             = 1

# n-gram characters list, for CJK indexing
# optional, default is empty
#
# ngram_chars           = U+3000..U+2FA1F

This kind of warning has only happened on CJK language, including Chinese, Japanese, Korean. So-called East Asia Languages.

 

This is a how-to to give the fix on phpBB forum, using SphinxSearch as backend search in Chinese.