因为xapian对CJK的搜索比较弱,只能搜两个字的词,如果搜“中国人”就不行。我写了个程序,可以自动帮你把单词切分为2个字的,并进行搜索。
;;
;; Xapian, the search engine of mu has a poor support of CJK characters,
;; which causes only query contains no more than 2 CJK characters works.
;;
;; https://researchmap.jp/?page_id=457
;;
;; This workaroud breaks any CJK words longer than 2 characters into
;; combines of bi-grams. Example: 我爱你 -> (我爱 爱你)
;;
(defun mu4e-goodies~break-cjk-word (word)
"Break CJK word into list of bi-grams like: 我爱你 -> 我爱 爱你"
(if (or (<= (length word) 2)
(equal (length word) (string-bytes word))) ; only ascii chars
word
(let ((pos nil)
(char-list nil)
(br-word nil))
(if (setq pos (string-match ":" word)) ; like: "s:abc"
(concat (substring word 0 (+ 1 pos))
(mu4e-goodies~break-cjk-word (substring word (+ 1 pos))))
(if (memq 'ascii (find-charset-string word)) ; ascii mixed with others like: abcあいう
word
(progn
(setq char-list (split-string word "" t))
(while (cdr char-list)
(setq br-word (concat br-word (concat (car char-list) (cadr char-list)) " "))
(setq char-list (cdr char-list)))
br-word))))))
(defun mu4e-goodies~break-cjk-query (expr)
"Break CJK strings into bi-grams in query."
(let ((word-list (split-string expr " " t))
(new ""))
(dolist (word word-list new)
(setq new (concat new (mu4e-goodies~break-cjk-word word) " ")))))
(setq mu4e-query-rewrite-function 'mu4e-goodies~break-cjk-query)
这个包含在我的一个包含了不少我自己写的mu4e
的扩展功能的GitHub
项目中。