用 jieba 为 emacs 提供中文分词扩展选区的功能,求教写这样的 elisp 代码有救吗

先用一个动图展示效果 gif1027

先是对光标所在行进行分词,这里用的是请求 flask api, 可以改为命令行直接获取

(defun jieba-cut ()
  (interactive)
  (progn
    (setq r nil)
    (let ((text (thing-at-point 'line t)))
      (request
	"http://127.0.0.1:5000/jieba_cut"
      :type "POST"
      :data (encode-coding-string (json-encode `(("text". ,text))) 'utf-8)
      :headers '(("Content-Type" . "application/json"))
      :parser 'json-read
      :encoding 'utf-8
      :sync t
      :success (cl-function
		(lambda (&key data &allow-other-keys)
		  (setq r (assoc-default 'result data)))))
    (mapcar 'identity r))))

flask 的分词 api

import jieba
@app.route('/jieba_cut', methods=['POST'])
def jieba_cut():
    data = json.loads(request.data)
    text = data.get('text')
    result = [i for i in jieba.cut(text)]
    print(result)
    return jsonify({'result': result})

下面是为 emacs 提供中文分词扩展选区的功能,并绑定在 C-- 按键上,和 expand-region 的快捷键 C-= 对应,用了递归

(defun jieba-expand2 (list point acc)
  (if (<= point (+ acc (length (car list))))
      (list acc (length (car list)))
    (jieba-expand2 (cdr list) point (+ acc (length (car list))))))

(defun jieba-expand-at-point2 ()
  (interactive)
  (let* ((p1 (point))
	 (je2 (progn
		(beginning-of-line)
		(jieba-expand2 (jieba-cut)
			       (- p1 (point)) -1)))
	 (beg (car je2))
	 (len1 (cadr je2)))
    (forward-char (+ beg 1))
    (set-mark (point))
    (forward-char len1)))


(defun jieba-expand-at-point3 (beg end)
  (interactive "r")
  (if (region-active-p)
      (let* ((p1 (+ end 1))
	     (je2 (progn
		    (beginning-of-line)
		    (jieba-expand2 (jieba-cut)
				   (- p1 (point)) -1)))
	     (len1 (if (equal 1 (cadr je2))
		       (+ (car je2) 1)
		      (+ (car je2) (cadr je2) 1))))
	(set-mark beg)
	(forward-char len1))
    (let* ((p1 (point))
	     (je2 (progn
		    (beginning-of-line)
		    (jieba-expand2 (jieba-cut)
				   (- p1 (point)) -1)))
	     (beg (car je2))
	     (len1 (cadr je2)))
	(forward-char (+ beg 1))
	(set-mark (point))
	(forward-char len1))))

(global-set-key (kbd "C--") 'jieba-expand-at-point3)

没救了

(defun jieba-expand2 (list point acc)
  (if (<= point (+ acc (length (car list))))
      (list acc (length (car list)))
    (jieba-expand2 (cdr list) point (+ acc (length (car list))))))

一看就知道不但性能低下,而且 list 参数是 nil (或者 list 里 string 的总长小于 point)的话就会爆栈,真正的 Lisper 是会这样写的

(defun do-every-thing-with-out-recursion (strlst point)
  (let ((acc -1)
        len)
    (while strlst
      (setq len (length (car strlst))
            strlst (cdr strlst))
      (if (<= point (+ acc len))
          (setq strlst nil)
        (setq acc (+ acc len))))
    (list acc len)))

;; (do-every-thing-with-out-recursion '("as" "dsa" "aaa" "adsdsad" "asddsasd") 3)
;; => (1 3)
;; same as
;; (jieba-expand2 '("as" "dsa" "aaa" "adsdsad" "asddsasd") 3 -1)

起的名字还没有任何对功能的提示,让人看了连想改进的动力都没了

2 个赞

社区已经有大佬开发过高性能的类似插件了

2 个赞