新人报道,写了个中文词性高亮的小功能,请各位前辈批评

elisp 有hashmap的

我也入坑一年,和你比 差太多了,后浪推前浪, :+1:

不至于,我这才做了多么微不足道的一些事啊,大家才是真正有能力、有贡献的!一起加油!

这些天真的感谢大家!无论是各位前辈的热情帮助还是伟大的emacs本身,都实在是太让我感动了!此情非长歌无以言表,特作拙句以遗诸君:

辛丑咏emacs

铸炼琢磨五九年,春秋一去尔一坚。

力出盘古开寰宇,朗若云神御九天。

四海芳邻常伴侧,玲珑情虑每增添。

料得此心君身系,无奈今生爱恨间。

注: 五九年:自emacs1976年发布以来,已有五九四十五年;

云神:云中君,见《九歌》;

只是聊表心绪,字句不甚工整精炼,尾联一字还出了律,让大家见笑了

12 个赞

M-x poem

2 个赞

虽然不懂诗歌,但是这个帖子非常有趣 :+1:

太牛了,这就是后浪吗

哎,我等前浪和有文化的后浪比,这就是差距啊。

是啊,佩服佩服。。。。

可别这么说,没啥文不文化的,我一没做出什么贡献,二没什么有意义的创造,各位前辈可比我强太多啦 :smile:

我一直相信,无论什么事,只要认真干一干,付出那么些时间,都是必然可以获得相当的成就的,无论编程还是写作,任何事都是如此!大家大多都比我年龄大许多,智慧也比我高许多,只不过没有在我们看得见的地方罢了,这本也就无可厚非啦。

诗词什么的,如果有同学喜欢,可以手机里下个古诗文网,平常没事就跟刷公众号似的刷两篇;再多关注关注生活,像路边的花鸟草木啊身边的人啊什么的(我就很喜欢学校花园里碧桃树的紫叶和核桃树叶的芳香,还有老是在扑麻雀的花喜鹊,哦还特别喜欢春天的风,它真的是有香气的),清理清理心情,没过两周就有文思蹦出来啦!

如果想写点带格律的,可以买来《诗词格律》读一读,薄薄一本,也就30出头,没几小时就看完了;对填词感兴趣的可以额外看一本《填词丛谈》,不费什么功夫,倒也能为生活添一抹色彩!(总之不会比圆锥曲线难就是了 :joy:

6 个赞

做时间的朋友,心态好,前途无量!

1 个赞

哈哈哈哈我来交作业啦!感谢 @cireu @LdBeth 两位前辈的批评和指导!

代码基本被我整个重写了一遍,足足有421行,而且竟然跑通了,不可思议,感觉自己创造了奇迹 :joy:

在这个过程中我学到了巨多的东西,探索了很多新的技巧,也de了一堆bug什么的,让我想起了小学刚玩mc时把方块挨个放到地上鼓捣的经历,很有趣~

这里是代码,为了便于阅读,以及为了让我这个逻辑苦手写得下去,我尽可能多地加了注释 :joy:

(defgroup cnhl-colors nil
  "中文词性颜色"
  :group 'cnhl-color)

;; 类比色
;; 红色系
(defface cnhl-n-r-j
  '((t (:foreground "#F0B0A5")))
  "名词、代词、简称颜色"
  :group 'cnhl-colors)
(defface cnhl-np
  '((t (:foreground "#F0A299")))
  "人名颜色"
  :group 'cnhl-colors)
;; 绿色系
(defface cnhl-v-i-id
  '((t (:foreground "#C8E3A1")))
  "动词、习语颜色"
  :group 'cnhl-colors)
;; 橙黄色系
(defface cnhl-a
  '((t (:foreground "#EDC7A8")))
  "形容词颜色"
  :group 'cnhl-colors)
(defface cnhl-f-s-t-m-q-mq
  '((t (:foreground "#EDBD9D")))
  "方位词、处所词、时间词、数词、量词、数量词颜色"
  :group 'cnhl-colors)
;; 浅黄色系
(defface cnhl-d
  '((t (:foreground "#EDE7A8")))
  "副词颜色"
  :group 'cnhl-colors)
(defface cnhl-c
  '((t (:foreground "#EDE39D")))
  "连词颜色"
  :group 'cnhl-colors)
(defface cnhl-p
  '((t (:foreground "#EDE891")))
  "介词颜色"
  :group 'cnhl-colors)
;; 中黄色系
(defface cnhl-u-y-h-k
  '((t (:foreground "#D7C8A4")))
  "助词、语气助词、前接成分、后接成分颜色"
  :group 'cnhl-colors)
(defface cnhl-e-o
  '((t (:foreground "#D6C299")))
  "叹词、拟声词颜色"
  :group 'cnhl-colors)
;; 其它
(defface cnhl-g-w-x
  '((t (:foreground "gray80")))
  "语素、标点及其它颜色"
  :group 'cnhl-colors)

;; 浓度色
(defface cnhl-shades-n-r-j
  '((t (:foreground "#FFD6CC")))
  "名词、代词、简称颜色"
  :group 'cnhl-colors)
(defface cnhl-shades-v-i-id
  '((t (:foreground "#FFECC2")))
  "动词、习语颜色"
  :group 'cnhl-colors)
(defface cnhl-shades-a-f-s-t-m-q-mq
  '((t (:foreground "#CCBC9B")))
  "形容词、方位词、处所词、时间词、数词、量词、数量词颜色"
  :group 'cnhl-colors)
(defface cnhl-shades-d-c-p
  '((t (:foreground "#C0A199")))
  "副词、连词、介词颜色"
  :group 'cnhl-colors)
(defface cnhl-shades-u-y-h-k-e-o
  '((t (:foreground "#998D74")))
  "助词、语气助词、前接成分、后接成分、叹词、拟声词颜色"
  :group 'cnhl-colors)
(defface cnhl-shades-g-w-x
  '((t (:foreground "gray65")))
  "语素、标点及其它颜色"
  :group 'cnhl-colors)

;; 简化浓度色
(defface cnhl-shades-simple-n-r-j-v-i-d
  '((t (:foreground "#FFECC2")))
  "名词、代词、简称、动词、习语颜色"
  :group 'cnhl-colors)
(defface cnhl-shades-simple-a-f-s-t-m-q-mq-d-c-p
  '((t (:foreground "#DFD1B1")))
  "形容词、方位词、处所词、时间词、数词、量词、数量词、副词、连词、介词颜色"
  :group 'cnhl-colors)
(defface cnhl-shades-simple-u-y-h-k-e-o
  '((t (:foreground "#A99D84")))
  "助词、语气助词、前接成分、后接成分、叹词、拟声词颜色"
  :group 'cnhl-colors)

(defvar *cnhl-color-scheme* nil
  "cnhl中文词性高亮配色方案
nil代表类比色(色彩最丰富),simple代表简化浓度色(色彩最简单),其余值代表浓度色(红黄色系五色)")

(defun thulac-start ()
  "启动python版thulac词性分析process"
  (make-process
   :name "thulac"
   :buffer "*thulac*"
   :command (list "python3" "-c" "import sys;import json;import thulac;t=thulac.thulac();exec('while True:print(json.dumps(t.cut(sys.stdin.readline()),ensure_ascii=False))')"))
  ;; 第一次实例化thulac时会耗费一段时间加载分析模型,加载完毕会输出"Model loaded succeed"作为提示
  ;; 使用accept-process-output等待加载完成。这段时间(大约数秒)emacs会卡住。
  (accept-process-output (get-process "thulac"))
  ;; 绑定filter
  (set-process-filter
   (get-process "thulac")
   'thulac-filter)
  ;; 退出emacs时自动杀掉thulac process
  (set-process-query-on-exit-flag (get-process "thulac") nil))

(defvar *cnhl-hl-enable* t
  "是否启用高亮,用于仅分词的情况")

(defvar *cnhl-last-info*
  (list :beg nil :end nil :result nil :buffer nil)
  "上一分析句的信息")

(defvar *cnhl-filter-timer* nil
  "用于临时存储filter所用的timer")

(defvar *cnhl-major-mode* (list 'org-journal-mode 'org-mode 'text-mode)
  "允许进行中文高亮的buffer mode")

(defvar *cnhl-whole-buffer* nil
  "全buffer高亮模式开关变量")

(defun thulac-filter (process output)
  "接收python thulac传回的中文分析结果,进行初步处理后执行操作函数"
  (when (member major-mode *cnhl-major-mode*)
    ;; 使用python -c方式运行的process,单次输出较长的字符串时会被emacs分段读取传入filter(大约在第800C左右),
    ;; 执行python文件则无此问题。于是此处设置timer延迟执行,防止意外断句
    ;; 如果有正在准备执行的操作就干掉它重来
    (when *cnhl-filter-timer*
      (cancel-timer *cnhl-filter-timer*)
      (setq *cnhl-filter-timer* nil)
      ;; 把新到的输出加进记录中
      (setq output
            (concat (plist-get *cnhl-last-info* :result)
                    output)))
    ;; 把输出记录下来
    (plist-put *cnhl-last-info* :result output)
    ;; 设置timer
    (setq *cnhl-filter-timer*
          (run-at-time
           ;; 默认间隔0.05s足矣
           "0.05" nil '(lambda ()
                         ;; 若开启高亮则执行
                         (when *cnhl-hl-enable*
                           (cnhl-run))
                         ;; 执行高亮后删掉自己
                         (setq *cnhl-filter-timer* nil)
                         ;; 若启动全buffer高亮,则自动跳到下一行重复执行高亮
                         (when *cnhl-whole-buffer*
                           (goto-line (+ (line-number-at-pos) 1))
                           (cnhl)
                           ;; 执行到buffer底部则退出全buffer高亮
                           (when (= (line-end-position) (buffer-end 1))
                             (setq *cnhl-whole-buffer* nil))))))))

(defun cnhl-current-word-info ()
  "获取光标所在词的一系列信息,用于中文分词"
  (interactive)
  ;; 若缓存中没有光标所在句的结果则对当前句执行高亮以获得信息
  (unless (and (plist-get *cnhl-last-info* :result)
               (string= (buffer-name) (plist-get *cnhl-last-info* :buffer))
               (> (point) (plist-get *cnhl-last-info* :beg))
               (< (point) (plist-get *cnhl-last-info* :end)))
    (cnhl-sentence)
    ;; 等待分析结果出来,避免因为timer延时而导致的意外
    (accept-process-output (get-process "thulac")))
  (block body
    ;; 从前往后将每个词的结束位置与光标位置相比较,当某词词末大于光标位置时,该位置即为光标所在词词末
    ;; array - 句子分析结果;p - 光标词词末;prev-1 - 上一词词末;prev-2 - 上上词词末
    (let ((array (json-parse-string
                  (plist-get *cnhl-last-info* :result)))
          (p (plist-get *cnhl-last-info* :beg))
          (prev-1 nil)
          (prev-2 nil))
      ;; idx - 在遍历的词的位置;word - 在遍历的词
      (do* ((idx 0 (1+ idx))
            (word (aref (aref array idx) 0) (aref (aref array idx) 0)))
          ((< idx 0))
        ;; 挨个腾地儿
        (setq prev-2 prev-1)
        (setq prev-1 p)
        (setq p (+ p (length word)))
        ;; 当正在遍历的词末位置大于光标位置时
        (when (> p (point))
          ;; 返回本次词末,上词词末,上上词词末
          (return-from body (list :end p :prev-1 prev-1 :prev-2 prev-2)))))))

(defun cnhl-forward-word ()
  (interactive)
  (if (member major-mode *cnhl-major-mode*)
      ;; 前往本词词末,即下一词词首
      (goto-char (plist-get (cnhl-current-word-info) :end))
    ;; 若不在设置启用的major-mode中,则fallback回emacs默认的分词操作函数
    (forward-word)))

(defun cnhl-backward-word ()
  (interactive)
  (if (member major-mode *cnhl-major-mode*)
      ;; 前往上上词词末,即上词词首
      (goto-char (plist-get (cnhl-current-word-info) :prev-2))
    (backward-word)))

(defun cnhl-backward-kill-word ()
  (interactive)
  (if (member major-mode *cnhl-major-mode*)
      (let ((result (cnhl-current-word-info)))
        ;; 如果光标位置在词末,删掉上上词词末至上词词末,即上词词首至本词词首,即上一个词
        (if (= (point) (plist-get result :prev-1))
            (delete-region
             (plist-get result :prev-2)
             (plist-get result :prev-1))
          ;; 否则整个删掉光标所在词
          (delete-region
           (plist-get result :prev-1)
           (plist-get result :end))))
    (backward-kill-word 1)))

(defun cnhl-kill-word ()
  (interactive)
  (if (member major-mode *cnhl-major-mode*)
      (let ((result (cnhl-current-word-info)))
        ;; 删掉上词词末至本词词末,即本词词首至下词词首,即本词
        (delete-region
         (plist-get result :prev-1)
         (plist-get result :end)))
    (kill-word 1)))


(defun cnhl-run ()
  "根据存储的信息执行中文词性高亮"
  (save-excursion
    ;; 清除高亮区域内已有的overlay
    (dolist (overlay (overlays-in (plist-get *cnhl-last-info* :beg)
                                  (plist-get *cnhl-last-info* :end)))
      (delete-overlay overlay))
    ;; 光标移到句首
    (goto-char (plist-get *cnhl-last-info* :beg))
    ;; 设变量num,从0开始遍历所有分析出的词语
    (let ((num 0)
          (output (json-parse-string (plist-get *cnhl-last-info* :result))))
      (while num
        (condition-case err
            ;; 选取item为output数组中的第num个词的分析结果,格式为数组,第0项为词语,第1项为词性标识
            (let* ((item (aref output num))
                   ;; 向前搜索item的第一项,即目标词语,记录词语后端位置,搜索范围仅限分析句内
                   (end (search-forward
                         (aref item 0)
                         (plist-get *cnhl-last-info* :end)))
                   ;; 根据搜索出的位置新建overlay
                   (overlay (make-overlay
                             (- end (length (aref item 0))) end)))
              ;; 贴overlay的颜色
              (overlay-put
               overlay 'face
               (if *cnhl-color-scheme*
                   (if (string= *cnhl-color-scheme* "simple")
                       ;; 通过语法标识判断应选用的颜色
                       ;; 简单浓度色(黄色系三色)
                       (pcase (aref item 1)
                         ((or "n" "ns" "ni" "nz" "r" "j" "np" "v" "i" "d")
                          'cnhl-shades-simple-n-r-j-v-i-d)
                         ((or "a" "f" "s" "t" "m" "q" "mq" "d" "c" "p")
                          'cnhl-shades-simple-a-f-s-t-m-q-mq-d-c-p)
                         ((or "u" "y" "h" "k" "e" "o")
                          'cnhl-shades-simple-u-y-h-k-e-o)
                         (_
                          'cnhl-shades-g-w-x))
                     ;; 浓度色(红黄色系五色)
                     (pcase (aref item 1)
                       ((or "n" "ns" "ni" "nz" "r" "j" "np")
                        'cnhl-shades-n-r-j)
                       ((or "u" "y" "h" "k" "e" "o")
                        'cnhl-shades-u-y-h-k-e-o)
                       ((or "v" "id" "i")
                        'cnhl-shades-v-i-id)
                       ((or "d" "c" "p")
                        'cnhl-shades-d-c-p)
                       ((or "a" "f" "s" "t" "m" "q" "mq")
                        'cnhl-shades-a-f-s-t-m-q-mq)
                       (_
                        'cnhl-shades-g-w-x)))
                 ;; 类比色(彩色)
                 (pcase (aref item 1)
                   ((or "n" "ns" "ni" "nz" "r" "j")
                    'cnhl-n-r-j)
                   ((or "u" "y" "h" "k")
                    'cnhl-u-y-h-k)
                   ((or "v" "id" "i")
                    'cnhl-v-i-id)
                   ((or "np")
                    'cnhl-np)
                   ((or "a")
                    'cnhl-a)
                   ((or "p")
                    'cnhl-p)
                   ((or "d")
                    'cnhl-d)
                   ((or "c")
                    'cnhl-c)
                   ((or "f" "s" "t" "m" "q" "mq")
                    'cnhl-f-s-t-m-q-mq)
                   ((or "e" "o")
                    'cnhl-e-o)
                   (_
                    'cnhl-g-w-x))))
              ;; 贴完了,num加一,开始处理下一个词
              (setq num (+ num 1)))
          ;; 如果报错(如搜索出界),则停止当前句的高亮过程
          (t (setq num nil)))))))

(defun cnhl (&optional beg end)
  "中文词性高亮"
  (interactive)
  ;; 清除可能意外存留的filter timer,防止因高亮过程意外终止,timer未正常清除导致的高亮失败
  (setq *cnhl-filter-timer* nil)
  ;; 启动thulac
  (unless (get-process "thulac") (thulac-start))
  ;; 若无参数运行,则默认取当前行
  (unless beg
    (setq beg (line-beginning-position))
    (setq end (line-end-position)))
  ;; 记录字符串数据
  (setq *cnhl-last-info*
        (list :beg beg
              :end end
              :buffer (buffer-name)))
  ;; 将范围内的字符串送入thulac进行分析,结果返回至thulac-receive-output进行操作
  (process-send-string
   "thulac"
   (concat
    (string-trim (buffer-substring-no-properties beg end))
    "\n")))

(defun cnhl-buffer ()
  "对当前buffer全文逐行进行中文词性高亮"
  (interactive)
  ;; 启动全文高亮后移动光标至顶部,开始进行高亮
  (setq *cnhl-whole-buffer* t)
  (goto-char (point-min))
  (cnhl))

  (defun cnhl-sentence ()
    "对当前句进行中文词性高亮"
    (interactive)
    (save-excursion
      ;; 生成句读匹配的正则
      (let* ((regxp (rx (or (group ",")
                            (group "。")
                            (group "?")
                            (group ";")
                            (group ":")
                            (group "、")
                            (group "‘")
                            (group "’")
                            (group "“")
                            (group "”")
                            (group "…")
                            (group "!")
                            (group "(")
                            (group ")")
                            (group "~")
                            (group "*"))))
             ;; 寻找语段起始结束位置,若无标点则直到行末
             (end
              (save-excursion
                (or (search-forward-regexp regxp (line-end-position) t)
                    (line-end-position))))
             (beg
              (or (search-backward-regexp regxp (line-beginning-position) t)
                  (line-beginning-position))))
        ;; 对语段进行中文高亮
        (cnhl beg end))))

(defvar *cnhl-sentence-timer* nil
  "用于临时存储cnhl实时高亮的timer")

(defun cnhl-sentence-timely (&optional beg end len)
  "中文实时高亮函数,加入after-change-functions以实现实时高亮输入内容"
  (when (member major-mode *cnhl-major-mode*)
    ;; 删除准备执行的高亮
    (when *cnhl-sentence-timer*
      (cancel-timer *cnhl-sentence-timer*))
    ;; 安排高亮任务,延迟一段时间后执行,防止过于频繁地执行高亮造成资源浪费
    (setq *cnhl-sentence-timer*
          (run-at-time
           ;; 默认延迟0.1秒
           "0.1" nil '(lambda ()
                        (cnhl-sentence)
                        ;; 执行完删除自己
                        (setq *cnhl-sentence-timer* nil))))))

(defun cnhl-switch-color ()
  "切换中文高亮配色方案"
  (interactive)
  ;; 按照 类比 - 浓度 - 简化浓度 的顺序切换配色方案
  (if *cnhl-color-scheme*
      (if (string= *cnhl-color-scheme* "simple")
          (setq *cnhl-color-scheme* nil)
        (setq *cnhl-color-scheme* "simple"))
    (setq *cnhl-color-scheme* ""))
  ;; 切换后对当前行重新高亮
  (cnhl))

;; cnhl中文分词操作按键绑定
(global-set-key (kbd "M-f") 'cnhl-forward-word)
(global-set-key (kbd "M-b") 'cnhl-backward-word)
(global-set-key (kbd "M-d") 'cnhl-kill-word)
(global-set-key (kbd "M-DEL") 'cnhl-backward-kill-word)

;; 启用实时高亮输入内容
(add-hook 'after-change-functions #'cnhl-sentence-timely)

效果:

ezgif.com-gif-maker ezgif.com-gif-maker-2 ezgif.com-gif-maker-3

做出的改动有:

  1. 写了个粗陋的分词功能。因为thulac本身就有分词功能,想着不利用一下是不是浪费了 :joy: 本打算抄一抄jieba.el的,却发现自己看不懂,只好写了个掰手指头式的实现 :sob:……

  2. 使用process运行thulac,filter接收结果后调用执行高亮的函数。这里我把python代码写成了单行,避免了多出一个文件的窘境。不过这带来了一个奇怪的问题,在运行文件时无论句子有多长,python的输出都能被filter一次读取到,但在单行情况下用python -c运行,过长的输出就会被分段送进filter,我才疏学浅解决不了,只得多加个timer,郁闷(

  3. 语法上的小优化,像timer、pcase、aref、rx等的使用什么的;cl-defstruct我暂时没有用,一是我不太懂怎么用,二是我发现在使用array存储分析结果后,它的调用频率会相当的高,如果把结果中的每个数组都转成struct会不会造成浪费?……
    trie有在学了,但不太清楚该怎么用,怕贪多嚼不烂我就没写进去qwq……刚好我们这两天的信息课在讲树形结构,为什么我就没认真听呢哈哈哈哈

就像 @cireu 前辈说的,hacking emacs给我以前所未有的干劲和快乐,它简单有趣,又让我觉得,我真的在创造有意义的东西。而且我认为,相比我玩过的其他语言,lisp系的语言最让我觉得合乎人的思维,让我感到有温度,有别的语言所没有的温暖的美,很难不爱上它~

谢谢大家的帮助!请各位前辈多多批评!

我爱emacs!

11 个赞

我本来想解释一下 cl-defstruct 的那啥选项,不过找到篇文章能比我解释的更好

https://nullprogram.com/blog/2018/02/14/

trie 的话只是个进阶要求,不会的话学会用 hashtable 也够了。

The output to the filter may come in chunks of any size. A program that produces the same output twice in a row may send it as one batch of 200 characters one time, and five batches of 40 characters the next. If the filter looks for certain text strings in the subprocess output, make sure to handle the case where one of these strings is split across two or more batches of output; one way to do this is to insert the received text into a temporary buffer, which can then be searched.

https://www.gnu.org/software/emacs/manual/html_node/elisp/Filter-Functions.html

2 个赞

让我想起了之前最喜欢的一Markdown 编辑器 iA Writer.

哈哈哈哈谢谢!我大emacs怎么可能比不过这些写作软件 :smile:

:joy:11月30日更新:

  1. 修复按词移动在句首的失败的问题
  2. 修复输入文字后快速输入标点或换行后上一句未高亮的问题
  3. 玩具式地应用了一下defstruct,路漫漫其修远兮

这次可是暴露出我很多考虑不周的问题来了,给大家道个歉,以后一定好好检查代码哈哈哈哈

感谢 @LdBeth 前辈的指导!希望没有辜负各位前辈的指点 :see_no_evil:

(defgroup cnhl-colors nil
  "中文词性颜色"
  :group 'cnhl-color)

;; 类比色
;; 红色系
(defface cnhl-n-r-j
  '((t (:foreground "#F0B0A5")))
  "名词、代词、简称颜色"
  :group 'cnhl-colors)
(defface cnhl-np
  '((t (:foreground "#F0A299")))
  "人名颜色"
  :group 'cnhl-colors)
;; 绿色系
(defface cnhl-v-i-id
  '((t (:foreground "#C8E3A1")))
  "动词、习语颜色"
  :group 'cnhl-colors)
;; 橙黄色系
(defface cnhl-a
  '((t (:foreground "#EDC7A8")))
  "形容词颜色"
  :group 'cnhl-colors)
(defface cnhl-f-s-t-m-q-mq
  '((t (:foreground "#EDBD9D")))
  "方位词、处所词、时间词、数词、量词、数量词颜色"
  :group 'cnhl-colors)
;; 浅黄色系
(defface cnhl-d
  '((t (:foreground "#EDE7A8")))
  "副词颜色"
  :group 'cnhl-colors)
(defface cnhl-c
  '((t (:foreground "#EDE39D")))
  "连词颜色"
  :group 'cnhl-colors)
(defface cnhl-p
  '((t (:foreground "#EDE891")))
  "介词颜色"
  :group 'cnhl-colors)
;; 中黄色系
(defface cnhl-u-y-h-k
  '((t (:foreground "#D7C8A4")))
  "助词、语气助词、前接成分、后接成分颜色"
  :group 'cnhl-colors)
(defface cnhl-e-o
  '((t (:foreground "#D6C299")))
  "叹词、拟声词颜色"
  :group 'cnhl-colors)
;; 其它
(defface cnhl-g-w-x
  '((t (:foreground "gray80")))
  "语素、标点及其它颜色"
  :group 'cnhl-colors)

;; 浓度色
(defface cnhl-shades-n-r-j
  '((t (:foreground "#FFD6CC")))
  "名词、代词、简称颜色"
  :group 'cnhl-colors)
(defface cnhl-shades-v-i-id
  '((t (:foreground "#FFECC2")))
  "动词、习语颜色"
  :group 'cnhl-colors)
(defface cnhl-shades-a-f-s-t-m-q-mq
  '((t (:foreground "#CCBC9B")))
  "形容词、方位词、处所词、时间词、数词、量词、数量词颜色"
  :group 'cnhl-colors)
(defface cnhl-shades-d-c-p
  '((t (:foreground "#C0A199")))
  "副词、连词、介词颜色"
  :group 'cnhl-colors)
(defface cnhl-shades-u-y-h-k-e-o
  '((t (:foreground "#998D74")))
  "助词、语气助词、前接成分、后接成分、叹词、拟声词颜色"
  :group 'cnhl-colors)
(defface cnhl-shades-g-w-x
  '((t (:foreground "gray65")))
  "语素、标点及其它颜色"
  :group 'cnhl-colors)

;; 简化浓度色
(defface cnhl-shades-simple-n-r-j-v-i-d
  '((t (:foreground "#FFECC2")))
  "名词、代词、简称、动词、习语颜色"
  :group 'cnhl-colors)
(defface cnhl-shades-simple-a-f-s-t-m-q-mq-d-c-p
  '((t (:foreground "#DFD1B1")))
  "形容词、方位词、处所词、时间词、数词、量词、数量词、副词、连词、介词颜色"
  :group 'cnhl-colors)
(defface cnhl-shades-simple-u-y-h-k-e-o
  '((t (:foreground "#A99D84")))
  "助词、语气助词、前接成分、后接成分、叹词、拟声词颜色"
  :group 'cnhl-colors)

(defvar *cnhl-color-scheme* nil
  "cnhl中文词性高亮配色方案
nil代表类比色(色彩最丰富),simple代表简化浓度色(色彩最简单),其余值代表浓度色(红黄色系五色)")

(defun thulac-start ()
  "启动python版thulac词性分析process"
  (make-process
   :name "thulac"
   :buffer "*thulac*"
   :command (list "python3" "-c" "import sys;import json;import thulac;t=thulac.thulac();exec('while True:print(json.dumps(t.cut(sys.stdin.readline()),ensure_ascii=False))')"))
  ;; 第一次实例化thulac时会耗费一段时间加载分析模型,加载完毕会输出"Model loaded succeed"作为提示
  ;; 使用accept-process-output等待加载完成。这段时间(大约数秒)emacs会卡住。
  (accept-process-output (get-process "thulac"))
  ;; 绑定filter
  (set-process-filter
   (get-process "thulac")
   'thulac-filter)
  ;; 退出emacs时自动杀掉thulac process
  (set-process-query-on-exit-flag (get-process "thulac") nil))

(defvar *cnhl-hl-enable* t
  "是否启用高亮,用于仅分词的情况")

(defvar *cnhl-last-info*
  (list :beg nil :end nil :result nil :buffer nil)
  "上一分析句的信息")

(defvar *cnhl-filter-timer* nil
  "用于临时存储filter所用的timer")

(defvar *cnhl-major-mode* (list 'org-journal-mode 'org-mode 'text-mode)
  "允许进行中文高亮的buffer mode")

(defvar *cnhl-whole-buffer* nil
  "全buffer高亮模式开关变量")

(defun thulac-filter (process output)
  "接收python thulac传回的中文分析结果,进行初步处理后执行操作函数"
  (when (member major-mode *cnhl-major-mode*)
    ;; 使用python -c方式运行的process,单次输出较长的字符串时会被emacs分段读取传入filter(大约在第800C左右),
    ;; 执行python文件则无此问题。于是此处设置timer延迟执行,防止意外断句
    ;; 如果有正在准备执行的操作就干掉它重来
    (when *cnhl-filter-timer*
      (cancel-timer *cnhl-filter-timer*)
      (setq *cnhl-filter-timer* nil)
      ;; 把新到的输出加进记录中
      (setq output
            (concat (plist-get *cnhl-last-info* :result)
                    output)))
    ;; 把输出记录下来
    (plist-put *cnhl-last-info* :result output)
    ;; 设置timer
    (setq *cnhl-filter-timer*
          (run-at-time
           ;; 默认间隔0.05s足矣
           "0.05" nil '(lambda ()
                         ;; 若开启高亮则执行
                         (when *cnhl-hl-enable*
                           (cnhl-run))
                         ;; 执行高亮后删掉自己
                         (setq *cnhl-filter-timer* nil)
                         ;; 若启动全buffer高亮,则自动跳到下一行重复执行高亮
                         (when *cnhl-whole-buffer*
                           (goto-line (+ (line-number-at-pos) 1))
                           (cnhl)
                           ;; 执行到buffer底部则退出全buffer高亮
                           (when (= (line-end-position) (buffer-end 1))
                             (setq *cnhl-whole-buffer* nil))))))))

(cl-defstruct (cnhl-word (:type vector)) word part)

(defun cnhl-current-word-info ()
  "获取光标所在词的一系列信息,用于中文分词"
  (interactive)
  ;; 若缓存中没有光标所在句的结果则对当前句执行高亮以获得信息
  (unless (and (plist-get *cnhl-last-info* :result)
               (string= (buffer-name) (plist-get *cnhl-last-info* :buffer))
               (> (point) (plist-get *cnhl-last-info* :beg))
               (< (point) (plist-get *cnhl-last-info* :end)))
    (cnhl-sentence)
    ;; 等待分析结果出来,避免因为timer延时而导致的意外
    (accept-process-output (get-process "thulac")))
  (block body
    ;; 从前往后将每个词的结束位置与光标位置相比较,当某词词末大于光标位置时,该位置即为光标所在词词末
    ;; array - 句子分析结果;p - 光标词词末;prev-1 - 上一词词末;prev-2 - 上上词词末
    (let ((array (json-parse-string
                  (plist-get *cnhl-last-info* :result)))
          (p (plist-get *cnhl-last-info* :beg))
          (prev-1 nil)
          (prev-2 nil))
      ;; idx - 在遍历的词的位置;word - 在遍历的词
      (do* ((idx 0 (1+ idx))
            (word (cnhl-word-word (aref array idx)) (cnhl-word-word (aref array idx))))
          ((< idx 0))
        ;; 挨个腾地儿
        (setq prev-2 prev-1)
        (setq prev-1 p)
        (setq p (+ p (length word)))
        ;; 当正在遍历的词末位置大于光标位置时
        (when (> p (point))
          ;; 返回本次词末,上词词末,上上词词末
          (return-from body (list :end p :prev-1 prev-1 :prev-2 prev-2)))))))

(defun cnhl-forward-word ()
  (interactive)
  (if (member major-mode *cnhl-major-mode*)
      ;; 前往本词词末,即下一词词首
      (goto-char (plist-get (cnhl-current-word-info) :end))
    ;; 若不在设置启用的major-mode中,则fallback回emacs默认的分词操作函数
    (forward-word)))

(defun cnhl-backward-word ()
  (interactive)
  (if (member major-mode *cnhl-major-mode*)
      ;; 光标在句首时,无法抓到上一句的标点,手动将光标前移
      (if (= (point) (line-beginning-position))
          (goto-char (- (point) 1))
        ;; 前往上上词词末,即上词词首
        (goto-char (plist-get (cnhl-current-word-info) :prev-2)))
    (backward-word)))

(defun cnhl-backward-kill-word ()
  (interactive)
  (if (member major-mode *cnhl-major-mode*)
      (progn (when (= (point) (line-beginning-position))
               (goto-char (- (point) 1)))
             (let ((result (cnhl-current-word-info)))
               ;; 如果光标位置在词末,删掉上上词词末至上词词末,即上词词首至本词词首,即上一个词
               (if (= (point) (plist-get result :prev-1))
                   (delete-region
                    (plist-get result :prev-2)
                    (plist-get result :prev-1))
                 ;; 否则整个删掉光标所在词
                 (delete-region
                  (plist-get result :prev-1)
                  (plist-get result :end)))))
    (backward-kill-word 1)))

(defun cnhl-kill-word ()
  (interactive)
  (if (member major-mode *cnhl-major-mode*)
      (let ((result (cnhl-current-word-info)))
        ;; 删掉上词词末至本词词末,即本词词首至下词词首,即本词
        (delete-region
         (plist-get result :prev-1)
         (plist-get result :end)))
    (kill-word 1)))


(defun cnhl-run ()
  "根据存储的信息执行中文词性高亮"
  (save-excursion
    ;; 清除高亮区域内已有的overlay
    (dolist (overlay (overlays-in (plist-get *cnhl-last-info* :beg)
                                  (plist-get *cnhl-last-info* :end)))
      (delete-overlay overlay))
    ;; 光标移到句首
    (goto-char (plist-get *cnhl-last-info* :beg))
    ;; 设变量num,从0开始遍历所有分析出的词语
    (let ((num 0)
          (output (json-parse-string (plist-get *cnhl-last-info* :result))))
      (while num
        (condition-case err
            ;; 选取item为output数组中的第num个词的分析结果,格式为数组,第0项为词语,第1项为词性标识
            (let* ((item (aref output num))
                   ;; 向前搜索item的第一项,即目标词语,记录词语后端位置,搜索范围仅限分析句内
                   (end (search-forward
                         (cnhl-word-word item)
                         (plist-get *cnhl-last-info* :end)))
                   ;; 根据搜索出的位置新建overlay
                   (overlay (make-overlay
                             (- end (length (aref item 0))) end)))
              ;; 贴overlay的颜色
              (overlay-put
               overlay 'face
               (if *cnhl-color-scheme*
                   (if (string= *cnhl-color-scheme* "simple")
                       ;; 通过语法标识判断应选用的颜色
                       ;; 简单浓度色(黄色系三色)
                       (pcase (cnhl-word-part item)
                         ((or "n" "ns" "ni" "nz" "r" "j" "np" "v" "i" "d")
                          'cnhl-shades-simple-n-r-j-v-i-d)
                         ((or "a" "f" "s" "t" "m" "q" "mq" "d" "c" "p")
                          'cnhl-shades-simple-a-f-s-t-m-q-mq-d-c-p)
                         ((or "u" "y" "h" "k" "e" "o")
                          'cnhl-shades-simple-u-y-h-k-e-o)
                         (_
                          'cnhl-shades-g-w-x))
                     ;; 浓度色(红黄色系五色)
                     (pcase (cnhl-word-part item)
                       ((or "n" "ns" "ni" "nz" "r" "j" "np")
                        'cnhl-shades-n-r-j)
                       ((or "u" "y" "h" "k" "e" "o")
                        'cnhl-shades-u-y-h-k-e-o)
                       ((or "v" "id" "i")
                        'cnhl-shades-v-i-id)
                       ((or "d" "c" "p")
                        'cnhl-shades-d-c-p)
                       ((or "a" "f" "s" "t" "m" "q" "mq")
                        'cnhl-shades-a-f-s-t-m-q-mq)
                       (_
                        'cnhl-shades-g-w-x)))
                 ;; 类比色(彩色)
                 (pcase (cnhl-word-part item)
                   ((or "n" "ns" "ni" "nz" "r" "j")
                    'cnhl-n-r-j)
                   ((or "u" "y" "h" "k")
                    'cnhl-u-y-h-k)
                   ((or "v" "id" "i")
                    'cnhl-v-i-id)
                   ((or "np")
                    'cnhl-np)
                   ((or "a")
                    'cnhl-a)
                   ((or "p")
                    'cnhl-p)
                   ((or "d")
                    'cnhl-d)
                   ((or "c")
                    'cnhl-c)
                   ((or "f" "s" "t" "m" "q" "mq")
                    'cnhl-f-s-t-m-q-mq)
                   ((or "e" "o")
                    'cnhl-e-o)
                   (_
                    'cnhl-g-w-x))))
              ;; 贴完了,num加一,开始处理下一个词
              (setq num (+ num 1)))
          ;; 如果报错(如搜索出界),则停止当前句的高亮过程
          (t (setq num nil)))))))

(defun cnhl (&optional beg end)
  "中文词性高亮"
  (interactive)
  ;; 清除可能意外存留的filter timer,防止因高亮过程意外终止,timer未正常清除导致的高亮失败
  (setq *cnhl-filter-timer* nil)
  ;; 启动thulac
  (unless (get-process "thulac") (thulac-start))
  ;; 若无参数运行,则默认取当前行
  (unless beg
    (setq beg (line-beginning-position))
    (setq end (line-end-position)))
  ;; 记录字符串数据
  (setq *cnhl-last-info*
        (list :beg beg
              :end end
              :buffer (buffer-name)))
  ;; 将范围内的字符串送入thulac进行分析,结果返回至thulac-receive-output进行操作
  (process-send-string
   "thulac"
   (concat
    (string-trim (buffer-substring-no-properties beg end))
    "\n")))

(defun cnhl-buffer ()
  "对当前buffer全文逐行进行中文词性高亮"
  (interactive)
  ;; 启动全文高亮后移动光标至顶部,开始进行高亮
  (setq *cnhl-whole-buffer* t)
  (goto-char (point-min))
  (cnhl))

(defun cnhl-sentence ()
  "对当前句进行中文词性高亮"
  (interactive)
  (save-excursion
    ;; 生成句读匹配的正则
    (let* ((regxp (rx (or (group ",")
                          (group "。")
                          (group "?")
                          (group ";")
                          (group ":")
                          (group "、")
                          (group "‘")
                          (group "’")
                          (group "“")
                          (group "”")
                          (group "…")
                          (group "!")
                          (group "(")
                          (group ")")
                          (group "~")
                          (group "*"))))
           (end
            ;; 语段起始结束位置,若无标点则直到行末
            (save-excursion
              (or (search-forward-regexp regxp (line-end-position) t)
                  (line-end-position))))
           (beg
            ;; 本次编辑的起始点小于光标位置,则从起始点为高亮起点向前搜索
            (progn
              (when (and *cnhl-sentence-beginning-position*
                         (< *cnhl-sentence-beginning-position* (point)))
                (goto-char *cnhl-sentence-beginning-position*)
                (setq *cnhl-sentence-beginning-position* nil))
              (or (search-backward-regexp regxp (line-beginning-position) t)
                  (line-beginning-position)))))
      ;; 对语段进行中文高亮
      (cnhl beg end))))

(defvar *cnhl-sentence-timer* nil
  "用于临时存储cnhl实时高亮的timer")

(defvar *cnhl-sentence-beginning-position* nil
  "记录本此语句高亮的起始点")

(defun cnhl-sentence-timely (beg end len)
  "中文实时高亮函数,加入after-change-functions以实现实时高亮输入内容"
  (when (member major-mode *cnhl-major-mode*)
    (if *cnhl-sentence-timer*
        ;; 删除准备执行的高亮
        (cancel-timer *cnhl-sentence-timer*)
      ;; 记录起始点
      (setq *cnhl-sentence-beginning-position* (point)))
    ;; 安排高亮任务,延迟一段时间后执行,防止过于频繁地执行高亮造成资源浪费
    (setq *cnhl-sentence-timer*
          (run-at-time
           ;; 默认延迟0.1秒
           "0.1" nil '(lambda ()
                        (cnhl-sentence)
                        ;; 执行完删除timer
                        (setq *cnhl-sentence-timer* nil))))))

(defun cnhl-switch-color ()
  "切换中文高亮配色方案"
  (interactive)
  ;; 按照 类比 - 浓度 - 简化浓度 的顺序切换配色方案
  (if *cnhl-color-scheme*
      (if (string= *cnhl-color-scheme* "simple")
          (setq *cnhl-color-scheme* nil)
        (setq *cnhl-color-scheme* "simple"))
    (setq *cnhl-color-scheme* ""))
  ;; 切换后对当前行重新高亮
  (cnhl))

;; cnhl中文分词操作按键绑定
(global-set-key (kbd "M-f") 'cnhl-forward-word)
(global-set-key (kbd "M-b") 'cnhl-backward-word)
(global-set-key (kbd "M-d") 'cnhl-kill-word)
(global-set-key (kbd "M-DEL") 'cnhl-backward-kill-word)

;; 启用实时高亮输入内容
(add-hook 'after-change-functions #'cnhl-sentence-timely)

1

增加一个命令参数变量比较好:

(defvar thulac-start-command (list "python3" "-c" "import sys;import json;import thulac;t=thulac.thulac();exec('while True:print(json.dumps(t.cut(sys.stdin.readline()),ensure_ascii=False))')"))

...

   :command thulac-start-command

2

可改为:


(defvar cnhl-word-simple-face-alist
  '((("n" "ns" "ni" "nz" "r" "j" "np" "v" "i" "d") . cnhl-shades-simple-n-r-j-v-i-d)
    (("a" "f" "s" "t" "m" "q" "mq" "d" "c" "p") . cnhl-shades-simple-a-f-s-t-m-q-mq-d-c-p)
    (("u" "y" "h" "k" "e" "o") . cnhl-shades-simple-u-y-h-k-e-o)))

...

               (if *cnhl-color-scheme*
                   (if (string= *cnhl-color-scheme* "simple")
                       ;; 通过语法标识判断应选用的颜色
                       ;; 简单浓度色(黄色系三色)
                       (or (cl-assoc (cnhl-word-part item) cnhl-word-simple-face-alist :test #'member)
                           'cnhl-shades-g-w-x)

其余两种配色类似。

3

是不是可以改为?

(let* ((regexp (rx (group (or "," "。" "?" ";" ":" "、" "‘" "’" "“" "”" "…" "!" "(" ")" "~" "*")))))

4

如果想发布到 melpa,变量和函数的命名必需统一。

谢谢前辈指点!我会及时修改! :laughing: :laughing:

年纪轻轻,好学,有才,还这么谦虚,真是前途无量啊

很抱歉,前段时间为了准备合格考和期末考试,暂时放下了这边的完善工作,有些辜负大家的指教。现在放假了,我又来交作业啦!

虽然我也没想过要提交到melpa什么的(水平差远了哈哈),但每次都一股脑儿把所有代码扔进论坛也不甚雅观。于是我将这个小插件正式定名为 cnhl 放到了github上,亦是方便大家安装使用(感谢 @twlz0ne 前辈的提醒!)。


这次在已有代码上依然有比较多的改动,请各位前辈指导:

  1. 弄了个不三不四的“trie”出来(也不知道算不算trie :joy: 再次感谢 @LdBeth 前辈的指导!),再结合了其它一些不甚雅观的方法改善了一下匹配face部分的代码。

    具体思路是:预置一个列表(cnhl--face-items),表中存有thulac返回结果的首字母与face的对应关系,再用这个列表生成一个哈希表(不知道怎么初始化哈希表的值,只好用loop的笨方法挨个写进去……),然后每次使用thulac返回值的首字母字母去匹配哈希表中的值。这部分代码如下:

    不知道有没有更好的方法,还请前辈们指教。

  2. 重新理了理timer,解决了上一个版本中一直困扰我的几个大bug,例如快速换行导致高亮失败、删字导致下一句第一个词被清除高亮一类的破问题。现在它在写作中的表现已经相当不错。唯一的问题是一口气高亮超过82个字符的行会失败,算是个重大bug,我仔细研究研究是怎么回事(逻辑苦手当场暴毙qaq)……

  3. 主题上做了很大的修改。根据一段时间的使用经验和对词语出现概率的估算,我调整了一下对thulac结果的合并方法,让它用起来更清晰舒服;同时去掉了那两个高度合并的方案,只留下一个,因为根据我的写作实际,彩虹糖似的语法高亮并不会对我造成干扰。我还加入了适配亮色主题的颜色(两种,在README中)。

    我的审美不适合所有人,要定制的话,Adobe Color可能能帮到大家——毕竟文字的高亮还是和代码高亮不太一样,既要考虑到清晰度,也要考虑到整体的和谐,不刺眼,不喧宾夺主,保证思路连贯。

    直接效果就是代码量小了好多哈哈,看着清楚多了!有机会把设计思路总结总结。

  4. 我现在在用emacs29,在29中,原先的return函数被cl-return所替换,我不知道它能不能向下兼容。如果cnhl在旧版本上报错的话,我会新建一个分支。 感谢 @twlz0ne 前辈!不存在问题,是我想当然了qwq

  5. 一些细节上的优化,像语法优化、cl-loop的使用、封装了一个开关高亮的函数之类的杂七杂八的东西,倒是为自己微不足道的进步沾沾自喜了一阵子。另外我还学着其它包给cnhl.el装模作样地加了个头尾,不知道符不符合规范,还请前辈们批评。

看上去是能用了,但我知道它还是过于简陋,连minor mode都没做。下一步我打算让它更像样些,加入一些必要的功能;如果有可能还想换用c++版的thulac,这样就不需要折腾一堆python了,或者直接写成dynamic module——但那离我还太远了,好高骛远要不得。要学习的地方很多呀。

另外还有部分细节体验较差,例如在org有序列表的一行中按下M-RET会导致当前行的高亮被删除;使用一些生成文本的功能后会报找不到下文的错误;高亮范围有时会越出标点导致性能浪费等,我会着手修复。

哦对了,我还想写一篇对文科生们的emacs安利文,不知道有没有哪位前辈已经写过了,我继续补充发扬(flag立在这儿了awa)。


感谢一路指导我的各位前辈们,小子无以为报,旧作一片以酬诸君:

莫疑繁此为何妨,夜觅阑珊守尺方。

代码行行映人肃,个中点点入心长。

肠肥恶见三牲貌,腹馁唯知一口张。

当敬人间才与苦,方成诸夏景还长。

还望各位前辈多多批评!

3 个赞