各位前辈好,emacs新人一枚前来报道!~
在我断断续续使用emacs的近一年来,论坛的大家给了我巨大的帮助,要是没有各位,我绝对会错过无数美好的风景!请多多指教!
我今年高二,平常主要用emacs写点随笔、作文,整理些笔记什么的,由于是文科生,平日里用电脑也总是在跟中文打交道。看着emacs里代码都是漂漂亮亮、五彩缤纷的,而中文却白花花的一片,心里总很不是滋味,写东西时都兴致缺缺了。在学了一点lisp之后,终于忍不住自己动手写了个中文词性高亮出来,弥补了我一直以来的遗憾,在这里分享给大家。我是个彻头彻尾的文科生,代码写的实在令人难以启齿,请各位前辈多多批评!
中文词性分析的功能我是借助清华大学的THULAC词法分析工具实现的,网址在这里:http://thulac.thunlp.org ;具体实现方法是在emacs后台挂一个python版的THULAC,把字符传进去,解析返回的列表并添加overlay。
使用前需要先在pip中安装thulac,如果用的是3.8以上的python,需要把thulac源码中的character/CBTaggingDecoder.py
文件第170行的time.clock()
修改为time.perf_counter()
,或者参考这里:https://zhuanlan.zhihu.com/p/402214700
(defun thulac-start ()
"在一个python shell中启动THULAC。"
(python-shell-make-comint
(python-shell-calculate-command)
(python-shell-get-process-name nil) nil)
(python-shell-send-string-no-output "import thulac;import json;t=thulac.thulac()")
;; 不空一行的话第一句的返回值会出错
(python-shell-send-string-no-output "")
)
(defun thulac-send-get-result (&optional str)
"从python buffer中获取一段话(默认为当前行)的中文词性,返回值为一个双层列表。"
(interactive)
(unless (python-shell-get-buffer) (thulac-start))
(let ((json-array-type 'list))
(json-read-from-string
(substring
(python-shell-send-string-no-output
(concat "json.dumps(t.cut(\""
(or str (string-trim (thing-at-point 'line t)))
"\"),ensure_ascii=False)"))
1 -1))))
(defgroup cnhl-colors nil
"中文词性颜色"
:group 'cnhl-color)
;; 类比色
;; 红色系
(defface cnhl-n-r-j
'((t (:foreground "#F0B0A5")))
"名词、代词、简称颜色"
:group 'cnhl-colors)
(defface cnhl-np
'((t (:foreground "#F0A299")))
"人名颜色"
:group 'cnhl-colors)
;; 绿色系
(defface cnhl-v-i-id
'((t (:foreground "#C8E3A1")))
"动词、习语颜色"
:group 'cnhl-colors)
;; 橙黄色系
(defface cnhl-a
'((t (:foreground "#EDC7A8")))
"形容词颜色"
:group 'cnhl-colors)
(defface cnhl-f-s-t-m-q-mq
'((t (:foreground "#EDBD9D")))
"方位词、处所词、时间词、数词、量词、数量词颜色"
:group 'cnhl-colors)
;; 浅黄色系
(defface cnhl-d
'((t (:foreground "#EDE7A8")))
"副词颜色"
:group 'cnhl-colors)
(defface cnhl-c
'((t (:foreground "#EDE39D")))
"连词颜色"
:group 'cnhl-colors)
(defface cnhl-p
'((t (:foreground "#EDE891")))
"介词颜色"
:group 'cnhl-colors)
;; 中黄色系
(defface cnhl-u-y-h-k
'((t (:foreground "#D7C8A4")))
"助词、语气助词、前接成分、后接成分颜色"
:group 'cnhl-colors)
(defface cnhl-e-o
'((t (:foreground "#D6C299")))
"叹词、拟声词颜色"
:group 'cnhl-colors)
;; 其它
(defface cnhl-g-w-x
'((t (:foreground "gray80")))
"语素、标点及其它颜色"
:group 'cnhl-colors)
;; 浓度色
(defface cnhl-shades-n-r-j
'((t (:foreground "#FFD6CC")))
"名词、代词、简称颜色"
:group 'cnhl-colors)
(defface cnhl-shades-v-i-id
'((t (:foreground "#FFECC2")))
"动词、习语颜色"
:group 'cnhl-colors)
(defface cnhl-shades-a-f-s-t-m-q-mq
'((t (:foreground "#CCBC9B")))
"形容词、方位词、处所词、时间词、数词、量词、数量词颜色"
:group 'cnhl-colors)
(defface cnhl-shades-d-c-p
'((t (:foreground "#C0A199")))
"副词、连词、介词颜色"
:group 'cnhl-colors)
(defface cnhl-shades-u-y-h-k-e-o
'((t (:foreground "#998D74")))
"助词、语气助词、前接成分、后接成分、叹词、拟声词颜色"
:group 'cnhl-colors)
(defface cnhl-shades-g-w-x
'((t (:foreground "gray65")))
"语素、标点及其它颜色"
:group 'cnhl-colors)
;; 简化浓度色
(defface cnhl-shades-simple-n-r-j-v-i-d
'((t (:foreground "#FFECC2")))
"名词、代词、简称、动词、习语颜色"
:group 'cnhl-colors)
(defface cnhl-shades-simple-a-f-s-t-m-q-mq-d-c-p
'((t (:foreground "#DFD1B1")))
"形容词、方位词、处所词、时间词、数词、量词、数量词、副词、连词、介词颜色"
:group 'cnhl-colors)
(defface cnhl-shades-simple-u-y-h-k-e-o
'((t (:foreground "#A99D84")))
"助词、语气助词、前接成分、后接成分、叹词、拟声词颜色"
:group 'cnhl-colors)
(defvar *cnhl-color-scheme* nil)
(defun cnhl (&optional lst search-beg search-end)
"中文词性高亮"
(interactive)
(save-excursion
(goto-char (or search-beg (line-beginning-position)))
(dolist (item (or lst (thulac-send-get-result)))
(condition-case err
(let* ((end (search-forward (car item) (or search-end (line-end-position))))
(overlay (make-overlay
(- end (length (car item))) end)))
(overlay-put
overlay 'face
(if *cnhl-color-scheme*
(if (string= *cnhl-color-scheme* "simple")
(cond ((member (second item)
'("n" "ns" "ni" "nz" "r" "j" "np" "v" "i" "d"))
'cnhl-shades-simple-n-r-j-v-i-d)
((member (second item)
'("a" "f" "s" "t" "m" "q" "mq" "d" "c" "p"))
'cnhl-shades-simple-a-f-s-t-m-q-mq-d-c-p)
((member (second item) '("u" "y" "h" "k" "e" "o")) 'cnhl-shades-simple-u-y-h-k-e-o)
(t 'cnhl-shades-g-w-x))
(cond ((member (second item) '("n" "ns" "ni" "nz" "r" "j" "np")) 'cnhl-shades-n-r-j)
((member (second item) '("u" "y" "h" "k" "e" "o")) 'cnhl-shades-u-y-h-k-e-o)
((member (second item) '("v" "id" "i")) 'cnhl-shades-v-i-id)
((member (second item) '("d" "c" "p")) 'cnhl-shades-d-c-p)
((member (second item) '("a" "f" "s" "t" "m" "q" "mq")) 'cnhl-shades-a-f-s-t-m-q-mq)
(t 'cnhl-shades-g-w-x)))
(cond ((member (second item) '("n" "ns" "ni" "nz" "r" "j")) 'cnhl-n-r-j)
((member (second item) '("u" "y" "h" "k")) 'cnhl-u-y-h-k)
((member (second item) '("v" "id" "i")) 'cnhl-v-i-id)
((string= (second item) "np") 'cnhl-np)
((string= (second item) "a") 'cnhl-a)
((string= (second item) "p") 'cnhl-p)
((string= (second item) "d") 'cnhl-d)
((string= (second item) "c") 'cnhl-c)
((member (second item) '("f" "s" "t" "m" "q" "mq")) 'cnhl-f-s-t-m-q-mq)
((member (second item) '("e" "o")) 'cnhl-e-o)
(t 'cnhl-g-w-x)))))
(search-failed (setq search-complete t))))))
(defun cnhl-buffer ()
"对当前buffer全文进行中文语法高亮"
(interactive)
(save-excursion
(beginning-of-buffer)
(do ((complete nil)
(line (thing-at-point 'line t) (thing-at-point 'line t)))
(complete)
(condition-case nil
(progn (cnhl)
(next-line))
(end-of-buffer (setq complete t))))))
(defun cnhl-last-sentence (&optional begin end len)
"对从上一个标点到光标的内容进行中文语法高亮"
(interactive)
(when (member mode-name '("Journal" "Org" "Text"))
(save-excursion
(let ((end (save-excursion
(or (search-forward-regexp
"\\(,\\)\\|\\(。\\)\\|\\(?\\)\\|\\(;\\)\\|\\(:\\)\\|\\(、\\)\\|\\(‘\\)\\|\\(’\\)\\|\\(“\\)\\|\\(”\\)\\|\\(…\\)\\|\\(!\\)\\|\\((\\)\\|\\()\\)\\|\\(~\\)" (line-end-position) t)
(line-end-position))))
(beg (or (search-backward-regexp
"\\(,\\)\\|\\(。\\)\\|\\(?\\)\\|\\(;\\)\\|\\(:\\)\\|\\(、\\)\\|\\(‘\\)\\|\\(’\\)\\|\\(“\\)\\|\\(”\\)\\|\\(…\\)\\|\\(!\\)\\|\\((\\)\\|\\()\\)\\|\\(~\\)|\\(*\\)" (line-beginning-position) t)
(line-beginning-position))))
(dolist (overlay (overlays-in beg end))
(delete-overlay overlay))
(cnhl (thulac-send-get-result
(buffer-substring-no-properties beg end))
beg end)))))
(add-hook 'after-change-functions #'cnhl-last-sentence)
(defun cnhl-switch-color ()
"切换中文高亮颜色组"
(interactive)
(if *cnhl-color-scheme*
(if (string= *cnhl-color-scheme* "simple")
(setq *cnhl-color-scheme* nil)
(setq *cnhl-color-scheme* "simple"))
(setq *cnhl-color-scheme* ""))
(dolist (overlay (overlays-in (line-beginning-position) end))
(delete-overlay overlay))
(cnhl (thulac-send-get-result
(buffer-substring-no-properties beg end))))
cnhl
函数默认会高亮当前行,cnhl-buffer
会一口气高亮整个buffer,如果是两三千字的大作文会卡一会儿;为防止色彩太花而影响行文思路、舍本逐末,我胡搞了三套配色方案,都是为黑色背景设计的,可以通过cnhl-switch-color
切换 ,这是三种方案的预览图:
——当我折腾完才发现哪怕是最花的第一套方案也不怎么影响我,总体来看利远大于弊……
cnhl-last-sentence
用于高亮当前句,加进after-change-functions
里可以做到实时高亮,会识别当前buffer的major mode以确保只在特定的buffer生效(我设置启用的major-mode有text、org和journal),由于技术很渣只能想出来这样一个解决方案了……
总体来看效果还是相当不错的,在开启实时高亮的情况下emacs的CPU占用率并没有明显的飙升,电量消耗也没有太大影响;自带的模型在处理复杂句式上也容易判断失误,但相比它益处而言,这些缺点倒也不算什么。
或许会有人说这个功能缺乏实用价值,但在我的设想中,这个功能的第一大目的在于提升写作的趣味性,在不干扰我的情况下让我更加热爱写作;第二大目的是在我检查语句时提供有效的辅助,帮助我迅速判断句子中哪些元素缺失了,哪些元素可以删去,从而优化我的语言,培育良好的写作风格,或者说,让我在头脑不清醒的时候也不至于胡言乱语(笑)。其它则都是次要的。就我目前的体验来看,它完全达到了我的目的。就我所知,目前提供自然语言标注这种功能的软件很少,我知道的只有metion这一家,它的功能也无法满足我的设想(太干扰啦!)。所以能做出这么个东西来,我还是挺自豪的,好像终于为大家做出了点贡献?
望能帮助到有需要的同学!望各位前辈多多批评指正!