比如说, 复制右边这段代码 >>> hugo new about.md
<<<, 粘贴到 Emacs 肉眼是观察不出来有啥区别的
(注意复制的时候包括 >>><<<
, 粘贴的时候去掉 >>><<<
, 从而保证复制到了空白字符)
-
使用
utf-8
编码保存到文件, 用find-file-literally
重新打开文件使空白字符现行\342\200\213\342\200\213hugo\342\200\213\342\200\213 \342\200\213\342\200\213new\342\200\213\342\200\213 \342\200\213\342\200\213about.md
以上
\XXX
字符表示八进制, 如\342
表示 八进制342
, 十进制226
, 十六进制e2
其中
\342\200\213
不断重复, 其十六进制为e2 80 8b
-
使用
hexl-mode
查看编码00000000: e280 8be2 808b 6875 676f e280 8be2 808b ......hugo...... 00000010: 20e2 808b e280 8b6e 6577 e280 8be2 808b ......new...... 00000020: 20e2 808b e280 8b61 626f 7574 2e6d 64 ......about.md
重新用 find-file
打开文件, 光标移动到不可见字符上,
使用 C-x =
也就是 what-cursor-position
查了一下这个字符是 U+200B
,
使用 M-x describe-char RET
正式名称是 ZERO WIDTH SPACE
谷歌了一下,
Commonly abbreviated ZWSP.
This character is intended for invisible word separation and for line break control;
It has no width, but its presence between two characters does not prevent increased letter spacing in justification.
之所以发现这个问题, 是因为直接粘贴到命令行无法运行:
$ hugo new about.md
'hugo' is not recognized as an internal or external command,
operable program or batch file.
解决方法, 使用 elisp 替换掉不可见空白字符
;; from http://xahlee.info/emacs/emacs/elisp_unicode_replace_invisible_chars.html
(defun xah-replace-invisible-char ()
"Query replace some invisible Unicode chars.
The chars to be searched are:
ZERO WIDTH NO-BREAK SPACE (65279, #xfeff)
ZERO WIDTH SPACE (codepoint 8203, #x200b)
RIGHT-TO-LEFT MARK (8207, #x200f)
RIGHT-TO-LEFT OVERRIDE (8238, #x202e)
LEFT-TO-RIGHT MARK (8206, #x200e)
OBJECT REPLACEMENT CHARACTER (65532, #xfffc)
Search begins at cursor position. (respects `narrow-to-region')
URL `http://xahlee.info/emacs/emacs/elisp_unicode_replace_invisible_chars.html'
Version 2018-09-07"
(interactive)
(query-replace-regexp "\ufeff\\|\u200b\\|\u200f\\|\u202e\\|\u200e\\|\ufffc" ""))
或者直接在 buffer 中高亮显示不可见空白字符 (Glyphless characters)
“Glyphless characters” are characters which are displayed in a special way, e.g., as a box containing a hexadecimal code, instead of being displayed literally.
These include characters which are explicitly defined to be glyphless, as well as characters for which there is no available font (on a graphical display), and characters which cannot be encoded by the terminal’s coding system (on a text terminal).
(defun w/see-you ()
"Highlight ZERO WIDTH chars in all buffers."
(interactive)
(let ((charnames (list "BYTE ORDER MARK"
"ZERO WIDTH NO-BREAK SPACE"
"ZERO WIDTH SPACE"
"RIGHT-TO-LEFT MARK"
"RIGHT-TO-LEFT OVERRIDE"
"LEFT-TO-RIGHT MARK"
"OBJECT REPLACEMENT CHARACTER"
"ZERO WIDTH JOINER"
"ZERO WIDTH NON-JOINER")))
(set-face-background 'glyphless-char "RoyalBlue1")
(dolist (name charnames)
;; see info node "info:elisp#Glyphless Chars" for available values
(set-char-table-range glyphless-char-display
(char-from-name name) "fuck"))))
一些效果图:
其他的 unicode 字符: M-x list-unicode-display RET
一些补充信息:
Insert a Unicode character like →
(0x2192) by name (RIGHTWARDS ARROW
):
M-x insert-char RET
or C-x 8 RET
, 输入 RIGHTWARDS ARROW
回车
Insert a Unicode character like →
by its hexadecimal value (0x2192
):
M-x insert-char RET
or C-x 8 RET
, 输入 2192
回车
改进:
- 当复制的时候自动去掉这些不可见字符
- 当保存文件的时候, 检查空白字符并提醒
(but NOT silently remove/replace them, maybe you are taking notes
)
References: