elisp 正则表达式如何匹配中文和英文

Samray · 2017 年3 月 6 日 05:42

各位道友,求助解决方法我现在想写一个函数来将我中文和英文混杂的空格删掉,但是我没办法写出我想匹配的正则表达式我想写一个正则表达式找出前面是英文字符,后面是中文的空格,例如 Emacs 中文 匹配 Emacs Lisp 不匹配 中文中文 不匹配希望大家可以给我点建议

smallst · 2017 年3 月 6 日 08:36

elisp 正则的话 \w \cc可以匹配

Samray · 2017 年3 月 6 日 09:12

不行吧,要不你把复制我的例子,然后M-x re-builder 试试,我真的不行

JJPandari · 2017 年3 月 6 日 09:29

不清楚elisp-specific的正则，按提示的re-builder试了一下，[A-Za-z]+ [^A-Za-z]+似乎可以。

smallst · 2017 年3 月 6 日 10:06

我不会用 re-builder……我用的是 replace-regexp

smallst · 2017 年3 月 6 日 10:09

你要用re-builder 就 [a-zA-Z] \\cc

Samray · 2017 年3 月 6 日 13:39

我原来写的是 [a-z]\s[^\x00-\xff],是可以匹配,但是我用函数就无法实现我想要的删除英文和中文之间的空格,下面是我的函数

(defun thing-at-buffers-or-at-region ()
  "Return string or word at the cursor or in the marked region."
  (if (region-active-p)
      (buffer-substring-no-properties
       (region-beginning)
       (region-end))
    (buffer-substring-no-properties
     (point-min) (point-max))))
(defun samray/replace-whitespace-between-letter-and-chinese-char ()
  "Because the chinese input method i use,i will left whitespace between letter;
and chinese char,so just delete it"
  (interactive)
  (save-excursion
    (replace-regexp-in-string "\\([a-zA-z]\s[^\x00-\xff]\\)" ""
                              (thing-at-buffers-or-at-region))
    )
  )

还望指教

et2010 · 2017 年3 月 6 日 14:08

首先，空格应该是用 \\s- 来 match 的吧？

然后，你这个是不是把空格相邻的汉字和英文也删除了？

我没有试，但是感觉是这样的

Samray · 2017 年3 月 6 日 14:18

我只是测试,我原来是

(replace-regexp-in-string "\\([a-zA-z]\s[^\x00-\xff]\\)" "\\([a-zA-z][^\x00-\xff]\\)"

我在re-builder 下,上面的是可以匹配到空格的,但是连前一个字母和后一个汉字也匹配了.你的\\s- 也是同样的情况

et2010 · 2017 年3 月 6 日 14:20

你试一下

(replace-regexp-in-string "\\([a-zA-Z]\\)\\s-+\\(\\cc\\)" "\\1\\2" (thing-at-buffers-or-at-region))

论坛输不了这个字符，中间那个2前面是1

Samray · 2017 年3 月 6 日 14:23

不太理解什么叫中间那个2前面是1:joy:

smallst · 2017 年3 月 6 日 14:23

按你的写法，应该是

(replace-regexp-in-string "\\([a-zA-Z]\)\s\\(\\cc\\)" "\\ 1\\2"
                          (thing-at-buffers-or-at-region))

(上面 \\ 和1之间没有空格，受markdown语法影响为了看清楚就打了空格）而我觉得这种替换一般不设置成函数而是直接运行 replace-regexp \([a-zA-Z]\) \(\cc\) \1\2 当然如果你实在是常用这个功能写成函数是极好的

不过我猜你需要的功能其实应该是

(replace-regexp-in-string "\\([a-zA-Z]\\)\s\\(\\cc\\)\\|\\(\\cc\\)\s\\([a-zA-Z]\\)" "\\ 1\\2\\3\\4"
                          (thing-at-buffers-or-at-region))

毕竟只删除英文中文很奇怪？中文英文的空格不删嘛？

Samray · 2017 年3 月 6 日 14:25

你猜的是非常准确的,但是我最开始描述问题的时候应该先描述简单的,如果可以解决的就可以举一反三啦

et2010 · 2017 年3 月 6 日 14:28

没办法，输入不了只能截图了

这个我刚才试了是可行的

[[:alnum:]] 包含了英文和数字，如果只要英文的话可以 [[:alpha:]]

smallst · 2017 年3 月 6 日 14:30

优秀:grin:

Samray · 2017 年3 月 6 日 14:34

为什么我测试却是不行的.函数:

(defun thing-at-buffers-or-at-region ()
  "Return string or word at the cursor or in the marked region."
  (if (region-active-p)
      (buffer-substring-no-properties
       (region-beginning)
       (region-end))
    (buffer-substring-no-properties
     (point-min) (point-max))))
(defun samray/replace-whitespace-between-letter-and-chinese-char ()
  "Because the chinese input method i use,i will left whitespace between letter;
and chinese char,so just delete it"
  (interactive)
  (save-excursion
    (replace-regexp-in-string "\\([[:alnum:]]\\)\\s-+\\(\\cc\\)" "\\1\\2" (thing-at-buffers-or-at-region))
    ))

测试语句

Emacs 中文
Emacs Lisp
中文 中文

结果不应该是

Emacs中文
Emacs Lisp
中文 中文

么? 可是什么都没有出现

et2010 · 2017 年3 月 6 日 14:36

这个可以匹配两种情况

错了

et2010 · 2017 年3 月 6 日 14:37

你用的是 spacemacs 吗？如果是的话，小心 chinese layer 会在英文和汉字中添加假空格，是 pangu-spacing 这个包的作用

Samray · 2017 年3 月 6 日 14:39

我用的是原生Emacs,我感觉这个不会有影响吧

et2010 · 2017 年3 月 6 日 14:41

说到 pangu-spacing，你可以参考一下那个包，就是用来处理中英文之间空格的包，它是用 overlay 在中英文之间添加假空格，也是用的正则表达式。也是托管在 github