如果 Emacs 无法识别一个文件的编码，它默认显示的是什么？

chansey97 · 2021 年12 月 7 日 18:21

我有一个文件test-gbk.txt，它的内容是

啊

注意：这个文件的编码是GBK，它实际内容是B0 A1.

由于我的 Emacs 默认配置是 utf-8，打开这个文件，无法正常显示字符：

\260\241

我可以让它正常显示，只要使用命令：M-x revert-buffer-with-coding-system chinese-gbk

我的问题是：上面的\260\241是什么意思？看起来不想 GBK 编码。

注意： \260 已经超越了一个字节的长度。

Youmu · 2021 年12 月 7 日 19:18

\260 和 \241 是 8 进制的，也就是

\260 = 0xb0, \241 = 0xa1

chansey97 · 2021 年12 月 7 日 20:00

初看以为是十进制，想来想去不对，没想到这里Emacs居然用了八进制（还是默认行为）。

感谢。

chansey97 · 2021 年12 月 8 日 14:46

补充一下，这个数字叫 raw bytes ，属于一个特殊字符集 eight-bit。

On rare occasions, Emacs encounters raw bytes : single bytes whose values are in the range 128 (0200 octal) through 255 (0377 octal), which Emacs cannot interpret as part of a known encoding of some non-ASCII character. Such raw bytes are treated as if they belonged to a special character set eight-bit ; Emacs displays them as escaped octal codes (this can be customized; see Display Custom). In this case, C-x = shows ‘raw-byte’ instead of ‘file’. In addition, C-x = shows the character codes of raw bytes as if they were in the range #x3FFF80..#x3FFFFF , which is where Emacs maps them to distinguish them from Unicode characters in the range #x0080..#x00FF .

yssource · 2021 年12 月 9 日 04:08

居然发现一个奇怪的问题。

echo "ibase=8;obase=16;260" | bc 这个输出值是 C8，错误的

echo "obase=16;ibase=8;260" | bc

这个输出值是 B0，正确的

好奇怪的现象，是 bc 的 bug 吗？

cismonx · 2021 年12 月 9 日 04:43

echo "ibase=8;obase=16;260" | bc

这里，ibase=8 使得后续的数值都被当作八进制解析。

而十进制的 16 等于八进制的 020，因此：

echo "ibase=8;obase=20;260" | bc

结果为 B0，符合预期。

参考：bc

The value of a NUMBER token shall be interpreted as a numeral in the base specified by the value of the internal register ibase (described below).

yssource · 2021 年12 月 9 日 05:33

谢谢，opengroup 这个文档很详细。

之前参考了，bc Command Manual 没有发现是这个原因。

Coelacanthus · 2021 年12 月 23 日 12:08

这样看的话，写 bc 表达式的时候建议先写 obase 再写 ibase，这样比较符合直觉。

echo "obase=16;ibase=8;260" | bc