ox-hugo auto-fill-mode 开启的状态下如何去除导出的 Markdown 中的空格

Tisoga · 2019 年6 月 6 日 06:03

目前我是使用 org-mode + ox-hugo 来写博客，org-mode 开启了 auto-fill-mode，但是这在导出的时候有一些问题，在一些中文间会存在空格，例如：

这是一个测试
文本

导出后的文字会变成 这是一个测试文本，是因为 auto-fill-mode 自动在测试后面折行了，导致导出的 Markdown 文件也是折行的，Markdown 再通过 Hugo 渲染成 HTML 就会多出一个空格。

目前的解决办法是在 org-mode 下使用 truncate-line 并关闭 auto-fill-mode，这样就相当于在同一行。

想问一下各位，有没有更优雅的方案，在保持使用 auto-fill-mode 的同时还能够保证中文字符间空格的正确，实在不行只能去 Hack ox-hugo 了。

smallzhan · 2019 年6 月 6 日 06:21

是这个吧。这个针对 html 输出的，针对 md 的应该同样修改一下就是了。

【解决了】怎样去除 org-mode 导出的 html 中的中文字符内的空格？ Emacs-general

情况是这样的，因为整个 org 文件是中英文混排的，可能一段中文一段英文。中英文段落目前都用了 fill-paragraph，可能出现这样的一个段落：在一些节中，我使用了英语，主要是因为，这一方面，在西方的文学作品中更广为人知。他们中的一些词，都对应着中文的同一个词，很难去很好的说明。当然，我会尽量给关键词加上中文。它生成的 html 为: <p> "在一些节中，我使用了英语，主要是因为，这一方面，在西方的文学作品中更广为人知。他们中的一些词，都对应着中文的同一个词，很难去很好的说明。当然，我会尽量给关键词加上中文。" </p> 注意，在他们和末尾的加上这样的换行的地方，有着空格。本来全都是英语的话没什么问题，但是坏在这里是中文，就会导致生成的 html 变成这样的： [40%20PM] 突然在字符中插入了空格。那么问题来了，这个空格怎么去掉呢？我一开始是在 #+OPTIONS 里面加上 \n:t，可以让 html 的换行的地方与 org mode 一样，但是，对于不同大小的屏幕来说，这不是一个很好的方法，可能会在一些小屏幕上出现问题（没到…

Tisoga · 2019 年6 月 6 日 06:47

@smallzhan 这个在 Doom Emacs chinese layer 里面有，但是有点问题：

org-html-paragraph: Wrong number of arguments: ((t) (paragraph contents info) "Join consecutive Chinese lines into a single long line without unwanted space
when exporting org-mode to html." (let* ((fix-regexp "[[:multibyte:]]") (origin-contents contents) (fixed-contents (replace-regexp-in-string (concat "\\(" fix-regexp "\\) *
 *\\(" fix-regexp "\\)") "\\1\\2" origin-contents))) (list paragraph fixed-contents info))), 1

smallzhan · 2019 年6 月 6 日 06:53

我这用起来没问题。这个是针对html的。感觉 ox-hugo 应该去 advice org-hugo-paragraph 类似的东西吧

Tisoga · 2019 年6 月 11 日 06:25

修改了一下，可以用在 ox-hugo 中了。

  (defadvice org-hugo-paragraph (before org-hugo-paragraph-advice
                                        (paragraph contents info) activate)
    "Join consecutive Chinese lines into a single long line without
unwanted space when exporting org-mode to hugo markdown."
    (let* ((origin-contents (ad-get-arg 1))
           (fix-regexp "[[:multibyte:]]")
           (fixed-contents
            (replace-regexp-in-string
             (concat
              "\\(" fix-regexp "\\) *\n *\\(" fix-regexp "\\)") "\\1\\2" origin-contents)))
      (ad-set-arg 1 fixed-contents)))

tumashu · 2019 年6 月 11 日 08:39

用 advice-add 吧，defadvice估计已经不建议使用了

kaushalmodi · 2019 年6 月 11 日 15:09

Hello, I went by just how Google Translate translated this thread to English… so this issue should be now fixed in ox-hugo.

You can see the fix and updated tests in Do not insert newline/space between multibyte chars (e.g. Chinese) · kaushalmodi/ox-hugo@ad9eb05 · GitHub.

Btw feel free to directly open an issue on the ox-hugo repo in future .

yssource · 2019 年6 月 11 日 15:18

It is amazing. You go to this website. I like ox-hugo very much.

kaushalmodi · 2019 年6 月 11 日 16:25

I found this thread by utter chance. I thought, let me google “ox-hugo” and I found this .

Tisoga · 2019 年6 月 12 日 02:54

Awesome, nice work!

JJPandari · 2019 年6 月 13 日 07:30

nah, this site usually has a really high rank in search results as I inspect, my guess is that Discourse (the forum template/software) really knows what it’s doing.

Kathy_H · 2019 年7 月 14 日 23:59

My goodness, you are incredible. I browsed many ox-hugo posts in forums and noticed that you’ve been searching articles/questions, providing feedbacks/answers. Just wasn’t expecting you’d go all the way to translate to answer the post. Thank you.

kaushalmodi · 2019 年10 月 28 日 14:09

Hello all,

I would like to get feedback on this design change decision in ox-hugo related to this auto-fill behavior.

The change in Do not insert newline/space between multibyte chars (e.g. Chinese) · kaushalmodi/ox-hugo@ad9eb05 · GitHub caused a regression for ox-hugo users using multi-byte chars other than Chinese.

I realized that after this bug report: Cyrillic text issue · Issue #300 · kaushalmodi/ox-hugo · GitHub .

I am not surprised seeing that bug report because that auto-fill behavior change is not common in the multi-byte char scripts I know of too: Gujarati, Hindi.

So, would it be OK, If I enable the autofill behavior in the above mentioned commit only if #+language: zh is set in the Org file?

I think that would be much simpler and performance efficient than using regular expression to parse if the multi-byte char fits in the Chinese char range of unicodes. I will also make that an elisp variable that can hold a list of strings, so that people can add more language codes to that list.

Thoughts?

/cc @Tisoga @Kathy_H @yssource

kaushalmodi · 2019 年11 月 2 日 14:56

Hello all, is there any issue with my suggestion above?

I will go ahead with making that change next week.

Kathy_H · 2019 年11 月 14 日 13:47

Hi Kaushal,

Thanks for the enquiry which is very thoughtful. I know little about elisp development so I am not qualified to give feedback regarding the technicality.

I think that would be much simpler and performance efficient than using regular expression to parse if the multi-byte char fits in the Chinese char range of unicodes.

I do agree with this opinion. As long as this is stated in manual (or in sovled Github issue), it sounds good to me.

Please feel free to let me know for further issue. And thanks again for the prompt improvement.

kaushalmodi · 2019 年11 月 29 日 11:55

I was on vacation. But this change (use of the #+language to disable line breaks between multibyte chars) be committed very soon, now this week. Thanks for the feedback.

kaushalmodi · 2019 年11 月 30 日 05:17

Hello,

I have now finished the change I suggested, but with a minor change.

The user will not be setting the earlier suggested #+language: zh keyword.

ox-hugo will try to get the locale of the user from some default environment variables (see https://ox-hugo.scripter.co/doc/cjk-support/).

If that does not work automatically, user can explicitly specify the locale to be Chinese by adding #+hugo_locale: zh keyword or :EXPORT_HUGO_LOCALE: zh property to a subtree’s property drawer. (Any local string that begins with “zh” will work, like “zh_CH”).

Example:

#+hugo_base_dir: .
#+hugo_locale: zh_CH

* Filling automatically not preserved for Chinese characters (preserve filling on)
:PROPERTIES:
:EXPORT_FILE_NAME: filling-not-preserved-for-chinese-characters--preserve-filling-on
:EXPORT_HUGO_PRESERVE_FILLING: t
:END:
#+begin_description
Ensure that multi-byte characters are force-unwrapped if the locale is
manually set or auto-detected as Chinese.
#+end_description
abc
def
ghi
这是一个测试
文本