ox-hugo auto-fill-mode 开启的状态下如何去除导出的 Markdown 中的空格

目前我是使用 org-mode + ox-hugo 来写博客,org-mode 开启了 auto-fill-mode,但是这在导出的时候有一些问题,在一些中文间会存在空格,例如:

这是一个测试
文本

导出后的文字会变成 这是一个测试 文本,是因为 auto-fill-mode 自动在测试后面折行了,导致导出的 Markdown 文件也是折行的,Markdown 再通过 Hugo 渲染成 HTML 就会多出一个空格。

目前的解决办法是在 org-mode 下使用 truncate-line 并关闭 auto-fill-mode,这样就相当于在同一行。

想问一下各位,有没有更优雅的方案,在保持使用 auto-fill-mode 的同时还能够保证中文字符间空格的正确,实在不行只能去 Hack ox-hugo 了。

1 个赞

是这个吧。这个针对 html 输出的,针对 md 的应该同样修改一下就是了。

@smallzhan 这个在 Doom Emacs chinese layer 里面有,但是有点问题:

org-html-paragraph: Wrong number of arguments: ((t) (paragraph contents info) "Join consecutive Chinese lines into a single long line without unwanted space
when exporting org-mode to html." (let* ((fix-regexp "[[:multibyte:]]") (origin-contents contents) (fixed-contents (replace-regexp-in-string (concat "\\(" fix-regexp "\\) *
 *\\(" fix-regexp "\\)") "\\1\\2" origin-contents))) (list paragraph fixed-contents info))), 1

我这用起来没问题。这个是针对html的。感觉 ox-hugo 应该去 advice org-hugo-paragraph 类似的东西吧

修改了一下,可以用在 ox-hugo 中了。

  (defadvice org-hugo-paragraph (before org-hugo-paragraph-advice
                                        (paragraph contents info) activate)
    "Join consecutive Chinese lines into a single long line without
unwanted space when exporting org-mode to hugo markdown."
    (let* ((origin-contents (ad-get-arg 1))
           (fix-regexp "[[:multibyte:]]")
           (fixed-contents
            (replace-regexp-in-string
             (concat
              "\\(" fix-regexp "\\) *\n *\\(" fix-regexp "\\)") "\\1\\2" origin-contents)))
      (ad-set-arg 1 fixed-contents)))

用 advice-add 吧,defadvice估计已经不建议使用了

Hello, I went by just how Google Translate translated this thread to English… so this issue should be now fixed in ox-hugo.

You can see the fix and updated tests in Do not insert newline/space between multibyte chars (e.g. Chinese) · kaushalmodi/ox-hugo@ad9eb05 · GitHub.

Btw feel free to directly open an issue on the ox-hugo repo in future :smile:.

9 个赞

It is amazing. You go to this website. I like ox-hugo very much. :+1:

1 个赞

I found this thread by utter chance. I thought, let me google “ox-hugo” and I found this :smile:.

Awesome, nice work!

nah, this site usually has a really high rank in search results as I inspect, my guess is that Discourse (the forum template/software) really knows what it’s doing.

My goodness, you are incredible. I browsed many ox-hugo posts in forums and noticed that you’ve been searching articles/questions, providing feedbacks/answers. Just wasn’t expecting you’d go all the way to translate to answer the post. Thank you.

1 个赞

Hello all,

I would like to get feedback on this design change decision in ox-hugo related to this auto-fill behavior.

The change in Do not insert newline/space between multibyte chars (e.g. Chinese) · kaushalmodi/ox-hugo@ad9eb05 · GitHub caused a regression for ox-hugo users using multi-byte chars other than Chinese.

I realized that after this bug report: Cyrillic text issue · Issue #300 · kaushalmodi/ox-hugo · GitHub .

I am not surprised seeing that bug report because that auto-fill behavior change is not common in the multi-byte char scripts I know of too: Gujarati, Hindi.

So, would it be OK, If I enable the autofill behavior in the above mentioned commit only if #+language: zh is set in the Org file?

I think that would be much simpler and performance efficient than using regular expression to parse if the multi-byte char fits in the Chinese char range of unicodes. I will also make that an elisp variable that can hold a list of strings, so that people can add more language codes to that list.

Thoughts?

/cc @Tisoga @Kathy_H @yssource

Hello all, is there any issue with my suggestion above?

I will go ahead with making that change next week.

1 个赞

Hi Kaushal,

Thanks for the enquiry which is very thoughtful. I know little about elisp development so I am not qualified to give feedback regarding the technicality.

I think that would be much simpler and performance efficient than using regular expression to parse if the multi-byte char fits in the Chinese char range of unicodes.

I do agree with this opinion. As long as this is stated in manual (or in sovled Github issue), it sounds good to me.

Please feel free to let me know for further issue. And thanks again for the prompt improvement.

I was on vacation. But this change (use of the #+language to disable line breaks between multibyte chars) be committed very soon, now this week. Thanks for the feedback.

Hello,

I have now finished the change I suggested, but with a minor change.

The user will not be setting the earlier suggested #+language: zh keyword.

ox-hugo will try to get the locale of the user from some default environment variables (see https://ox-hugo.scripter.co/doc/cjk-support/).

If that does not work automatically, user can explicitly specify the locale to be Chinese by adding #+hugo_locale: zh keyword or :EXPORT_HUGO_LOCALE: zh property to a subtree’s property drawer. (Any local string that begins with “zh” will work, like “zh_CH”).

Example:

#+hugo_base_dir: .
#+hugo_locale: zh_CH

* Filling automatically not preserved for Chinese characters (preserve filling on)
:PROPERTIES:
:EXPORT_FILE_NAME: filling-not-preserved-for-chinese-characters--preserve-filling-on
:EXPORT_HUGO_PRESERVE_FILLING: t
:END:
#+begin_description
Ensure that multi-byte characters are force-unwrapped if the locale is
manually set or auto-detected as Chinese.
#+end_description
abc
def
ghi
这是一个测试
文本
4 个赞