【讨论】一种提升英文阅读体验的可能性

哈哈我不。粗现得多,经常看到就记住了,不再出现那就没必要记,不可惜。

@Jousimies 不过你有好的anki对接经验就快分享给大家,也许有人需要。:smirk:

@ginqi7 你那边使用:dictionary-overlay-jump-prev(next)-unknown-word 的时候,会出现 hl-line-mode与光标分离情况么 ↓↓

我直接裸用 (websocket-bridge-call-buffer "jump_next_unknown_word") 也出现分离的情况,不知道是什么原因

(setq dictionary-overlay-translators '("local" "darwin" "sdcv" "web"))

整体用下来,把 local 和 darwin (原生) 放在前面,翻译质量一下子就上去了,野生的乱译、错译少了许多。

今日尝试在 windows 下 dictionary-overlay-install

Collecting six==1.16.0
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting snowballstemmer==2.2.0
  Using cached snowballstemmer-2.2.0-py2.py3-none-any.whl (93 kB)
Collecting tokenizers==0.13.2
  Using cached tokenizers-0.13.2.tar.gz (359 kB)
  Installing build dependencies: started
  Installing build dependencies: still running...
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting websocket-bridge-python==0.0.2
  Using cached websocket_bridge_python-0.0.2-py3-none-any.whl
Collecting websockets==10.4
  Using cached websockets-10.4-cp311-cp311-win_amd64.whl (101 kB)
Building wheels for collected packages: tokenizers
  Building wheel for tokenizers (pyproject.toml): started
  Building wheel for tokenizers (pyproject.toml): finished with status 'error'
  error: subprocess-exited-with-error
  
  Building wheel for tokenizers (pyproject.toml) did not run successfully.
  exit code: 1
  
  [51 lines of output]
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-cpython-311
  creating build\lib.win-amd64-cpython-311\tokenizers
  copying py_src\tokenizers\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers
  creating build\lib.win-amd64-cpython-311\tokenizers\models
  copying py_src\tokenizers\models\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers\models
  creating build\lib.win-amd64-cpython-311\tokenizers\decoders
  copying py_src\tokenizers\decoders\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers\decoders
  creating build\lib.win-amd64-cpython-311\tokenizers\normalizers
  copying py_src\tokenizers\normalizers\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers\normalizers
  creating build\lib.win-amd64-cpython-311\tokenizers\pre_tokenizers
  copying py_src\tokenizers\pre_tokenizers\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers\pre_tokenizers
  creating build\lib.win-amd64-cpython-311\tokenizers\processors
  copying py_src\tokenizers\processors\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers\processors
  creating build\lib.win-amd64-cpython-311\tokenizers\trainers
  copying py_src\tokenizers\trainers\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers\trainers
  creating build\lib.win-amd64-cpython-311\tokenizers\implementations
  copying py_src\tokenizers\implementations\base_tokenizer.py -> build\lib.win-amd64-cpython-311\tokenizers\implementations
  copying py_src\tokenizers\implementations\bert_wordpiece.py -> build\lib.win-amd64-cpython-311\tokenizers\implementations
  copying py_src\tokenizers\implementations\byte_level_bpe.py -> build\lib.win-amd64-cpython-311\tokenizers\implementations
  copying py_src\tokenizers\implementations\char_level_bpe.py -> build\lib.win-amd64-cpython-311\tokenizers\implementations
  copying py_src\tokenizers\implementations\sentencepiece_bpe.py -> build\lib.win-amd64-cpython-311\tokenizers\implementations
  copying py_src\tokenizers\implementations\sentencepiece_unigram.py -> build\lib.win-amd64-cpython-311\tokenizers\implementations
  copying py_src\tokenizers\implementations\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers\implementations
  creating build\lib.win-amd64-cpython-311\tokenizers\tools
  copying py_src\tokenizers\tools\visualizer.py -> build\lib.win-amd64-cpython-311\tokenizers\tools
  copying py_src\tokenizers\tools\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers\tools
  copying py_src\tokenizers\__init__.pyi -> build\lib.win-amd64-cpython-311\tokenizers
  copying py_src\tokenizers\models\__init__.pyi -> build\lib.win-amd64-cpython-311\tokenizers\models
  copying py_src\tokenizers\decoders\__init__.pyi -> build\lib.win-amd64-cpython-311\tokenizers\decoders
  copying py_src\tokenizers\normalizers\__init__.pyi -> build\lib.win-amd64-cpython-311\tokenizers\normalizers
  copying py_src\tokenizers\pre_tokenizers\__init__.pyi -> build\lib.win-amd64-cpython-311\tokenizers\pre_tokenizers
  copying py_src\tokenizers\processors\__init__.pyi -> build\lib.win-amd64-cpython-311\tokenizers\processors
  copying py_src\tokenizers\trainers\__init__.pyi -> build\lib.win-amd64-cpython-311\tokenizers\trainers
  copying py_src\tokenizers\tools\visualizer-styles.css -> build\lib.win-amd64-cpython-311\tokenizers\tools
  running build_ext
  running build_rust
  error: can't find Rust compiler
  
  If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
  
  To update pip, run:
  
      pip install --upgrade pip
  
  and then retry package installation.
  
  If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
  [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

单独执行了pip update 和 rust 安装未果

看来是 tokenizers 安装失败了。

pip install tokenizers 试试看?

请教大佬 @ginqi7 : 我个人Fork 里加了一条:大于等于5位,且不包含韵母,则认为不是词。 这么写不知道还能不能写得更精简,以及大buffer里不知道会不会有性能问题?

def new_word_p(word: str) -> bool:
    if len(word) < 3:
        return False
    if re.match(r"\b[^aeiou]{5,}\b", word, re.M | re.I):   # 大于等于五个字母时,若不含韵母
        return False                                       # 则不认为是词
    if re.search(r"[^a-z]", word, re.M | re.I):
        return False
    return not in_or_stem_in(word, known_words)

\b 应该不需要。分割出来的word里不会用空白符。性能应该也不会有什么影响。都是比较短的单词的匹配。

我再试试。

另外请教一下,google translate有配额限制么?我现在出现了这个提示,也不知道我真的网络差(最近呆的地方网络真的不怎么样),还是被限流了?

[Dictionary-overlay]web-translate error, check your network. or run (websocket-bridge-app-open-buffer 'dictionary-overlay) see the error details. [5 times]

补充:(websocket-bridge-app-open-buffer 'dictionary-overlay) 后得到: image

你可以看python日志里的信息。应该不是限流了,估计就是网络比较差。这个报错是我代码里给的提示。

发现 terminal 下 git clone 也 down 了。应该是我网络有问题了。

再请教: :grin: ran out of input 指的是什么事件?

神奇,我没有遇到过。不知道是什么情况 :rofl:

我准备把 log buffer 置顶一天,看看还有什么事件没有见过。

那你可以把python 里 run_and_log 里的print 手动注释了。免得打印的东西太多。

1 个赞

我还是怀疑被google translate 暂时限制了,观察一阵子看看。

不过有个情况:如果web查词失败,生词只进hash-table, 但不产生overlay? (如上图,光标位置 auto jump 自动找到的,说明 hash-table 有效,但没见overlay。)

墙外也会被限制么。 :rofl:

计划上是没有查出结果,就跳过。不过我也没有注意没查出结果实际是什么效果。

我觉得是触发了 daily quota limit? 我不清楚有没有这样的机制,且我没有实证,哈哈。但是这几天一直往自己的knownwords.txt 倒词汇进去,也开了上百个wall street journal 的elfeed文章,查得有点多?只能等24h再看看。

你方便的时候断网测试下无网会不会渲染失败? 或者comment掉↓

(setq dictionary-overlay-translators '("local" "darwin" "sdcv"
;;; "web"
))

我试过注释掉是可以的。

那我不得不开始漫长的debug之旅了。。:cry:

贴一下个人使用配置:主要在 elfeed-entry-mode 和 eww 中读新闻使用, 恰巧昨天猫大更新了 popweb-dict 快速创建新词典的宏,于是赶紧结合了起来。 (只看不抄好习惯,因为包含有未公开的私人配置,直接抄会报错)

首先是popweb:

(use-package popweb
  :commands (popweb-org-roam-link-show
             popweb-latex-mode)
  :straight nil
  :config
  (require 'popweb-latex) (add-hook 'latex-mode-hook #'popweb-latex-mode)
  (require 'popweb-org-roam-link)
  (require 'popweb-url)
  (require 'popweb-dict)

  ;; NOTE 2022-12-05: personal API, local static html as url, demonstration only.
  (popweb-dict-create
   "youglish-api"
   (concat
    "file:///"
    (let ((temp-file (concat path-cache-dir "popweb/tmp.html")))
      (with-temp-file temp-file
        (insert-file-contents
         (concat path-emacs-dir "lisp/popweb-dict-yg-js.html"))
        (goto-char (point-min))
        (silenzio (replace-regexp "query" (concat word " :r"))))
      temp-file))
   "")

  ;; NOTE 2022-12-05: WIP
  (popweb-dict-create
   "forvo"
   "https://forvo.com/search/%s/en_usa/"
   (concat
    "window.scrollTo(0, 0); "
    "document.getElementsByTagName('html')[0].style.visibility = 'hidden'; "
    "document.getElementsByClassName('main_search')[0].style.visibility = 'visible'; "
    "document.getElementsByClassName('main_section')[0].style.visibility = 'visible'; "
    ;; "document.getElementsByClassName('left-content col')[0].style.visibility = 'visible'; "
    ;; "document.getElementsByTagName('header')[0].style.display = 'none'; "
    ;; "document.getElementsByClassName('contentPadding')[0].style.padding = '10px';"
    ))

  ;; NOTE 2022-12-05: far from perfect
  (popweb-dict-create
   "mw"
   "https://www.merriam-webster.com/dictionary/%s"
   (concat
    "window.scrollTo(0, 0); "
    "document.getElementsByTagName('html')[0].style.visibility = 'hidden'; "
    "document.getElementsByClassName('left-sidebar')[0].style.visibility = 'visible'; " ; ✓
    "document.getElementsByClassName('redesign-container')[0].style.visibility = 'visible'; "
    )))

然后是dictionary overlay:

(use-package dictionary-overlay
  :commands (dictionary-overlay-render-buffer)
  :straight nil
  :custom-face
  (dictionary-overlay-unknownword ((t :inherit font-lock-keyword-face)))
  (dictionary-overlay-translation ((t :inherit font-lock-comment-face)))
  :config
  (dictionary-overlay-start)
  
  (setq dictionary-overlay-translators '("local" "darwin" "sdcv" "web")
        dictionary-overlay-recenter-after-mark-and-jump 10
        dictionary-overlay-user-data-directory (concat path-emacs-private-dir "dictionary-overlay-data/")
        dictionary-overlay-just-unknown-words nil
        dictionary-overlay-auto-jump-after '(mark-word-known
                                             ;; mark-word-unknown
                                             render-buffer))

  (use-package popweb :straight nil :demand)

  (defvar dictionary-overlay-lookup-prefix-map
    (let ((map (make-sparse-keymap)))
      (define-key map (kbd "y") #'popweb-dict-youdao-pointer)
      (define-key map (kbd "u") #'popweb-dict-youglish-pointer)
      (define-key map (kbd "o") #'popweb-dict-forvo-pointer)
      (define-key map (kbd "k") #'popweb-dict-youglish-api-pointer)
      (define-key map (kbd "m") #'popweb-dict-mw-pointer)
      (define-key map (kbd "b") #'popweb-dict-bing-pointer)
      map)
    "Keymap for 3rd party dictionaries.")

  (defun dictionary-overlay-lookup-prefix-map ()
    "Transient keymap for fast lookup with different dictionaries."
    (interactive)
    (set-transient-map dictionary-overlay-lookup-prefix-map))

  :general
  (dictionary-overlay-map
   "p" nil
   "n" nil
   "m" nil
   "M" nil
   ;; --
   "j" #'dictionary-overlay-jump-prev-unknown-word
   "k" #'dictionary-overlay-lookup-prefix-map
   "l" #'dictionary-overlay-jump-next-unknown-word
   "L" (lambda () (interactive) (websocket-bridge-app-open-buffer 'dictionary-overlay))
   "o" #'dictionary-overlay-mark-word-smart
   "O" #'dictionary-overlay-mark-word-smart-reversely
   ;; --
   "a" #'dictionary-overlay-mark-word-unknown
   "." #'dictionary-overlay-jump-out-of-overlay
   "r" #'popweb-restart-process)
  (elfeed-show-mode-map
   "a" #'dictionary-overlay-mark-word-unknown
   "r" #'dictionary-overlay-restart
   "." #'dictionary-overlay-render-buffer)
  (eww-mode-map
   "a" #'dictionary-overlay-mark-word-unknown
   "r" #'dictionary-overlay-restart
   "." #'dictionary-overlay-render-buffer))
1 个赞

看了下最近的词汇,发现 knownword 里有不少数字,估计是 mark buffer 来的;另外还有些人名

可否再有一个 txt 放数字、人名地名啥的,执行 dictionary-overlay-special 放入?或者至少忽略掉阿拉伯数字

感谢分享,看来要学下 use-package 和 straight 了

btw: 大家用什么来快速移动到和选中单词以便标记?