哈哈我不。粗现得多,经常看到就记住了,不再出现那就没必要记,不可惜。
@Jousimies 不过你有好的anki对接经验就快分享给大家,也许有人需要。
@ginqi7 你那边使用:dictionary-overlay-jump-prev(next)-unknown-word
的时候,会出现 hl-line-mode与光标分离情况么 ↓↓
我直接裸用 (websocket-bridge-call-buffer "jump_next_unknown_word")
也出现分离的情况,不知道是什么原因
(setq dictionary-overlay-translators '("local" "darwin" "sdcv" "web"))
整体用下来,把 local 和 darwin (原生) 放在前面,翻译质量一下子就上去了,野生的乱译、错译少了许多。
今日尝试在 windows 下 dictionary-overlay-install
Collecting six==1.16.0
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting snowballstemmer==2.2.0
Using cached snowballstemmer-2.2.0-py2.py3-none-any.whl (93 kB)
Collecting tokenizers==0.13.2
Using cached tokenizers-0.13.2.tar.gz (359 kB)
Installing build dependencies: started
Installing build dependencies: still running...
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'done'
Collecting websocket-bridge-python==0.0.2
Using cached websocket_bridge_python-0.0.2-py3-none-any.whl
Collecting websockets==10.4
Using cached websockets-10.4-cp311-cp311-win_amd64.whl (101 kB)
Building wheels for collected packages: tokenizers
Building wheel for tokenizers (pyproject.toml): started
Building wheel for tokenizers (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-error
Building wheel for tokenizers (pyproject.toml) did not run successfully.
exit code: 1
[51 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-311
creating build\lib.win-amd64-cpython-311\tokenizers
copying py_src\tokenizers\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers
creating build\lib.win-amd64-cpython-311\tokenizers\models
copying py_src\tokenizers\models\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers\models
creating build\lib.win-amd64-cpython-311\tokenizers\decoders
copying py_src\tokenizers\decoders\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers\decoders
creating build\lib.win-amd64-cpython-311\tokenizers\normalizers
copying py_src\tokenizers\normalizers\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers\normalizers
creating build\lib.win-amd64-cpython-311\tokenizers\pre_tokenizers
copying py_src\tokenizers\pre_tokenizers\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers\pre_tokenizers
creating build\lib.win-amd64-cpython-311\tokenizers\processors
copying py_src\tokenizers\processors\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers\processors
creating build\lib.win-amd64-cpython-311\tokenizers\trainers
copying py_src\tokenizers\trainers\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers\trainers
creating build\lib.win-amd64-cpython-311\tokenizers\implementations
copying py_src\tokenizers\implementations\base_tokenizer.py -> build\lib.win-amd64-cpython-311\tokenizers\implementations
copying py_src\tokenizers\implementations\bert_wordpiece.py -> build\lib.win-amd64-cpython-311\tokenizers\implementations
copying py_src\tokenizers\implementations\byte_level_bpe.py -> build\lib.win-amd64-cpython-311\tokenizers\implementations
copying py_src\tokenizers\implementations\char_level_bpe.py -> build\lib.win-amd64-cpython-311\tokenizers\implementations
copying py_src\tokenizers\implementations\sentencepiece_bpe.py -> build\lib.win-amd64-cpython-311\tokenizers\implementations
copying py_src\tokenizers\implementations\sentencepiece_unigram.py -> build\lib.win-amd64-cpython-311\tokenizers\implementations
copying py_src\tokenizers\implementations\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers\implementations
creating build\lib.win-amd64-cpython-311\tokenizers\tools
copying py_src\tokenizers\tools\visualizer.py -> build\lib.win-amd64-cpython-311\tokenizers\tools
copying py_src\tokenizers\tools\__init__.py -> build\lib.win-amd64-cpython-311\tokenizers\tools
copying py_src\tokenizers\__init__.pyi -> build\lib.win-amd64-cpython-311\tokenizers
copying py_src\tokenizers\models\__init__.pyi -> build\lib.win-amd64-cpython-311\tokenizers\models
copying py_src\tokenizers\decoders\__init__.pyi -> build\lib.win-amd64-cpython-311\tokenizers\decoders
copying py_src\tokenizers\normalizers\__init__.pyi -> build\lib.win-amd64-cpython-311\tokenizers\normalizers
copying py_src\tokenizers\pre_tokenizers\__init__.pyi -> build\lib.win-amd64-cpython-311\tokenizers\pre_tokenizers
copying py_src\tokenizers\processors\__init__.pyi -> build\lib.win-amd64-cpython-311\tokenizers\processors
copying py_src\tokenizers\trainers\__init__.pyi -> build\lib.win-amd64-cpython-311\tokenizers\trainers
copying py_src\tokenizers\tools\visualizer-styles.css -> build\lib.win-amd64-cpython-311\tokenizers\tools
running build_ext
running build_rust
error: can't find Rust compiler
If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
To update pip, run:
pip install --upgrade pip
and then retry package installation.
If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
单独执行了pip update 和 rust 安装未果
请教大佬 @ginqi7 : 我个人Fork 里加了一条:大于等于5位,且不包含韵母,则认为不是词。 这么写不知道还能不能写得更精简,以及大buffer里不知道会不会有性能问题?
def new_word_p(word: str) -> bool:
if len(word) < 3:
return False
if re.match(r"\b[^aeiou]{5,}\b", word, re.M | re.I): # 大于等于五个字母时,若不含韵母
return False # 则不认为是词
if re.search(r"[^a-z]", word, re.M | re.I):
return False
return not in_or_stem_in(word, known_words)
\b 应该不需要。分割出来的word里不会用空白符。性能应该也不会有什么影响。都是比较短的单词的匹配。
我再试试。
另外请教一下,google translate有配额限制么?我现在出现了这个提示,也不知道我真的网络差(最近呆的地方网络真的不怎么样),还是被限流了?
[Dictionary-overlay]web-translate error, check your network. or run (websocket-bridge-app-open-buffer 'dictionary-overlay) see the error details. [5 times]
补充:(websocket-bridge-app-open-buffer 'dictionary-overlay) 后得到:
你可以看python日志里的信息。应该不是限流了,估计就是网络比较差。这个报错是我代码里给的提示。
发现 terminal 下 git clone 也 down 了。应该是我网络有问题了。
再请教: ran out of input 指的是什么事件?
神奇,我没有遇到过。不知道是什么情况
我准备把 log buffer 置顶一天,看看还有什么事件没有见过。
那你可以把python 里 run_and_log 里的print 手动注释了。免得打印的东西太多。
我还是怀疑被google translate 暂时限制了,观察一阵子看看。
不过有个情况:如果web查词失败,生词只进hash-table, 但不产生overlay? (如上图,光标位置 auto jump 自动找到的,说明 hash-table 有效,但没见overlay。)
墙外也会被限制么。
计划上是没有查出结果,就跳过。不过我也没有注意没查出结果实际是什么效果。
我觉得是触发了 daily quota limit? 我不清楚有没有这样的机制,且我没有实证,哈哈。但是这几天一直往自己的knownwords.txt 倒词汇进去,也开了上百个wall street journal 的elfeed文章,查得有点多?只能等24h再看看。
你方便的时候断网测试下无网会不会渲染失败? 或者comment掉↓
(setq dictionary-overlay-translators '("local" "darwin" "sdcv"
;;; "web"
))
我试过注释掉是可以的。
那我不得不开始漫长的debug之旅了。。
贴一下个人使用配置:主要在 elfeed-entry-mode 和 eww 中读新闻使用, 恰巧昨天猫大更新了 popweb-dict 快速创建新词典的宏,于是赶紧结合了起来。 (只看不抄好习惯,因为包含有未公开的私人配置,直接抄会报错)
首先是popweb:
(use-package popweb
:commands (popweb-org-roam-link-show
popweb-latex-mode)
:straight nil
:config
(require 'popweb-latex) (add-hook 'latex-mode-hook #'popweb-latex-mode)
(require 'popweb-org-roam-link)
(require 'popweb-url)
(require 'popweb-dict)
;; NOTE 2022-12-05: personal API, local static html as url, demonstration only.
(popweb-dict-create
"youglish-api"
(concat
"file:///"
(let ((temp-file (concat path-cache-dir "popweb/tmp.html")))
(with-temp-file temp-file
(insert-file-contents
(concat path-emacs-dir "lisp/popweb-dict-yg-js.html"))
(goto-char (point-min))
(silenzio (replace-regexp "query" (concat word " :r"))))
temp-file))
"")
;; NOTE 2022-12-05: WIP
(popweb-dict-create
"forvo"
"https://forvo.com/search/%s/en_usa/"
(concat
"window.scrollTo(0, 0); "
"document.getElementsByTagName('html')[0].style.visibility = 'hidden'; "
"document.getElementsByClassName('main_search')[0].style.visibility = 'visible'; "
"document.getElementsByClassName('main_section')[0].style.visibility = 'visible'; "
;; "document.getElementsByClassName('left-content col')[0].style.visibility = 'visible'; "
;; "document.getElementsByTagName('header')[0].style.display = 'none'; "
;; "document.getElementsByClassName('contentPadding')[0].style.padding = '10px';"
))
;; NOTE 2022-12-05: far from perfect
(popweb-dict-create
"mw"
"https://www.merriam-webster.com/dictionary/%s"
(concat
"window.scrollTo(0, 0); "
"document.getElementsByTagName('html')[0].style.visibility = 'hidden'; "
"document.getElementsByClassName('left-sidebar')[0].style.visibility = 'visible'; " ; ✓
"document.getElementsByClassName('redesign-container')[0].style.visibility = 'visible'; "
)))
然后是dictionary overlay:
(use-package dictionary-overlay
:commands (dictionary-overlay-render-buffer)
:straight nil
:custom-face
(dictionary-overlay-unknownword ((t :inherit font-lock-keyword-face)))
(dictionary-overlay-translation ((t :inherit font-lock-comment-face)))
:config
(dictionary-overlay-start)
(setq dictionary-overlay-translators '("local" "darwin" "sdcv" "web")
dictionary-overlay-recenter-after-mark-and-jump 10
dictionary-overlay-user-data-directory (concat path-emacs-private-dir "dictionary-overlay-data/")
dictionary-overlay-just-unknown-words nil
dictionary-overlay-auto-jump-after '(mark-word-known
;; mark-word-unknown
render-buffer))
(use-package popweb :straight nil :demand)
(defvar dictionary-overlay-lookup-prefix-map
(let ((map (make-sparse-keymap)))
(define-key map (kbd "y") #'popweb-dict-youdao-pointer)
(define-key map (kbd "u") #'popweb-dict-youglish-pointer)
(define-key map (kbd "o") #'popweb-dict-forvo-pointer)
(define-key map (kbd "k") #'popweb-dict-youglish-api-pointer)
(define-key map (kbd "m") #'popweb-dict-mw-pointer)
(define-key map (kbd "b") #'popweb-dict-bing-pointer)
map)
"Keymap for 3rd party dictionaries.")
(defun dictionary-overlay-lookup-prefix-map ()
"Transient keymap for fast lookup with different dictionaries."
(interactive)
(set-transient-map dictionary-overlay-lookup-prefix-map))
:general
(dictionary-overlay-map
"p" nil
"n" nil
"m" nil
"M" nil
;; --
"j" #'dictionary-overlay-jump-prev-unknown-word
"k" #'dictionary-overlay-lookup-prefix-map
"l" #'dictionary-overlay-jump-next-unknown-word
"L" (lambda () (interactive) (websocket-bridge-app-open-buffer 'dictionary-overlay))
"o" #'dictionary-overlay-mark-word-smart
"O" #'dictionary-overlay-mark-word-smart-reversely
;; --
"a" #'dictionary-overlay-mark-word-unknown
"." #'dictionary-overlay-jump-out-of-overlay
"r" #'popweb-restart-process)
(elfeed-show-mode-map
"a" #'dictionary-overlay-mark-word-unknown
"r" #'dictionary-overlay-restart
"." #'dictionary-overlay-render-buffer)
(eww-mode-map
"a" #'dictionary-overlay-mark-word-unknown
"r" #'dictionary-overlay-restart
"." #'dictionary-overlay-render-buffer))
看了下最近的词汇,发现 knownword 里有不少数字,估计是 mark buffer 来的;另外还有些人名
可否再有一个 txt 放数字、人名地名啥的,执行 dictionary-overlay-special 放入?或者至少忽略掉阿拉伯数字
感谢分享,看来要学下 use-package 和 straight 了
btw: 大家用什么来快速移动到和选中单词以便标记?