语音识别使用 Azure的 spx cli , 口语问题修复使用 gptel 的自定义 rewrite
如果大家感兴趣,我把具体配置Elisp代码贴上来。
提高工作效率神器
懒得动脑筋了,把lisp代码贴上来吧
看起来很不错
OK, 以下是部署过程和代码, 根据这个过程安装好后,你就能在 Emacs 中通过 spx-asr-start-or-stop 命令或 F9 快捷键启动或停止 ASR , 选中ASR内容后通过 gptel-rewrite 命令对内容进行修复。
期待后续有Emacs玩家发布适配其他ASR引擎(比如本地 whisper )并整合 gptel-rewrite 的 package.
安装Azure spx程序: Azure Speech CLI 介绍和安装文档
从Azure控制台中获取 region 和 key 写入.profile ,参考环境变量名 AZURE_TRANSCRIPTION_REGION & AZURE_TRANSCRIPTION_KEY
编写一个bash脚本放入系统 PATH 目录。
spx-asr:
#!/bin/bash
~/bin/spx recognize --key ${AZURE_TRANSCRIPTION_KEY} --region ${AZURE_TRANSCRIPTION_REGION} --microphone --languages "zh-CN;en-US"
spx-asr.el :
;; -*- lexical-binding: t; -*-
(defvar spx-asr-process nil)
(defvar spx-asr-recognizing-overlay nil)
(defun spx-asr-insert-at-point (text)
"在光标处插入或更新RECOGNIZING/RECOGNIZED文本。"
(if spx-asr-recognizing-overlay
;; 更新 overlay 内容
(overlay-put spx-asr-recognizing-overlay 'after-string text)
;; 新建 overlay
(setq spx-asr-recognizing-overlay (make-overlay (point) (point)))
(overlay-put spx-asr-recognizing-overlay 'after-string text)))
(defun spx-asr-replace-with-recognized (text)
"将当前 overlay 替换为最终文本。"
(when spx-asr-recognizing-overlay
(let ((start (overlay-start spx-asr-recognizing-overlay))
(end (overlay-end spx-asr-recognizing-overlay)))
(delete-overlay spx-asr-recognizing-overlay)
(setq spx-asr-recognizing-overlay nil)
(save-excursion
(goto-char start)
(delete-region start end)
(insert text))
(goto-char (+ start (length text)))
(insert " ")
)))
(defun spx-asr-process-filter (_ output)
"处理命令行工具的输出."
(let ((lines (split-string output "\n")))
(dolist (line lines)
;;(message line)
(cond
((string-match "^Connection CONNECTED" line)
(message "ASR Running"))
((string-match "^RECOGNIZING: \\(.*\\)$" line)
(spx-asr-insert-at-point (match-string 1 line)))
((string-match "^RECOGNIZED: \\(.*\\)$" line)
(spx-asr-replace-with-recognized (match-string 1 line)))))))
(defun spx-asr-start ()
"在当前buffer以异步方式启动命令行语音识别工具 CMD."
(interactive )
(setq spx-asr-process
(start-process-shell-command
"speech-recognition" nil "spx-asr"))
(set-process-filter spx-asr-process 'spx-asr-process-filter)
(setq spx-asr-recognizing-overlay nil))
(defun spx-asr-stop ()
"终止语音识别进程并清理 overlay。"
(interactive)
(when (and spx-asr-process (process-live-p spx-asr-process))
(kill-process spx-asr-process))
(setq spx-asr-process nil)
(when spx-asr-recognizing-overlay
(delete-overlay spx-asr-recognizing-overlay)
(setq spx-asr-recognizing-overlay nil)))
(defun spx-asr-start-or-stop ()
"根据需要自动的启动或者关闭语音输入服务。
推荐在 .emacs 中通过 (keymap-global-set \"<f9>\" #\\='spx-asr-start-or-stop) 绑定.
"
(interactive)
(if (null spx-asr-process)
(progn
(spx-asr-start)
(message "spx-asr-start called")
)
(spx-asr-stop)
(message "spx-asr-stop called")
)
)
(provide 'spx-asr)
在.emacs 内添加
(require 'spx-asr)
安装 gptel
为gptel 添加专门用来 rewrite 的 directive:
(setq gptel-directives (append gptel-directives '((clear-asr . "处理这段语音转录的文字, 修复转录错误;使用更适合的标点;替换过分口语化的表达. 只需要输出处理后的文本."))))
(keymap-global-set "<f9>" #'spx-asr-start-or-stop)
测试了下,代码非常威猛,能成功运行。
我的emacs 运行在windows wsl上,
wsl2下的 .net8.0 程序spx居然能自动找到麦克风,无需任何设置
值得来一个包啊!