GPTEL如何支持ORGMODE嵌入图片的识别

hongfei6 · 2025 年6 月 15 日 14:16

一直没有找到让GPTEL直接发送ORGMODE内嵌图片的方法，想请教一下搜索到要把 gptel-track-media变量打开，但是打开后用openrouter尝试了好多比较新的模型，大致回复都是类似这样的

===========

很乐意帮您讲解这幅图的内容，但由于您提供的只是一个文本片段和对图片的引用（=file:images/a5_ptn_sub_sys.png=），。。。

==========

请问大家对这个需求强烈吗？个人感觉还是很有用的，大大增强用orgmode写文档时和模型的交互

目前只能先导出成pdf或docx,在网页端发送给模型，有些麻烦

1803 · 2025 年6 月 16 日 06:36

这样吗？

hongfei6 · 2025 年6 月 16 日 06:44

是的。请问你使用的是openrouter的gpt-4o，还是直接使用chatgpt的gpt-4o 还有其它模型支持图片的？

1803 · 2025 年6 月 16 日 07:08

右上角有 Sending media 就行。估计要模型以及 gptel 支持。

我用的azure的4o

hongfei6 · 2025 年6 月 17 日 06:31

谢谢回复

目前尝试下来gptel调用openrouter api就无法出现sending media按钮，无法直接使用此功能，只能手动用gptel -f发送png，间接使用。不确定是gptel还是openrouter的问题

当直接配置成openai gpt-4o后gptel会现实sending media按钮，但是由于我没能注册openai apikey,没法继续试验。

1803 · 2025 年6 月 27 日 15:42

配置模型时把 openai/gpt-4o 替换成类似于下面这样

   (gpt-4o
     :description "Advanced model for complex tasks; cheaper & faster than GPT-Turbo"
     :capabilities (media tool-use json url)
     :mime-types ("image/jpeg" "image/png" "image/gif" "image/webp")
     :context-window 128
     :input-cost 2.50
     :output-cost 10
     :cutoff-date "2023-10")

github.com/karthink/gptel

gptel.el

master


      
          Setting it to (message system tool) will cache everything and is
          the same as t."
            :type '(choice
                    (const :tag "Cache everything" t)
                    (const :tag "Do not cache" nil)
                    (repeat symbol))
            :group 'gptel)
          
          (defvar gptel--known-backends)
          
          (defconst gptel--openai-models
            '((gpt-4o
               :description "Advanced model for complex tasks; cheaper & faster than GPT-Turbo"
               :capabilities (media tool-use json url)
               :mime-types ("image/jpeg" "image/png" "image/gif" "image/webp")
               :context-window 128
               :input-cost 2.50
               :output-cost 10
               :cutoff-date "2023-10")
              (gpt-4o-mini
               :description "Cheap model for fast tasks; cheaper & more capable than GPT-3.5 Turbo"