在 cl-defun or defun 里使用 url-retrieve 如何修改 url-user-agent

emacs -Q 启动 emacs ,在 scratch 中 粘贴一下代码,然后 eval-buffer

;;; -*- lexical-binding:t -*-

(require 'cl-lib)
(require 'url)

(cl-defun my-test-user-agent ()
  (interactive)
  (setq url-debug t)
  (let ((url-user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 Edg/110.0.1587.57"))
    (url-retrieve "https://httpbin.org/get?a=b" (lambda (_)))))

然后 M-x 运行 my-test-user-agent , 打开 URL-DEBUG 查看,发现 User-Agent 没有变成 let 里定义的字符串。

所以应该如何修改 url-user-agent

这个跟 cl-defun 和 defun 没有关系。

url-retrieve 是异步的,当实际创建请求用到 url-user-agent 时,可能已经出了 let 的作用域, 这个时候 url-user-agent 就变成原值了,所以看着像没有生效。

url-user-agent 有无生效不是这么看的,得看服务端收到了什么。

https://ifconfig.net/ 做测试,当 user-agentcurl 时它返回 IP,否则返回 HTML:

(let ((url-user-agent "curl"))
  (url-retrieve
   "https://ifconfig.net"
   (lambda (_)
     (with-current-buffer (current-buffer)
       (when (re-search-forward "\n\n" nil t)
         (print (buffer-substring (point) (point-max))))))))
;; => xx.xx.xx.xxx

(let ((url-user-agent "curl"))
  (with-current-buffer (url-retrieve-synchronously "https://ifconfig.net/")
    (when (re-search-backward "\n\n" nil t)
      (buffer-substring (point) (point-max)))))
;; => xx.xx.xx.xxx

image


EDIT:

实际上 *URL-DEBUG* 看到的也是已经修改的 user-agent:


http -> Found existing connection: ifconfig.net:443 #
http -> Reusing existing connection: ifconfig.net:443
http -> Marking connection as busy: ifconfig.net:443 #
http -> getting referer from buffer: buffer:# target-url:#s(url "https" nil nil "ifconfig.net" nil "" nil nil t nil t t) lastloc:nil
http -> Request is: 
GET / HTTP/1.1
MIME-Version: 1.0
Connection: keep-alive
Host: ifconfig.net
Accept-encoding: gzip
Accept: */*
User-Agent: curl

当你第一次请求 “https://ifconfig.net” ,url-user-agent 就没有修改成功, 后面再请求同一个网站因为重用了连接,再 url-retrieve 返回前就已经发送了请求,所以 url-user-agent 能够正常修改

我用你上面那段代码,第一次执行返回的 html ,后面返回的就是 ip

url 处理第一次连接和重用连接的代码:

⋊> emacs -Q --eval "\
   (progn
     (defvar data-buffer nil)
     (let ((url-user-agent \"curl\")
           (url-debug t)
           (proc-buffer
            (url-retrieve
             \"https://ifconfig.net\"
             (lambda (_)
               (with-current-buffer (setq data-buffer (current-buffer))
                 (when (re-search-forward \"\n\n\" nil t)
                   (message \"==> Respone buffer\")
                   (message \"==> %s\" (buffer-substring (point) (point-max)))))))))
       (while (not data-buffer)
         (sit-for 0.1))
       (with-current-buffer \"*URL-DEBUG*\"
         (message \"==> *URL-DEBUG*\")
         (goto-char (point-min))
         (while (re-search-forward \"^User-Agent:.*\" nil t)
           (message \"==> %s\" (buffer-substring (match-beginning 0) (match-end 0)))))))" --batch
Contacting host: ifconfig.net:443
==> Respone buffer
==> xx.xx.xx.xxx

==> *URL-DEBUG*
==> User-Agent: curl

把 sit-for 放在 let 作用域外试试,上面整个请求过程都在 let 作用域里面,url-user-agent 肯定能够改变

这样的确是有问题。

文档里说了:

The variables ‘url-request-data’, ‘url-request-method’ and ‘url-request-extra-headers’ can be dynamically bound around the request; dynamic binding of other variables doesn’t necessarily take effect.

我觉得其它 url-* 变量动态绑定也应该生效比较合理。

可是 (let ((url-request-extra-headers '(("User-Agent" . "curl"))))... 也不生效啊,url 这块代码应可能问题。

按理说 url-request-extra-headers 可以用来动态绑定 User-Agent,但是在构造 http 请求的时候只是调用了 (url-http-user-agent-string),而忽略了 url-request-extra-headers 的设置:

(with-emacs
  (defvar data-buffer nil)
  (setq url-debug t)
  (let ((url-request-extra-headers '(("User-Agent" . "curl"))))
    (url-retrieve
     "https://ifconfig.net"
     (lambda (_)
       (with-current-buffer (current-buffer)
         (when (re-search-forward "\n\n" nil t)
           (message "==> Respone callback")
           (message "%s" (truncate-string-to-width
                          (buffer-substring (point) (point-max))
                          100 nil nil t))))
       (setq data-buffer t))))
  (while (not data-buffer)
    (sit-for 0.1))
  (with-current-buffer "*URL-DEBUG*"
    (message "==> *URL-DEBUG*")
    (goto-char (point-min))
    (while (re-search-forward "^User-Agent:.*" nil t)
      (message "%s" (buffer-substring (match-beginning 0) (match-end 0))))))
;; ==> Respone callback
;; <!DOCTYPE html>
;; <html lang="en">
;;   <head>
;;     <meta charset="utf-8" />
;;     <title>What is my IP addre...
;; ==> *URL-DEBUG*
;; User-Agent: URL/Emacs Emacs/29.0.60 (TTY; x86_64-apple-darwin17.7.0)
;; User-Agent: curl

*URL-DEBUG* 找到了两个不同 User-Agent

url-retrive 返回的 buffer 再设置一下 local variable 就可以了:

(with-emacs
  (defvar data-buffer nil)
  (setq url-debug t)
  (let ((url-user-agent "curl"))
    (with-current-buffer
        (url-retrieve
         "https://ifconfig.net"
         (lambda (_)
           (with-current-buffer (current-buffer)
             (when (re-search-forward "\n\n" nil t)
               (message "==> Respone callback")
               (message "%s" (truncate-string-to-width
                              (buffer-substring (point) (point-max))
                              100 nil nil t))))
           (setq data-buffer t)))
      (set (make-local-variable 'url-user-agent) url-user-agent))) ;; +++
  (while (not data-buffer)
    (sit-for 0.1))
  (with-current-buffer "*URL-DEBUG*"
    (message "==> *URL-DEBUG*")
    (goto-char (point-min))
    (while (re-search-forward "^User-Agent:.*" nil t)
      (message "%s" (buffer-substring (match-beginning 0) (match-end 0))))))

输出:

Contacting host: ifconfig.net:443
==> Respone callback
xx.xx.xx.xxx

==> *URL-DEBUG*
User-Agent: curl

1 个赞

这种方式有效,感谢大佬