plist 居然比 alist 慢

Kana · 2025 年5 月 31 日 07:44

跑了一下 perf，但仍然看不出来什么……

命令及输出

$ perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,branch-misses emacs -q --batch --script test.el --eval '(test-alist (t-create-alist 2000) 2000 2000)'

 Performance counter stats for 'emacs -q --batch --script test.el --eval (test-alist (t-create-alist 2000) 2000 2000)':

     1,664,859,118      cache-references:u                                                    
        37,560,924      cache-misses:u                   #    2.26% of all cache refs         
    22,989,245,371      cycles:u                                                              
    89,163,920,063      instructions:u                   #    3.88  insn per cycle            
    29,026,961,366      branches:u                                                            
        21,303,743      branch-misses:u                  #    0.07% of all branches           

       5.229262209 seconds time elapsed

       5.180081000 seconds user
       0.023831000 seconds sys


$ perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,branch-misses emacs -q --batch --script test.el --eval '(test-plist (t-create-plist 2000) 2000 2000)'

 Performance counter stats for 'emacs -q --batch --script test.el --eval (test-plist (t-create-plist 2000) 2000 2000)':

     1,661,666,301      cache-references:u                                                    
        37,152,948      cache-misses:u                   #    2.24% of all cache refs         
    40,751,947,902      cycles:u                                                              
    84,247,031,479      instructions:u                   #    2.07  insn per cycle            
    24,801,242,536      branches:u                                                            
        21,350,234      branch-misses:u                  #    0.09% of all branches           

       9.235591770 seconds time elapsed

       9.183998000 seconds user
       0.022889000 seconds sys

二者的执行指令数量都差不多，但是 IPC（insn per cycle）不知为什么差异巨大：alist-get 是 3.8 IPC，而 plist-get 是 2.07 IPC，减了将近一半。常见的解释一般是缓存命中率或是分支预测有问题，但二者这两项也大同小异……不知道有没有比较熟悉底层调优的人来分析一下。

二者一个循环的操作也几乎一致

第 1 列	第 2 列
plist	alist
(consp list)	(consp list)
(consp (cdr list))	(consp (car list))
非 cons 时跳出循环	非 cons 时继续下个循环
(eq (car list) key)	(eq (caar list) key)
(setq list (cddr list))	(setq list (cdr list))

当然上面 perf 结果里也显示二者 instructions:u 差不太多就是了。

想了想，还有一种可能是对齐问题。如果 16-byte 对齐的地址访问比 8-byte 对齐的更快的话 [citation needed]，那么 Lisp_Cons 的 car 应该一般是 16-byte 对齐的，而紧跟后面的 cdr 就只能是 8-byte 对齐了。plist 操作 cdr 更多，而 alist 的 car 更多，说不定能说明速度差异？