Emacs GC 的一些研究

解决 GC 卡无需搞 “full gc” 之类的分带 GC,只需要让 GC 在 mark 和 sweep 的过程中不时回到 Lisp 解释器,这样就感觉不到卡顿。

9 个赞

改了一下GC代码, 针对每个小函数都打了时间戳, 下面是一些日志(时间是微妙):

GC startvisit_static_gc_roots: 44446 microseconds
mark_pinned_objects: 1 microseconds
mark_pinned_symbols: 1 microseconds
mark_lread: 0 microseconds
mark_terminals: 2 microseconds
mark_kboards: 3 microseconds
mark_threads: 461 microseconds
mark_gui: 6 microseconds
compact_font_caches: 4 microseconds
mark_buffers: 2164 microseconds
mark_finalizer_list: 1 microseconds
mark_and_sweep_weak_table_contents: 41 microseconds
gc_sweep: 11082 microseconds
gc rest part: 3 microseconds
GC total: 58245 microseconds

GC startvisit_static_gc_roots: 46099 microseconds
mark_pinned_objects: 1 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 1 microseconds
mark_kboards: 3 microseconds
mark_threads: 399 microseconds
mark_gui: 5 microseconds
compact_font_caches: 4 microseconds
mark_buffers: 2127 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 41 microseconds
gc_sweep: 10597 microseconds
gc rest part: 3 microseconds
GC total: 59309 microseconds

GC startvisit_static_gc_roots: 45430 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 1 microseconds
mark_terminals: 2 microseconds
mark_kboards: 3 microseconds
mark_threads: 405 microseconds
mark_gui: 6 microseconds
compact_font_caches: 5 microseconds
mark_buffers: 2174 microseconds
mark_finalizer_list: 1 microseconds
mark_and_sweep_weak_table_contents: 41 microseconds
gc_sweep: 10600 microseconds
gc rest part: 3 microseconds
GC total: 58700 microseconds

GC startvisit_static_gc_roots: 45565 microseconds
mark_pinned_objects: 1 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 2 microseconds
mark_kboards: 3 microseconds
mark_threads: 419 microseconds
mark_gui: 5 microseconds
compact_font_caches: 4 microseconds
mark_buffers: 2158 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 40 microseconds
gc_sweep: 10595 microseconds
gc rest part: 3 microseconds
GC total: 58820 microseconds

GC startvisit_static_gc_roots: 44817 microseconds
mark_pinned_objects: 1 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 1 microseconds
mark_kboards: 3 microseconds
mark_threads: 389 microseconds
mark_gui: 6 microseconds
compact_font_caches: 4 microseconds
mark_buffers: 2181 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 41 microseconds
gc_sweep: 10698 microseconds
gc rest part: 4 microseconds
GC total: 58173 microseconds

GC startvisit_static_gc_roots: 45418 microseconds
mark_pinned_objects: 1 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 2 microseconds
mark_kboards: 3 microseconds
mark_threads: 409 microseconds
mark_gui: 6 microseconds
compact_font_caches: 4 microseconds
mark_buffers: 2345 microseconds
mark_finalizer_list: 1 microseconds
mark_and_sweep_weak_table_contents: 41 microseconds
gc_sweep: 11044 microseconds
gc rest part: 78 microseconds
GC total: 59377 microseconds

GC startvisit_static_gc_roots: 44571 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 1 microseconds
mark_lread: 0 microseconds
mark_terminals: 2 microseconds
mark_kboards: 3 microseconds
mark_threads: 399 microseconds
mark_gui: 6 microseconds
compact_font_caches: 5 microseconds
mark_buffers: 2145 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 41 microseconds
gc_sweep: 10502 microseconds
gc rest part: 86 microseconds
GC total: 57787 microseconds

GC startvisit_static_gc_roots: 43894 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 1 microseconds
mark_terminals: 2 microseconds
mark_kboards: 4 microseconds
mark_threads: 401 microseconds
mark_gui: 6 microseconds
compact_font_caches: 4 microseconds
mark_buffers: 2198 microseconds
mark_finalizer_list: 1 microseconds
mark_and_sweep_weak_table_contents: 44 microseconds
gc_sweep: 10464 microseconds
gc rest part: 4 microseconds
GC total: 57045 microseconds

GC startvisit_static_gc_roots: 45366 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 2 microseconds
mark_kboards: 4 microseconds
mark_threads: 134 microseconds
mark_gui: 5 microseconds
compact_font_caches: 4 microseconds
mark_buffers: 2218 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 42 microseconds
gc_sweep: 10620 microseconds
gc rest part: 3 microseconds
GC total: 58952 microseconds

从统计看, GC里面的耗时主要是三个函数:

  1. visit_static_gc_roots: 扫描一遍所有 Lisp_Object, 耗时占用70%
  2. gc_sweep: 释放无用的垃圾对象, 耗时占用19%
  3. mark_buffers: 释放Buffer的垃圾对象, 耗时3%
  4. 其他所有函数占用8%

所以GC目前看, 优化最大潜力就是 visit_static_gc_roots 和 gc_sweep 这两个函数, 因为单位是微妙, 其他函数的消耗都可以忽略不计。

分享一下统计数据给大家看。

13 个赞
visit_static_gc_roots: 31064 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 21692 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 2 microseconds
GC_ROOT_C_SYMBOL: 10 microseconds
GC_ROOT_STATICPRO: 6840 microseconds

进一步分析 visit_static_gc_roots 函数里面, 其中

  visit_buffer_root (visitor,
                     &buffer_defaults,
                     GC_ROOT_BUFFER_LOCAL_DEFAULT);

占用visit_static_gc_roots 函数70%的计算时间。

  for (int i = 0; i < staticidx; i++)
    visitor.visit (staticvec[i], GC_ROOT_STATICPRO, visitor.data);

占用visit_static_gc_roots函数22%的计算时间。

2 个赞

在大型Markdown文档中测试删除行操作, 操作一次, 产生了12次GC回收调用, 这12次加载一起 (+ 35967 36885 36038 35876 35993 35642 38283 36760 36464 36201 36414 39017) 总共439毫秒, 也就是说, 删除操作光GC就占用了 0.43 秒, 造成了明显的卡顿。

2 个赞

这个是 buffer 没对大文件优化导致需多次复制內容吧。


gc_sweep 如果不实现更复杂的 lazy sweep,或換成 mark & compact 的话,应该是没什么提升空间了。

这个只用了一次的 visitor 的抽象在这里出现非常奇怪,而且让人怀疑是不是会导致 C 编译器不能优化 mark_object_root_visitor 没用到的参数。

然后,process_mark_stack 会把没有 mark 过的 object 所有引用的 object 一股脑加进 mark stack,而不是先 check mark 再决定加不加入 mark stack,这样的处理看起来也很有问题,会导致 mark 过的 object 多次进入 mark stack。mark 都已经在 object header 里了,也说不上什么 cache 友好的理由,讲究这个的都是用 bitmap marking 的。

是的, 是 lsp-bridge 自身比较大的Markdown文件(371行, 其实也不算大), 在删除Markdown表格内容的时候尤其明显, 可能是我 fingertip-kill 写的有问题, 但是刚好 profiler 捕获到是GC的耗时, 所以我刚好有一个样品可以压力测试GC。

下面是GC卡顿的实时日志:

...


GC start: 19000000
before gc: 9 microseconds
visit_static_gc_roots: 30529 microseconds
mark_pinned_objects: 1 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 2 microseconds
mark_kboards: 3 microseconds
mark_threads: 106 microseconds
mark_gui: 6 microseconds
compact_font_caches: 3 microseconds
mark_buffers: 283 microseconds
mark_finalizer_list: 1 microseconds
mark_and_sweep_weak_table_contents: 33 microseconds
gc_sweep: 7207 microseconds
gc rest part: 4 microseconds
GC total: 38202 microseconds
GC INTERVAL: 78855 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 24807 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 8 microseconds
GC_ROOT_STATICPRO: 6789 microseconds

GC start: 19000000
before gc: 8 microseconds
visit_static_gc_roots: 31723 microseconds
mark_pinned_objects: 1 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 2 microseconds
mark_kboards: 3 microseconds
mark_threads: 84 microseconds
mark_gui: 5 microseconds
compact_font_caches: 3 microseconds
mark_buffers: 288 microseconds
mark_finalizer_list: 1 microseconds
mark_and_sweep_weak_table_contents: 33 microseconds
gc_sweep: 6842 microseconds
gc rest part: 3 microseconds
GC total: 39044 microseconds
GC INTERVAL: 40879 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 22893 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 6293 microseconds

GC start: 20000000
before gc: 44 microseconds
visit_static_gc_roots: 29377 microseconds
mark_pinned_objects: 1 microseconds
mark_pinned_symbols: 1 microseconds
mark_lread: 1 microseconds
mark_terminals: 3 microseconds
mark_kboards: 4 microseconds
mark_threads: 180 microseconds
mark_gui: 6 microseconds
compact_font_caches: 5 microseconds
mark_buffers: 399 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 41 microseconds
gc_sweep: 7516 microseconds
gc rest part: 4 microseconds
GC total: 37604 microseconds
GC INTERVAL: 315734 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 23324 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 6339 microseconds

GC start: 20000000
before gc: 38 microseconds
visit_static_gc_roots: 29805 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 1 microseconds
mark_lread: 1 microseconds
mark_terminals: 2 microseconds
mark_kboards: 4 microseconds
mark_threads: 280 microseconds
mark_gui: 6 microseconds
compact_font_caches: 4 microseconds
mark_buffers: 354 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 36 microseconds
gc_sweep: 6959 microseconds
gc rest part: 3 microseconds
GC total: 37506 microseconds
GC INTERVAL: 118002 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 23940 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 6834 microseconds

GC start: 20000000
before gc: 4 microseconds
visit_static_gc_roots: 30916 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 1 microseconds
mark_terminals: 2 microseconds
mark_kboards: 3 microseconds
mark_threads: 301 microseconds
mark_gui: 6 microseconds
compact_font_caches: 3 microseconds
mark_buffers: 377 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 37 microseconds
gc_sweep: 7103 microseconds
gc rest part: 4 microseconds
GC total: 38774 microseconds
GC INTERVAL: 111044 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 22734 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 8 microseconds
GC_ROOT_STATICPRO: 6288 microseconds

...

GC start: 26000000
before gc: 20 microseconds
visit_static_gc_roots: 29436 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 1 microseconds
mark_terminals: 1 microseconds
mark_kboards: 3 microseconds
mark_threads: 364 microseconds
mark_gui: 5 microseconds
compact_font_caches: 3 microseconds
mark_buffers: 331 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 33 microseconds
gc_sweep: 6894 microseconds
gc rest part: 92 microseconds
GC total: 37243 microseconds
GC INTERVAL: 78344 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 23136 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 2 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 7093 microseconds

GC start: 26000000
before gc: 3 microseconds
visit_static_gc_roots: 30330 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 1 microseconds
mark_lread: 0 microseconds
mark_terminals: 1 microseconds
mark_kboards: 37 microseconds
mark_threads: 202 microseconds
mark_gui: 5 microseconds
compact_font_caches: 3 microseconds
mark_buffers: 345 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 33 microseconds
gc_sweep: 6987 microseconds
gc rest part: 4 microseconds
GC total: 37969 microseconds
GC INTERVAL: 79405 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 23107 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 15 microseconds
GC_ROOT_STATICPRO: 7508 microseconds

GC start: 27000000
before gc: 4 microseconds
visit_static_gc_roots: 30712 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 1 microseconds
mark_terminals: 1 microseconds
mark_kboards: 3 microseconds
mark_threads: 250 microseconds
mark_gui: 5 microseconds
compact_font_caches: 3 microseconds
mark_buffers: 324 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 30 microseconds
gc_sweep: 7153 microseconds
gc rest part: 3 microseconds
GC total: 38503 microseconds
GC INTERVAL: 83154 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 23341 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 6363 microseconds

GC start: 27000000
before gc: 24 microseconds
visit_static_gc_roots: 29848 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 1 microseconds
mark_lread: 1 microseconds
mark_terminals: 3 microseconds
mark_kboards: 4 microseconds
mark_threads: 302 microseconds
mark_gui: 5 microseconds
compact_font_caches: 5 microseconds
mark_buffers: 409 microseconds
mark_finalizer_list: 1 microseconds
mark_and_sweep_weak_table_contents: 39 microseconds
gc_sweep: 6805 microseconds
gc rest part: 10 microseconds
GC total: 37509 microseconds
GC INTERVAL: 83728 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 23716 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 6310 microseconds

GC start: 27000000
before gc: 4 microseconds
visit_static_gc_roots: 30145 microseconds
mark_pinned_objects: 35 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 1 microseconds
mark_terminals: 2 microseconds
mark_kboards: 4 microseconds
mark_threads: 241 microseconds
mark_gui: 6 microseconds
compact_font_caches: 5 microseconds
mark_buffers: 426 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 39 microseconds
gc_sweep: 6588 microseconds
gc rest part: 108 microseconds
GC total: 37650 microseconds
GC INTERVAL: 85017 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 23118 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 6445 microseconds

GC start: 27000000
before gc: 4 microseconds
visit_static_gc_roots: 29677 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 1 microseconds
mark_kboards: 3 microseconds
mark_threads: 178 microseconds
mark_gui: 5 microseconds
compact_font_caches: 4 microseconds
mark_buffers: 307 microseconds
mark_finalizer_list: 1 microseconds
mark_and_sweep_weak_table_contents: 32 microseconds
gc_sweep: 6896 microseconds
gc rest part: 3 microseconds
GC total: 37121 microseconds
GC INTERVAL: 89349 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 23489 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 2 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 6552 microseconds

GC start: 27000000
before gc: 3 microseconds
visit_static_gc_roots: 30141 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 1 microseconds
mark_kboards: 2 microseconds
mark_threads: 298 microseconds
mark_gui: 5 microseconds
compact_font_caches: 3 microseconds
mark_buffers: 347 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 34 microseconds
gc_sweep: 6967 microseconds
gc rest part: 3 microseconds
GC total: 37818 microseconds
GC INTERVAL: 92438 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 22757 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 15 microseconds
GC_ROOT_STATICPRO: 6383 microseconds

GC start: 27000000
before gc: 4 microseconds
visit_static_gc_roots: 29239 microseconds
mark_pinned_objects: 1 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 1 microseconds
mark_terminals: 1 microseconds
mark_kboards: 3 microseconds
mark_threads: 307 microseconds
mark_gui: 5 microseconds
compact_font_caches: 3 microseconds
mark_buffers: 373 microseconds
mark_finalizer_list: 1 microseconds
mark_and_sweep_weak_table_contents: 31 microseconds
gc_sweep: 6997 microseconds
gc rest part: 4 microseconds
GC total: 37007 microseconds
GC INTERVAL: 88881 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 22788 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 7 microseconds
GC_ROOT_C_SYMBOL: 8 microseconds
GC_ROOT_STATICPRO: 6932 microseconds

GC start: 27000000
before gc: 32 microseconds
visit_static_gc_roots: 29809 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 22 microseconds
mark_kboards: 3 microseconds
mark_threads: 297 microseconds
mark_gui: 5 microseconds
compact_font_caches: 4 microseconds
mark_buffers: 337 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 33 microseconds
gc_sweep: 6998 microseconds
gc rest part: 2 microseconds
GC total: 37554 microseconds
GC INTERVAL: 86311 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 24301 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 6363 microseconds

GC start: 27000000
before gc: 3 microseconds
visit_static_gc_roots: 30816 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 1 microseconds
mark_lread: 0 microseconds
mark_terminals: 1 microseconds
mark_kboards: 3 microseconds
mark_threads: 333 microseconds
mark_gui: 4 microseconds
compact_font_caches: 3 microseconds
mark_buffers: 346 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 33 microseconds
gc_sweep: 7043 microseconds
gc rest part: 3 microseconds
GC total: 38603 microseconds
GC INTERVAL: 88926 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 22719 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 9 microseconds
GC_ROOT_STATICPRO: 6777 microseconds

GC start: 27000000
before gc: 4 microseconds
visit_static_gc_roots: 29595 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 1 microseconds
mark_lread: 1 microseconds
mark_terminals: 2 microseconds
mark_kboards: 3 microseconds
mark_threads: 297 microseconds
mark_gui: 6 microseconds
compact_font_caches: 3 microseconds
mark_buffers: 303 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 34 microseconds
gc_sweep: 6774 microseconds
gc rest part: 8 microseconds
GC total: 37040 microseconds
GC INTERVAL: 88045 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 22831 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 6293 microseconds

GC start: 27000000
before gc: 4 microseconds
visit_static_gc_roots: 29241 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 1 microseconds
mark_lread: 0 microseconds
mark_terminals: 46 microseconds
mark_kboards: 3 microseconds
mark_threads: 245 microseconds
mark_gui: 5 microseconds
compact_font_caches: 3 microseconds
mark_buffers: 327 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 33 microseconds
gc_sweep: 6871 microseconds
gc rest part: 3 microseconds
GC total: 36796 microseconds
GC INTERVAL: 88022 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 22335 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 6860 microseconds

GC start: 27000000
before gc: 77 microseconds
visit_static_gc_roots: 29315 microseconds
mark_pinned_objects: 1 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 31 microseconds
mark_kboards: 3 microseconds
mark_threads: 38 microseconds
mark_gui: 6 microseconds
compact_font_caches: 4 microseconds
mark_buffers: 486 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 34 microseconds
gc_sweep: 7129 microseconds
gc rest part: 3 microseconds
GC total: 37162 microseconds
GC INTERVAL: 139716 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 22875 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 2 microseconds
GC_ROOT_C_SYMBOL: 9 microseconds
GC_ROOT_STATICPRO: 6563 microseconds

GC start: 28000000
before gc: 63 microseconds
visit_static_gc_roots: 29626 microseconds
mark_pinned_objects: 1 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 2 microseconds
mark_kboards: 3 microseconds
mark_threads: 212 microseconds
mark_gui: 7 microseconds
compact_font_caches: 6 microseconds
mark_buffers: 556 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 55 microseconds
gc_sweep: 7671 microseconds
gc rest part: 93 microseconds
GC total: 38317 microseconds
GC INTERVAL: 318769 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 22833 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 2 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 6443 microseconds

GC start: 28000000
before gc: 42 microseconds
visit_static_gc_roots: 29472 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 3 microseconds
mark_kboards: 5 microseconds
mark_threads: 165 microseconds
mark_gui: 6 microseconds
compact_font_caches: 6 microseconds
mark_buffers: 510 microseconds
mark_finalizer_list: 1 microseconds
mark_and_sweep_weak_table_contents: 44 microseconds
gc_sweep: 7642 microseconds
gc rest part: 110 microseconds
GC total: 38028 microseconds
GC INTERVAL: 299611 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 23170 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 7013 microseconds

GC start: 28000000
before gc: 47 microseconds
visit_static_gc_roots: 30281 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 1 microseconds
mark_terminals: 1 microseconds
mark_kboards: 2 microseconds
mark_threads: 243 microseconds
mark_gui: 5 microseconds
compact_font_caches: 3 microseconds
mark_buffers: 322 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 31 microseconds
gc_sweep: 7002 microseconds
gc rest part: 3 microseconds
GC total: 37953 microseconds
GC INTERVAL: 170713 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 23481 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 2 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 6813 microseconds

GC start: 28000000
before gc: 4 microseconds
visit_static_gc_roots: 30395 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 1 microseconds
mark_kboards: 2 microseconds
mark_threads: 212 microseconds
mark_gui: 5 microseconds
compact_font_caches: 3 microseconds
mark_buffers: 331 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 40 microseconds
gc_sweep: 6923 microseconds
gc rest part: 84 microseconds
GC total: 38013 microseconds
GC INTERVAL: 79987 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 23813 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 6663 microseconds

GC start: 28000000
before gc: 4 microseconds
visit_static_gc_roots: 30565 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 2 microseconds
mark_kboards: 36 microseconds
mark_threads: 209 microseconds
mark_gui: 5 microseconds
compact_font_caches: 3 microseconds
mark_buffers: 366 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 35 microseconds
gc_sweep: 7090 microseconds
gc rest part: 81 microseconds
GC total: 38428 microseconds
GC INTERVAL: 85031 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 23481 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 6310 microseconds

GC start: 29000000
before gc: 85 microseconds
visit_static_gc_roots: 29927 microseconds
mark_pinned_objects: 1 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 2 microseconds
mark_kboards: 3 microseconds
mark_threads: 110 microseconds
mark_gui: 6 microseconds
compact_font_caches: 4 microseconds
mark_buffers: 359 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 35 microseconds
gc_sweep: 7107 microseconds
gc rest part: 4 microseconds
GC total: 37653 microseconds
GC INTERVAL: 122217 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 22572 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 8 microseconds
GC_ROOT_STATICPRO: 6970 microseconds

GC start: 29000000
before gc: 19 microseconds
visit_static_gc_roots: 29642 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 20 microseconds
mark_terminals: 3 microseconds
mark_kboards: 32 microseconds
mark_threads: 254 microseconds
mark_gui: 5 microseconds
compact_font_caches: 4 microseconds
mark_buffers: 332 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 43 microseconds
gc_sweep: 7134 microseconds
gc rest part: 250 microseconds
GC total: 37754 microseconds
GC INTERVAL: 90011 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 22076 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 8 microseconds
GC_ROOT_STATICPRO: 7556 microseconds

GC start: 29000000
before gc: 65 microseconds
visit_static_gc_roots: 29829 microseconds
mark_pinned_objects: 1 microseconds
mark_pinned_symbols: 1 microseconds
mark_lread: 1 microseconds
mark_terminals: 3 microseconds
mark_kboards: 4 microseconds
mark_threads: 250 microseconds
mark_gui: 6 microseconds
compact_font_caches: 6 microseconds
mark_buffers: 470 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 39 microseconds
gc_sweep: 7256 microseconds
gc rest part: 3 microseconds
GC total: 37954 microseconds
GC INTERVAL: 143452 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 22844 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 9 microseconds
GC_ROOT_STATICPRO: 6564 microseconds

GC start: 29000000
before gc: 50 microseconds
visit_static_gc_roots: 29503 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 0 microseconds
mark_terminals: 1 microseconds
mark_kboards: 4 microseconds
mark_threads: 96 microseconds
mark_gui: 5 microseconds
compact_font_caches: 3 microseconds
mark_buffers: 453 microseconds
mark_finalizer_list: 0 microseconds
mark_and_sweep_weak_table_contents: 36 microseconds
gc_sweep: 8722 microseconds
gc rest part: 4 microseconds
GC total: 38934 microseconds
GC INTERVAL: 226304 microseconds
GC_ROOT_BUFFER_LOCAL_DEFAULT: 22681 microseconds
GC_ROOT_BUFFER_LOCAL_NAME: 1 microseconds
GC_ROOT_C_SYMBOL: 14 microseconds
GC_ROOT_STATICPRO: 7875 microseconds

GC start: 29000000
before gc: 89 microseconds
visit_static_gc_roots: 30742 microseconds
mark_pinned_objects: 0 microseconds
mark_pinned_symbols: 0 microseconds
mark_lread: 1 microseconds
mark_terminals: 3 microseconds
mark_kboards: 4 microseconds
mark_threads: 243 microseconds
mark_gui: 6 microseconds
compact_font_caches: 7 microseconds
mark_buffers: 623 microseconds
mark_finalizer_list: 1 microseconds
mark_and_sweep_weak_table_contents: 68 microseconds
gc_sweep: 7169 microseconds
gc rest part: 3 microseconds
GC total: 38992 microseconds
GC INTERVAL: 279210 microseconds

从统计看, 当GC卡顿时:

  1. 访问静态GC根(visit_static_gc_roots)的时间大约在29至31毫秒之间。
  2. 标记缓冲区(mark_buffers)的时间在0.3至0.6毫秒之间波动。
  3. GC sweep的时间在6.7至8.7毫秒之间波动。
  4. GC总时间(GC total)在大约37至39毫秒之间波动。
  5. GC间隔(GC INTERVAL)波动较大,从80毫秒到300毫秒不等。

也就是说, GC调用非常频繁, 最短80毫秒就会启动一次, 最长300毫秒就会启动一次, 也就是一秒钟启动次数在3~12次左右, 在这个间隔内,每次GC的消耗时间都在38毫秒左右。

1 个赞

visit_static_gc_roots 其实就是 process_mark_stack,慢慢的扫描整个 heap,当然耗时最多。 mark_threads 是保守的扫描 C stack。

不止一次在这里看到你这种不友善的表达了,你认为这样很得体吗?

另外,建议不要 feed 楼上那个 net.kook (dickmao),他自己找不到工作才来 Emacs China 这边晃荡

dickmao 没必要回复。

已经永久封禁。

7 个赞

gc_sweep 的明显提升方案和 process_mark_stack 一样:保持 block 的两种状态 (mark 完成前与 mark 完成后),然后不时退出 gc_sweep,返回时只 sweep 在 mark 完成前 allocated 的 block。

好的, 感谢回复, 我先把一些性能日志分享到这里。

process_mark_stack 里面现在很难打日志了, 因为对象太多, 打日志基本上 make 都很慢。

这个帖子先分享资料, 了解GC的行为模式, 对GC优化还没有头绪 (主要是感觉GC在单线程模式下, 做的事情太多了)。

等进一步分析, 看看一些其他手段能否提升性能, 比如:

  1. 检查对象是否可达的方法是否高效? 因为现在Lisp Object对象数量太多, 提升这一块哪怕1微秒, 叠加性能都很可观
  2. 对象类型分支检查的性能是否还可以进一步优化?
  3. 剪枝: 比如一个父对象已经可达了, 是否这个父对象下面的所有子对象就不用检查了?
  4. 其他: 提高内存缓存命中率和减少不必要的函数调用, 因为对象太多了, 函数调用次数太多也有一定的开销

后面的方案会造成不可预测的 branch,而且 object 的 mark bit 格式太多,还会增加 process_mark_stack 的大小。这个 mattias 已经试过了。

麻了,感觉不同 object 种类分区是个不好的历史决定,像 OCaml 那样只有一种 block 格式就没有这么多 branch 要写了。

2 个赞

今天用 valgrind 做了一些性能分析, 性能分析方法:

  1. 运行valgrind, 通过 valgrind 找出性能瓶颈
   valgrind --tool=callgrind --simulate-cache=yes --collect-jumps=yes emacs
  • --tool=callgrind:使用 callgrind 工具分析性能。

  • --simulate-cache=yes:模拟缓存行为,以捕获缓存命中/未命中等信息。这将有助于了解缓存效率对性能的影响。

  • --collect-jumps=yes:收集跳转指令信息。这将有助于了解代码的分支预测效果。

    等待 valgrind 启动完(需要十几秒), 运行Emacs卡顿的命令, Ctrl + C 结束分析

  1. 看性能分析报告
    kcachegrind callgrind.out.PID

目前看 mark_localized_symbol 函数的性能占比非常高。

5 个赞

如果你拒绝 inline mark_localized_symbol,会缓解。这是编译器优化造成的 false positive。

原因是大多数 global variable 都指向没有 blv 的 symbol,而 mark_localized_symbol 会反过来调用 process_mark_stack.

2 个赞

对,我昨天看了调用关系,发现mark_localized_symbol和process_mark_stack有循环调用的关系。

可能后面从localized类型转换成符号对应的初始值和默认值,应该不会循环回来第二次。

我先每天没事记录下,分享一下我的实测性能数据。

5 个赞