Skip to content
  • Alan Wu's avatar
    89e79976
    Combine call info and cache to speed up method invocation · 89e79976
    Alan Wu authored
    To perform a regular method call, the VM needs two structs,
    `rb_call_info` and `rb_call_cache`. At the moment, we allocate these two
    structures in separate buffers. In the worst case, the CPU needs to read
    4 cache lines to complete a method call. Putting the two structures
    together reduces the maximum number of cache line reads to 2.
    
    Combining the structures also saves 8 bytes per call site as the current
    layout uses separate two pointers for the call info and the call cache.
    This saves about 2 MiB on Discourse.
    
    This change improves the Optcarrot benchmark at least 3%. For more
    details, see attached bugs.ruby-lang.org ticket.
    
    Complications:
     - A new instruction attribute `comptime_sp_inc` is introduced to
     calculate SP increase at compile time without using call caches. At
     compile time, a `TS_CALLDATA` operand points to a call info struct, but
     at runtime, the same operand points to a call data struct. Instruction
     that explicitly define `sp_inc` also need to define `comptime_sp_inc`.
     - MJIT code for copying call cache becomes slightly more complicated.
     - This changes the bytecode format, which might break existing tools.
    
    [Misc #16258]
    89e79976
    Combine call info and cache to speed up method invocation
    Alan Wu authored
    To perform a regular method call, the VM needs two structs,
    `rb_call_info` and `rb_call_cache`. At the moment, we allocate these two
    structures in separate buffers. In the worst case, the CPU needs to read
    4 cache lines to complete a method call. Putting the two structures
    together reduces the maximum number of cache line reads to 2.
    
    Combining the structures also saves 8 bytes per call site as the current
    layout uses separate two pointers for the call info and the call cache.
    This saves about 2 MiB on Discourse.
    
    This change improves the Optcarrot benchmark at least 3%. For more
    details, see attached bugs.ruby-lang.org ticket.
    
    Complications:
     - A new instruction attribute `comptime_sp_inc` is introduced to
     calculate SP increase at compile time without using call caches. At
     compile time, a `TS_CALLDATA` operand points to a call info struct, but
     at runtime, the same operand points to a call data struct. Instruction
     that explicitly define `sp_inc` also need to define `comptime_sp_inc`.
     - MJIT code for copying call cache becomes slightly more complicated.
     - This changes the bytecode format, which might break existing tools.
    
    [Misc #16258]
Loading