Skip to content
  • Aaron Patterson's avatar
    50c2c4bd
    Make rb_vm_insns_count a thread local variable · 50c2c4bd
    Aaron Patterson authored
    `rb_vm_insns_count` is a global variable used for reporting YJIT
    statistics. It is a counter that tallies the number of interpreter
    instructions that have been executed, this way we can approximate how
    much time we're spending in YJIT compared to the interpreter.
    
    Unfortunately keeping this statistic means that every instruction
    executed in the interpreter loop must increment the counter. Normally
    this isn't a problem, but in multi-threaded situations (when Ractors are
    used), incrementing this counter can become quite costly due to page
    caching issues.
    
    Additionally, since there is no locking when incrementing this global,
    the count can't really make sense in a multi-threaded environment.
    
    This commit changes `rb_vm_insns_count` to a thread local. That way each
    Ractor has it's own copy of the counter and incrementing the counter
    becomes quite cheap. Of course this means that in multi-threaded
    situations, the value doesn't really make sense (but it didn't make
    sense before because of the lack of locking).
    
    The counter is used for YJIT statistics, and since YJIT is basically
    disabled when Ractors are in use, I don't think we care about
    inaccuracies (for the time being). We can revisit this counter when we
    give YJIT multi-threading support, but for the time being this commit
    restores multi-threaded performance.
    
    To test this, I used the benchmark in [Bug #20489].
    
    Here is the performance on Ruby 3.2:
    
    ```
    $ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
    ruby 3.2.0 (2022-12-25 revision a5289082) [x86_64-linux]
    [0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
    ../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
    
    ________________________________________________________
    Executed in    2.53 secs    fish           external
       usr time   19.86 secs  370.00 micros   19.86 secs
       sys time    0.02 secs  320.00 micros    0.02 secs
    ```
    
    We can see the regression in performance on the master branch:
    
    ```
    $ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
    ruby 3.5.0dev (2025-01-10T16:22:26Z master 4a2702da) +PRISM [x86_64-linux]
    [0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
    ../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
    
    ________________________________________________________
    Executed in   24.87 secs    fish           external
       usr time  195.55 secs    0.00 micros  195.55 secs
       sys time    0.00 secs  716.00 micros    0.00 secs
    ```
    
    Here are the stats after this commit:
    
    ```
    $ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
    ruby 3.5.0dev (2025-01-10T20:37:06Z tl 3ef0432779) +PRISM [x86_64-linux]
    [0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
    ../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
    
    ________________________________________________________
    Executed in    2.46 secs    fish           external
       usr time   19.34 secs  381.00 micros   19.34 secs
       sys time    0.01 secs  321.00 micros    0.01 secs
    ```
    
    [Bug #20489]
    50c2c4bd
    Make rb_vm_insns_count a thread local variable
    Aaron Patterson authored
    `rb_vm_insns_count` is a global variable used for reporting YJIT
    statistics. It is a counter that tallies the number of interpreter
    instructions that have been executed, this way we can approximate how
    much time we're spending in YJIT compared to the interpreter.
    
    Unfortunately keeping this statistic means that every instruction
    executed in the interpreter loop must increment the counter. Normally
    this isn't a problem, but in multi-threaded situations (when Ractors are
    used), incrementing this counter can become quite costly due to page
    caching issues.
    
    Additionally, since there is no locking when incrementing this global,
    the count can't really make sense in a multi-threaded environment.
    
    This commit changes `rb_vm_insns_count` to a thread local. That way each
    Ractor has it's own copy of the counter and incrementing the counter
    becomes quite cheap. Of course this means that in multi-threaded
    situations, the value doesn't really make sense (but it didn't make
    sense before because of the lack of locking).
    
    The counter is used for YJIT statistics, and since YJIT is basically
    disabled when Ractors are in use, I don't think we care about
    inaccuracies (for the time being). We can revisit this counter when we
    give YJIT multi-threading support, but for the time being this commit
    restores multi-threaded performance.
    
    To test this, I used the benchmark in [Bug #20489].
    
    Here is the performance on Ruby 3.2:
    
    ```
    $ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
    ruby 3.2.0 (2022-12-25 revision a5289082) [x86_64-linux]
    [0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
    ../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
    
    ________________________________________________________
    Executed in    2.53 secs    fish           external
       usr time   19.86 secs  370.00 micros   19.86 secs
       sys time    0.02 secs  320.00 micros    0.02 secs
    ```
    
    We can see the regression in performance on the master branch:
    
    ```
    $ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
    ruby 3.5.0dev (2025-01-10T16:22:26Z master 4a2702da) +PRISM [x86_64-linux]
    [0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
    ../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
    
    ________________________________________________________
    Executed in   24.87 secs    fish           external
       usr time  195.55 secs    0.00 micros  195.55 secs
       sys time    0.00 secs  716.00 micros    0.00 secs
    ```
    
    Here are the stats after this commit:
    
    ```
    $ time RUBY_MAX_CPU=12 ./miniruby -v ../test.rb 8 8
    ruby 3.5.0dev (2025-01-10T20:37:06Z tl 3ef0432779) +PRISM [x86_64-linux]
    [0...1, 1...2, 2...3, 3...4, 4...5, 5...6, 6...7, 7...8]
    ../test.rb:43: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
    
    ________________________________________________________
    Executed in    2.46 secs    fish           external
       usr time   19.34 secs  381.00 micros   19.34 secs
       sys time    0.01 secs  321.00 micros    0.01 secs
    ```
    
    [Bug #20489]
Loading