Skip to content
  • yui-knk's avatar
    d8601621
    Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods · d8601621
    yui-knk authored
    Implementation for Language Server Protocol (LSP) sometimes needs token information.
    For example both `m(1)` and `m(1, )` has same AST structure other than node locations
    then it's impossible to check the existence of `,` from AST. However in later case,
    it might be better to suggest variables list for the second argument.
    Token information is important for such case.
    
    This commit adds these methods.
    
    * Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of`
    * Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes.
    * Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node.
    
    [Feature #19070]
    
    Impacts on memory usage and performance are below:
    
    Memory usage:
    
    ```
    $ cat test.rb
    root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)
    
    $ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
    ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
    11408kb
    
    # keep_tokens :false
    $ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
    17508kb
    
    # keep_tokens :true
    $ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
    30960kb
    ```
    
    Performance:
    
    ```
    $ cat ../ast_keep_tokens.yml
    prelude: |
      src = <<~SRC
        module M
          class C
            def m1(a, b)
              1 + a + b
            end
          end
        end
      SRC
    benchmark:
      without_keep_tokens: |
        RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
      with_keep_tokens: |
        RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)
    
    $ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
    /home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
                --executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
                --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
                --output=markdown --output-compare -v ../ast_keep_tokens.yml
    compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
    built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
    warming up..
    
    |                     |compare-ruby|built-ruby|
    |:--------------------|-----------:|---------:|
    |without_keep_tokens  |     21.659k|   21.303k|
    |                     |       1.02x|         -|
    |with_keep_tokens     |      6.220k|    5.691k|
    |                     |       1.09x|         -|
    ```
    d8601621
    Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods
    yui-knk authored
    Implementation for Language Server Protocol (LSP) sometimes needs token information.
    For example both `m(1)` and `m(1, )` has same AST structure other than node locations
    then it's impossible to check the existence of `,` from AST. However in later case,
    it might be better to suggest variables list for the second argument.
    Token information is important for such case.
    
    This commit adds these methods.
    
    * Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of`
    * Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes.
    * Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node.
    
    [Feature #19070]
    
    Impacts on memory usage and performance are below:
    
    Memory usage:
    
    ```
    $ cat test.rb
    root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true)
    
    $ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v
    ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
    11408kb
    
    # keep_tokens :false
    $ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
    17508kb
    
    # keep_tokens :true
    $ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb
    30960kb
    ```
    
    Performance:
    
    ```
    $ cat ../ast_keep_tokens.yml
    prelude: |
      src = <<~SRC
        module M
          class C
            def m1(a, b)
              1 + a + b
            end
          end
        end
      SRC
    benchmark:
      without_keep_tokens: |
        RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false)
      with_keep_tokens: |
        RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true)
    
    $ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml
    /home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \
                --executables="compare-ruby::./ruby -I.ext/common --disable-gem" \
                --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common  ../tool/runruby.rb --extout=.ext  -- --disable-gems --disable-gem" \
                --output=markdown --output-compare -v ../ast_keep_tokens.yml
    compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
    built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux]
    warming up..
    
    |                     |compare-ruby|built-ruby|
    |:--------------------|-----------:|---------:|
    |without_keep_tokens  |     21.659k|   21.303k|
    |                     |       1.02x|         -|
    |with_keep_tokens     |      6.220k|    5.691k|
    |                     |       1.09x|         -|
    ```
Loading