Skip to content
  • Jeremy Evans's avatar
    1a05dc03
    Make backtrace generation work outward from current frame · 1a05dc03
    Jeremy Evans authored
    This fixes multiple bugs found in the partial backtrace
    optimization added in 3b24b791.
    These bugs occurs when passing a start argument to caller where
    the start argument lands on a iseq frame without a pc.
    
    Before this commit, the following code results in the same
    line being printed twice, both for the #each method.
    
    ```ruby
    def a; [1].group_by { b } end
    def b; puts(caller(2, 1).first, caller(3, 1).first) end
    a
    ```
    
    After this commit and in Ruby 2.7, the lines are different,
    with the first line being for each and the second for group_by.
    
    Before this commit, the following code can either segfault or
    result in an infinite loop:
    
    ```ruby
    def foo
      caller_locations(2, 1).inspect # segfault
      caller_locations(2, 1)[0].path # infinite loop
    end
    
    1.times.map { 1.times.map { foo } }
    ```
    
    After this commit, this code works correctly.
    
    This commit completely refactors the backtrace handling.
    Instead of processing the backtrace from the outermost
    frame working in, process it from the innermost frame
    working out.  This is much faster for partial backtraces,
    since you only access the control frames you need to in
    order to construct the backtrace.
    
    To handle cfunc frames in the new design, they start
    out with no location information.  We increment a counter
    for each cfunc frame added.  When an iseq frame with pc
    is accessed, after adding the iseq backtrace location,
    we use the location for the iseq backtrace location for
    all of the directly preceding cfunc backtrace locations.
    
    If the last backtrace line is a cfunc frame, we continue
    scanning for iseq frames until the end control frame, and
    use the location information from the first one for the
    trailing cfunc frames in the backtrace.
    
    As only rb_ec_partial_backtrace_object uses the new
    backtrace implementation, remove all of the function
    pointers and inline the functions.  This makes the
    process easier to understand.
    
    Restore the Ruby 2.7 implementation of backtrace_each and
    use it for all the other functions that called
    backtrace_each other than rb_ec_partial_backtrace_object.
    All other cases requested the entire backtrace, so there
    is no advantage of using the new algorithm for those.
    Additionally, there are implicit assumptions in the other
    code that the backtrace processing works inward instead
    of outward.
    
    Remove the cfunc/iseq union in rb_backtrace_location_t,
    and remove the prev_loc member for cfunc.  Both cfunc and
    iseq types can now have iseq and pc entries, so the
    location information can be accessed the same way for each.
    This avoids the need for a extra backtrace location entry
    to store an iseq backtrace location if the final entry in
    the backtrace is a cfunc. This is also what fixes the
    segfault and infinite loop issues in the above bugs.
    
    Here's Ruby pseudocode for the new algorithm, where start
    and length are the arguments to caller or caller_locations:
    
    ```ruby
    end_cf = VM.end_control_frame.next
    cf = VM.start_control_frame
    size = VM.num_control_frames - 2
    bt = []
    cfunc_counter = 0
    
    if length.nil? || length > size
      length = size
    end
    
    while cf != end_cf && bt.size != length
      if cf.iseq?
        if cf.instruction_pointer?
          if start > 0
            start -= 1
          else
            bt << cf.iseq_backtrace_entry
            cf_counter.times do |i|
              bt[-1 - i].loc = cf.loc
            end
            cfunc_counter = 0
          end
        end
      elsif cf.cfunc?
        if start > 0
          start -= 1
        else
          bt << cf.cfunc_backtrace_entry
          cfunc_counter += 1
        end
      end
    
      cf = cf.prev
    end
    
    if cfunc_counter > 0
      while cf != end_cf
        if (cf.iseq? && cf.instruction_pointer?)
          cf_counter.times do |i|
            bt[-i].loc = cf.loc
          end
        end
        cf = cf.prev
      end
    end
    ```
    
    With the following benchmark, which uses a call depth of
    around 100 (common in many Ruby applications):
    
    ```ruby
    class T
      def test(depth, &block)
        if depth == 0
          yield self
        else
          test(depth - 1, &block)
        end
      end
      def array
        Array.new
      end
      def first
        caller_locations(1, 1)
      end
      def full
        caller_locations
      end
    end
    
    t = T.new
    t.test((ARGV.first || 100).to_i) do
      Benchmark.ips do |x|
        x.report ('caller_loc(1, 1)') {t.first}
        x.report ('caller_loc') {t.full}
        x.report ('Array.new') {t.array}
        x.compare!
      end
    end
    ```
    
    Results before commit:
    
    ```
    Calculating -------------------------------------
        caller_loc(1, 1)    281.159k (_ 0.7%) i/s -      1.426M in   5.073055s
              caller_loc     15.836k (_ 2.1%) i/s -     79.450k in   5.019426s
               Array.new      1.852M (_ 2.5%) i/s -      9.296M in   5.022511s
    
    Comparison:
               Array.new:  1852297.5 i/s
        caller_loc(1, 1):   281159.1 i/s - 6.59x  (_ 0.00) slower
              caller_loc:    15835.9 i/s - 116.97x  (_ 0.00) slower
    ```
    
    Results after commit:
    
    ```
    Calculating -------------------------------------
        caller_loc(1, 1)    562.286k (_ 0.8%) i/s -      2.858M in   5.083249s
              caller_loc     16.402k (_ 1.0%) i/s -     83.200k in   5.072963s
               Array.new      1.853M (_ 0.1%) i/s -      9.278M in   5.007523s
    
    Comparison:
               Array.new:  1852776.5 i/s
        caller_loc(1, 1):   562285.6 i/s - 3.30x  (_ 0.00) slower
              caller_loc:    16402.3 i/s - 112.96x  (_ 0.00) slower
    ```
    
    This shows that the speed of caller_locations(1, 1) has roughly
    doubled, and the speed of caller_locations with no arguments
    has improved slightly.  So this new algorithm is significant faster,
    much simpler, and fixes bugs in the previous algorithm.
    
    Fixes [Bug #18053]
    1a05dc03
    Make backtrace generation work outward from current frame
    Jeremy Evans authored
    This fixes multiple bugs found in the partial backtrace
    optimization added in 3b24b791.
    These bugs occurs when passing a start argument to caller where
    the start argument lands on a iseq frame without a pc.
    
    Before this commit, the following code results in the same
    line being printed twice, both for the #each method.
    
    ```ruby
    def a; [1].group_by { b } end
    def b; puts(caller(2, 1).first, caller(3, 1).first) end
    a
    ```
    
    After this commit and in Ruby 2.7, the lines are different,
    with the first line being for each and the second for group_by.
    
    Before this commit, the following code can either segfault or
    result in an infinite loop:
    
    ```ruby
    def foo
      caller_locations(2, 1).inspect # segfault
      caller_locations(2, 1)[0].path # infinite loop
    end
    
    1.times.map { 1.times.map { foo } }
    ```
    
    After this commit, this code works correctly.
    
    This commit completely refactors the backtrace handling.
    Instead of processing the backtrace from the outermost
    frame working in, process it from the innermost frame
    working out.  This is much faster for partial backtraces,
    since you only access the control frames you need to in
    order to construct the backtrace.
    
    To handle cfunc frames in the new design, they start
    out with no location information.  We increment a counter
    for each cfunc frame added.  When an iseq frame with pc
    is accessed, after adding the iseq backtrace location,
    we use the location for the iseq backtrace location for
    all of the directly preceding cfunc backtrace locations.
    
    If the last backtrace line is a cfunc frame, we continue
    scanning for iseq frames until the end control frame, and
    use the location information from the first one for the
    trailing cfunc frames in the backtrace.
    
    As only rb_ec_partial_backtrace_object uses the new
    backtrace implementation, remove all of the function
    pointers and inline the functions.  This makes the
    process easier to understand.
    
    Restore the Ruby 2.7 implementation of backtrace_each and
    use it for all the other functions that called
    backtrace_each other than rb_ec_partial_backtrace_object.
    All other cases requested the entire backtrace, so there
    is no advantage of using the new algorithm for those.
    Additionally, there are implicit assumptions in the other
    code that the backtrace processing works inward instead
    of outward.
    
    Remove the cfunc/iseq union in rb_backtrace_location_t,
    and remove the prev_loc member for cfunc.  Both cfunc and
    iseq types can now have iseq and pc entries, so the
    location information can be accessed the same way for each.
    This avoids the need for a extra backtrace location entry
    to store an iseq backtrace location if the final entry in
    the backtrace is a cfunc. This is also what fixes the
    segfault and infinite loop issues in the above bugs.
    
    Here's Ruby pseudocode for the new algorithm, where start
    and length are the arguments to caller or caller_locations:
    
    ```ruby
    end_cf = VM.end_control_frame.next
    cf = VM.start_control_frame
    size = VM.num_control_frames - 2
    bt = []
    cfunc_counter = 0
    
    if length.nil? || length > size
      length = size
    end
    
    while cf != end_cf && bt.size != length
      if cf.iseq?
        if cf.instruction_pointer?
          if start > 0
            start -= 1
          else
            bt << cf.iseq_backtrace_entry
            cf_counter.times do |i|
              bt[-1 - i].loc = cf.loc
            end
            cfunc_counter = 0
          end
        end
      elsif cf.cfunc?
        if start > 0
          start -= 1
        else
          bt << cf.cfunc_backtrace_entry
          cfunc_counter += 1
        end
      end
    
      cf = cf.prev
    end
    
    if cfunc_counter > 0
      while cf != end_cf
        if (cf.iseq? && cf.instruction_pointer?)
          cf_counter.times do |i|
            bt[-i].loc = cf.loc
          end
        end
        cf = cf.prev
      end
    end
    ```
    
    With the following benchmark, which uses a call depth of
    around 100 (common in many Ruby applications):
    
    ```ruby
    class T
      def test(depth, &block)
        if depth == 0
          yield self
        else
          test(depth - 1, &block)
        end
      end
      def array
        Array.new
      end
      def first
        caller_locations(1, 1)
      end
      def full
        caller_locations
      end
    end
    
    t = T.new
    t.test((ARGV.first || 100).to_i) do
      Benchmark.ips do |x|
        x.report ('caller_loc(1, 1)') {t.first}
        x.report ('caller_loc') {t.full}
        x.report ('Array.new') {t.array}
        x.compare!
      end
    end
    ```
    
    Results before commit:
    
    ```
    Calculating -------------------------------------
        caller_loc(1, 1)    281.159k (_ 0.7%) i/s -      1.426M in   5.073055s
              caller_loc     15.836k (_ 2.1%) i/s -     79.450k in   5.019426s
               Array.new      1.852M (_ 2.5%) i/s -      9.296M in   5.022511s
    
    Comparison:
               Array.new:  1852297.5 i/s
        caller_loc(1, 1):   281159.1 i/s - 6.59x  (_ 0.00) slower
              caller_loc:    15835.9 i/s - 116.97x  (_ 0.00) slower
    ```
    
    Results after commit:
    
    ```
    Calculating -------------------------------------
        caller_loc(1, 1)    562.286k (_ 0.8%) i/s -      2.858M in   5.083249s
              caller_loc     16.402k (_ 1.0%) i/s -     83.200k in   5.072963s
               Array.new      1.853M (_ 0.1%) i/s -      9.278M in   5.007523s
    
    Comparison:
               Array.new:  1852776.5 i/s
        caller_loc(1, 1):   562285.6 i/s - 3.30x  (_ 0.00) slower
              caller_loc:    16402.3 i/s - 112.96x  (_ 0.00) slower
    ```
    
    This shows that the speed of caller_locations(1, 1) has roughly
    doubled, and the speed of caller_locations with no arguments
    has improved slightly.  So this new algorithm is significant faster,
    much simpler, and fixes bugs in the previous algorithm.
    
    Fixes [Bug #18053]
Loading