Skip to content
  • Mike Dalessio's avatar
    512f8217
    [ruby/yarp] fix: double-counting of errors in parsing escaped strings · 512f8217
    Mike Dalessio authored
    Essentially, this change updates `yp_unescape_calculate_difference` to
    not create syntax errors, and we rely entirely on
    `yp_unescape_manipulate_string` to report syntax errors.
    
    To do that, this PR adds another (!) parameter to `unescape`:
    `yp_list_t *error_list`. When present, `unescape` reports syntax
    errors (and otherwise does not).
    
    However, an edge case that needed to be addressed is reporting syntax
    errors in this case:
    
        ?\u{1234 2345}
    
    In a string context, it's possible to have multiple codepoints by
    doing something like `"\u{1234 2345}"`; however, in the character
    literal context, this is a syntax error -- only a single codepoint is
    allowed.
    
    Unfortunately, when `yp_unescape_manipulate_string` is called, there's
    nothing to indicate that we are in a "character literal" context and
    that only a single codepoint is valid.
    
    To make this work, this PR:
    
    - introduces a new static utility function in yarp.c,
      `yp_char_literal_node_create_and_unescape`, which is called when
      we're parsing `YP_TOKEN_CHARACTER_LITERAL`
    - introduces a new (unexported) function,
      `yp_unescape_manipulate_char_literal` which does the same thing as
      `yp_unescape_manipulate_string` but tells `unescape` that only a
      single codepoint is expected
    
    https://github.com/ruby/yarp/commit/f6a65840b5
    512f8217
    [ruby/yarp] fix: double-counting of errors in parsing escaped strings
    Mike Dalessio authored
    Essentially, this change updates `yp_unescape_calculate_difference` to
    not create syntax errors, and we rely entirely on
    `yp_unescape_manipulate_string` to report syntax errors.
    
    To do that, this PR adds another (!) parameter to `unescape`:
    `yp_list_t *error_list`. When present, `unescape` reports syntax
    errors (and otherwise does not).
    
    However, an edge case that needed to be addressed is reporting syntax
    errors in this case:
    
        ?\u{1234 2345}
    
    In a string context, it's possible to have multiple codepoints by
    doing something like `"\u{1234 2345}"`; however, in the character
    literal context, this is a syntax error -- only a single codepoint is
    allowed.
    
    Unfortunately, when `yp_unescape_manipulate_string` is called, there's
    nothing to indicate that we are in a "character literal" context and
    that only a single codepoint is valid.
    
    To make this work, this PR:
    
    - introduces a new static utility function in yarp.c,
      `yp_char_literal_node_create_and_unescape`, which is called when
      we're parsing `YP_TOKEN_CHARACTER_LITERAL`
    - introduces a new (unexported) function,
      `yp_unescape_manipulate_char_literal` which does the same thing as
      `yp_unescape_manipulate_string` but tells `unescape` that only a
      single codepoint is expected
    
    https://github.com/ruby/yarp/commit/f6a65840b5
Loading