yarp/unescape.h · e50fcca9a79d8e25b33ad3611df6bf4627faafbf · ruby / ruby-deleted-200

Sep 01, 2023

[ruby/yarp] fix: double-counting of errors in parsing escaped strings · 512f8217

Mike Dalessio authored Sep 01, 2023

Essentially, this change updates `yp_unescape_calculate_difference` to
not create syntax errors, and we rely entirely on
`yp_unescape_manipulate_string` to report syntax errors.

To do that, this PR adds another (!) parameter to `unescape`:
`yp_list_t *error_list`. When present, `unescape` reports syntax
errors (and otherwise does not).

However, an edge case that needed to be addressed is reporting syntax
errors in this case:

    ?\u{1234 2345}

In a string context, it's possible to have multiple codepoints by
doing something like `"\u{1234 2345}"`; however, in the character
literal context, this is a syntax error -- only a single codepoint is
allowed.

Unfortunately, when `yp_unescape_manipulate_string` is called, there's
nothing to indicate that we are in a "character literal" context and
that only a single codepoint is valid.

To make this work, this PR:

- introduces a new static utility function in yarp.c,
  `yp_char_literal_node_create_and_unescape`, which is called when
  we're parsing `YP_TOKEN_CHARACTER_LITERAL`
- introduces a new (unexported) function,
  `yp_unescape_manipulate_char_literal` which does the same thing as
  `yp_unescape_manipulate_string` but tells `unescape` that only a
  single codepoint is expected

https://github.com/ruby/yarp/commit/f6a65840b5

512f8217

[ruby/yarp] fix: double-counting of errors in parsing escaped strings

Mike Dalessio authored Sep 01, 2023

Essentially, this change updates `yp_unescape_calculate_difference` to
not create syntax errors, and we rely entirely on
`yp_unescape_manipulate_string` to report syntax errors.

To do that, this PR adds another (!) parameter to `unescape`:
`yp_list_t *error_list`. When present, `unescape` reports syntax
errors (and otherwise does not).

However, an edge case that needed to be addressed is reporting syntax
errors in this case:

    ?\u{1234 2345}

In a string context, it's possible to have multiple codepoints by
doing something like `"\u{1234 2345}"`; however, in the character
literal context, this is a syntax error -- only a single codepoint is
allowed.

Unfortunately, when `yp_unescape_manipulate_string` is called, there's
nothing to indicate that we are in a "character literal" context and
that only a single codepoint is valid.

To make this work, this PR:

- introduces a new static utility function in yarp.c,
  `yp_char_literal_node_create_and_unescape`, which is called when
  we're parsing `YP_TOKEN_CHARACTER_LITERAL`
- introduces a new (unexported) function,
  `yp_unescape_manipulate_char_literal` which does the same thing as
  `yp_unescape_manipulate_string` but tells `unescape` that only a
  single codepoint is expected

https://github.com/ruby/yarp/commit/f6a65840b5