Skip to content
  • NAITOH Jun's avatar
    e73f35dd
    [ruby/strscan] [CRuby] Optimize `strscan_do_scan()`: Remove · e73f35dd
    NAITOH Jun authored
    unnecessary use of `rb_enc_get()`
    (https://github.com/ruby/strscan/pull/108)
    
    - before: #106
    
    ## Why?
    
    In `rb_strseq_index()`, the result of `rb_enc_check()` is used.
    
    -
    https://github.com/ruby/ruby/blob/6c7209cd3788ceec01e504d99057f9d3b396be84/string.c#L4335-L4368
    > enc = rb_enc_check(str, sub);
    
    > return strseq_core(str_ptr, str_ptr_end, str_len, sub_ptr, sub_len,
    offset, enc);
    
    -
    https://github.com/ruby/ruby/blob/6c7209cd3788ceec01e504d99057f9d3b396be84/string.c#L4309-L4318
    ```C
    strseq_core(const char *str_ptr, const char *str_ptr_end, long str_len,
                const char *sub_ptr, long sub_len, long offset, rb_encoding *enc)
    {
        const char *search_start = str_ptr;
        long pos, search_len = str_len - offset;
    
        for (;;) {
            const char *t;
            pos = rb_memsearch(sub_ptr, sub_len, search_start, search_len, enc);
    ```
    
    ## Benchmark
    
    It shows String as a pattern is 1.24x faster than Regexp as a pattern.
    
    ```
    $ benchmark-driver benchmark/check_until.yaml
    Warming up --------------------------------------
                  regexp     9.225M i/s -      9.328M times in 1.011068s (108.40ns/i)
              regexp_var     9.327M i/s -      9.413M times in 1.009214s (107.21ns/i)
                  string     9.200M i/s -      9.355M times in 1.016840s (108.70ns/i)
              string_var    11.249M i/s -     11.255M times in 1.000578s (88.90ns/i)
    Calculating -------------------------------------
                  regexp     9.565M i/s -     27.676M times in 2.893476s (104.55ns/i)
              regexp_var    10.111M i/s -     27.982M times in 2.767496s (98.90ns/i)
                  string    10.060M i/s -     27.600M times in 2.743465s (99.40ns/i)
              string_var    12.519M i/s -     33.746M times in 2.695615s (79.88ns/i)
    
    Comparison:
              string_var:  12518707.2 i/s
              regexp_var:  10111089.6 i/s - 1.24x  slower
                  string:  10060144.4 i/s - 1.24x  slower
                  regexp:   9565124.4 i/s - 1.31x  slower
    ```
    
    https://github.com/ruby/strscan/commit/ff2d7afa19
    e73f35dd
    [ruby/strscan] [CRuby] Optimize `strscan_do_scan()`: Remove
    NAITOH Jun authored
    unnecessary use of `rb_enc_get()`
    (https://github.com/ruby/strscan/pull/108)
    
    - before: #106
    
    ## Why?
    
    In `rb_strseq_index()`, the result of `rb_enc_check()` is used.
    
    -
    https://github.com/ruby/ruby/blob/6c7209cd3788ceec01e504d99057f9d3b396be84/string.c#L4335-L4368
    > enc = rb_enc_check(str, sub);
    
    > return strseq_core(str_ptr, str_ptr_end, str_len, sub_ptr, sub_len,
    offset, enc);
    
    -
    https://github.com/ruby/ruby/blob/6c7209cd3788ceec01e504d99057f9d3b396be84/string.c#L4309-L4318
    ```C
    strseq_core(const char *str_ptr, const char *str_ptr_end, long str_len,
                const char *sub_ptr, long sub_len, long offset, rb_encoding *enc)
    {
        const char *search_start = str_ptr;
        long pos, search_len = str_len - offset;
    
        for (;;) {
            const char *t;
            pos = rb_memsearch(sub_ptr, sub_len, search_start, search_len, enc);
    ```
    
    ## Benchmark
    
    It shows String as a pattern is 1.24x faster than Regexp as a pattern.
    
    ```
    $ benchmark-driver benchmark/check_until.yaml
    Warming up --------------------------------------
                  regexp     9.225M i/s -      9.328M times in 1.011068s (108.40ns/i)
              regexp_var     9.327M i/s -      9.413M times in 1.009214s (107.21ns/i)
                  string     9.200M i/s -      9.355M times in 1.016840s (108.70ns/i)
              string_var    11.249M i/s -     11.255M times in 1.000578s (88.90ns/i)
    Calculating -------------------------------------
                  regexp     9.565M i/s -     27.676M times in 2.893476s (104.55ns/i)
              regexp_var    10.111M i/s -     27.982M times in 2.767496s (98.90ns/i)
                  string    10.060M i/s -     27.600M times in 2.743465s (99.40ns/i)
              string_var    12.519M i/s -     33.746M times in 2.695615s (79.88ns/i)
    
    Comparison:
              string_var:  12518707.2 i/s
              regexp_var:  10111089.6 i/s - 1.24x  slower
                  string:  10060144.4 i/s - 1.24x  slower
                  regexp:   9565124.4 i/s - 1.31x  slower
    ```
    
    https://github.com/ruby/strscan/commit/ff2d7afa19
Loading