Skip to content
  • Ricardo Díaz's avatar
    8d0ca1a1
    Use String#grapheme_clusters and String#each_grapheme_cluster · 8d0ca1a1
    Ricardo Díaz authored
    Both methods were introduced in Ruby 2.5 and are faster than scanning
    unicode graphemes with String#scan
    
    ```
    Warming up --------------------------------------
               scan(/X/)    43.127k i/100ms
       grapheme_clusters   103.348k i/100ms
    Calculating -------------------------------------
               scan(/X/)    427.853k (± 2.4%) i/s -      2.156M in   5.042967s
       grapheme_clusters      1.045M (± 0.8%) i/s -      5.271M in   5.042360s
    
    Comparison:
       grapheme_clusters:  1045353.5 i/s
               scan(/X/):   427852.8 i/s - 2.44x  (± 0.00) slower
    ```
    
    Benchmark script:
    
    ```ruby
    require "minitest/autorun"
    require "benchmark/ips"
    
    class BugTest < Minitest::Test
      def test_grapheme_clusters
        string = [0x0924, 0x094D, 0x0930].pack("U*") # "त्र"
        # string = [0x000D, 0x000A].pack("U*") # cr lf
        # string = "こにちわ"
    
        assert string.scan(/\X/) == string.grapheme_clusters
    
        Benchmark.ips do |x|
          x.report("scan(/\X/)") do
            string.scan(/\X/)
          end
    
          x.report("grapheme_clusters") do
            string.grapheme_clusters
          end
    
          x.compare!
        end
      end
    end
    ```
    
    String#grapheme_clusters had a bug with CRLF which was fixed in Ruby
    2.6: https://bugs.ruby-lang.org/issues/15337
    
    Now that Rails requires Ruby 2.7+, it shouldn't be an issue.
    8d0ca1a1
    Use String#grapheme_clusters and String#each_grapheme_cluster
    Ricardo Díaz authored
    Both methods were introduced in Ruby 2.5 and are faster than scanning
    unicode graphemes with String#scan
    
    ```
    Warming up --------------------------------------
               scan(/X/)    43.127k i/100ms
       grapheme_clusters   103.348k i/100ms
    Calculating -------------------------------------
               scan(/X/)    427.853k (± 2.4%) i/s -      2.156M in   5.042967s
       grapheme_clusters      1.045M (± 0.8%) i/s -      5.271M in   5.042360s
    
    Comparison:
       grapheme_clusters:  1045353.5 i/s
               scan(/X/):   427852.8 i/s - 2.44x  (± 0.00) slower
    ```
    
    Benchmark script:
    
    ```ruby
    require "minitest/autorun"
    require "benchmark/ips"
    
    class BugTest < Minitest::Test
      def test_grapheme_clusters
        string = [0x0924, 0x094D, 0x0930].pack("U*") # "त्र"
        # string = [0x000D, 0x000A].pack("U*") # cr lf
        # string = "こにちわ"
    
        assert string.scan(/\X/) == string.grapheme_clusters
    
        Benchmark.ips do |x|
          x.report("scan(/\X/)") do
            string.scan(/\X/)
          end
    
          x.report("grapheme_clusters") do
            string.grapheme_clusters
          end
    
          x.compare!
        end
      end
    end
    ```
    
    String#grapheme_clusters had a bug with CRLF which was fixed in Ruby
    2.6: https://bugs.ruby-lang.org/issues/15337
    
    Now that Rails requires Ruby 2.7+, it shouldn't be an issue.
Loading