Skip to content

Question: how to deal with Regexp::Timeout in _decode_uri_component? #233

@rafbgarcia

Description

@rafbgarcia

Hi folks, thanks for maintaining the URI gem!

I faced the following issue with a 65MB mime-body payload and over 13 million percent-encoded characters:

Regexp::TimeoutError POST /rails/action_mailbox/mailgun/inbound_emails/mime

vendor/bundle/ruby/3.3.0/gems/uri-0.13.3/lib/uri/common.rb:400:in `match?': regexp match timeout (Regexp::TimeoutError)
    from vendor/bundle/ruby/3.3.0/gems/uri-0.13.3/lib/uri/common.rb:400:in `_decode_uri_component'

Ref:

My workaround was to monkey patch the decode_www_form_component to avoid the Regexp code path if it times out:

module URIFormComponentLinearDecode
  ORIGINAL_DECODE_WWW_FORM_COMPONENT = URI.method(:decode_www_form_component)

  DECODE_TABLE = URI.const_get(:TBLDECWWWCOMP_)

  def decode_www_form_component(str, enc = Encoding::UTF_8)
    ORIGINAL_DECODE_WWW_FORM_COMPONENT.call(str, enc)
  rescue Regexp::TimeoutError
    raise unless str.is_a?(String)

    Rails.logger.info("[URIFormComponentLinearDecode] bytesize=#{str.bytesize}")

    linear_decode_www_form_component(str, enc)
  end

  private

  def linear_decode_www_form_component(str, enc)
    source = str.b
    output = String.new(capacity: source.bytesize).b
    index = 0

    while index < source.bytesize
      byte = source.getbyte(index)

      case byte
      when 37 # "%"
        raise ArgumentError, "invalid %-encoding (#{str})" unless index + 2 < source.bytesize

        encoded = source.byteslice(index, 3)
        decoded = DECODE_TABLE[encoded]

        raise ArgumentError, "invalid %-encoding (#{str})" unless decoded

        output << decoded
        index += 3
      when 43 # "+"
        output << DECODE_TABLE["+"]
        index += 1
      else
        output << byte
        index += 1
      end
    end

    output.force_encoding(enc)
  end
end

URI.singleton_class.prepend(URIFormComponentLinearDecode)

I was wondering:

  • Did you guys face this problem before?
  • Do you have a better approach to it?
  • Do you think a solution to this issue belongs in the URI codebase?
  • Do you think it would make sense to use a native function in this case?

I'm happy to contribute with a PR if you would like me to. Please let me know if you have any thoughts.

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions