REGEX Multiline Anchors

in #ruby7 years ago


Ever try out a regular expression on rubular, stick it in your model:

MEH_REGEX = /^(?=.*[a-zA-Z])(?=.*[0-9]).{8,}$/

...start up your test suite and see this error?

The provided regular expression is using multiline anchors (^ or $), which may present a security risk. Did you mean to use \A and \z, or forgot to add the :multiline => true option?

Quickie

The correct approach to solving this problem is to simply adjust your regex to use \A and \z</ rather than ^ and $.

VALID_REGEX = /\A(?=.*[a-zA-Z])(?=.*[0-9]).{8,}\z/

A deeper look

Turns out, your application is open to some sketchy security holes, so it is important to grasp what is happening. It turns out that Ruby (at least as of version 2.5) uses a slightly different approach than other languages to match the end and the beginning of a string.

^ and $ are Start of Line and End of Line anchors. In Ruby, these characters only match a newline \n, and not the start and end of string. This means users could inject malicious javascript into your application and it would still pass the regex. This may seem a little abstract at first, so let's take a look at a quick example to make it more concrete. Open up irb:

> MEH_REGEX = /^(?=.*[a-zA-Z])(?=.*[0-9]).{8,}$/
> "nose".match MEH_REGEX
> "goodnose2".match MEH_REGEX

One way to use a regex is to apply it to a string and see if there’s a match. If there is a match, you'll get a MatchData object back. As you can see, nose fails and goodnose2 passes as we'd expect. Now let's try this:

> "goodnose2\nalert('exploit')".match MEH_REGEX

Uh oh, that worked too. You see, to Ruby, our expectations were matched. It doesn't care if there is something else on a newline that happens to be malicious code. The regex matched up to \n, so it ignored everything after that point. Everything before that point passed the test. You can see how you can quickly one could get into trouble.

As mentioned above, the correct solution is to use \A and \z.

> VALID_REGEX = /\A(?=.*[a-zA-Z])(?=.*[0-9]).{6,}\z/
> "goodnose2\nalert('exploit')".match VALID_REGEX

Although this is a pretty serious security issue, all you really need to keep in mind that ^ and $ match the line beginning and line end in Ruby, and not the beginning and end of a string. With your Ruby regex patched up, you'll be good to go.

(•ω•)

Sort:  

these posts are gold, please keep them coming!