Inefficient Regular Expression Complexity

The product uses a regular expression with an inefficient, possibly exponential worst-case computational complexity that consumes excessive CPU cycles.


Description

Some regular expression engines have a feature called "backtracking". If the token cannot match, the engine "backtracks" to a position that may result in a different token that can match.

Backtracking becomes a weakness if all of these conditions are met:

The number of possible backtracking attempts are exponential relative to the length of the input.

The input can fail to match the regular expression.

The input can be long enough.

Attackers can create crafted inputs that intentionally cause the regular expression to use excessive backtracking in a way that causes the CPU consumption to spike.

Demonstrations

The following examples help to illustrate the nature of this weakness and describe methods or techniques which can be used to mitigate the risk.

Note that the examples here are by no means exhaustive and any given weakness may have many subtle varieties, each of which may require different detection methods or runtime controls.

Example One

This example attempts to check if an input string is a "sentence" [REF-1164].

var test_string = "Bad characters: $@#";
var bad_pattern  = /^(\w+\s?)*$/i;
var result = test_string.search(bad_pattern);

The regular expression has a vulnerable backtracking clause inside (\w+\s?)*$ which can be triggered to cause a Denial of Service by processing particular phrases.

To fix the backtracking problem, backtracking is removed with the ?= portion of the expression which changes it to a lookahead and the \2 which prevents the backtracking. The modified example is:

var test_string = "Bad characters: $@#";
var good_pattern  = /^((?=(\w+))\2\s?)*$/i;
var result = test_string.search(good_pattern);

Note that [REF-1164] has a more thorough (and lengthy) explanation of everything going on within the RegEx.

Example Two

This example attempts to check if an input string is a "sentence" and is modified for Perl [REF-1164].

my $test_string = "Bad characters: \$\@\#";
my $bdrslt = $test_string;
$bdrslt =~ /^(\w+\s?)*$/i;

The regular expression has a vulnerable backtracking clause inside (\w+\s?)*$ which can be triggered to cause a Denial of Service by processing particular phrases.

To fix the backtracking problem, backtracking is removed with the ?= portion of the expression which changes it to a lookahead and the \2 which prevents the backtracking. The modified example is:

my $test_string = "Bad characters: \$\@\#";
my $gdrslt = $test_string;
$gdrslt =~ /^((?=(\w+))\2\s?)*$/i;

Note that [REF-1164] has a more thorough (and lengthy) explanation of everything going on within the RegEx.

See Also

Comprehensive Categorization: Resource Lifecycle Management

Weaknesses in this category are related to resource lifecycle management.

Complexity Issues

Weaknesses in this category are associated with things being overly complex.

Comprehensive CWE Dictionary

This view (slice) covers all the elements in CWE.

Weaknesses Introduced During Implementation

This view (slice) lists weaknesses that can be introduced during implementation.

Weakness Base Elements

This view (slice) displays only weakness base elements.


Common Weakness Enumeration content on this website is copyright of The MITRE Corporation unless otherwise specified. Use of the Common Weakness Enumeration and the associated references on this website are subject to the Terms of Use as specified by The MITRE Corporation.