Improper Input Validation
The product receives input or data, but it does not validate or incorrectly validates that the input has the properties that are required to process the data safely and correctly.
Input validation is a frequently-used technique for checking potentially dangerous inputs in order to ensure that the inputs are safe for processing within the code, or when communicating with other components. When software does not validate input properly, an attacker is able to craft the input in a form that is not expected by the rest of the application. This will lead to parts of the system receiving unintended input, which may result in altered control flow, arbitrary control of a resource, or arbitrary code execution.
Input validation is not the only technique for processing input, however. Other techniques attempt to transform potentially-dangerous input into something safe, such as filtering (CWE-790) - which attempts to remove dangerous inputs - or encoding/escaping (CWE-116), which attempts to ensure that the input is not misinterpreted when it is included in output to another component. Other techniques exist as well (see CWE-138 for more examples.)
Input validation can be applied to:
raw data - strings, numbers, parameters, file contents, etc.
metadata - information about the raw data, such as headers or size
Data can be simple or structured. Structured data can be composed of many nested layers, composed of combinations of metadata and raw data, with other simple or structured data.
Many properties of raw data or metadata may need to be validated upon entry into the code, such as:
specified quantities such as size, length, frequency, price, rate, number of operations, time, etc.
implied or derived quantities, such as the actual size of a file instead of a specified size
indexes, offsets, or positions into more complex data structures
symbolic keys or other elements into hash tables, associative arrays, etc.
well-formedness, i.e. syntactic correctness - compliance with expected syntax
lexical token correctness - compliance with rules for what is treated as a token
specified or derived type - the actual type of the input (or what the input appears to be)
consistency - between individual data elements, between raw data and metadata, between references, etc.
conformance to domain-specific rules, e.g. business logic
equivalence - ensuring that equivalent inputs are treated the same
authenticity, ownership, or other attestations about the input, e.g. a cryptographic signature to prove the source of the data
Implied or derived properties of data must often be calculated or inferred by the code itself. Errors in deriving properties may be considered a contributing factor to improper input validation.
Note that "input validation" has very different meanings to different people, or within different classification schemes. Caution must be used when referencing this CWE entry or mapping to it. For example, some weaknesses might involve inadvertently giving control to an attacker over an input when they should not be able to provide an input at all, but sometimes this is referred to as input validation.
Finally, it is important to emphasize that the distinctions between input validation and output escaping are often blurred, and developers must be careful to understand the difference, including how input validation is not always sufficient to prevent vulnerabilities, especially when less stringent data types must be supported, such as free-form text. Consider a SQL injection scenario in which a person's last name is inserted into a query. The name "O'Reilly" would likely pass the validation step since it is a common last name in the English language. However, this valid name cannot be directly inserted into the database because it contains the "'" apostrophe character, which would need to be escaped or otherwise transformed. In this case, removing the apostrophe might reduce the risk of SQL injection, but it would produce incorrect behavior because the wrong name would be recorded.
The following examples help to illustrate the nature of this weakness and describe methods or techniques which can be used to mitigate the risk.
Note that the examples here are by no means exhaustive and any given weakness may have many subtle varieties, each of which may require different detection methods or runtime controls.
This example demonstrates a shopping interaction in which the user is free to specify the quantity of items to be purchased and a total is calculated.
The user has no control over the price variable, however the code does not prevent a negative value from being specified for quantity. If an attacker were to provide a negative value, then the user would have their account credited instead of debited.
This example asks the user for a height and width of an m X n game board with a maximum dimension of 100 squares.
While this code checks to make sure the user cannot specify large, positive integers and consume too much memory, it does not check for negative values supplied by the user. As a result, an attacker can perform a resource consumption (CWE-400) attack against this program by specifying two, large negative values that will not overflow, resulting in a very large memory allocation (CWE-789) and possibly a system crash. Alternatively, an attacker can provide very large negative values which will cause an integer overflow (CWE-190) and unexpected behavior will follow depending on how the values are treated in the remainder of the program.
The following example shows a PHP application in which the programmer attempts to display a user's birthday and homepage.
The programmer intended for $birthday to be in a date format and $homepage to be a valid URL. However, since the values are derived from an HTTP request, if an attacker can trick a victim into clicking a crafted URL with <script> tags providing the values for birthday and / or homepage, then the script will run on the client's browser when the web server echoes the content. Notice that even if the programmer were to defend the $birthday variable by restricting input to integers and dashes, it would still be possible for an attacker to provide a string of the form:
If this data were used in a SQL statement, it would treat the remainder of the statement as a comment. The comment could disable other security-related logic in the statement. In this case, encoding combined with input validation would be a more useful protection mechanism.
Furthermore, an XSS (CWE-79) attack or SQL injection (CWE-89) are just a few of the potential consequences when input validation is not used. Depending on the context of the code, CRLF Injection (CWE-93), Argument Injection (CWE-88), or Command Injection (CWE-77) may also be possible.
The following example takes a user-supplied value to allocate an array of objects and then operates on the array.
This example attempts to build a list from a user-specified value, and even checks to ensure a non-negative value is supplied. If, however, a 0 value is provided, the code will build an array of size 0 and then try to store a new Widget in the first location, causing an exception to be thrown.
This Android application has registered to handle a URL when sent an intent:
The application assumes the URL will always be included in the intent. When the URL is not present, the call to getStringExtra() will return null, thus causing a null pointer exception when length() is called.
Weaknesses in this category are related to improper input validation.
Weaknesses in this category are related to the "Emerging Energy Technologies" category from the SEI ETF "Categories of Security Vulnerabilities in ICS" as published in...
Weaknesses in this category are related to the A03 category "Injection" in the OWASP Top Ten 2021.
This view (slice) covers all the elements in CWE.
CWE entries in this view are listed in the 2023 CWE Top 25 Most Dangerous Software Weaknesses.
CWE entries in this view are listed in the 2022 CWE Top 25 Most Dangerous Software Weaknesses.