When HTML is used to create interactive sites, care needs to be taken to avoid introducing vulnerabilities through which attackers can compromise the integrity of the site itself or of the site’s users.
A comprehensive study of this matter is beyond the scope of this document, and authors are strongly encouraged to study the matter in more detail. However, this section attempts to provide a quick introduction to some common pitfalls in HTML application development.
The security model of the Web is based on the concept of “origins”, and correspondingly many of the potential attacks on the Web involve cross-origin actions.
When accepting untrusted input, e.g. user-generated content such as text comments, values in URL parameters, messages from third-party sites, etc, it is imperative that the data be validated before use, and properly escaped when displayed. Failing to do this can allow a hostile user to perform a variety of attacks, ranging from the potentially benign, such as providing bogus user information like a negative age, to the serious, such as running scripts every time a user looks at a page that includes the information, potentially propagating the attack in the process, to the catastrophic, such as deleting all data in the server.
When writing filters to validate user input, it is imperative that filters always be whitelist-based, allowing known-safe constructs and disallowing all other input. Blacklist-based filters that disallow known-bad inputs and allow everything else are not secure, as not everything that is bad is yet known (for example, because it might be invented in the future).
If the attacker then convinced a victim user to visit this page, a script of the attacker’s choosing would run on the page. Such a script could do any number of hostile actions, limited only by what the site offers: if the site is an e-commerce shop, for instance, such a script could cause the user to unknowingly make arbitrarily many unwanted purchases.
This is called a cross-site scripting attack.
There are many constructs that can be used to try to trick a site into executing code. Here are some that authors are encouraged to consider when writing whitelist filters:
- When allowing harmless-seeming elements like img, it is important to whitelist any provided attributes as well. If one allowed all attributes then an attacker could, for instance, use the onload attribute to run arbitrary script.
- Allowing a base element to be inserted means any script elements in the page with relative links can be hijacked, and similarly that any form submissions can get redirected to a hostile site.