The World Wide Web is a network of information resources. The Web relies on three mechanisms intended to make these resources readily available to the widest possible audience:
URLs typically consist of three pieces:
Consider the URL that designates the current HTML specification:
http://www.w3.org/TR/WD-html4/cover.html
This URL may be read as follows: Use the HTTP protocol to transfer the data residing on the machine www.w3.org in the file /TR/WD-html4/cover.html
URLs in general are case-sensitive (with the exception of machine names). There may be URLs, or parts of URLs, where case doesn't matter, but identifying these may not be easy. Users should always consider that URLs are case-sensitive. /u
The character set of URLs that appear in HTML is specified in [RFC1738].
The URL specification en vigeur at the writing of this document ([RFC1738]) offers a mechanism to refer to a resource, but not to a location within a resource. The Web community has adopted a convention called "fragment URLs" to refer to anchors within an HTML document. A fragment URL ends with "#" followed by an anchor identifier. For instance, here is a fragment URL pointing to an anchor named section_2:
http://somesite.com/html/top.html#section_2
A relative URL (defined in [RFC1808]) doesn't contain any protocol or machine information, and its path generally refers to an HTML document on the same machine as the current document. Relative URLs may contain relative path components (".." means the parent location) and may be fragment URLs.
Relative URLs may be resolved to full URLs, for example when the user attempts to follow a link from one document to another. [RFC1808] defines the normative algorithm for resolving relative URLs. The following description is for convenience only.
Briefly, a full URL is derived from a relative URL by attaching a "base" part to the relative URL. The base part is a URL that may come from any or all of the following sources:
[RFC1808] specifies the precedence among multiple sources of base information. For the purposes of this explanation, the last piece of base information takes precedence over the others and HTTP headers are considered to occur earlier than the document HEAD.
If no explicit base information accompanies the document, the base URL is that which designates the location of the current document.
Given a base URL and a relative URL (that does not begin with a slash), a full URL is derived as follows:
In HTML, URLs play a role in these situations:
In each case, authors may use a full URL, a fragment URL, or a relative URL. Please consult the section on anchors for more information about links and URLs.
In addition to HTTP URLs, authors might want to include MAILTO URLs (see [RFC1738]) in their documents. MAILTO URLs cause email to be sent to some email address. For instance, the author might create a link that, when activated, causes the user agent to open a mail program with the destination address in the "To:" field.
MAILTO URLs have the following syntax:
mailto:email-address
User agents may support MAILTO URL extensions that are not yet Internet standards (e.g., appending subject information to a URL with the syntax "?Subject=my%20subject" where any space characters are replaced by "%20").