Yongji Wang's Blog

Tuesday, May 27, 2014

HTTP The Definitive Guide (Digest Authentication)

Digest Authentication

The Improvements of Digest Authentication
In particular, digest authentication:

Never sends secret passwords across the network in the clear
Prevents unscrupulous individuals from capturing and replaying authentication handshakes
Optionally can guard against tampering with message contents
Guards against several other common forms of attacks

Using Digests to Keep Passwords Secret
The motto of digest authentication is “never send the password across the network.

One-Way Digests
The 128 bits of MD5 output often are written as 32 hexadecimal characters, each character representing 4 bits.

Using Nonces to Prevent Replays
To prevent such replay attacks, the server can pass along to the client a special token called a nonce,* which changes frequently (perhaps every millisecond, or for every authentication). The client appends this nonce token to the password before computing the digest.

Digest authentication requires the use of nonces, because a trivial replay weakness would make un-nonced digest authentication effectively as weak as basic authentication. Nonces are passed from server to client in the WWW-Authenticate challenge.

The Digest Authentication Handshake
The HTTP digest authentication protocol is an enhanced version of authentication that uses headers similar to those used in basic authentication. Some new options are added to the traditional headers, and one new optional header, Authorization-Info, is added.

Digest Calculations
Digest Algorithm Input Data
Digests are computed from three components:

A pair of functions consisting of a one-way hash function H(d) and digest KD(s,d), where s stands for secret and d stands for data
A chunk of data containing security information, including the secret password, called A1
A chunk of data containing nonsecret attributes of the request message, called A2

The Algorithms H(d) and KD(s,d)

The two algorithms suggested in RFC 2617 are MD5 and MD5-sess (where “sess” stands for session), and the algorithm defaults to MD5 if no other algorithm is specified.
If either MD5 or MD5-sess is used, the H function computes the MD5 of the data, and the KD digest function computes the MD5 of the colon-joined secret and nonsecret data. In other words:
H(<data>) = MD5(<data>)
KD(<secret>,<data>) = H(concatenate(<secret>:<data>))

The Security-Related Data (A1)

The Message-Related Data (A2)
RFC 2617 defines two schemes for A2, depending on the quality of protection (qop) chosen

The first scheme involves only the HTTP request method and URL. This is used when qop=“auth”, which is the default case.
The second scheme adds in the message entity body to provide a degree of message integrity checking. This is used when qop=“auth-int”.

Overall Digest Algorithm

The first way is intended to be compatible with the older specification RFC 2069, used when the qop option is missing. It computes the digest using the hash of the secret information and the nonced message data.
The second way is the modern, preferred approach—it includes support for nonce counting and symmetric authentication. This approach is used whenever qop is “auth” or “auth-int”. It adds nonce count, qop, and cnonce data to the digest.

Digest Authentication Session

Preemptive Authorization

Here are three potential ways a client can obtain the correct nonce without waiting for a new WWW-Authenticate challenge:

Server pre-sends the next nonce in the Authentication-Info success header.
Server allows the same nonce to be reused for a small window of time.
Both the client and server use a synchronized, predictable nonce-generation algorithm.

Next nonce pregeneration

The next nonce value can be provided in advance to the client by the Authentication-Info success header. This header is sent along with the 200 OK response from a previous successful authentication.

Authentication-Info: nextnonce="<nonce-value>"

Limited nonce reuse
Instead of pregenerating a sequence of nonces, another approach is to allow limited reuse of nonces. For example, a server may allow a nonce to be reused 5 times, or for 10 seconds.

When the nonce finally expires, the server is expected to send the client a 401 Unauthorized challenge, with the WWW-Authenticate: stale=true directive set:

WWW-Authenticate: Digest
realm="<realm-value>"
nonce="<nonce-value>"
stale=true

Synchronized nonce generation
It is possible to employ time-synchronized nonce-generation algorithms, where both the client and the server can generate a sequence of identical nonces, based on a shared secret key, that a third party cannot easily predict (such as secure ID cards).

Nonce Selection
RFC 2617 suggests this hypothetical nonce formulation:
BASE64(time-stamp H(time-stamp ":" ETag ":" private-key))

Symmetric Authentication
RFC 2617 extends digest authentication to allow the client to authenticate the server.
The response digest is calculated like the request digest, except that the message body information (A2) is different, because there is no method in a response, and the message entity data is different. The methods of computation of A2for request and response digests are compared in Tables 13-6 and 13-7.

Quality of Protection Enhancements
The qop field may be present in all three digest headers: WWW-Authenticate, Authorization, and Authentication-Info.

Message Integrity Protection
If integrity protection is applied (qop=“auth-int”), H (the entity body) is the hash of the entity body, not the message body.

Digest Authentication Headers
Both the basic and digest authentication protocols contain an authorization challenge, carried by the WWW-Authenticate header, and an authorization response, carried by the Authorization header. Digest authentication adds an optional Authorization-Info header, which is sent after successful authentication, to complete a threephase handshake and pass along the next nonce to use.

Practical Considerations

Multiple Challenges - A server can issue multiple challenges for a resource. For example, if a server does not know the capabilities of a client, it may provide both basic and digest authentication challenges. When faced with multiple challenges, the client must choose to answer with the strongest authentication mechanism that it supports.
Error Handling - In digest authentication, if a directive or its value is improper, or if a required directive is missing, the proper response is 400 Bad Request.
Protection Spaces - The realm value, in combination with the canonical root URL of the server being accessed, defines the protection space. The specific calculation of protection space depends on the authentication mechanism:

In basic authentication, clients assume that all paths at or below the request URI are within the same protection space as the current challenge. A client can preemptively authorize for resources in this space without waiting for another challenge from the server.
In digest authentication, the challenge’s WWW-Authenticate: domain field more precisely defines the protection space. The domain field is a quoted, space-separated list of URIs. All the URIs in the domain list, and all URIs logically beneath these prefixes, are assumed to be in the same protection space. If the domain field is missing or empty, all URIs on the challenging server are in the protection space.

Rewriting URIs - Proxies may rewrite URIs in ways that change the URI syntax but not the actual resource being described. For example:

Hostnames may be normalized or replaced with IP addresses.
Embedded characters may be replaced with “%” escape forms.
Additional attributes of a type that doesn’t affect the resource fetched from the particular origin server may be appended or inserted into the URI.
Because URIs can be changed by proxies, and because digest authentication sanity checks the integrity of the URI value, the digest authentication will break if any of these changes are made.

Caches

Security Considerations

Header Tampering
Replay Attacks
Multiple Authentication Mechanisms
Dictionary Attacks
Hostile Proxies and Man-in-the-Middle Attacks
Chosen Plaintext Attacks

Precomputed dictionary attacks
Batched brute-force attacks

Storing Passwords

Friday, May 23, 2014

HTTP The Definitive Guide (Basic Authentication)

Basic Authentication

Authentication
HTTP’s Challenge/Response Authentication Framework

Authentication Protocols and Headers
HTTP defines two official authentication protocols: basic authentication and digest authentication.

Security Realms
Web servers group protected documents into security realms. Each security realm can have different sets of authorized users.

A realm should have a descriptive string name, like “Corporate Financials,” to help the user understand which username and password to use. It may also be useful to list the server hostname in the realm name--for example, “executive-committee@bigcompany.com”.

Basic Authentication
The HTTP basic authentication WWW-Authenticate and Authorization headers are summarized in Table 12-2.

Base-64 Username/Password Encoding
In a nutshell, base-64 encoding takes a sequence of 8-bit bytes and breaks the sequence of bits into 6-bit chunks. Each 6-bit piece is used to pick a character in a special 64-character alphabet, consisting mostly of letters and numbers.

Proxy Authentication
Authentication also can be done by intermediary proxy servers.

The Security Flaws of Basic Authentication
Consider the following security flaws:

Basic authentication sends the username and password across the network in a form that can trivially be decoded.
Even if the secret password were encoded in a scheme that was more complicated to decode, a third party could still capture the garbled username and password and replay the garbled information to origin servers over and over again to gain access. No effort is made to prevent these replay attacks.
Even if basic authentication is used for noncritical applications, such as corporate intranet access control or personalized content, social behavior makes this dangerous.
Basic authentication offers no protection against proxies or intermediaries that act as middlemen, leaving authentication headers intact but modifying the rest of the message to dramatically change the nature of the transaction.
Basic authentication is vulnerable to spoofing by counterfeit servers.

HTTP The Definitive Guide (Client Identification and Cookies)

Client Identification and Cookies

The Personal Touch

Personal greetings
Targeted recommendations
Administrative information on file
Session tracking

HTTP Headers

Client IP Address
User Login

Fat URLs
URLs modified to include user state information are called fat URLs.

Fat URLs can be used to identify users as they browse a site. But this technology does have several serious problems. Some of these problems include:

Ugly URLs
The fat URLs displayed in the browser are confusing for new users.
Can’t share URLs
The fat URLs contain state information about a particular user and session. If you mail that URL to someone else, you may inadvertently be sharing your accumulated personal information.
Breaks caching
Generating user-specific versions of each URL means that there are no longer
commonly accessed URLs to cache.
Extra server load
The server needs to rewrite HTML pages to fatten the URLs.
Escape hatches
It is too easy for a user to accidentally “escape” from the fat URL session by jumping to another site or by requesting a particular URL. Fat URLs work only if the user strictly follows the premodified links. If the user escapes, he may lose his progress (perhaps a filled shopping cart) and will have to start again.
Not persistent across sessions
All information is lost when the user logs out, unless he bookmarks the particular fat URL.

Cookies
Types of Cookies
You can classify cookies broadly into two types: session cookies and persistent cookies.
The only difference between session cookies and persistent cookies is when they expire. As we will see later, a cookie is a session cookie if its Discard parameter is set, or if there is no Expires or Max-Age parameter indicating an extended expiration time.

How Cookies Work

Cookie Jar: Client-Side State
Because the browser is responsible for storing the cookie information, this system is called client-side state. The official name for the cookie specification is the HTTP State Management Mechanism.

Netscape Navigator cookies

domain
The domain of the cookie
allh
  Whether all hosts in a domain get the cookie, or only the specific host named
path
  The path prefix in the domain associated with the cookie
secure
  Whether we should send this cookie only if we have an SSL connection
expiration
  The cookie expiration date in seconds since Jan 1, 1970 00:00:00 GMT
name
  The name of the cookie variable
value
  The value of the cookie variable

Microsoft Internet Explorer cookies

Different Cookies for Different Sites
Cookie Domain attribute
Set-cookie: user="mary17"; domain="airtravelbargains.com"
Cookie Path attribute
Set-cookie: pref=compact; domain="airtravelbargains.com"; path=/autos/

Cookie Ingredients
There are two different versions of cookie specifications in use: Version 0 cookies (sometimes called “Netscape cookies”), and Version 1 (“RFC 2965”) cookies.

Version 0 (Netscape) Cookies
Set-Cookie: name=value [; expires=date] [; path=path] [; domain=domain] [; secure]
Cookie: name1=value1 [; name2=value2] ...

When a client sends requests, it includes all the unexpired cookies that match the domain, path, and secure filters to the site. All the cookies are combined into a Cookie header:

Cookie: session-id=002-1145265-8016838; session-id-time=1007884800

Version 1 (RFC 2965) Cookies

Version 1 Set-Cookie2 header

Version 1 Cookie header
Each matching cookie must include any Domain, Port, or Path attributes from the corresponding Set-Cookie2 headers.
For example, assume the client has received these five Set-Cookie2responses in the past from the www.joes-hardware.com web site:
Set-Cookie2: ID="29046"; Domain=".joes-hardware.com"
  Set-Cookie2: color=blue
  Set-Cookie2: support-pref="L2"; Domain="customer-care.joes-hardware.com"
  Set-Cookie2: Coupon="hammer027"; Version="1"; Path="/tools"
  Set-Cookie2: Coupon="handvac103"; Version="1"; Path="/tools/cordless"
If the client makes another request for path /tools/cordless/specials.html, it will pass along a long Cookie2 header like this:
  Cookie: $Version="1";
ID="29046"; $Domain=".joes-hardware.com";
color="blue";
Coupon="hammer027"; $Path="/tools";
Coupon="handvac103"; $Path="/tools/cordless"

Version 1 Cookie2 header and version negotiation
The Cookie2 request header is used to negotiate interoperability between clients and servers that understand different versions of the cookie specification. The Cookie2 header advises the server that the user agent understands new-style cookies and provides the version of the cookie standard supported (it would have made more sense to call it Cookie-Version):
Cookie2: $Version="1"
If the server understands new-style cookies, it recognizes the Cookie2header and should send Set-Cookie2(rather than Set-Cookie) response headers. If a client gets both a Set-Cookie and a Set-Cookie2header for the same cookie, it ignores the old Set-Cookie header.
If a client supports both Version 0 and Version 1 cookies but gets a Version 0 Set-Cookie header from the server, it should send cookies with the Version 0 Cookie header. However, the client also should send Cookie2: $Version=“1” to give the server indication that it can upgrade.

Cookies and Session Tracking

Figure 11-5a—Browser requests Amazon.com root page for the first time.
Figure 11-5b—Server redirects the client to a URL for the e-commerce software.
Figure 11-5c—Client makes a request to the redirected URL.
Figure 11-5d—Server slaps two session cookies on the response and redirects the user to another URL, so the client will request again with these cookies attached. This new URL is a fat URL, meaning that some state is embedded into the URL. If the client has cookies disabled, some basic identification can still be done as long as the user follows the Amazon.com-generated fat URL links and doesn’t leave the site.
Figure 11-5e—Client requests the new URL, but now passes the two attached cookies.
Figure 11-5f—Server redirects to the home.html page and attaches two more cookies.
Figure 11-5g—Client fetches the home.html page and passes all four cookies.
Figure 11-5h—Server serves back the content.

Cookies and Caching
The rules for cookies and caching are not well established. Here are some guiding principles for dealing with caches:

Mark documents uncacheable if they are
Be cautious about caching Set-Cookie headers
Be cautious about requests with Cookie headers

Cookies, Security, and Privacy

Wednesday, May 14, 2014

HTTP The Definitive Guide (HTTP-NG)

HTTP-NG

WebMUX

Here are some of the significant goals of the WebMUX protocol:
Simple design.

High performance.
Multiplexing—Multiple data streams (of arbitrary higher-level protocols) can be interleaved dynamically and efficiently over a single connection, without stalling data waiting for slow producers.
Credit-based flow control—Data is produced and consumed at different rates, and senders and receivers have different amounts of memory and CPU resources available. WebMUX uses a “credit-based” flow-control scheme, where receivers preannounce interest in receiving data to prevent resource-scarcity deadlocks.
Alignment preserving—Data alignment is preserved in the multiplexed stream so that binary data can be sent and processed efficiently.
Rich functionality—The interface is rich enough to support a sockets API.

Binary Wire Protocol
The HTTP-NG team proposed the Binary Wire Protocol to enhance how the nextgeneration HTTP protocol supports remote operations.

HTTP The Definitive Guide (Web Robots)

Web Robots

Crawlers and Crawling

Where to Start: The “Root Set”

Extracting Links and Normalizing Relative Links

Cycle Avoidance

Loops and Dups

Trails of Breadcrumbs

Trees and hash tables
Lossy presence bit maps
Checkpoints
Partitioning

Aliases and Robot Cycles

Canonicalizing URLs

Filesystem Link Cycles

Dynamic Virtual Web Spaces

Avoiding Loops and Dups

Robotic HTTP

Identifying Request Headers

User-Agent

Tells the server the name of the robot making the request.

From

Provides the email address of the robot’s user/administrator.*

Tells the server what media types are okay to send.† This can help ensure that the robot receives only content in which it’s interested (text, images, etc.).

Referer

Provides the URL of the document that contains the current request-URL.

Virtual Hosting

Conditional Requests

Response Handling

Status codes

Entities

User-Agent Targeting

Misbehaving Robots

Runaway robots
Stale URLs
Long, wrong URLs
Nosy robots
Dynamic gateway access

Excluding Robots

The Robots Exclusion Standard

Fetching robots.txt

GET /robots.txt HTTP/1.0

Host: www.joes-hardware.com

User-Agent: Slurp/2.0

Date: Wed Oct 3 20:22:48 EST 2001

Response codes

• If the server responds with a success status (HTTP status code 2XX), the robot must parse the content and apply the exclusion rules to fetches from that site.

• If the server response indicates the resource does not exist (HTTP status code 404), the robot can assume that no exclusion rules are active and that access to the site is not restricted by robots.txt

• If the server response indicates access restrictions (HTTP status code 401 or 403) the robot should regard access to the site as completely restricted.

• If the request attempt results in temporary failure (HTTP status code 503), the robot should defer visits to the site until the resource can be retrieved.

• If the server response indicates redirection (HTTP status code 3XX), the robot should follow the redirects until the resource is found.

robots.txt File Format

Caching and Expiration of robots.txt

HTML Robot-Control META Tags

Robot META directives

NOINDEX

NOFOLLOW

INDEX

Tells a robot that it may index the contents of the page.

Tells a robot that it may crawl any outgoing links in the page.

NOARCHIVE

Tells a robot that it should not cache a local copy of the page.*

ALL

Equivalent to INDEX, FOLLOW.

NONE

Equivalent to NOINDEX, NOFOLLOW.

Search engine META tags

Robot Etiquette

Modern Search Engine Architecture

Full-Text Index

Posting the Query

Sorting and Presenting the Results

HTTP The Definitive Guide (Integration Points: Gateways, Tunnels, and Relays)

Integration Points: Gateways, Tunnels, and Relays

Gateways

An application can ask (through HTTP or some other defined interface) a gateway to handle the request, and the gateway can provide a response. The gateway can speak the query language to the database or generate the dynamic content, acting like a portal: a request goes in, and a response comes out.

Client-Side and Server-Side Gateways

Server-side gateways speak HTTP with clients and a foreign protocol with servers (HTTP/*).
Client-side gateways speak foreign protocols with clients and HTTP with servers (*/HTTP).

Protocol Gateways

HTTP/*: Server-Side Web Gateways

The gateway does the following:
• Sends the USER and PASS commands to log in to the server
• Issues the CWD command to change to the proper directory on the server
• Sets the download type to ASCII
• Fetches the document’s last modification time with MDTM
• Tells the server to expect a passive data retrieval request using PASV
• Requests the object retrieval using RETR
• Opens a data connection to the FTP server on a port returned on the control channel; as soon as the data channel is opened, the object content flows back to the gateway

HTTP/HTTPS: Server-Side Security Gateways

HTTPS/HTTP: Client-Side Security Accelerator Gateways
Recently, HTTPS/HTTP gateways have become popular as security accelerators.
These gateways often include special decryption hardware to decrypt secure traffic much more efficiently than the origin server, removing load from the origin server. Because these gateways send unencrypted traffic between the gateway and origin server, you need to use caution to make sure the network between the gateway and origin server is secure.

Resource Gateways

The first popular API for application gateways was the Common Gateway Interface (CGI). CGI is a standardized set of interfaces that web servers use to launch programs in response to HTTP requests for special URLs, collect the program output, and send it back in HTTP responses. Over the past several years, commercial web servers have provided more sophisticated interfaces for connecting web servers to applications.

Common Gateway Interface (CGI)
Fast CGI

Server Extension APIs

Application Interfaces and Web Services

Tunnels
Establishing HTTP Tunnels with CONNECT

In Figure 8-10a, the client sends a CONNECT request to the tunnel gateway. The client’s CONNECT method asks the tunnel gateway to open a TCP connection (here, to the host named orders.joes-hardware.com on port 443, the normal SSL port).
The TCP connection is created in Figure 8-10b and Figure 8-10c.
Once the TCP connection is established, the gateway notifies the client (Figure 8-10d) by sending an HTTP 200 Connection Established response.
At this point, the tunnel is set up. Any data sent by the client over the HTTP tunnel will be relayed directly to the outgoing TCP connection, and any data sent by the server will be relayed to the client over the HTTP tunnel.

CONNECT requests

CONNECT home.netscape.com:443 HTTP/1.0

User-agent: Mozilla/4.0

CONNECT responses
HTTP/1.0 200 Connection Established
Proxy-agent: Netscape-Proxy/1.1

Data Tunneling, Timing, and Connection Management

SSL Tunneling
Web tunnels were first developed to carry encrypted SSL traffic through firewalls.

SSL Tunneling Versus HTTP/HTTPS Gateways

Tunnel Authentication
In particular, the proxy authentication support can be used with tunnels to authenticate a client’s right to use a tunnel.

Tunnel Security Considerations
To minimize abuse of tunnels, the gateway should open tunnels only for particular well-known ports, such as 443 for HTTPS.

Relays
HTTP relays are simple HTTP proxies that do not fully adhere to the HTTP specifications. Relays process enough HTTP to establish connections, then blindly forward bytes.