Wednesday, May 28, 2014

HTTP The Definitive Guide (Internationalization)

Internationalization
This chapter covers two primary internationalization issues for the Web: character set encodings and language tags.
HTTP applications use character set encodings to request and display text in different alphabets, and they use language tags to describe and restrict content to languages the user understands.

HTTP Support for International Content
Servers tell clients about a document’s alphabet and language with the HTTP Content-Type charset parameter and Content-Language headers. These headers describe what’s in the entity body’s “box of bits,” how to convert the contents into the proper characters that can be displayed onscreen, and what spoken language the words represent.

At the same time, the client needs to tell the server which languages the user understands and which alphabetic coding algorithms the browser has installed. The client sends Accept-Charset and Accept-Language headers to tell the server which character set encoding algorithms and languages the client understands, and which of them are preferred.

Accept-Language: fr, en;q=0.8
Accept-Charset: iso-8859-1, utf-8

Character Sets and HTTP
Charset Is a Character-to-Bits Encoding

How Character Sets and Encodings Work
Bits-to-character conversions happen in two steps,

  • In Figure 16-2a, bits from a document are converted into a character code that identifies a particular numbered character in a particular coded character set. In the example, the decoded character code is numbered 225.
  • In Figure 16-2b, the character code is used to select a particular element of the coded character set. In iso-8859-6, the value 225 corresponds to “ARABIC LETTER FEH.” The algorithms used in Steps a and b are determined from the MIME charset tag.
The Wrong Charset Gives the Wrong Characters
Standardized MIME Charset Values
Content-Type Charset Header and META Tags
Web servers send the client the MIME charset tag in the Content-Type header, using
the charset parameter:
Content-Type: text/html; charset=iso-2022-jp

For HTML content, character sets might be found in <META HTTP-EQUIV="Content-Type"> tags that describe the charset.
If a client cannot infer a character encoding, it assumes iso-8859-1.

The Accept-Charset Header
HTTP clients can tell servers precisely which character systems they support, using the Accept-Charset request header.

Multilingual Character Encoding Primer
Character Set Terminology

  • Character
  • Glyph - A character may have multiple glyphs if it can be written different ways
  • Coded character
  • Coding space - A range of integers that we plan to use as character code values.
  • Code width - The number of bits in each (fixed-size) character code.
  • Character repertoire - A particular working set of characters (a subset of all the characters in the world).
  • Coded character set
  • Character encoding scheme - An algorithm to encode numeric character codes into a sequence of content bits (and to decode them back).
Charset Is Poorly Named
Technically, the MIME charset tag (used in the Content-Type charset parameter and the Accept-Charset header) doesn’t specify a character set at all. The MIME charset value names a total algorithm for mapping data bits to codes to unique characters. It combines the two separate concepts of character encoding scheme and coded character set.

Characters

Glyphs, Ligatures, and Presentation Forms
Here’s the general rule: if the meaning of the text changes when you replace one glyph with another, the glyphs are different characters. Otherwise, they are the same characters, with a different stylistic presentation.

Coded Character Sets
US-ASCII: The mother of all character sets
“American Standard Code for Information Interchange.”
HTTP messages (headers, URIs, etc.) use US-ASCII.

iso-8859
The iso-8859 character set standards are 8-bit supersets of US-ASCII that use the high bit to add characters for international writing.

iso-8859-1, also known as Latin1, is the default character set for HTML.

JIS X 0201
JIS X 0201 is an extremely minimal character set that extends ASCII with Japanese half width katakana characters. JIS is an acronym for “Japanese Industrial Standard.”

JIS X 0208 and JIS X 0212
The JIS X 0208 character set was the first multi-byte Japanese character set; it defined 6,879 coded characters, most of which are Chinese-based kanji. The JIS X 0212 character set adds an additional 6,067 characters.

UCS
The Universal Character Set (UCS) is a worldwide standards effort to combine all of the world’s characters into a single coded character set.

Character Encoding Schemes

  • Fixed width
  • Variable width (nonmodal) -Variable-width encodings use different numbers of bits for different character code numbers.
  • Variable width (modal) - Modal encodings use special “escape” patterns to shift between different modes. For example, a modal encoding can be used to switch between multiple, overlapping character sets in the middle of text.
Encoding schemes:
8-bit - It supports only character sets with a code range of 256 characters. The iso-8859 family of character sets uses the 8-bit identity encoding.

UTF-8 - UTF stands for “UCS Transformation Format”  UTF-8 uses a nonmodal, variable-length encoding for the character code values, where the leading bits of the first byte tell the length of the encoded character in bytes, and any subsequent byte contains six bits of code value. For example, character code 90 (ASCII “Z”) would be encoded as 1 byte (01011010), while code 5073 (13-bit binary value 1001111010001) would be encoded into 3 bytes:
11100001 10001111 10010001
iso-2022-jp
iso-2022-jp is a variable-length, modal encoding, with all values less than 128 to prevent problems with non–8-bit-clean software.
The encoding context always is set to one of four predefined character sets.* Special “escape sequences” shift from one set to another.

euc-jp
EUC stands for “Extended Unix Code,” first developed to support Asian characters on Unix operating systems.
Like iso-2022-jp, the euc-jp encoding is a variable-length encoding that allows the use of several standard Japanese character sets. But unlike iso-2022-jp, the euc-jp encoding is not modal. There are no escape sequences to shift between modes.


Language Tags and HTTP
The Content-Language Header
The Accept-Language Header
Clients use Accept-Language and Accept-Charset to request content they can understand.

Types of Language Tags
Language tags can be used to represent:

  • General language classes (as in “es” for Spanish)
  • Country-specific languages (as in “en-GB” for English in Great Britain)
  • Dialects of languages (as in “no-bok” for Norwegian “Book Language”)
  • Regional languages (as in “sgn-US-MA” for Martha’s Vineyard sign language)
  • Standardized nonvariant languages (e.g., “i-navajo”)
  • Nonstandard languages (e.g., “x-snowboarder-slang”*)

Subtags
Language tags have one or more parts, separated by hyphens, called subtags:

  • The first subtag called the primary subtag. The values are standardized.
  • The second subtag is optional and follows its own naming standard.
  • Any trailing subtags are unregistered.

Capitalization
However, lowercasing conventionally is used to represent general languages, while uppercasing is
used to signify particular countries.

IANA Language Tag Registrations

First Subtag: Namespace
If the first subtag has:

  • Two characters, it is a language code from the ISO 639† and 639-1 standards 
  • Three characters, it is a language code listed in the ISO 639-2‡ standard and extensions
  • The letter “i,” the language tag is explicitly IANA-registered
  • The letter “x,” the language tag is a private, nonstandard, extension subtag

Second Subtag: Namespace
If the second subtag has:

  • Two characters, it’s a country/region defined by ISO 3166*
  • Three to eight characters, it may be registered with the IANA
  • One character, it is illegal

Remaining Subtags: Namespace
There are no rules for the third and following subtags, apart from being up to eight characters (letters and digits).

Configuring Language Preferences


Internationalized URIs
Global Transcribability Versus Meaningful Characters

URI Character Repertoire

Escaping International Characters
Note that escape values should be in the range of US-ASCII codes (0–127).

Modal Switches in URIs

Other Considerations
Headers and Out-of-Spec Data
HTTP headers must consist of characters from the US-ASCII character set.

Dates
Domain Names



























































HTTP The Definitive Guide (Entities and Encodings)

Entities and EncodingsIn particular, HTTP ensures that its cargo:
  • Can be identified correctly (using Content-Type media formats and Content-Language headers) so browsers and other clients can process the content properly
  • Can be unpacked properly (using Content-Length and Content-Encoding headers) 
  • Is fresh (using entity validators and cache-expiration controls)
  • Meets the user’s needs (based on content-negotiation Accept headers)
  • Moves quickly and efficiently through the network (using range requests, delta encoding, and other data compression)
  • Arrives complete and untampered with (using transfer encoding headers and
  • Content-MD5 checksums)
Messages Are Crates, Entities Are Cargo
HTTP/1.1 defines 10 primary entity header fields:
Content-Type
        The kind of object carried by the entity.
Content-Length
        The length or size of the message being sent.
Content-Location
        An alternate location for the object at the time of the request.
Content-Range
        If this is a partial entity, this header defines which pieces of the whole are included.
Content-MD5
        A checksum of the contents of the entity body.
Last-Modified
        The date on which this particular content was created or modified at the server.
Expires
        The date and time at which this entity data will become stale.
Allow
        What request methods are legal on this resource; e.g., GET and HEAD.
ETag
        A unique validator for this particular instance* of the document. The ETag header is not defined                     formally as an entity header, but it is an important header for many operations involving entities.
Cache-Control
        Directives on how this document can be cached. The Cache-Control header, like the ETag header, is         not defined formally as an entity header.

Entity Bodies

  • In Figure 15-2a, the entity body begins at byte number 65, right after the end-ofheaders CRLF. The entity body contains the ASCII characters for “Hi! I’m a message!”
  • In Figure 15-2b, the entity body begins at byte number 67. The entity body contains the binary contents of the GIF image. GIF files begin with 6-byte version signature, a 16-bit width, and a 16-bit height. You can see all three of these directly in the entity body.
Content-Length: The Entity’s Size
The Content-Length header is mandatory for messages with entity bodies, unless the message is transported using chunked encoding.

Detecting Truncation
Older versions of HTTP used connection close to delimit the end of a message. But, without Content-Length, clients cannot distinguish between successful connection close at the end of a message and connection close due to a server crash in the middle of a message. Clients need Content-Length to detect message truncation.

Message truncation is especially severe for caching proxy servers. Caching proxy servers generally do not cache HTTP bodies that don’t have an explicit Content-Length header, to reduce the risk of caching truncated messages.

Incorrect Content-Length
An incorrect Content-Length can cause even more damage than a missing Content-Length.

Content-Length and Persistent Connections
Content-Length is essential for persistent connections.If the response comes across a persistent connection, another HTTP response can immediately follow the current response. The Content-Length header lets the client know where one message ends and the next begins. Because the connection is persistent, the client cannot use connection close to identify the message’s end. Without a Content-Length header, HTTP applications won’t know where one entity body ends and the next message begins.
As we will see in “Transfer Encoding and Chunked Encoding,” there is one situation where you can use persistent connections without having a Content-Length header: when you use chunked encoding.

Content Encoding
If the body has been content-encoded, the Content-Length header specifies the length, in bytes, of the encoded body, not the length of the original, unencoded body.

Rules for Determining Entity Body Length
The rules should be applied in order; the first match applies.

  • If a particular HTTP message type is not allowed to have a body, ignore the Content-Length header for body calculations.
  • If a message contains a Transfer-Encoding header (other than the default HTTP “identity” encoding), the entity will be terminated by a special pattern called a “zero-byte chunk,” unless the message is terminated first by closing the connection.
  • If a message has a Content-Length header (and the message type allows entity bodies), the Content-Length value contains the body length, unless there is a non-identity Transfer-Encoding header. If a message is received with both a Content-Length header field and a non-identity Transfer-Encoding header field, you must ignore the Content-Length, because the transfer encoding will change the way entity bodies are represented and transferred (and probably the number of bytes transmitted)
  • If the message uses the “multipart/byteranges” media type and the entity length is not otherwise specified (in the Content-Length header), each part of the multipart message will specify its own size. This multipart type is the only entity body type that self-delimits its own size, so this media type must not be sent unless the sender knows the recipient can parse it.
  • If none of the above rules match, the entity ends when the connection closes.
  • To be compatible with HTTP/1.0 applications, any HTTP/1.1 request that has an entity body also must include a valid Content-Length header field (unless the server is known to be HTTP/1.1-compliant).
Entity Digests
The Content-MD5 header is used by servers to send the result of running the MD5 algorithm on the entity body. The Content-MD5 header contains the MD5 of the content after all content encodings have been applied to the entity body and before any transfer encodings have been applied to it.

Media Type and Charset
The Content-Type header field describes the MIME type of the entity body.
If the entity has gone through content encoding, for example, the Content-Type header will still specify the entity body type before the encoding.

Character Encodings for Text Media
        Content-Type: text/html; charset=iso-8859-4

Multipart Media Types
MIME “multipart” email messages contain multiple messages stuck together and sent as a single, complex message. Each component is self-contained, with its own set of headers describing its content; the different components are concatenated together and delimited by a string. HTTP also supports multipart bodies; however, they typically are sent in only one of two situations: in fill-in form submissions and in range responses carrying pieces of a document.

Multipart Form Submissions
When an HTTP fill-in form is submitted, variable-length text fields and uploaded objects are sent as separate parts of a multipart body, allowing forms to be filled out with values of different types and lengths.
Example:

1.
Content-Type: multipart/form-data; boundary=AaB03x
--AaB03x
Content-Disposition: form-data; name="submit-name"
Sally
--AaB03x
Content-Disposition: form-data; name="files"; filename="essayfile.txt"
Content-Type: text/plain
...contents of essayfile.txt...
--AaB03x--

2.
Content-Type: multipart/form-data; boundary=AaB03x
--AaB03x
Content-Disposition: form-data; name="submit-name"
Sally
--AaB03x
Content-Disposition: form-data; name="files"
Content-Type: multipart/mixed; boundary=BbC04y
--BbC04y
Content-Disposition: file; filename="essayfile.txt"
Content-Type: text/plain
...contents of essayfile.txt...
--BbC04y
Content-Disposition: file; filename="imagefile.gif"
Content-Type: image/gif
Content-Transfer-Encoding: binary
...contents of imagefile.gif...
--BbC04y--
--AaB03x--

Multipart Range Responses
HTTP responses to range requests also can be multipart. Such responses come with a Content-Type: multipart/byteranges header and a multipart body with the different ranges.

Example:

HTTP/1.0 206 Partial content
Server: Microsoft-IIS/5.0
Date: Sun, 10 Dec 2000 19:11:20 GMT
Content-Location: http://www.joes-hardware.com/gettysburg.txt
Content-Type: multipart/x-byteranges; boundary=--[abcdefghijklmnopqrstuvwxyz]--
Last-Modified: Sat, 09 Dec 2000 00:38:47 GMT
--[abcdefghijklmnopqrstuvwxyz]--
Content-Type: text/plain
Content-Range: bytes 0-174/1441
Fourscore and seven years ago our fathers brough forth on this continent
a new nation, conceived in liberty and dedicated to the proposition that
all men are created equal.
--[abcdefghijklmnopqrstuvwxyz]--
Content-Type: text/plain
Content-Range: bytes 552-761/1441
But in a larger sense, we can not dedicate, we can not consecrate,
we can not hallow this ground. The brave men, living and dead who
struggled here have consecrated it far above our poor power to add
or detract.
--[abcdefghijklmnopqrstuvwxyz]--
Content-Type: text/plain
Content-Range: bytes 1344-1441/1441
and that government of the people, by the people, for the people shall
not perish from the earth.
--[abcdefghijklmnopqrstuvwxyz]--

Content Encoding
The Content-Encoding Process
The content-encoding process is:

  • A web server generates an original response message, with original Content-Type and Content-Length headers.
  • A content-encoding server (perhaps the origin server or a downstream proxy) creates an encoded message. The encoded message has the same Content-Type but (if, for example, the body is compressed) a different Content-Length. The content-encoding server adds a Content-Encoding header to the encoded message, so that a receiving application can decode it.
  • A receiving program gets the encoded message, decodes it, and obtains the original.

Content-Encoding Types
Accept-Encoding Headers
The Accept-Encoding field contains a comma-separated list of supported encodings.
Here are a few examples:
        Accept-Encoding: compress, gzip
        Accept-Encoding:
        Accept-Encoding: *
        Accept-Encoding: compress;q=0.5, gzip;q=1.0
        Accept-Encoding: gzip;q=1.0, identity; q=0.5, *;q=0

Transfer Encoding and Chunked Encoding
Transfer encodings also are reversible transformations performed on the entity body, but they are applied for architectural reasons and are independent of the format of the content. You apply a transfer encoding to a message to change the way message data is transferred across the network.
Safe Transport
In HTTP, there are only a few reasons why transporting message bodies can cause trouble. Two of these are:
Unknown size
Security

Transfer-Encoding Headers
Transfer-Encoding
TE
All transfer-encoding values are case-insensitive. HTTP/1.1 uses transfer-encoding values in the TE header field and in the Transfer-Encoding header field. The latest HTTP specification defines only one transfer encoding, chunked encoding.

Chunked Encoding
Chunked encoding breaks messages into chunks of known size. Each chunk is sent one after another, eliminating the need for the size of the full message to be known before it is sent.

Chunking and persistent connections
Chunked encoding provides a solution for this dilemma, by allowing servers to send the body in chunks, specifying only the size of each chunk. As the body is dynamically generated, a server can buffer up a portion of it, send its size and the chunk, and then repeat the process until the full body has been sent. The server can signal the end of the body with a chunk of size 0 and still keep the connection open and ready for the next response.

Trailers in chunked messages
Any of the HTTP headers can be sent as trailers, except for the Transfer-Encoding, Trailer, and Content-Length headers.

Combining Content and Transfer Encodings
Transfer-Encoding Rules

  • The set of transfer encodings must include “chunked.” The only exception is if the message is terminated by closing the connection.
  • When the chunked transfer encoding is used, it is required to be the last transfer encoding applied to the message body.
  • The chunked transfer encoding must not be applied to a message body more than once.
Time-Varying Instances
The HTTP protocol specifies operations for a class of requests and responses, called instance manipulations, that operate on instances of an object. The two main instance-manipulation methods are range requests and delta encoding.
Validators and Freshness

Freshness
Servers can provide this information using one of two headers: Expires and Cache-Control.
Expires: Sun Mar 18 23:59:59 GMT 2001

The Cache-Control header actually is very powerful. It can be used by both servers and clients to describe freshness using more directives than just specifying an age or expiration time.
Conditionals and Validators
        GET /announce.html HTTP/1.0
        If-Modified-Since: Sat, 29 Jun 2002, 14:30:00 GMT
The If-Modified-Since conditional header tests the last-modified date of a document instance, so we say that the last-modified date is the validator. The If-None-Match conditional header tests the ETag value of a document, which is a special keyword or version-identifying tag associated with the entity. Last-Modified and ETag are the two primary validators used by HTTP.

HTTP groups validators into two classes: weak validators and strong validators.
The last-modified time is considered a weak validator because, although it specifies the time at which the resource was last modified, it specifies that time to an accuracy of at most one second.
The ETag header is considered a strong validator, because the server can place a distinct value in the ETag header every time a value changes.
The server might advertise a “weak” entity tag by prefixing the tag with “W/”.

Range Requests
HTTP goes further: it allows clients to actually request just part or a range of a document.
Example:

GET /bigfile.html HTTP/1.1
Host: www.joes-hardware.com
Range: bytes=4000-
User-Agent: Mozilla/4.61 [en] (WinNT; I)
In the case where clients request multiple ranges in a single request, responses come back as a single entity, with a multipart body and a Content-Type: multipart/byteranges header.

Servers can advertise to clients that they accept ranges by including the header Accept-Ranges in their responses.

HTTP/1.1 200 OK
Date: Fri, 05 Nov 1999 22:35:15 GMT
Server: Apache/1.2.4
Accept-Ranges: bytes
That is, a client’s range request makes sense only if the client and server have the same version of a document.
Delta Encoding
Delta encoding is an extension to the HTTP protocol that optimizes transfers by communicating changes instead of entire objects. Delta encoding is a type of instance manipulation, because it relies on clients and servers exchanging information about particular instances of an object.

A-IM is short for Accept-Instance-Manipulation

Instance Manipulations, Delta Generators, and Delta Appliers


The Unix diff -e algorithm does a line-by-line comparison of files. This obviously is okay for text files but breaks down for binary files. The vcdiff algorithm is more powerful, working even for non-text files and generally producing smaller deltas than diff -e.

A server supporting delta encoding must keep all the different copies of that page as it changes over time, in order to figure out what’s changed between any requesting client’s copy and the latest copy.







































































Tuesday, May 27, 2014

HTTP The Definitive Guide (Secure HTTP)

Secure HTTP
Making HTTP Safe
We need a technology for HTTP security that provides:
  • Server authentication (clients know they’re talking to the real server, not a phony)
  • Client authentication (servers know they’re talking to the real user, not a phony)
  • Integrity (clients and servers are safe from their data being changed)
  • Encryption (clients and servers talk privately without fear of eavesdropping)
  • Efficiency (an algorithm fast enough for inexpensive clients and servers to use)
  • Ubiquity (protocols are supported by virtually all clients and servers)
  • Administrative scalability (instant secure communication for anyone, anywhere)
  • Adaptability (supports the best known security methods of the day)
  • Social viability (meets the cultural and political needs of the society)
HTTPS
Digital Cryptography

The Art and Science of Secret Coding
Ciphers

Cipher Machines
Keyed Ciphers

Digital Ciphers

Symmetric-Key Cryptography
Many digital cipher algorithms are called symmetric-key ciphers, because they use the same key value for encoding as they do for decoding (e = d).
Some popular symmetric-key cipher algorithms are DES, Triple-DES, RC2, and RC4.

Key Length and Enumeration Attacks
Establishing Shared Keys
Public-Key Cryptography

RSA

Hybrid Cryptosystems and Session Keys

Digital Signatures

Signatures Are Cryptographic Checksums
Digital signatures are special cryptographic checksums attached to a message. They have two benefits:

  • Signatures prove the author wrote the message. Because only the author has the author’s top-secret private key,* only the author can compute these checksums. The checksum acts as a personal “signature” from the author.
  • Signatures prevent message tampering. If a malicious assailant modified the message in-flight, the checksum would no longer match. And because the checksum involves the author’s secret, private key, the intruder will not be able to fabricate a correct checksum for the tampered-with message.
Digital signatures often are generated using asymmetric, public-key technology.

Digital Certificates
In this section, we talk about digital certificates, the “ID cards” of the Internet. Digital certificates (often called “certs,” like the breath mints) contain information about a user or firm that has been vouched for by a trusted organization.

The Guts of a Certificate
Basic digital certificates commonly contain basic things common to printed IDs, such as:

  • Subject’s name (person, server, organization, etc.)
  • Expiration date
  • Certificate issuer (who is vouching for the certificate)
  • Digital signature from the certificate issuer
X.509 v3 Certificates
The good news is that most certificates in use today store their information in a standard form, called X.509 v3. X.509 v3 certificates provide a standard way of structuring certificate information into parseable fields. Different kinds of certificates have different field values, but most follow the X.509 v3 structure.


Using Certificates to Authenticate Servers
The server certificate contains many fields, including:
  • Name and hostname of the web site
  • Public key of the web site
  • Name of the signing authority
  • Signature from the signing authority

If the signing authority is unknown, the browser isn’t sure if it should trust the signing authority and usually displays a dialog box for the user to read and see if he trusts the signer. The signer might be the local IT department, or a software vendor.

HTTPS: The Details
HTTPS is the most popular secure version of HTTP. It is widely implemented and available in all major commercial browsers and servers. HTTPS combines the HTTP protocol with a powerful set of symmetric, asymmetric, and certificate-based cryptographic techniques, making HTTPS very secure but also very flexible and easy to administer across the anarchy of the decentralized, global Internet.

HTTPS Overview

HTTPS Schemes

Secure Transport Setup

SSL Handshake
Before you can send encrypted HTTP messages, the client and server need to do an SSL handshake, where they:

  • Exchange protocol version numbers
  • Select a cipher that each side knows
  • Authenticate the identity of each side
  • Generate temporary session keys to encrypt the channel

Server Certificates

Site Certificate Validation
The steps are:

  • Date check
  • Signer trust check
  • Signature check
  • Site identity check

Virtual Hosting and Certificates
It’s sometimes tricky to deal with secure traffic on sites that are virtually hosted (multiple hostnames on a single server). Some popular web server programs support only a single certificate. If a user arrives for a virtual hostname that does not strictly match the certificate name, a warning box is displayed.

A Real HTTPS Client
OpenSSL -OpenSSL is the most popular open source implementation of SSL and TLS.
A Simple HTTPS Client - Refer to http://www.openssl.org for more information about the OpenSSL libraries.

Executing Our Simple OpenSSL Client
% https_client clients1.online.msdw.com
(1) SSL context initialized
(2) 'clients1.online.msdw.com' has IP address '63.151.15.11'
(3) TCP connection open to host 'clients1.online.msdw.com', port 443
(4) SSL endpoint created & handshake completed
(5) SSL connected with cipher: DES-CBC3-MD5
(6) server's certificate was received:
            subject: /C=US/ST=Utah/L=Salt Lake City/O=Morgan Stanley/OU=Online/CN=
clients1.online.msdw.com
            issuer: /C=US/O=RSA Data Security, Inc./OU=Secure Server Certification
Authority
(7) sent HTTP request over encrypted channel:
            GET / HTTP/1.0
            Host: clients1.online.msdw.com:443
            Connection: close
(8) got back 615 bytes of HTTP response:
            HTTP/1.1 302 Found
            Date: Sat, 09 Mar 2002 09:43:42 GMT
            Server: Stronghold/3.0 Apache/1.3.14 RedHat/3013c (Unix) mod_ssl/2.7.1 OpenSSL/0.9.6
            Location: https://clients.online.msdw.com/cgi-bin/ICenter/home
            Connection: close
            Content-Type: text/html; charset=iso-8859-1
            <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
            <HTML><HEAD>
            <TITLE>302 Found</TITLE>
            </HEAD><BODY>
            <H1>Found</H1>
            The document has moved <A HREF="https://clients.online.msdw.com/cgi-bin/ICenter/
            home">here</A>.<P>
            <HR>
            <ADDRESS>Stronghold/3.0Apache/1.3.14 RedHat/3013c Server at clients1.online.msdw.com
            Port 443</ADDRESS>
            </BODY></HTML>
(9) all done, cleaned up and closed connection

Tunneling Secure Traffic Through Proxies

HTTP is used to send the plaintext endpoint information, using a new extension method called CONNECT. The CONNECT method tells the proxy to open a connection to the desired host and port number and, when that’s done, to tunnel data directly between the client and server. The CONNECT method is a one-line text command that provides the hostname and port of the secure origin server, separated by a colon. The host:port is followed by a space and an HTTP version string followed by a CRLF. After that there is a series of zero or more HTTP request header lines, followed by an empty line. After the empty line, if the handshake to establish the connection was successful, SSL data transfer can begin. Here is an example:
    CONNECT home.netscape.com:443 HTTP/1.0
    User-agent: Mozilla/1.1N
    <raw SSL-encrypted data would follow here...>
After the empty line in the request, the client will wait for a response from the proxy. The proxy will evaluate the request and make sure that it is valid and that the user is authorized to request such a connection. If everything is in order, the proxy will make a connection to the destination server and, if successful, send a 200 Connection  Established response to the client.
    HTTP/1.0 200 Connection established
    Proxy-agent: Netscape-Proxy/1.1