Tuesday, May 6, 2014

HTTP The Definitive Guide (Proxies)

Proxies
Web Intermediaries

Private and Shared Proxies

  • Public proxies
  • Private proxies
Proxies Versus Gateways
     Strictly speaking, proxies connect two or more applications that speak the same protocol,
while gateways hook up two or more parties that speak different protocols. A
gateway acts as a “protocol converter,” allowing a client to complete a transaction
with a server, even when the client and server speak different protocols.


    In practice, the difference between proxies and gateways is blurry.

Why Use Proxies?
Child filter

Document access controller


Security firewall

Web cache


Surrogate
Content router

Transcoder

Anonymizer

Proxy Server Deployment

  • Egress proxy
  • Access (ingress) proxy
  • Surrogates
  • Network exchange proxy
Proxy Hierarchies
Proxy hierarchy content routing

Here are a few other examples of dynamic parent selection:
  • Load balancing
  • Geographic proximity routing
  • Protocol/type routing
  • Subscription-based routing
 How Proxies Get Traffic

  • Modify the client
  • Modify the network - intercepting proxy
  • Modify the DNS namespace
  • Modify the web server 
    • Some web servers also can be configured to redirect client requests to a proxy by
      sending an HTTP redirection command (response code 305) back to the client.
      Upon receiving the redirect, the client transacts with the proxy

Client Proxy Settings
  • Manual configuration
  • Browser preconfiguration
  • Proxy auto-configuration (PAC) - PAC files typically have a .pac suffix and the MIME type “application/x-ns-proxy-autoconfig.” 
    • Example 6-1. Example proxy auto-configuration file
      function FindProxyForURL(url, host) {
      if (url.substring(0,5) == "http:") {
      return "PROXY http-proxy.mydomain.com:8080";
      } else if (url.substring(0,4) =="ftp:") {
      return "PROXY ftp-proxy.mydomain.com:8080";
      } else {
      return "DIRECT";
      }
      }
    •  FindProxyForURL return value - DIRECT, PROXY host:port and SOCKS host:port
  • WPAD proxy discovery
    • WPAD is an algorithm that uses an escalating strategy of discovery
      mechanisms to find the appropriate PAC file for the browser automatically
Tricky Things About Proxy Requests
  • Proxy URIs Differ from Server URIs
  • The Same Problem with Virtual Hosting
  • Intercepting Proxies Get Partial URIs
Proxies Can Handle Both Proxy and Server Requests
The rules for using full and partial URIs are:
• If a full URI is provided, the proxy should use it.
• If a partial URI is provided, and a Host header is present, the Host header
should be used to determine the origin server name and port number.
• If a partial URI is provided, and there is no Host header, the origin server needs
to be determined in some other way:
— If the proxy is a surrogate, standing in for an origin server, the proxy can be
configured with the real server’s address and port number.
— If the traffic was intercepted, and the interceptor makes the original IP
address and port available, the proxy can use the IP address and port number
from the interception technology (see Chapter 20).
— If all else fails, the proxy doesn’t have enough information to determine the
origin server and must return an error message (often suggesting that the
user upgrade to a modern browser that supports Host headers).


In-Flight URI Modification
     Proxy servers need to be very careful about changing the request URI as they forward
messages.
     In particular, the HTTP specifications forbid general intercepting proxies from
rewriting the absolute path parts of URIs when forwarding them. The only exception
is that they can replace an empty path with “/”.

URI Client Auto-Expansion and Hostname Resolution

URI Resolution Without a Proxy

URI Resolution with an Explicit Proxy
     When you use an explicit proxy the browser no longer performs any of these convenience
expansions, because the user’s URI is passed directly to the proxy.

URI Resolution with an Intercepting Proxy
Tracing Messages

The Via Header

Via syntax
The Via header field contains a comma-separated list of waypoints.
The formal syntax for a Via header is shown here:
            Via = "Via" ":" 1#( waypoint )
            waypoint = ( received-protocol received-by [ comment ] )
            received-protocol = [ protocol-name "/" ] protocol-version
            received-by = ( host [ ":" port ] ) | pseudonym

     Note that each Via waypoint contains up to four components: an optional protocol
name (defaults to HTTP), a required protocol version, a required node name, and an
optional descriptive comment:

Via request and response paths
     Because requests and responses usually travel over the same TCP connection,
response messages travel backward across the same path as the requests.
Via and gateways

The Server and Via headers
     The Server response header field describes the software used by the origin server.
     Here are a few examples:
     Server: Apache/1.3.14 (Unix) PHP/4.0.4
     Server: Netscape-Enterprise/4.1
     Server: Microsoft-IIS/5.0

Privacy and security implications of Via
     For organizations that have very strong privacy requirements for obscuring the
design and topology of internal network architectures, a proxy may combine an
ordered sequence of Via waypoint entries (with identical received-protocol values)
into a single, joined entry.
     Don’t combine multiple entries unless they all are under the same organizational
control and the hosts already have been replaced by pseudonyms. Also, don’t combine
entries that have different received-protocol values.

The TRACE Method
     The TRACE response has Content-Type message/http and a 200
OK status.
     Normally, TRACE messages travel all the way to the destination server, regardless of
the number of intervening proxies. You can use the Max-Forwards header to limit the number of proxy hops for TRACE and OPTIONS requests, which is useful for testing a chain of proxies forwarding messages in an infinite loop or for checking the effects of particular proxy servers in the middle of a chain.

Proxy Authentication

Proxy Interoperation
Handling Unsupported Headers and Methods
     Proxies must forward unrecognized header fields and must maintain the relative order of header fields with the same name.

OPTIONS: Discovering Optional Feature Support

If the URI of the OPTIONS request is an asterisk (*), the request pertains to the entire server’s supported functionality.

If the URI is a real resource, the OPTIONS request inquires about the features available to that particular resource.

     The only header field that HTTP/1.1 specifies in the response is
the Allow header, which describes what methods are supported by the server (or
particular resource on the server).

The Allow Header

No comments:

Post a Comment