Yongji Wang's Blog: 2014

Wednesday, July 23, 2014

CLR via C# - CHAPTER 8 Methods

Methods

Instance Constructors and Classes (Reference Types)

When constructing a reference type object, the memory allocated for the object is always zeroed out before the type’s instance constructor is called. Any fields that the constructor doesn’t explicitly overwrite are guaranteed to have a value of 0or null.

If you define a class that does not explicitly define any constructors, the C# compiler defines a default (parameterless) constructor for you whose implementation simply calls the base class’s parameterless constructor.

Instance Constructors and Structures (Value Types)

Value types don’t actually even need to have a constructor defined within them, and the C# compiler doesn't emit default parameterless constructors for value types.

The CLR does allow you to define constructors on value types. The only way that these constructors will execute is if you write code to explicitly call one of them,

The trick part is that C# doesn’t allow a value type to define a parameterless constructor.

As an alternative way to initialize all the fields of a value type, you can actually do the following.
// C# allows value types to have constructors that take parameters.
public SomeValType(Int32 x) {
// Looks strange but compiles fine and initializes all fields to 0/null.
this = new SomeValType();
m_x = x; // Overwrite m_x's 0 with x
// Notice that m_y was initialized to 0.
}
In a value type’s constructor, thisrepresents an instance of the value type itself and you can actually assign to it the result of newing up an instance of the value type, which really just zeroes out all the fields.

Type Constructors
If a type has a type constructor, it can have no more than one. In addition, type constructors never have parameters.

Because the CLR guarantees that a type constructor executes only once per AppDomain and is thread-safe, a type constructor is a great place to initialize any singleton objects required by the type.

In fact, because the CLR is responsible for calling type constructors, you should always avoid writing any code that requires type constructors to be called in a specific order.

CLR via C# - CHAPTER 7 Constants and Fields

Constants and Fields

Constants

When code refers to a constant symbol, compilers look up the symbol in the metadata of the assembly that defines the constant, extract the constant’s value, and embed the value in the emitted Intermediate Language (IL) code.

Fields

However, readonly fields can be written to only within a constructor method (which is called only once, when an object is first created).

When a field is of a reference type and the field is marked as readonly, it is the reference that is immutable, not the object that the field refers to.

Tuesday, July 22, 2014

CLR via C# - CHAPTER 6 Type and Member Basics

Type and Member Basics

The Different Kinds of Type Members

Friend Assemblies
When an assembly is built, it can indicate other assemblies it considers “friends” by using the InternalsVisibleToattribute defined in the System.Runtime.CompilerServicesnamespace.

When the type of a field, method parameter, or method return using System;
using System.Runtime.CompilerServices; // For InternalsVisibleTo attribute
// This assembly's internal types can be accessed by any code written
// in the following two assemblies (regardless of version or culture):
[assembly:InternalsVisibleTo("Wintellect, PublicKey=12345678...90abcdef")]
[assembly:InternalsVisibleTo("Microsoft, PublicKey=b77a5c56...1934e089")]
internal sealed class SomeInternalType { ... }
internal sealed class AnotherInternalType { ... }

Member Accessibility

The CLR requires that all members of an interface type be public. The C# compiler knows this and forbids the programmer from explicitly specifying accessibility on interface members; the compiler just makes all the members publicfor you.

When a derived type is overriding a member defined in its base type, the C# compiler requires that the original member and the overriding member have the same accessibility. That is, if the member in the base class is protected, the overriding member in the derived class must also be protected. However, this is a C# restriction, not a CLR restriction. When deriving from a base class, the CLR allows a member’s accessibility to become less restrictive but not more restrictive. For example, a class can override a protectedmethod defined in its base class and make the overridden method public(more accessible). However, a class cannot override a protectedmethod defined in its base class and make the overridden method private(less accessible). The reason a class cannot make a base class method more restricted is because a user of the derived class could always cast to the base type and gain access to the base class’s method. If the CLR allowed the derived type’s method to be less accessible, it would be making a claim that was not enforceable.

Static Classes
The compiler enforces many restrictions on a staticclass:
■ The class must be derived directly from System.Objectbecause deriving from any other base class makes no sense because inheritance applies only to objects, and you cannot create an instance of a staticclass.
■ The class must not implement any interfaces because interface methods are callable only when using an instance of a class.
■ The class must define only staticmembers (fields, methods, properties, and events). Any instance members cause the compiler to generate an error.
■ The class cannot be used as a field, method parameter, or local variable because all of these would indicate a variable that refers to an instance, and this is not allowed. If the compiler detects any of these uses, the compiler issues an error.

Defining a class by using the statickeyword causes the C# compiler to make the class both abstractand sealed. Furthermore, the compiler will not emit an instance constructor method into the type.

Partial Classes, Structures, and Interfaces

Components, Polymorphism, and Versioning

How the CLR Calls Virtual Methods, Properties, and Events

Using Type Visibility and Member Accessibility Intelligently

Dealing with Virtual Methods When Versioning Types

Monday, July 21, 2014

CLR via C# - CHAPTER 5 Primitive, Reference, and Value Types

Primitive, Reference, and Value Types

Programming Language Primitive Types
Any data types the compiler directly supports are called primitive types. Primitive types map directly to types existing in the Framework Class Library (FCL).

Reference Types and Value Types
The CLR supports two kinds of types: reference types and value types.Value type instances are usually allocated on a thread’s stack.
The .NET Framework SDK documentation clearly indicates which types are reference types and which are value types.When looking up a type in the documentation, any type called a class is a reference type. On the other hand, the documentation refers to each value type as a structure or an enumeration.

All of the structures are immediately derived from the System.ValueTypeabstract type. System.Value Typeis itself immediately derived from the System.Objecttype. By definition, all value types must be derived from System.ValueType. All enumerations are derived from the System.Enumabstract type, which is itself derived from System.ValueType.

Object Equality and Identity
When defining your own type, if you decide to override Equals, you must ensure that it adheres
to the four properties of equality:
■ Equalsmust be reflexive; that is, x.Equals(x)must return true.
■ Equalsmust be symmetric; that is, x.Equals(y)must return the same value as y.Equals(x).
■ Equalsmust be transitive; that is, if x.Equals(y)returns trueand y.Equals(z)returns true, then x.Equals(z)must also return true.
■ Equalsmust be consistent. Provided that there are no changes in the two values being compared, Equalsshould consistently return trueor false.

When overriding the Equalsmethod, there are a couple more things that you’ll probably want
to do:
■ Have the type implement the System.IEquatable<T>interface’s Equalsmethod This generic interface allows you to define a type-safe Equalsmethod. Usually, you’ll implement the Equalsmethod that takes an Objectparameter to internally call the type-safe Equals method.
■ Overload the ==and !=operator methods Usually, you’ll implement these operator methods to internally call the type-safe Equalsmethod.

Object Hash Codes
If you define a type and override the Equalsmethod, you should also override the GetHashCode method.
The reason a type that defines Equalsmust also define GetHashCodeis that the implementation of the System.Collections.Hashtabletype, the System.Collections.Generic.Dictionary type, and some other collections require that any two objects that are equal must have the same hash code value.

Basically, when you add a key/value pair to a collection, a hash code for the key object is obtained first. This hash code indicates which “bucket” the key/value pair should be stored in. When the collection needs to look up a key, it gets the hash code for the specified key object. This code identifies the “bucket” that is now searched sequentially, looking for a stored key object that is equal to the specified key object.

When selecting an algorithm for calculating hash codes for instances of your type, try to follow these guidelines:
■ Use an algorithm that gives a good random distribution for the best performance of the hash table.
■ Your algorithm can also call the base type’s GetHashCodemethod, including its return value. However, you don’t generally want to call Object’s or ValueType’s GetHashCodemethod, because the implementation in either method doesn’t lend itself to high-performance hashing algorithms.
■ Your algorithm should use at least one instance field.
■ Ideally, the fields you use in your algorithm should be immutable; that is, the fields should be initialized when the object is constructed, and they should never again change during the object’s lifetime.
■ Your algorithm should execute as quickly as possible.
■ Objects with the same value should return the same code. For example, two Stringobjects with the same text should return the same hash code value.

The dynamicPrimitive Type
payload code

When the type of a field, method parameter, or method return type is specified as dynamic, the compiler converts this type to the System.Objecttype and applies an instance of System.Runtime.CompilerServices.DynamicAttribute to the field, parameter, or return type in metadata. If a local variable is specified as dynamic, then the variable’s type will also be of type Object, but the DynamicAttributeis not applied to the local variable because its usage is self-contained within the method. Because dynamicis really the same as Object, you cannot write methods whose signature differs only by dynamicand Object.
It is also possible to use dynamicwhen specifying generic type arguments to a generic class (reference type), a structure (value type), an interface, a delegate, or a method. When you do this, the compiler converts dynamicto Objectand applies DynamicAttributeto the various pieces of metadata where it makes sense. Note that the generic code that you are using has already been compiled and will consider the type to be Object; no dynamic dispatch will be performed because the compiler did not produce any payload code in the generic code.
Any expression can implicitly be cast to dynamicbecause all expressions result in a type that is rived from Object Normally, the compiler does not allow you to write code that implicitly casts an expression from Objectto another type; you must use explicit cast syntax. However, the compiler does allow you to cast an expression from dynamicto another type by using implicit cast syntax.

CLR via C# - CHAPTER 4 Type Fundamentals

Type Fundamentals

How Things Relate at Run Time

Friday, July 18, 2014

CLR via C# - CHAPTER 3 Shared Assemblies and Strongly Named Assemblies

Shared Assemblies and Strongly Named Assemblies

Two Kinds of Assemblies, Two Kinds of Deployment
The real difference between weakly named and strongly named assemblies is that a strongly named assembly is signed with a publisher’s public/private key pair that uniquely identifies the assembly’s publisher. This key pair allows the assembly to be uniquely identified, secured, and versioned, and it allows the assembly to be deployed anywhere on the user’s machine or even on the Internet.

A strongly named assembly consists of four attributes that uniquely identify the assembly: a file name (without an extension), a version number, a culture identity, and a public key. Because public keys are very large numbers, we frequently use a small hash value derived from a public key. This hash value is called a public key token.

A strongly named assembly consists of four attributes that uniquely identify the assembly: a file name (without an extension), a version number, a culture identity, and a public key. Because public keys are very large numbers, we frequently use a small hash value derived from a public key. This hash value is called a public key token.

SN –k MyCompany.snk
This line tells SN.exe to create a file called MyCompany.snk. This file will contain the public and private key numbers persisted in a binary format.

First, you invoke SN.exe with the –pswitch to create a file that contains only the public key (MyCompany.PublicKey)
SN –p MyCompany.snk MyCompany.PublicKey sha256
Then, you invoke SN.exe, passing it the –tpswitch and the file that contains just the public key.
SN –tp MyCompany.PublicKey

A public key token is a 64-bit hash of the public key.

When you compile your assembly, you use the /keyfile:<file>compiler switch.
csc /keyfile:MyCompany.snk Program.cs

These reduced public key values—known as public key tokens—are what are actually stored in an AssemblyRef table.
By the way, the Assembly Def entry always stores the full public key, not the public key token.

The Global Assembly Cache
As you can see, you can invoke GACUtil.exe, specifying the /iswitch to install an assembly into the GAC, and you can use GACUtil.exe’s /uswitch to uninstall an assembly from the GAC.
By default, the GAC can be manipulated only by a user belonging to the Windows Administrators group.

First, you invoke SN.exe with the –pswitch to create a file that contains only the public key (MyCompany.PublicKey)
SN –p MyCompany.snk MyCompany.PublicKey sha256
Then, you invoke SN.exe, passing it the –tpswitch and the file that contains just the public key.
SN –tp MyCompany.PublicKey

Using GACUtil.exe’s /iswitch is very convenient for developer testing. However, if you use GACUtil.exe to deploy an assembly in a production environment, it’s recommended that you use GACUtil.exe’s /rswitch in addition to specifying the /ior /uswitch to install or uninstall the assembly. The /rswitch integrates the assembly with the Windows install and uninstall engine. Basically, it tells the system which application requires the assembly and then ties the application and the assembly together.

MSI is the only tool that is guaranteed to be on end-user machines and capable of installing assemblies into the GAC.

Building an Assembly That References a Strongly Named Assembly
Yousee, when you install the .NET Framework, two copies of Microsoft’s assembly files are actually installed. One set is installed into the compiler/CLR directory, and another set is installed into a GAC subdirectory. The files in the compiler/CLR directory exist so that you can easily build your assembly, whereas the copies in the GAC exist so that they can be loaded at run time.

In addition, the assemblies in the compiler/CLR directory are machine agnostic. That is, these assemblies contain only metadata in them. Because the IL code is not required at build time, this directory does not have to contain x86, x64, and ARM versions of an assembly. The assemblies in the GAC contain metadata and IL code because the code is needed only at run time.

Strongly Named Assemblies Are Tamper-Resistant
When an assembly is installed into the GAC, the system hashes the contents of the file containing the manifest and compares the hash value with the RSA digital signature value embedded within the PE file (after unsigning it with the public key). If the values are identical, the file’s contents haven’t been tampered with. In addition, the system hashes the contents of the assembly’s other files and compares the hash values with the hash values stored in the manifest file’s FileDef table.

Delayed Signing
The .NET Framework supports delayed signing, sometimes referred to as partial signing.
When an assembly is installed into the GAC, the system hashes the contents of the file containing the manifest and compares the hash value with the RSA digital signature value embedded within the PE file (after unsigning it with the public key). If the values are identical, the file’s contents haven’t been tampered with. In addition, the system hashes the contents of the assembly’s other files and compares the hash values with the hash values stored in the manifest file’s FileDef tableaYou must also tell the tool that you want the assembly to be delay signed, meaning that you’re not supplying a private key. For the C# compiler, you do this by specifying the /delaysigncompiler switch. In Visual Studio, you display the properties for your project, click the Signing tab, and then select the Delay Sign Only check box. If you’re using AL.exe, you can specify the /delay[sign]commandline switch.

When creating the resulting assembly, space is left in the resulting PE file for the RSA digital signature. (The utility can determine how much space is necessary from he size of the public key.) Note that the file’s contents won’t be hashed at this time either.

On every machine on which the assembly needs to be installed into the GAC, you must prevent the system from verifying the integrity of the assembly’s files. To do this, you use the SN.exe utility, specifying the –Vrcommand-line switch.

After this step, you can deploy the fully signed assembly. On the developing and testing machines, don’t forget to turn verification of this assembly back on by using SN.exe’s –Vuor –Vxcommand-line switch.

the steps to develop your assembly by using the delayed signing technique
1. csc /keyfile:MyCompany.PublicKey /delaysign MyAssembly.cs
2. SN.exe –Vr MyAssembly.dll
3. SN.exe Ra MyAssembly.dll MyCompany.PrivateKey
4. SN.exe –Vu MyAssembly.dll

So, if you want to obfuscate an assembly file or perform any other type of post-build operation, you should use delayed signing, perform the post-build operation, and then run SN.exe with the –R or –Rc switch to complete the signing process of the assembly with all of its hashing.

Delayed signing is also useful whenever you want to perform some other operation to an assembly before you package it. For example, you may want to run an obfuscator over your assembly.

Privately Deploying Strongly Named Assemblies
codeBase

How the Runtime Resolves Type References

However, the .NET Framework assemblies (including MSCorLib.dll) are closely tied to the version of the CLR that’s running. Any assembly that references .NET Framework assemblies always binds to the version that matches the CLR’s version.
However, the GAC identifies assemblies by using name, version, culture, public key, and CPU architecture.

When searching the GAC for an assembly, the CLR first searches for a CPU architecture–specific version of the assembly. If it does not find a matching assembly, it then searches for a CPU-agnostic version of the assembly.

Advanced Administrative Control (Configuration)
<?xml version="1.0"?>
<configuration>
<runtime>
<assemblyBinding xmlns="urn:schemasmicrosoftcom:asm.v1">
<probing privatePath="AuxFiles;bin\subdir" />
<dependentAssembly>
<assemblyIdentity name="SomeClassLibrary" publicKeyToken="32ab4ba45e0a69a1" culture="neutral"/>
<bindingRedirect oldVersion="1.0.0.0" newVersion="2.0.0.0" />
<codeBase version="2.0.0.0" href="http://www.Wintellect.com/SomeClassLibrary.dll" />
</dependentAssembly>
<dependentAssembly>
<assemblyIdentity name="TypeLib" publicKeyToken="1f2e74e897abbcfe" culture="neutral"/>
<bindingRedirect oldVersion="3.0.0.03.5.0.0" newVersion="4.0.0.0" />
<publisherPolicy apply="no" />
</dependentAssembly>
</assemblyBinding>
</runtime>
</configuration>

Publisher Policy Control
AL.exe /out:Policy.1.0.SomeClassLibrary.dll
/version:1.0.0.0
/keyfile:MyCompany.snk
/linkresource:SomeClassLibrary.config

<publisherPolicy apply="no"/>

Wednesday, July 2, 2014

CLR via C# - CHAPTER 2 Building, Packaging, Deploying, and Administering Applications and Types

Building, Packaging, Deploying, and Administering Applications and Types

If, for some reason, you really don’t want the C# compiler to reference the MSCorLib.dll assembly, you can use the /nostdlibswitch.
csc.exe /out:Program.exe /t:exe /nostdlib Program.cs

To build a console user interface (CUI) application, specify the /t:exeswitch; to build a graphical user interface (GUI) application, specify the /t:winexeswitch; and to build a Windows Store app, specify the /t:appcontainerexeswitch.

Response Files
A response file is a text file that contains a set of compiler command-line switches.

Whenyou use the /referencecompiler switch to reference an assembly, you can specify a complete path to a particular file. However, if you do not specify a path, the compiler will search for the file in the following places (in the order listed):
■ Working directory.
■ The directory that contains the CSC.exe file itself. MSCorLib.dll is always obtained from this directory. The path looks something like this: %SystemRoot%\Microsoft.NET\Framework\v4.0.#####.
■ Any directories specified using the /libcompiler switch.
■ Any directories specified using the LIBenvironment variable.

Also, you can tell the compiler to ignore both local and global CSC.rsp files by specifying the /noconfig command-line switch.

A Brief Look at Metadata
Nowwe know what kind of PE file we’ve created. But what exactly is in the Program.exe file? A managed PE file has four main parts: the PE32(+) header, the CLR header, the metadata, and the IL. The PE32(+) header is the standard information that Windows expects. The CLR header is a small block of information that is specific to modules that require the CLR (managed modules). The header includes the major and minor version number of the CLR that the module was built for: some flags,a MethodDeftoken (described later) indicating the module’s entry point method if this module is a CUI, GUI, or Windows Store executable, and an optional strong-name digital signature (discussed in Chapter 3). Finally, the header contains the size and offsets of certain metadata tables contained within the module. You can see the exact format of the CLR header by examining the IMAGE_COR20_HEADER
defined in the CorHdr.h header file.
Themetadata is a block of binary data that consists of several tables. There are three categories of tables: definition tables, reference tables, and manifest tables. Table 2-1 describes some of the more common definition tables that exist in a module’s metadata block.

The metadata created includes a set of reference tables that keep a record of the referenced items.

ILDasm Program.exe

Combining Modules to Form an Assembly
The manifest is another set of metadata tables that basically contain the names of the files that are part of the assembly. They also describe the assembly’s version, culture, publisher, publicly exported types, and all of the files that comprise the assembly.

The CLR operates on assemblies; that is, the CLR always loads the file that contains the manifest metadata tables first and then uses the manifest to get the names of the other files that are in the assembly.Here are some characteristics of assemblies that you should remember:
■ An assembly defines the reusable types.
■ An assembly is marked with a version number.
■ An assembly can have security information associated with it.

An assembly’s individual files don’t have these attributes—except for the file that contains the manifest metadata tables.

To build an assembly, you must select one of your PE files to be the keeper of the manifest. Or you can create a separate PE file that contains nothing but the manifest.

he C# compiler produces an assembly when you specify any of the following command-line switches: /t[arget]:exe, /t[arget]:winexe, /t[arget]: appcontainerexe, /t[arget]:library, or /t[arget]:winmdobj.

In addition to these switches, the C# compiler supports the /t[arget]:moduleswitch. This switch tells the compiler to produce a PE file that doesn’t contain the manifest metadata tables. The PE file produced is always a DLL PE file, and this file must be added to an assembly before the CLR can access any types within it. When you use the /t:moduleswitch, the C# compiler, by default, names the output file with an extension of .netmodule.

Unfortunately, the Microsoft Visual Studio integrated development environment (IDE) doesn’t natively support the ability for you to create multifile assemblies. If you want to create multifile assemblies, you must resort to using command-line tools.

Thereare many ways to add a module to an assembly. If you’re using the C# compiler to build a PE file with a manifest, you can use the /addmoduleswitch.

csc /out:MultiFileLibrary.dll /t:library /addmodule:RUT.netmodule FUT.cs

Using the Assembly Linker
csc /t:module RUT.cs
csc /t:module FUT.cs
al /out: MultiFileLibrary.dll /t:library FUT.netmodule RUT.netmodule

csc /t:module /r:MultiFileLibrary.dll Program.cs
al /out:Program.exe /t:exe /main:Program.Main Program.netmodule

Adding Resource Files to an Assembly

AL.exe /embed[resource]switch /link[resource]switch
CSC.exe /resourceswitch /linkresourceswitch

One last note about resources: it’s possible to embed standard Win32 resources into an assembly. You can do this easily by specifying the path of a .res file with the /win32resswitch when using either AL.exe or CSC.exe.
In addition, you can quickly and easily embed a standard Win32 icon resource into an assembly file by specifying the path of the .ico file with the /win32iconswitch when using either AL.exe or CSC.exe.

Assembly Version Resource Information

Version Numbers

AssemblyFileVersion
This version number is stored in the Win32 version resource.This number is for information purposes only; the CLR doesn’t examine this version number in any way.

AssemblyInformationalVersion.
This version number is also stored in the Win32 version resource, and again, this number is for information purposes only; the CLR doesn’t examine or care about it in any way. This version number exists to indicate the version of the product that includes this assembly.

AssemblyVersion
This version number is stored in the AssemblyDef manifest metadata table. The CLR uses this version number when binding to strongly named assemblies (discussed in Chapter 3). This number is extremely important and is used to uniquely identify an assembly.

Culture

Assemblies that are marked with a culture are called satellite assemblies.

You’ll usually use the AL.exe tool to build a satellite assembly. You won’t use a compiler because the satellite assembly should have no code contained within it.
When using AL.exe, you specify the desired culture by using the /c[ulture]:textswitch, where textis a string such as “en-US,” representing US English. When you deploy a satellite assembly, you should place it in a subdirectory whose name matches the culture text. For example, if the application’s base directory is C:\MyApp, the US English satellite assembly should be placed in the C:\MyApp\en-US subdirectory. At run time, you access a satellite assembly’s resources by using the System.Resources.ResourceManagerclass.

Simple Application Deployment (Privately Deployed Assemblies)

Simple Administrative Control (Configuration)

AppDir directory (contains the application’s assembly files)
Program.exe
Program.exe.config (discussed below)

AuxFiles subdirectory (contains MultiFileLibrary’s assembly files)
MultiFileLibrary.dll
FUT.netmodule
RUT.netmodule

<configuration>
<runtime>
<assemblyBinding xmlns="urn:schemasmicrosoftcom:asm.v1">
<probing privatePath="AuxFiles" />
</assemblyBinding>
</runtime>
</configuration>

CLR via C# - CHAPTER 1 The CLR’s Execution Model

The CLR’s Execution Model

Compiling Source Code into Managed Modules

As the figure shows, you can create source code files written in any programming language that supports the CLR. Then you use the corresponding compiler to check the syntax and analyze the source code. Regardlessof which compiler you use, the result is a managed module. A managed module is a standard 32-bit Windows portable executable (PE32) file or a standard 64-bit Windows portable executable (PE32+) file that requires the CLR to execute. By the way, managed assemblies always take advantage of Data Execution Prevention (DEP) and Address Space Layout Randomization (ASLR) in Windows; these two features improve the security of your whole system

In addition to emitting IL, every compiler targeting the CLR is required to emit full metadatainto every managed module. In brief, metadata is a set of data tables that describe what is defined in the module, such as types and their members. In addition, metadata also has tables indicating what the managed module references, such as imported types and their members. Metadata is a superset of older technologies such as COM’s Type Libraries and Interface Definition Language (IDL) files. The important thing to note is that CLR metadata is far more complete. And, unlike Type Libraries and IDL, metadata is always associated with the file that contains the IL code. In fact, the metadata is always embedded in the same EXE/DLL as the code, making it impossible to separate the two. Because the compiler produces the metadata and the code at the same time and binds them into the resulting managed module, the metadata and the IL code it describes are never out of sync with one another.

Metadatahas many uses. Here are some of them:
■ Metadata removes the need for native C/C++ header and library files when compiling because all the information about the referenced types/members is contained in the file that has the IL that implements the type/members. Compilers can read metadata directly from managed modules.
■ Microsoft Visual Studio uses metadata to help you write code. Its IntelliSense feature parses metadata to tell you what methods, properties, events, and fields a type offers, and in the case of a method, what parameters the method expects.
■ The CLR’s code verification process uses metadata to ensure that your code performs only “type-safe” operations. (I’ll discuss verification shortly.)
■ Metadata allows an object’s fields to be serialized into a memory block, sent to another machine, and then deserialized, re-creating the object’s state on the remote machine.
■ Metadata allows the garbage collector to track the lifetime of objects. For any object, the garbage collector can determine the type of the object and, from the metadata, know which fields within that object refer to other objects.

Combining Managed Modules into Assemblies

An assembly allows you to decouple the logical and physical notions of a reusable, securable, versionable component. How you partition your code and resources into different files is completely up to you.
Assemblies allow you to break up the deployment of the files while still treating all of the files as a single collection.

Loading the Common Language Runtime

Microsoft ships two SDK command-line utilities, DumpBin.exe and CorFlags.exe, that you can use to examine the header information emitted in a managed module by the compiler.

Executing Your Assembly’s Code

There are two C# compiler switches that impact code optimization: /optimizeand /debug.

Furthermore, thecompiler produces a Program Database (PDB) file only if you specify the /debug(+/full/pdbonly) switch. The PDB file helps the debugger find local variables and map the IL instructions to source code.

When you create a new C# project in Visual Studio, the Debug configuration of the project has /optimizeand /debug:fullswitches, and the Release configuration has /optimize+ and /debug:pdbonlyswitches specified.

Ifyour experiments show that the CLR’s JIT compiler doesn’t offer your application the kind of performance it requires, you may want to take advantage of the NGen.exe tool that ships with the .NET Framework SDK. This tool compiles all of an assembly’s IL code into native code and saves the resulting native code to a file on disk.

IL and Verification

Unsafe Code
Microsoft supplies a utility called PEVerify.exe, which examines all of an assembly’s methods and notifies you of any methods that contain unsafe code.

The Native Code Generator Tool: NGen.exe

The NGen.exe tool is interesting in two scenarios:

■ Improving an application’s startup time Running NGen.exe can improve startup time because the code will already be compiled into native code so that compilation doesn’t have to occur at run time.
■ Reducing an application’s working set If you believe that an assembly will be loaded into multiple processes simultaneously, running NGen.exe on that assembly can reduce the applications’ working set. The reason is because the NGen.exe tool compiles the IL to native code and saves the output in a separate file. This file can be memory-mapped into multiple-process address spaces simultaneously, allowing the code to be shared; not every process needs its own copy of the code.

When a setup program invokes NGen.exe on an application or a single assembly, all of the assemblies for that application or the one specified assembly have their IL code compiled into native code. A new assembly file containing only this native code instead of IL code is created by NGen.exe. This new file is placed in a folder under the directory with a name like %SystemRoot%\Assembly\NativeImages_v4.0.#####_64. The directory name includes the version of the CLR and information denoting whether the native code is ompiled for 32-bit or 64-bit versions of Windows.

There are several potential problems with respect to NGen’d files:
■ No intellectual property protection
■ NGen’d files can get out of sync
• CLR version: This changes with patches or service packs.
• CPU type: This changes if you upgrade your processor hardware.
• Windows operating system version: This changes with a new service pack update.
• Assembly’s identity module version ID (MVID): This changes when recompiling.
• Referenced assembly’s version IDs: This changes when you recompile a referenced assembly.
• Security: This changes when you revoke permissions (such as declarative inheritance, declarative link-time, SkipVerification, or UnmanagedCodepermissions), that were once granted.
■ Inferior execution-time performance

For large client applications that experience very long startup times, Microsoft provides a Managed Profile Guided Optimization tool (MPGO.exe). This tool analyzes the execution of your application to see what it needs at startup. This information is then fed to the NGen.exe tool in order to better optimize the resulting native image.

The Framework Class Library
The Common Type System
The following list shows the valid options for controlling access to a member:
■ Private Themember is accessible only by other members in the same class type.
■ Family The member is accessible by derived types, regardless of whether they are within the same assembly. Note that many languages (such as C++ and C#) refer to family as protected.
■ Family and assembly The member is accessible by derived types, but only if the derived type is defined in the same assembly. Many languages (such as C# and Visual Basic) don’t offer this access control. Of course, IL Assembly language makes it available.
■ Assembly The member is accessible by any code in the same assembly. Many languages refer to assemblyas internal.
■ Family or assembly The member is accessible by derived types in any assembly. The member is also accessible by any types in the same assembly. C# refers to familyor assemblyas protected internal.
■ Public The member is accessible by any code in any assembly.

The Common Language Specification

If you’re designing a type in one language, and you expect that type to be used by another language, you shouldn’t take advantage of any features that are outside of the CLS in its public and protected members.

// Tell compiler to check for CLS compliance
[assembly: CLSCompliant(true)]

Interoperability with Unmanaged Code

■ Managed code can call an unmanaged function in a DLL
■ Managed code can use an existing COM component (server)
■ Unmanaged code can use a managed type (server)

Monday, June 9, 2014

HTTP The Definitive Guide (Logging and Usage Tracking)

Logging and Usage Tracking

What to Log?
For the most part, logging is done for two reasons: to look for problems on the server or proxy (e.g., which requests are failing), and to generate statistics about how web sites are accessed.

A few examples of commonly logged fields are:

HTTP method
HTTP version of client and server
URL of the requested resource
HTTP status code of the response
Size of the request and response messages (including any entity bodies)
Timestamp of when the transaction occurred
Referer and User-Agent header values

Log Formats

Common Log Format

Combined Log Format

The Combined Log Format is very similar to the Common Log Format; in fact, it mirrors it exactly, with the addition of two fields

Netscape Extended Log Format

The first seven fields in the Netscape Extended Log Format are identical to those in the Common Log Format (see Table 21-1). Table 21-3 lists, in order, the new fields that the Netscape Extended Log Format introduces.

Netscape Extended 2 Log Format

The Netscape Extended 2Log Format derives from the Netscape Extended Log Format, and its initial fields are identical to those listed in Table 21-3.

Table 21-4 lists, in order, the additional fields of the Netscape Extended 2 Log Format.

Squid Proxy Log Format

Hit Metering

The Hit Metering protocol requires caches to periodically report cache access statistics to

origin servers.

Overview

The Meter Header

A Word on Privacy

Sunday, June 8, 2014

HTTP The Definitive Guide (Redirection and Load Balancing)

Redirection and Load Balancing

In this chapter, we’ll take a look at the following redirection techniques, how they
work, and what their load-balancing capabilities are (if any):

HTTP redirection
DNS redirection
Anycast routing
Policy routing
IP MAC forwarding
IP address forwarding
The Web Cache Coordination Protocol (WCCP)
The Intercache Communication Protocol (ICP)
The Hyper Text Caching Protocol (HTCP)
The Network Element Control Protocol (NECP)
The Cache Array Routing Protocol (CARP)
The Web Proxy Autodiscovery Protocol (WPAD)

Where to Redirect

Servers, proxies, caches, and gateways all appear to clients as servers, in the sense that a client sends them an HTTP request, and they process it. Many redirection techniques work for servers, proxies, caches, and gateways because of their common, server-like traits.

Web servers handle requests on a per-IP basis.

Proxies tend to handle requests on a per-protocol basis.

Overview of Redirection Protocols

The direction that an HTTP message takes on its way through the Internet is affected by the HTTP applications and routing devices it passes from, through, and toward. For example:

The browser application that creates the client’s message could be configured to send it to a proxy server.
DNS resolvers choose the IP address that is used for addressing the message. This IP address can be different for different clients in different geographical locations.
As the message passes through networks, it is divided into addressed packets; switches and routers examine the TCP/IP addressing on the packets and make decisions about routing the packets on that basis.
Web servers can bounce requests back to different web servers with HTTP redirects.

Table 20-1 summarizes the redirection methods used to redirect messages to servers, each of which is discussed later in this chapter.

Table 20-2 summarizes the redirection methods used to redirect messages to proxy servers.

General Redirection Methods

HTTP Redirection

DNS Redirection
DNS allows several IP addresses to be associated to a singledomain, and DNS resolvers can be configured or programmed to return varying IP addresses.

DNS round robin
DNS round robin uses a feature of DNS hostname resolution to balance load across a farm of web servers. It is a pure load-balancing strategy, and it does not take into account any factors about the location of the client relative to the server or the current stress on the server.

Multiple addresses and round-robin address rotation
DNS round robin for load balancing

The impact of DNS caching

Other DNS-based redirection algorithms

Load-balancing algorithms - Some DNS servers keep track of the load on the web servers and place the leastloaded web servers at the front of the list.
Proximity-routing algorithms - DNS servers can attempt to direct users to nearby web servers, when the farm of web servers is geographically dispersed.
Fault-masking algorithms - DNS servers can monitor the health of the network and route requests away from service interruptions or other faults.

Anycast Addressing
In anycast addressing, several geographically dispersed web servers have the exact same IP address and rely on the “shortest-path” routing capabilities of backbone routers to send client requests to the server nearest to the client.

IP MAC Forwarding

Because MAC address forwarding is point-to-point only, the server or proxy has to be located one hop away from the switch.

IP Address Forwarding
In IP address forwarding, a switch or other layer 4–aware device examines TCP/IP addressing on incoming packets and routes packets accordingly by changing the destination IP address, instead of the destination MAC address.

This type of forwarding also is called Network Address Translation (NAT).

Two ways to control the return path of the response are:

Change the source IP address of the packet to the IP address of the switch. - This is called full NAT, where the IP forwarding device translates both destination and source IP addresses.
If the source IP address remains the client’s IP address, make sure (from a hardware perspective) that no routes exist directly from server to client (bypassing the switch). -This sometimes is called half NAT.

Network Element Control Protocol
The Network Element Control Protocol (NECP) allows network elements (NEs)— devices such as routers and switches that forward IP packets—to talk with server elements (SEs)—devices such as web servers and proxy caches that serve application layer requests.

Messages

Proxy Redirection Methods
Explicit Browser Configuration
Proxy Auto-configuration

Web Proxy Autodiscovery Protocol
PAC file autodiscovery
An HTTP client that implements the WPAD protocol:

Uses WPAD to find the PAC file CURL
Fetches the PAC file (a.k.a. configuration file, or CFILE) corresponding to the CURL
Executes the PAC file to determine the proxy server
Sends HTTP requests to the proxy server returned by the PAC file

WPAD algorithm
The current WPAD specification defines the following techniques, in order:

DHCP (Dynamic Host Discovery Protocol)
SLP (Service Location Protocol)
DNS well-known hostnames
DNS SRV records
DNS service URLs in TXT records

Of these five mechanisms, only the DHCP and DNS well-known hostname techniques are required for WPAD clients.

Consider a client with hostname johns-desktop.development.foo.com. This is the
sequence of discovery attempts a complete WPAD client would perform:

DHCP
SLP
DNS A lookup on “QNAME=wpad.development.foo.com”
DNS SRV lookup on “QNAME=wpad.development.foo.com”
DNS TXT lookup on “QNAME=wpad.development.foo.com”
DNS A lookup on “QNAME=wpad.foo.com”
DNS SRV lookup on “QNAME=wpad.foo.com”
DNS TXT lookup on “QNAME=wpad.foo.com”

CURL discovery using DHCP
DNS A record lookup
Retrieving the PAC file
Once a candidate CURL is created, the WPAD client usually makes a GET request to the CURL. When making requests, WPAD clients are required to send Accept headers with appropriate CFILE format information that they are capable of handling.
For example:
Accept: application/x-ns-proxy-autoconfig

When to execute WPAD
The web proxy autodiscovery process is required to occur at least as frequently as one of the following:

Upon startup of the web client—WPAD is performed only for the start of the first instance. Subsequent instances inherit the settings.
Whenever there is an indication from the networking stack that the IP address of the client host has changed.

WPAD spoofing

Timeouts

Administrator considerations
Administrators should configure at least one of the DHCP or DNS A record lookup methods in their environments, as those are the only two that all compatible clients are required to implement.

Cache Redirection Methods
WCCP Redirection
Cisco Systems developed the Web Cache Coordination Protocol (WCCP) to enable routers to redirect web traffic to proxy caches.

How WCCP redirection works

Start with a network containing WCCP-enabled routers and caches that can communicate with one another.

A set of routers and their target caches form a WCCP service group. The configuration of the service group specifies what traffic is sent where, how traffic is sent, and how load should be balanced among the caches in the service group.
If the service group is configured to redirect HTTP traffic, routers in the service group send HTTP requests to caches in the service group.
When an HTTP request arrives at a router in the service group, the router chooses one of the caches in the service group to serve the request (based on either a hash on the request’s IP address or a mask/value set pairing scheme).
The router sends the request packets to the cache, either by encapsulating the packets with the cache’s IP address or by IP MAC forwarding.
If the cache cannot serve the request, the packets are returned to the router for normal forwarding.
The members of the service group exchange heartbeat messages with one another, continually verifying one another’s availability.

WCCP2 messages
Message components
Each WCCP2message consists of a header and components. The WCCP header information contains the message type (Here I Am, I See You, Assignment, or Removal Query), WCCP version, and message length (not including the length of the header).

Service groups

A service group consists of a set of WCCP-enabled routers and caches that exchange WCCP messages.

GRE packet encapsulation
Routers that support WCCP redirect HTTP packets to a particular server by encapsulating them with the server’s IP address. The packet encapsulation also contains an IP header proto field that indicates Generic Router Encapsulation (GRE).

WCCP load balancing

Internet Cache Protocol
The Internet Cache Protocol (ICP) allows caches to look for content hits in sibling caches.

ICP can be thought of as a cache clustering protocol.

Cache Array Routing Protocol
The Cache Array Routing Protocol (CARP) is a standard proposed by Microsoft Corporation and Netscape Communication Corporation to administer a collection of proxy servers such that an array of proxy servers appears to clients as one logical cache.
In contrast, the collection of servers connected using CARP operates as a single, large server with each component server containing only a fraction of the total cached documents.

Hyper Text Caching Protocol
The difference between an ICP and an HTCP transaction is in the level of detail in the requests and responses.

HTCP Authentication

Setting Caching Policies