Search Results

Search found 140 results on 6 pages for 'chaya cooper'.

Page 3/6 | < Previous Page | 1 2 3 4 5 6  | Next Page >

  • PostSharp, Obfuscation, and IL

    - by Simon Cooper
    Aspect-oriented programming (AOP) is a relatively new programming paradigm. Originating at Xerox PARC in 1994, the paradigm was first made available for general-purpose development as an extension to Java in 2001. From there, it has quickly been adapted for use in all the common languages used today. In the .NET world, one of the primary AOP toolkits is PostSharp. Attributes and AOP Normally, attributes in .NET are entirely a metadata construct. Apart from a few special attributes in the .NET framework, they have no effect whatsoever on how a class or method executes within the CLR. Only by using reflection at runtime can you access any attributes declared on a type or type member. PostSharp changes this. By declaring a custom attribute that derives from PostSharp.Aspects.Aspect, applying it to types and type members, and running the resulting assembly through the PostSharp postprocessor, you can essentially declare 'clever' attributes that change the behaviour of whatever the aspect has been applied to at runtime. A simple example of this is logging. By declaring a TraceAttribute that derives from OnMethodBoundaryAspect, you can automatically log when a method has been executed: public class TraceAttribute : PostSharp.Aspects.OnMethodBoundaryAspect { public override void OnEntry(MethodExecutionArgs args) { MethodBase method = args.Method; System.Diagnostics.Trace.WriteLine( String.Format( "Entering {0}.{1}.", method.DeclaringType.FullName, method.Name)); } public override void OnExit(MethodExecutionArgs args) { MethodBase method = args.Method; System.Diagnostics.Trace.WriteLine( String.Format( "Leaving {0}.{1}.", method.DeclaringType.FullName, method.Name)); } } [Trace] public void MethodToLog() { ... } Now, whenever MethodToLog is executed, the aspect will automatically log entry and exit, without having to add the logging code to MethodToLog itself. PostSharp Performance Now this does introduce a performance overhead - as you can see, the aspect allows access to the MethodBase of the method the aspect has been applied to. If you were limited to C#, you would be forced to retrieve each MethodBase instance using Type.GetMethod(), matching on the method name and signature. This is slow. Fortunately, PostSharp is not limited to C#. It can use any instruction available in IL. And in IL, you can do some very neat things. Ldtoken C# allows you to get the Type object corresponding to a specific type name using the typeof operator: Type t = typeof(Random); The C# compiler compiles this operator to the following IL: ldtoken [mscorlib]System.Random call class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle( valuetype [mscorlib]System.RuntimeTypeHandle) The ldtoken instruction obtains a special handle to a type called a RuntimeTypeHandle, and from that, the Type object can be obtained using GetTypeFromHandle. These are both relatively fast operations - no string lookup is required, only direct assembly and CLR constructs are used. However, a little-known feature is that ldtoken is not just limited to types; it can also get information on methods and fields, encapsulated in a RuntimeMethodHandle or RuntimeFieldHandle: // get a MethodBase for String.EndsWith(string) ldtoken method instance bool [mscorlib]System.String::EndsWith(string) call class [mscorlib]System.Reflection.MethodBase [mscorlib]System.Reflection.MethodBase::GetMethodFromHandle( valuetype [mscorlib]System.RuntimeMethodHandle) // get a FieldInfo for the String.Empty field ldtoken field string [mscorlib]System.String::Empty call class [mscorlib]System.Reflection.FieldInfo [mscorlib]System.Reflection.FieldInfo::GetFieldFromHandle( valuetype [mscorlib]System.RuntimeFieldHandle) These usages of ldtoken aren't usable from C# or VB, and aren't likely to be added anytime soon (Eric Lippert's done a blog post on the possibility of adding infoof, methodof or fieldof operators to C#). However, PostSharp deals directly with IL, and so can use ldtoken to get MethodBase objects quickly and cheaply, without having to resort to string lookups. The kicker However, there are problems. Because ldtoken for methods or fields isn't accessible from C# or VB, it hasn't been as well-tested as ldtoken for types. This has resulted in various obscure bugs in most versions of the CLR when dealing with ldtoken and methods, and specifically, generic methods and methods of generic types. This means that PostSharp was behaving incorrectly, or just plain crashing, when aspects were applied to methods that were generic in some way. So, PostSharp has to work around this. Without using the metadata tokens directly, the only way to get the MethodBase of generic methods is to use reflection: Type.GetMethod(), passing in the method name as a string along with information on the signature. Now, this works fine. It's slower than using ldtoken directly, but it works, and this only has to be done for generic methods. Unfortunately, this poses problems when the assembly is obfuscated. PostSharp and Obfuscation When using ldtoken, obfuscators don't affect how PostSharp operates. Because the ldtoken instruction directly references the type, method or field within the assembly, it is unaffected if the name of the object is changed by an obfuscator. However, the indirect loading used for generic methods was breaking, because that uses the name of the method when the assembly is put through the PostSharp postprocessor to lookup the MethodBase at runtime. If the name then changes, PostSharp can't find it anymore, and the assembly breaks. So, PostSharp needs to know about any changes an obfuscator does to an assembly. The way PostSharp does this is by adding another layer of indirection. When PostSharp obfuscation support is enabled, it includes an extra 'name table' resource in the assembly, consisting of a series of method & type names. When PostSharp needs to lookup a method using reflection, instead of encoding the method name directly, it looks up the method name at a fixed offset inside that name table: MethodBase genericMethod = typeof(ContainingClass).GetMethod(GetNameAtIndex(22)); PostSharp.NameTable resource: ... 20: get_Prop1 21: set_Prop1 22: DoFoo 23: GetWibble When the assembly is later processed by an obfuscator, the obfuscator can replace all the method and type names within the name table with their new name. That way, the reflection lookups performed by PostSharp will now use the new names, and everything will work as expected: MethodBase genericMethod = typeof(#kGy).GetMethod(GetNameAtIndex(22)); PostSharp.NameTable resource: ... 20: #kkA 21: #zAb 22: #EF5a 23: #2tg As you can see, this requires direct support by an obfuscator in order to perform these rewrites. Dotfuscator supports it, and now, starting with SmartAssembly 6.6.4, SmartAssembly does too. So, a relatively simple solution to a tricky problem, with some CLR bugs thrown in for good measure. You don't see those every day!

    Read the article

  • .NET Security Part 4

    - by Simon Cooper
    Finally, in this series, I am going to cover some of the security issues that can trip you up when using sandboxed appdomains. DISCLAIMER: I am not a security expert, and this is by no means an exhaustive list. If you actually are writing security-critical code, then get a proper security audit of your code by a professional. The examples below are just illustrations of the sort of things that can go wrong. 1. AppDomainSetup.ApplicationBase The most obvious one is the issue covered in the MSDN documentation on creating a sandbox, in step 3 – the sandboxed appdomain has the same ApplicationBase as the controlling appdomain. So let’s explore what happens when they are the same, and an exception is thrown. In the sandboxed assembly, Sandboxed.dll (IPlugin is an interface in a partially-trusted assembly, with a single MethodToDoThings on it): public class UntrustedPlugin : MarshalByRefObject, IPlugin { // implements IPlugin.MethodToDoThings() public void MethodToDoThings() { throw new EvilException(); } } [Serializable] internal class EvilException : Exception { public override string ToString() { // show we have read access to C:\Windows // read the first 5 directories Console.WriteLine("Pwned! Mwuahahah!"); foreach (var d in Directory.EnumerateDirectories(@"C:\Windows").Take(5)) { Console.WriteLine(d.FullName); } return base.ToString(); } } And in the controlling assembly: // what can possibly go wrong? AppDomainSetup appDomainSetup = new AppDomainSetup { ApplicationBase = AppDomain.CurrentDomain.SetupInformation.ApplicationBase } // only grant permissions to execute // and to read the application base, nothing else PermissionSet restrictedPerms = new PermissionSet(PermissionState.None); restrictedPerms.AddPermission( new SecurityPermission(SecurityPermissionFlag.Execution)); restrictedPerms.AddPermission( new FileIOPermission(FileIOPermissionAccess.Read, appDomainSetup.ApplicationBase); restrictedPerms.AddPermission( new FileIOPermission(FileIOPermissionAccess.pathDiscovery, appDomainSetup.ApplicationBase); // create the sandbox AppDomain sandbox = AppDomain.CreateDomain("Sandbox", null, appDomainSetup, restrictedPerms); // execute UntrustedPlugin in the sandbox // don't crash the application if the sandbox throws an exception IPlugin o = (IPlugin)sandbox.CreateInstanceFromAndUnwrap("Sandboxed.dll", "UntrustedPlugin"); try { o.MethodToDoThings() } catch (Exception e) { Console.WriteLine(e.ToString()); } And the result? Oops. We’ve allowed a class that should be sandboxed to execute code with fully-trusted permissions! How did this happen? Well, the key is the exact meaning of the ApplicationBase property: The application base directory is where the assembly manager begins probing for assemblies. When EvilException is thrown, it propagates from the sandboxed appdomain into the controlling assembly’s appdomain (as it’s marked as Serializable). When the exception is deserialized, the CLR finds and loads the sandboxed dll into the fully-trusted appdomain. Since the controlling appdomain’s ApplicationBase directory contains the sandboxed assembly, the CLR finds and loads the assembly into a full-trust appdomain, and the evil code is executed. So the problem isn’t exactly that the sandboxed appdomain’s ApplicationBase is the same as the controlling appdomain’s, it’s that the sandboxed dll was in such a place that the controlling appdomain could find it as part of the standard assembly resolution mechanism. The sandbox then forced the assembly to load in the controlling appdomain by throwing a serializable exception that propagated outside the sandbox. The easiest fix for this is to keep the sandbox ApplicationBase well away from the ApplicationBase of the controlling appdomain, and don’t allow the sandbox permissions to access the controlling appdomain’s ApplicationBase directory. If you do this, then the sandboxed assembly can’t be accidentally loaded into the fully-trusted appdomain, and the code can’t be executed. If the plugin does try to induce the controlling appdomain to load an assembly it shouldn’t, a SerializationException will be thrown when it tries to load the assembly to deserialize the exception, and no damage will be done. 2. Loading the sandboxed dll into the application appdomain As an extension of the previous point, you shouldn’t directly reference types or methods in the sandboxed dll from your application code. That loads the assembly into the fully-trusted appdomain, and from there code in the assembly could be executed. Instead, pull out methods you want the sandboxed dll to have into an interface or class in a partially-trusted assembly you control, and execute methods via that instead (similar to the example above with the IPlugin interface). If you need to have a look at the assembly before executing it in the sandbox, either examine the assembly using reflection from within the sandbox, or load the assembly into the Reflection-only context in the application’s appdomain. The code in assemblies in the reflection-only context can’t be executed, it can only be reflected upon, thus protecting your appdomain from malicious code. 3. Incorrectly asserting permissions You should only assert permissions when you are absolutely sure they’re safe. For example, this method allows a caller read-access to any file they call this method with, including your documents, any network shares, the C:\Windows directory, etc: [SecuritySafeCritical] public static string GetFileText(string filePath) { new FileIOPermission(FileIOPermissionAccess.Read, filePath).Assert(); return File.ReadAllText(filePath); } Be careful when asserting permissions, and ensure you’re not providing a loophole sandboxed dlls can use to gain access to things they shouldn’t be able to. Conclusion Hopefully, that’s given you an idea of some of the ways it’s possible to get past the .NET security system. As I said before, this post is not exhaustive, and you certainly shouldn’t base any security-critical applications on the contents of this blog post. What this series should help with is understanding the possibilities of the security system, and what all the security attributes and classes mean and what they are used for, if you were to use the security system in the future.

    Read the article

  • Why unhandled exceptions are useful

    - by Simon Cooper
    It’s the bane of most programmers’ lives – an unhandled exception causes your application or webapp to crash, an ugly dialog gets displayed to the user, and they come complaining to you. Then, somehow, you need to figure out what went wrong. Hopefully, you’ve got a log file, or some other way of reporting unhandled exceptions (obligatory employer plug: SmartAssembly reports an application’s unhandled exceptions straight to you, along with the entire state of the stack and variables at that point). If not, you have to try and replicate it yourself, or do some psychic debugging to try and figure out what’s wrong. However, it’s good that the program crashed. Or, more precisely, it is correct behaviour. An unhandled exception in your application means that, somewhere in your code, there is an assumption that you made that is actually invalid. Coding assumptions Let me explain a bit more. Every method, every line of code you write, depends on implicit assumptions that you have made. Take this following simple method, that copies a collection to an array and includes an item if it isn’t in the collection already, using a supplied IEqualityComparer: public static T[] ToArrayWithItem( ICollection<T> coll, T obj, IEqualityComparer<T> comparer) { // check if the object is in collection already // using the supplied comparer foreach (var item in coll) { if (comparer.Equals(item, obj)) { // it's in the collection already // simply copy the collection to an array // and return it T[] array = new T[coll.Count]; coll.CopyTo(array, 0); return array; } } // not in the collection // copy coll to an array, and add obj to it // then return it T[] array = new T[coll.Count+1]; coll.CopyTo(array, 0); array[array.Length-1] = obj; return array; } What’s all the assumptions made by this fairly simple bit of code? coll is never null comparer is never null coll.CopyTo(array, 0) will copy all the items in the collection into the array, in the order defined for the collection, starting at the first item in the array. The enumerator for coll returns all the items in the collection, in the order defined for the collection comparer.Equals returns true if the items are equal (for whatever definition of ‘equal’ the comparer uses), false otherwise comparer.Equals, coll.CopyTo, and the coll enumerator will never throw an exception or hang for any possible input and any possible values of T coll will have less than 4 billion items in it (this is a built-in limit of the CLR) array won’t be more than 2GB, both on 32 and 64-bit systems, for any possible values of T (again, a limit of the CLR) There are no threads that will modify coll while this method is running and, more esoterically: The C# compiler will compile this code to IL according to the C# specification The CLR and JIT compiler will produce machine code to execute the IL on the user’s computer The computer will execute the machine code correctly That’s a lot of assumptions. Now, it could be that all these assumptions are valid for the situations this method is called. But if this does crash out with an exception, or crash later on, then that shows one of the assumptions has been invalidated somehow. An unhandled exception shows that your code is running in a situation which you did not anticipate, and there is something about how your code runs that you do not understand. Debugging the problem is the process of learning more about the new situation and how your code interacts with it. When you understand the problem, the solution is (usually) obvious. The solution may be a one-line fix, the rewrite of a method or class, or a large-scale refactoring of the codebase, but whatever it is, the fix for the crash will incorporate the new information you’ve gained about your own code, along with the modified assumptions. When code is running with an assumption or invariant it depended on broken, then the result is ‘undefined behaviour’. Anything can happen, up to and including formatting the entire disk or making the user’s computer sentient and start doing a good impression of Skynet. You might think that those can’t happen, but at Halting problem levels of generality, as soon as an assumption the code depended on is broken, the program can do anything. That is why it’s important to fail-fast and stop the program as soon as an invariant is broken, to minimise the damage that is done. What does this mean in practice? To start with, document and check your assumptions. As with most things, there is a level of judgement required. How you check and document your assumptions depends on how the code is used (that’s some more assumptions you’ve made), how likely it is a method will be passed invalid arguments or called in an invalid state, how likely it is the assumptions will be broken, how expensive it is to check the assumptions, and how bad things are likely to get if the assumptions are broken. Now, some assumptions you can assume unless proven otherwise. You can safely assume the C# compiler, CLR, and computer all run the method correctly, unless you have evidence of a compiler, CLR or processor bug. You can also assume that interface implementations work the way you expect them to; implementing an interface is more than simply declaring methods with certain signatures in your type. The behaviour of those methods, and how they work, is part of the interface contract as well. For example, for members of a public API, it is very important to document your assumptions and check your state before running the bulk of the method, throwing ArgumentException, ArgumentNullException, InvalidOperationException, or another exception type as appropriate if the input or state is wrong. For internal and private methods, it is less important. If a private method expects collection items in a certain order, then you don’t necessarily need to explicitly check it in code, but you can add comments or documentation specifying what state you expect the collection to be in at a certain point. That way, anyone debugging your code can immediately see what’s wrong if this does ever become an issue. You can also use DEBUG preprocessor blocks and Debug.Assert to document and check your assumptions without incurring a performance hit in release builds. On my coding soapbox… A few pet peeves of mine around assumptions. Firstly, catch-all try blocks: try { ... } catch { } A catch-all hides exceptions generated by broken assumptions, and lets the program carry on in an unknown state. Later, an exception is likely to be generated due to further broken assumptions due to the unknown state, causing difficulties when debugging as the catch-all has hidden the original problem. It’s much better to let the program crash straight away, so you know where the problem is. You should only use a catch-all if you are sure that any exception generated in the try block is safe to ignore. That’s a pretty big ask! Secondly, using as when you should be casting. Doing this: (obj as IFoo).Method(); or this: IFoo foo = obj as IFoo; ... foo.Method(); when you should be doing this: ((IFoo)obj).Method(); or this: IFoo foo = (IFoo)obj; ... foo.Method(); There’s an assumption here that obj will always implement IFoo. If it doesn’t, then by using as instead of a cast you’ve turned an obvious InvalidCastException at the point of the cast that will probably tell you what type obj actually is, into a non-obvious NullReferenceException at some later point that gives you no information at all. If you believe obj is always an IFoo, then say so in code! Let it fail-fast if not, then it’s far easier to figure out what’s wrong. Thirdly, document your assumptions. If an algorithm depends on a non-trivial relationship between several objects or variables, then say so. A single-line comment will do. Don’t leave it up to whoever’s debugging your code after you to figure it out. Conclusion It’s better to crash out and fail-fast when an assumption is broken. If it doesn’t, then there’s likely to be further crashes along the way that hide the original problem. Or, even worse, your program will be running in an undefined state, where anything can happen. Unhandled exceptions aren’t good per-se, but they give you some very useful information about your code that you didn’t know before. And that can only be a good thing.

    Read the article

  • Developing Schema Compare for Oracle (Part 5): Query Snapshots

    - by Simon Cooper
    If you've emailed us about a bug you've encountered with the EAP or beta versions of Schema Compare for Oracle, we probably asked you to send us a query snapshot of your databases. Here, I explain what a query snapshot is, and how it helps us fix your bug. Problem 1: Debugging users' bug reports When we started the Schema Compare project, we knew we were going to get problems with users' databases - configurations we hadn't considered, features that weren't installed, unicode issues, wierd dependencies... With SQL Compare, users are generally happy to send us a database backup that we can restore using a single RESTORE DATABASE command on our test servers and immediately reproduce the problem. Oracle, on the other hand, would be a lot more tricky. As Oracle generally has a 1-to-1 mapping between instances and databases, any databases users sent would have to be restored to their own instance. Furthermore, the number of steps required to get a properly working database, and the size of most oracle databases, made it infeasible to ask every customer who came across a bug during our beta program to send us their databases. We also knew that there would be lots of issues with data security that would make it hard to get backups. So we needed an easier way to be able to debug customers issues and sort out what strange schema data Oracle was returning. Problem 2: Test execution time Another issue we knew we would have to solve was the execution time of the tests we would produce for the Schema Compare engine. Our initial prototype showed that querying the data dictionary for schema information was going to be slow (at least 15 seconds per database), and this is generally proportional to the size of the database. If you're running thousands of tests on the same databases, each one registering separate schemas, not only would the tests would take hours and hours to run, but the test servers would be hammered senseless. The solution To solve these, we needed to be able to populate the schema of a database without actually connecting to it. Well, the IDataReader interface is the primary way we read data from an Oracle server. The data dictionary queries we use return their data in terms of simple strings and numbers, which we then process and reconstruct into an object model, and the results of these queries are identical for identical schemas. So, we can record the raw results of the queries once, and then replay these results to construct the same object model as many times as required without needing to actually connect to the original database. This is what query snapshots do. They are binary files containing the raw unprocessed data we get back from the oracle server for all the queries we run on the data dictionary to get schema information. The core of the query snapshot generation takes the results of the IDataReader we get from running queries on Oracle, and passes the row data to a BinaryWriter that writes it straight to a file. The query snapshot can then be replayed to create the same object model; when the results of a specific query is needed by the population code, we can simply read the binary data stored in the file on disk and present it through an IDataReader wrapper. This is far faster than querying the server over the network, and allows us to run tests in a reasonable time. They also allow us to easily debug a customers problem; using a simple snapshot generation program, users can generate a query snapshot that could be sent along with a bug report that we can immediately replay on our machines to let us debug the issue, rather than having to obtain database backups and restore databases to test systems. There are also far fewer problems with data security; query snapshots only contain schema information, which is generally less sensitive than table data. Query snapshots implementation However, actually implementing such a feature did have a couple of 'gotchas' to it. My second blog post detailed the development of the dependencies algorithm we use to ensure we get all the dependencies in the database, and that algorithm uses data from both databases to find all the needed objects - what database you're comparing to affects what objects get populated from both databases. We get information on these additional objects using an appropriate WHERE clause on all the population queries. So, in order to accurately replay the results of querying the live database, the query snapshot needs to be a snapshot of a comparison of two databases, not just populating a single database. Furthermore, although the code population queries (eg querying all_tab_cols to get column information) can simply be passed straight from the IDataReader to the BinaryWriter, we need to hook into and run the live dependencies algorithm while we're creating the snapshot to ensure we get the same WHERE clauses, and the same query results, as if we were populating straight from a live system. We also need to store the results of the dependencies queries themselves, as the resulting dependency graph is stored within the OracleDatabase object that is produced, and is later used to help order actions in synchronization scripts. This is significantly helped by the dependencies algorithm being a deterministic algorithm - given the same input, it will always return the same output. Therefore, when we're replaying a query snapshot, and processing dependency information, we simply have to return the results of the queries in the order we got them from the live database, rather than trying to calculate the contents of all_dependencies on the fly. Query snapshots are a significant feature in Schema Compare that really helps us to debug problems with the tool, as well as making our testers happier. Although not really user-visible, they are very useful to the development team to help us fix bugs in the product much faster than we otherwise would be able to.

    Read the article

  • Subterranean IL: Exception handling 1

    - by Simon Cooper
    Today, I'll be starting a look at the Structured Exception Handling mechanism within the CLR. Exception handling is quite a complicated business, and, as a result, the rules governing exception handling clauses in IL are quite strict; you need to be careful when writing exception clauses in IL. Exception handlers Exception handlers are specified using a .try clause within a method definition. .try <TryStartLabel> to <TryEndLabel> <HandlerType> handler <HandlerStartLabel> to <HandlerEndLabel> As an example, a basic try/catch block would be specified like so: TryBlockStart: // ... leave.s CatchBlockEndTryBlockEnd:CatchBlockStart: // at the start of a catch block, the exception thrown is on the stack callvirt instance string [mscorlib]System.Object::ToString() call void [mscorlib]System.Console::WriteLine(string) leave.s CatchBlockEnd CatchBlockEnd: // method code continues... .try TryBlockStart to TryBlockEnd catch [mscorlib]System.Exception handler CatchBlockStart to CatchBlockEnd There are four different types of handler that can be specified: catch <TypeToken> This is the standard exception catch clause; you specify the object type that you want to catch (for example, [mscorlib]System.ArgumentException). Any object can be thrown as an exception, although Microsoft recommend that only classes derived from System.Exception are thrown as exceptions. filter <FilterLabel> A filter block allows you to provide custom logic to determine if a handler block should be run. This functionality is exposed in VB, but not in C#. finally A finally block executes when the try block exits, regardless of whether an exception was thrown or not. fault This is similar to a finally block, but a fault block executes only if an exception was thrown. This is not exposed in VB or C#. You can specify multiple catch or filter handling blocks in each .try, but fault and finally handlers must have their own .try clause. We'll look into why this is in later posts. Scoped exception handlers The .try syntax is quite tricky to use; it requires multiple labels, and you've got to be careful to keep separate the different exception handling sections. However, starting from .NET 2, IL allows you to use scope blocks to specify exception handlers instead. Using this syntax, the example above can be written like so: .try { // ... leave.s EndSEH}catch [mscorlib]System.Exception { callvirt instance string [mscorlib]System.Object::ToString() call void [mscorlib]System.Console::WriteLine(string) leave.s EndSEH}EndSEH:// method code continues... As you can see, this is much easier to write (and read!) than a stand-alone .try clause. Next time, I'll be looking at some of the restrictions imposed by SEH on control flow, and how the C# compiler generated exception handling clauses.

    Read the article

  • Subterranean IL: The ThreadLocal type

    - by Simon Cooper
    I came across ThreadLocal<T> while I was researching ConcurrentBag. To look at it, it doesn't really make much sense. What's all those extra Cn classes doing in there? Why is there a GenericHolder<T,U,V,W> class? What's going on? However, digging deeper, it's a rather ingenious solution to a tricky problem. Thread statics Declaring that a variable is thread static, that is, values assigned and read from the field is specific to the thread doing the reading, is quite easy in .NET: [ThreadStatic] private static string s_ThreadStaticField; ThreadStaticAttribute is not a pseudo-custom attribute; it is compiled as a normal attribute, but the CLR has in-built magic, activated by that attribute, to redirect accesses to the field based on the executing thread's identity. TheadStaticAttribute provides a simple solution when you want to use a single field as thread-static. What if you want to create an arbitary number of thread static variables at runtime? Thread-static fields can only be declared, and are fixed, at compile time. Prior to .NET 4, you only had one solution - thread local data slots. This is a lesser-known function of Thread that has existed since .NET 1.1: LocalDataStoreSlot threadSlot = Thread.AllocateNamedDataSlot("slot1"); string value = "foo"; Thread.SetData(threadSlot, value); string gettedValue = (string)Thread.GetData(threadSlot); Each instance of LocalStoreDataSlot mediates access to a single slot, and each slot acts like a separate thread-static field. As you can see, using thread data slots is quite cumbersome. You need to keep track of LocalDataStoreSlot objects, it's not obvious how instances of LocalDataStoreSlot correspond to individual thread-static variables, and it's not type safe. It's also relatively slow and complicated; the internal implementation consists of a whole series of classes hanging off a single thread-static field in Thread itself, using various arrays, lists, and locks for synchronization. ThreadLocal<T> is far simpler and easier to use. ThreadLocal ThreadLocal provides an abstraction around thread-static fields that allows it to be used just like any other class; it can be used as a replacement for a thread-static field, it can be used in a List<ThreadLocal<T>>, you can create as many as you need at runtime. So what does it do? It can't just have an instance-specific thread-static field, because thread-static fields have to be declared as static, and so shared between all instances of the declaring type. There's something else going on here. The values stored in instances of ThreadLocal<T> are stored in instantiations of the GenericHolder<T,U,V,W> class, which contains a single ThreadStatic field (s_value) to store the actual value. This class is then instantiated with various combinations of the Cn types for generic arguments. In .NET, each separate instantiation of a generic type has its own static state. For example, GenericHolder<int,C0,C1,C2> has a completely separate s_value field to GenericHolder<int,C1,C14,C1>. This feature is (ab)used by ThreadLocal to emulate instance thread-static fields. Every time an instance of ThreadLocal is constructed, it is assigned a unique number from the static s_currentTypeId field using Interlocked.Increment, in the FindNextTypeIndex method. The hexadecimal representation of that number then defines the specific Cn types that instantiates the GenericHolder class. That instantiation is therefore 'owned' by that instance of ThreadLocal. This gives each instance of ThreadLocal its own ThreadStatic field through a specific unique instantiation of the GenericHolder class. Although GenericHolder has four type variables, the first one is always instantiated to the type stored in the ThreadLocal<T>. This gives three free type variables, each of which can be instantiated to one of 16 types (C0 to C15). This puts an upper limit of 4096 (163) on the number of ThreadLocal<T> instances that can be created for each value of T. That is, there can be a maximum of 4096 instances of ThreadLocal<string>, and separately a maximum of 4096 instances of ThreadLocal<object>, etc. However, there is an upper limit of 16384 enforced on the total number of ThreadLocal instances in the AppDomain. This is to stop too much memory being used by thousands of instantiations of GenericHolder<T,U,V,W>, as once a type is loaded into an AppDomain it cannot be unloaded, and will continue to sit there taking up memory until the AppDomain is unloaded. The total number of ThreadLocal instances created is tracked by the ThreadLocalGlobalCounter class. So what happens when either limit is reached? Firstly, to try and stop this limit being reached, it recycles GenericHolder type indexes of ThreadLocal instances that get disposed using the s_availableIndices concurrent stack. This allows GenericHolder instantiations of disposed ThreadLocal instances to be re-used. But if there aren't any available instantiations, then ThreadLocal falls back on a standard thread local slot using TLSHolder. This makes it very important to dispose of your ThreadLocal instances if you'll be using lots of them, so the type instantiations can be recycled. The previous way of creating arbitary thread-static variables, thread data slots, was slow, clunky, and hard to use. In comparison, ThreadLocal can be used just like any other type, and each instance appears from the outside to be a non-static thread-static variable. It does this by using the CLR type system to assign each instance of ThreadLocal its own instantiated type containing a thread-static field, and so delegating a lot of the bookkeeping that thread data slots had to do to the CLR type system itself! That's a very clever use of the CLR type system.

    Read the article

  • Inside the DLR – Invoking methods

    - by Simon Cooper
    So, we’ve looked at how a dynamic call is represented in a compiled assembly, and how the dynamic lookup is performed at runtime. The last piece of the puzzle is how the resolved method gets invoked, and that is the subject of this post. Invoking methods As discussed in my previous posts, doing a full lookup and bind at runtime each and every single time the callsite gets invoked would be far too slow to be usable. The results obtained from the callsite binder must to be cached, along with a series of conditions to determine whether the cached result can be reused. So, firstly, how are the conditions represented? These conditions can be anything; they are determined entirely by the semantics of the language the binder is representing. The binder has to be able to return arbitary code that is then executed to determine whether the conditions apply or not. Fortunately, .NET 4 has a neat way of representing arbitary code that can be easily combined with other code – expression trees. All the callsite binder has to return is an expression (called a ‘restriction’) that evaluates to a boolean, returning true when the restriction passes (indicating the corresponding method invocation can be used) and false when it does’t. If the bind result is also represented in an expression tree, these can be combined easily like so: if ([restriction is true]) { [invoke cached method] } Take my example from my previous post: public class ClassA { public static void TestDynamic() { CallDynamic(new ClassA(), 10); CallDynamic(new ClassA(), "foo"); } public static void CallDynamic(dynamic d, object o) { d.Method(o); } public void Method(int i) {} public void Method(string s) {} } When the Method(int) method is first bound, along with an expression representing the result of the bind lookup, the C# binder will return the restrictions under which that bind can be reused. In this case, it can be reused if the types of the parameters are the same: if (thisArg.GetType() == typeof(ClassA) && arg1.GetType() == typeof(int)) { thisClassA.Method(i); } Caching callsite results So, now, it’s up to the callsite to link these expressions returned from the binder together in such a way that it can determine which one from the many it has cached it should use. This caching logic is all located in the System.Dynamic.UpdateDelegates class. It’ll help if you’ve got this type open in a decompiler to have a look yourself. For each callsite, there are 3 layers of caching involved: The last method invoked on the callsite. All methods that have ever been invoked on the callsite. All methods that have ever been invoked on any callsite of the same type. We’ll cover each of these layers in order Level 1 cache: the last method called on the callsite When a CallSite<T> object is first instantiated, the Target delegate field (containing the delegate that is called when the callsite is invoked) is set to one of the UpdateAndExecute generic methods in UpdateDelegates, corresponding to the number of parameters to the callsite, and the existance of any return value. These methods contain most of the caching, invoke, and binding logic for the callsite. The first time this method is invoked, the UpdateAndExecute method finds there aren’t any entries in the caches to reuse, and invokes the binder to resolve a new method. Once the callsite has the result from the binder, along with any restrictions, it stitches some extra expressions in, and replaces the Target field in the callsite with a compiled expression tree similar to this (in this example I’m assuming there’s no return value): if ([restriction is true]) { [invoke cached method] return; } if (callSite._match) { _match = false; return; } else { UpdateAndExecute(callSite, arg0, arg1, ...); } Woah. What’s going on here? Well, this resulting expression tree is actually the first level of caching. The Target field in the callsite, which contains the delegate to call when the callsite is invoked, is set to the above code compiled from the expression tree into IL, and then into native code by the JIT. This code checks whether the restrictions of the last method that was invoked on the callsite (the ‘primary’ method) match, and if so, executes that method straight away. This means that, the next time the callsite is invoked, the first code that executes is the restriction check, executing as native code! This makes this restriction check on the primary cached delegate very fast. But what if the restrictions don’t match? In that case, the second part of the stitched expression tree is executed. What this section should be doing is calling back into the UpdateAndExecute method again to resolve a new method. But it’s slightly more complicated than that. To understand why, we need to understand the second and third level caches. Level 2 cache: all methods that have ever been invoked on the callsite When a binder has returned the result of a lookup, as well as updating the Target field with a compiled expression tree, stitched together as above, the callsite puts the same compiled expression tree in an internal list of delegates, called the rules list. This list acts as the level 2 cache. Why use the same delegate? Stitching together expression trees is an expensive operation. You don’t want to do it every time the callsite is invoked. Ideally, you would create one expression tree from the binder’s result, compile it, and then use the resulting delegate everywhere in the callsite. But, if the same delegate is used to invoke the callsite in the first place, and in the caches, that means each delegate needs two modes of operation. An ‘invoke’ mode, for when the delegate is set as the value of the Target field, and a ‘match’ mode, used when UpdateAndExecute is searching for a method in the callsite’s cache. Only in the invoke mode would the delegate call back into UpdateAndExecute. In match mode, it would simply return without doing anything. This mode is controlled by the _match field in CallSite<T>. The first time the callsite is invoked, _match is false, and so the Target delegate is called in invoke mode. Then, if the initial restriction check fails, the Target delegate calls back into UpdateAndExecute. This method sets _match to true, then calls all the cached delegates in the rules list in match mode to try and find one that passes its restrictions, and invokes it. However, there needs to be some way for each cached delegate to inform UpdateAndExecute whether it passed its restrictions or not. To do this, as you can see above, it simply re-uses _match, and sets it to false if it did not pass the restrictions. This allows the code within each UpdateAndExecute method to check for cache matches like so: foreach (T cachedDelegate in Rules) { callSite._match = true; cachedDelegate(); // sets _match to false if restrictions do not pass if (callSite._match) { // passed restrictions, and the cached method was invoked // set this delegate as the primary target to invoke next time callSite.Target = cachedDelegate; return; } // no luck, try the next one... } Level 3 cache: all methods that have ever been invoked on any callsite with the same signature The reason for this cache should be clear – if a method has been invoked through a callsite in one place, then it is likely to be invoked on other callsites in the codebase with the same signature. Rather than living in the callsite, the ‘global’ cache for callsite delegates lives in the CallSiteBinder class, in the Cache field. This is a dictionary, typed on the callsite delegate signature, providing a RuleCache<T> instance for each delegate signature. This is accessed in the same way as the level 2 callsite cache, by the UpdateAndExecute methods. When a method is matched in the global cache, it is copied into the callsite and Target cache before being executed. Putting it all together So, how does this all fit together? Like so (I’ve omitted some implementation & performance details): That, in essence, is how the DLR performs its dynamic calls nearly as fast as statically compiled IL code. Extensive use of expression trees, compiled to IL and then into native code. Multiple levels of caching, the first of which executes immediately when the dynamic callsite is invoked. And a clever re-use of compiled expression trees that can be used in completely different contexts without being recompiled. All in all, a very fast and very clever reflection caching mechanism.

    Read the article

  • Why enumerator structs are a really bad idea (redux)

    - by Simon Cooper
    My previous blog post went into some detail as to why calling MoveNext on a BCL generic collection enumerator didn't quite do what you thought it would. This post covers the Reset method. To recap, here's the simple wrapper around a linked list enumerator struct from my previous post (minus the readonly on the enumerator variable): sealed class EnumeratorWrapper : IEnumerator<int> { private LinkedList<int>.Enumerator m_Enumerator; public EnumeratorWrapper(LinkedList<int> linkedList) { m_Enumerator = linkedList.GetEnumerator(); } public int Current { get { return m_Enumerator.Current; } } object System.Collections.IEnumerator.Current { get { return Current; } } public bool MoveNext() { return m_Enumerator.MoveNext(); } public void Reset() { ((System.Collections.IEnumerator)m_Enumerator).Reset(); } public void Dispose() { m_Enumerator.Dispose(); } } If you have a look at the Reset method, you'll notice I'm having to cast to IEnumerator to be able to call Reset on m_Enumerator. This is because the implementation of LinkedList<int>.Enumerator.Reset, and indeed of all the other Reset methods on the BCL generic collection enumerators, is an explicit interface implementation. However, IEnumerator is a reference type. LinkedList<int>.Enumerator is a value type. That means, in order to call the reset method at all, the enumerator has to be boxed. And the IL confirms this: .method public hidebysig newslot virtual final instance void Reset() cil managed { .maxstack 8 L_0000: nop L_0001: ldarg.0 L_0002: ldfld valuetype [System]System.Collections.Generic.LinkedList`1/Enumerator<int32> EnumeratorWrapper::m_Enumerator L_0007: box [System]System.Collections.Generic.LinkedList`1/Enumerator<int32> L_000c: callvirt instance void [mscorlib]System.Collections.IEnumerator::Reset() L_0011: nop L_0012: ret } On line 0007, we're doing a box operation, which copies the enumerator to a reference object on the heap, then on line 000c calling Reset on this boxed object. So m_Enumerator in the wrapper class is not modified by the call the Reset. And this is the only way to call the Reset method on this variable (without using reflection). Therefore, the only way that the collection enumerator struct can be used safely is to store them as a boxed IEnumerator<T>, and not use them as value types at all.

    Read the article

  • Oh no! My padding's invalid!

    - by Simon Cooper
    Recently, I've been doing some work involving cryptography, and encountered the standard .NET CryptographicException: 'Padding is invalid and cannot be removed.' Searching on StackOverflow produces 57 questions concerning this exception; it's a very common problem encountered. So I decided to have a closer look. To test this, I created a simple project that decrypts and encrypts a byte array: // create some random data byte[] data = new byte[100]; new Random().NextBytes(data); // use the Rijndael symmetric algorithm RijndaelManaged rij = new RijndaelManaged(); byte[] encrypted; // encrypt the data using a CryptoStream using (var encryptor = rij.CreateEncryptor()) using (MemoryStream encryptedStream = new MemoryStream()) using (CryptoStream crypto = new CryptoStream( encryptedStream, encryptor, CryptoStreamMode.Write)) { crypto.Write(data, 0, data.Length); encrypted = encryptedStream.ToArray(); } byte[] decrypted; // and decrypt it again using (var decryptor = rij.CreateDecryptor()) using (CryptoStream crypto = new CryptoStream( new MemoryStream(encrypted), decryptor, CryptoStreamMode.Read)) { byte[] decrypted = new byte[data.Length]; crypto.Read(decrypted, 0, decrypted.Length); } Sure enough, I got exactly the same CryptographicException when trying to decrypt the data even in this simple example. Well, I'm obviously missing something, if I can't even get this single method right! What does the exception message actually mean? What am I missing? Well, after playing around a bit, I discovered the problem was fixed by changing the encryption step to this: // encrypt the data using a CryptoStream using (var encryptor = rij.CreateEncryptor()) using (MemoryStream encryptedStream = new MemoryStream()) { using (CryptoStream crypto = new CryptoStream( encryptedStream, encryptor, CryptoStreamMode.Write)) { crypto.Write(data, 0, data.Length); } encrypted = encryptedStream.ToArray(); } Aaaah, so that's what the problem was. The CryptoStream wasn't flushing all it's data to the MemoryStream before it was being read, and closing the stream causes it to flush everything to the backing stream. But why does this cause an error in padding? Cryptographic padding All symmetric encryption algorithms (of which Rijndael is one) operates on fixed block sizes. For Rijndael, the default block size is 16 bytes. This means the input needs to be a multiple of 16 bytes long. If it isn't, then the input is padded to 16 bytes using one of the padding modes. This is only done to the final block of data to be encrypted. CryptoStream has a special method to flush this final block of data - FlushFinalBlock. Calling Stream.Flush() does not flush the final block, as you might expect. Only by closing the stream or explicitly calling FlushFinalBlock is the final block, with any padding, encrypted and written to the backing stream. Without this call, the encrypted data is 16 bytes shorter than it should be. If this final block wasn't written, then the decryption gets to the final 16 bytes of the encrypted data and tries to decrypt it as the final block with padding. The end bytes don't match the padding scheme it's been told to use, therefore it throws an exception stating what is wrong - what the decryptor expects to be padding actually isn't, and so can't be removed from the stream. So, as well as closing the stream before reading the result, an alternative fix to my encryption code is the following: // encrypt the data using a CryptoStream using (var encryptor = rij.CreateEncryptor()) using (MemoryStream encryptedStream = new MemoryStream()) using (CryptoStream crypto = new CryptoStream( encryptedStream, encryptor, CryptoStreamMode.Write)) { crypto.Write(data, 0, data.Length); // explicitly flush the final block of data crypto.FlushFinalBlock(); encrypted = encryptedStream.ToArray(); } Conclusion So, if your padding is invalid, make sure that you close or call FlushFinalBlock on any CryptoStream performing encryption before you access the encrypted data. Flush isn't enough. Only then will the final block be present in the encrypted data, allowing it to be decrypted successfully.

    Read the article

  • Headaches using distributed version control for traditional teams?

    - by J Cooper
    Though I use and like DVCS for my personal projects, and can totally see how it makes managing contributions to your project from others easier (e.g. your typical Github scenario), it seems like for a "traditional" team there could be some problems over the centralized approach employed by solutions like TFS, Perforce, etc. (By "traditional" I mean a team of developers in an office working on one project that no one person "owns", with potentially everyone touching the same code.) A couple of these problems I've foreseen on my own, but please chime in with other considerations. In a traditional system, when you try to check your change in to the server, if someone else has previously checked in a conflicting change then you are forced to merge before you can check yours in. In the DVCS model, each developer checks in their changes locally and at some point pushes to some other repo. That repo then has a branch of that file that 2 people changed. It seems that now someone must be put in charge of dealing with that situation. A designated person on the team might not have sufficient knowledge of the entire codebase to be able to handle merging all conflicts. So now an extra step has been added where someone has to approach one of those developers, tell him to pull and do the merge and then push again (or you have to build an infrastructure that automates that task). Furthermore, since DVCS tends to make working locally so convenient, it is probable that developers could accumulate a few changes in their local repos before pushing, making such conflicts more common and more complicated. Obviously if everyone on the team only works on different areas of the code, this isn't an issue. But I'm curious about the case where everyone is working on the same code. It seems like the centralized model forces conflicts to be dealt with quickly and frequently, minimizing the need to do large, painful merges or have anyone "police" the main repo. So for those of you who do use a DVCS with your team in your office, how do you handle such cases? Do you find your daily (or more likely, weekly) workflow affected negatively? Are there any other considerations I should be aware of before recommending a DVCS at my workplace?

    Read the article

  • Developing Schema Compare for Oracle (Part 6): 9i Query Performance

    - by Simon Cooper
    All throughout the EAP and beta versions of Schema Compare for Oracle, our main request was support for Oracle 9i. After releasing version 1.0 with support for 10g and 11g, our next step was then to get version 1.1 of SCfO out with support for 9i. However, there were some significant problems that we had to overcome first. This post will concentrate on query execution time. When we first tested SCfO on a 9i server, after accounting for various changes to the data dictionary, we found that database registration was taking a long time. And I mean a looooooong time. The same database that on 10g or 11g would take a couple of minutes to register would be taking upwards of 30 mins on 9i. Obviously, this is not ideal, so a poke around the query execution plans was required. As an example, let's take the table population query - the one that reads ALL_TABLES and joins it with a few other dictionary views to get us back our list of tables. On 10g, this query takes 5.6 seconds. On 9i, it takes 89.47 seconds. The difference in execution plan is even more dramatic - here's the (edited) execution plan on 10g: -------------------------------------------------------------------------------| Id | Operation | Name | Bytes | Cost |-------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 108K| 939 || 1 | SORT ORDER BY | | 108K| 939 || 2 | NESTED LOOPS OUTER | | 108K| 938 ||* 3 | HASH JOIN RIGHT OUTER | | 103K| 762 || 4 | VIEW | ALL_EXTERNAL_LOCATIONS | 2058 | 3 ||* 20 | HASH JOIN RIGHT OUTER | | 73472 | 759 || 21 | VIEW | ALL_EXTERNAL_TABLES | 2097 | 3 ||* 34 | HASH JOIN RIGHT OUTER | | 39920 | 755 || 35 | VIEW | ALL_MVIEWS | 51 | 7 || 58 | NESTED LOOPS OUTER | | 39104 | 748 || 59 | VIEW | ALL_TABLES | 6704 | 668 || 89 | VIEW PUSHED PREDICATE | ALL_TAB_COMMENTS | 2025 | 5 || 106 | VIEW | ALL_PART_TABLES | 277 | 11 |------------------------------------------------------------------------------- And the same query on 9i: -------------------------------------------------------------------------------| Id | Operation | Name | Bytes | Cost |-------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 16P| 55G|| 1 | SORT ORDER BY | | 16P| 55G|| 2 | NESTED LOOPS OUTER | | 16P| 862M|| 3 | NESTED LOOPS OUTER | | 5251G| 992K|| 4 | NESTED LOOPS OUTER | | 4243M| 2578 || 5 | NESTED LOOPS OUTER | | 2669K| 1440 ||* 6 | HASH JOIN OUTER | | 398K| 302 || 7 | VIEW | ALL_TABLES | 342K| 276 || 29 | VIEW | ALL_MVIEWS | 51 | 20 ||* 50 | VIEW PUSHED PREDICATE | ALL_TAB_COMMENTS | 2043 | ||* 66 | VIEW PUSHED PREDICATE | ALL_EXTERNAL_TABLES | 1777K| ||* 80 | VIEW PUSHED PREDICATE | ALL_EXTERNAL_LOCATIONS | 1744K| ||* 96 | VIEW | ALL_PART_TABLES | 852K| |------------------------------------------------------------------------------- Have a look at the cost column. 10g's overall query cost is 939, and 9i is 55,000,000,000 (or more precisely, 55,496,472,769). It's also having to process far more data. What on earth could be causing this huge difference in query cost? After trawling through the '10g New Features' documentation, we found item 1.9.2.21. Before 10g, Oracle advised that you do not collect statistics on data dictionary objects. From 10g, it advised that you do collect statistics on the data dictionary; for our queries, Oracle therefore knows what sort of data is in the dictionary tables, and so can generate an efficient execution plan. On 9i, no statistics are present on the system tables, so Oracle has to use the Rule Based Optimizer, which turns most LEFT JOINs into nested loops. If we force 9i to use hash joins, like 10g, we get a much better plan: -------------------------------------------------------------------------------| Id | Operation | Name | Bytes | Cost |-------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 7587K| 3704 || 1 | SORT ORDER BY | | 7587K| 3704 ||* 2 | HASH JOIN OUTER | | 7587K| 822 ||* 3 | HASH JOIN OUTER | | 5262K| 616 ||* 4 | HASH JOIN OUTER | | 2980K| 465 ||* 5 | HASH JOIN OUTER | | 710K| 432 ||* 6 | HASH JOIN OUTER | | 398K| 302 || 7 | VIEW | ALL_TABLES | 342K| 276 || 29 | VIEW | ALL_MVIEWS | 51 | 20 || 50 | VIEW | ALL_PART_TABLES | 852K| 104 || 78 | VIEW | ALL_TAB_COMMENTS | 2043 | 14 || 93 | VIEW | ALL_EXTERNAL_LOCATIONS | 1744K| 31 || 106 | VIEW | ALL_EXTERNAL_TABLES | 1777K| 28 |------------------------------------------------------------------------------- That's much more like it. This drops the execution time down to 24 seconds. Not as good as 10g, but still an improvement. There are still several problems with this, however. 10g introduced a new join method - a right outer hash join (used in the first execution plan). The 9i query optimizer doesn't have this option available, so forcing a hash join means it has to hash the ALL_TABLES table, and furthermore re-hash it for every hash join in the execution plan; this could be thousands and thousands of rows. And although forcing hash joins somewhat alleviates this problem on our test systems, there's no guarantee that this will improve the execution time on customers' systems; it may even increase the time it takes (say, if all their tables are partitioned, or they've got a lot of materialized views). Ideally, we would want a solution that provides a speedup whatever the input. To try and get some ideas, we asked some oracle performance specialists to see if they had any ideas or tips. Their recommendation was to add a hidden hook into the product that allowed users to specify their own query hints, or even rewrite the queries entirely. However, we would prefer not to take that approach; as well as a lot of new infrastructure & a rewrite of the population code, it would have meant that any users of 9i would have to spend some time optimizing it to get it working on their system before they could use the product. Another approach was needed. All our population queries have a very specific pattern - a base table provides most of the information we need (ALL_TABLES for tables, or ALL_TAB_COLS for columns) and we do a left join to extra subsidiary tables that fill in gaps (for instance, ALL_PART_TABLES for partition information). All the left joins use the same set of columns to join on (typically the object owner & name), so we could re-use the hash information for each join, rather than re-hashing the same columns for every join. To allow us to do this, along with various other performance improvements that could be done for the specific query pattern we were using, we read all the tables individually and do a hash join on the client. Fortunately, this 'pure' algorithmic problem is the kind that can be very well optimized for expected real-world situations; as well as storing row data we're not using in the hash key on disk, we use very specific memory-efficient data structures to store all the information we need. This allows us to achieve a database population time that is as fast as on 10g, and even (in some situations) slightly faster, and a memory overhead of roughly 150 bytes per row of data in the result set (for schemas with 10,000 tables in that means an extra 1.4MB memory being used during population). Next: fun with the 9i dictionary views.

    Read the article

  • What does RESTful web applications mean? [closed]

    - by John Cooper
    Possible Duplicate: What is REST (in simple English) What does RESTful web applications mean? A web service is a function that can be accessed by other programs over the web (Http). To clarify a bit, when you create a website in PHP that outputs HTML its target is the browser and by extension the human being reading the page in the browser. A web service is not targeted at humans but rather at other programs. SOAP and REST are two ways of creating WebServices. Correct me if i am wrong? What are other ways i can create a WebService? What does it mean fully RESTful web Application?

    Read the article

  • Subterranean IL: Exception handling 2

    - by Simon Cooper
    Control flow in and around exception handlers is tightly controlled, due to the various ways the handler blocks can be executed. To start off with, I'll describe what SEH does when an exception is thrown. Handling exceptions When an exception is thrown, the CLR stops program execution at the throw statement and searches up the call stack looking for an appropriate handler; catch clauses are analyzed, and filter blocks are executed (I'll be looking at filter blocks in a later post). Then, when an appropriate catch or filter handler is found, the stack is unwound to that handler, executing successive finally and fault handlers in their own stack contexts along the way, and program execution continues at the start of the catch handler. Because catch, fault, finally and filter blocks can be executed essentially out of the blue by the SEH mechanism, without any reference to preceding instructions, you can't use arbitary branches in and out of exception handler blocks. Instead, you need to use specific instructions for control flow out of handler blocks: leave, endfinally/endfault, and endfilter. Exception handler control flow try blocks You cannot branch into or out of a try block or its handler using normal control flow instructions. The only way of entering a try block is by either falling through from preceding instructions, or by branching to the first instruction in the block. Once you are inside a try block, you can only leave it by throwing an exception or using the leave <label> instruction to jump to somewhere outside the block and its handler. The leave instructions signals the CLR to execute any finally handlers around the block. Most importantly, you cannot fall out of the block, and you cannot use a ret to return from the containing method (unlike in C#); you have to use leave to branch to a ret elsewhere in the method. As a side effect, leave empties the stack. catch blocks The only way of entering a catch block is if it is run by the SEH. At the start of the block execution, the thrown exception will be the only thing on the stack. The only way of leaving a catch block is to use throw, rethrow, or leave, in a similar way to try blocks. However, one thing you can do is use a leave to branch back to an arbitary place in the handler's try block! In other words, you can do this: .try { // ... newobj instance void [mscorlib]System.Exception::.ctor() throw MidTry: // ... leave.s RestOfMethod } catch [mscorlib]System.Exception { // ... leave.s MidTry } RestOfMethod: // ... As far as I know, this mechanism is not exposed in C# or VB. finally/fault blocks The only way of entering a finally or fault block is via the SEH, either as the result of a leave instruction in the corresponding try block, or as part of handling an exception. The only way to leave a finally or fault block is to use endfinally or endfault (both compile to the same binary representation), which continues execution after the finally/fault block, or, if the block was executed as part of handling an exception, signals that the SEH can continue walking the stack. filter blocks I'll be covering filters in a separate blog posts. They're quite different to the others, and have their own special semantics. Phew! Complicated stuff, but it's important to know if you're writing or outputting exception handlers in IL. Dealing with the C# compiler is probably best saved for the next post.

    Read the article

  • Developing Schema Compare for Oracle (Part 4): Script Configuration

    - by Simon Cooper
    If you've had a chance to play around with the Schema Compare for Oracle beta, you may have come across this screen in the synchronization wizard: This screen is one of the few screens that, along with the project configuration form, doesn't come from SQL Compare. This screen was designed to solve a couple of issues that, although aren't specific to Oracle, are much more of a problem than on SQL Server: Datatype conversions and NOT NULL columns. 1. Datatype conversions SQL Server is generally quite forgiving when it comes to datatype conversions using ALTER TABLE. For example, you can convert from a VARCHAR to INT using ALTER TABLE as long as all the character values are parsable as integers. Oracle, on the other hand, only allows ALTER TABLE conversions that don't change the internal data format. Essentially, every change that requires an actual datatype conversion has to be done using a rebuild with a conversion function. That's OK, as we can simply hard-code the various conversion functions for the valid datatype conversions and insert those into the rebuild SELECT list. However, as there always is with Oracle, there's a catch. Have a look at the NUMTODSINTERVAL function. As well as specifying the value (or column) to convert, you have to specify an interval_unit, which tells oracle how to interpret the input number. We can't hardcode a default for this parameter, as it is entirely dependent on the user's data context! So, in order to convert NUMBER to INTERVAL DAY TO SECOND/INTERVAL YEAR TO MONTH, we need to have feedback from the user as to what to put in this parameter while we're generating the sync script - this requires a new step in the engine action/script generation to insert these values into the script, as well as new UI to allow the user to specify these values in a sensible fashion. In implementing the engine and UI infrastructure to allow this it made much more sense to implement it for any rebuild datatype conversion, not just NUMBER to INTERVALs. For conversions which we can do, we pre-fill the 'value' box with the appropriate function from the documentation. The user can also type in arbitary SQL expressions, which allows the user to specify optional format parameters for the relevant conversion functions, or indeed call their own functions to convert between values that don't have a built-in conversion defined. As the value gets inserted as-is into the rebuild SELECT list, any expression that is valid in that context can be specified as the conversion value. 2. NOT NULL columns Another problem that is solved by the new step in the sync wizard is adding a NOT NULL column to a table. If the table contains data (as most database tables do), you can't just add a NOT NULL column, as Oracle doesn't know what value to put in the new column for existing rows - the DDL statement will fail. There are actually 3 separate scenarios for this problem that have separate solutions within the engine: Adding a NOT NULL column to a table without a rebuild Here, the workaround is to add a column default with an appropriate value to the column you're adding: ALTER TABLE tbl1 ADD newcol NUMBER DEFAULT <value> NOT NULL; Note, however, there is something to bear in mind about this solution; once specified on a column, a default cannot be removed. To 'remove' a default from a column you change it to have a default of NULL, hence there's code in the engine to treat a NULL default the same as no default at all. Adding a NOT NULL column to a table, where a separate change forced a table rebuild Fortunately, in this case, a column default is not required - we can simply insert the default value into the rebuild SELECT clause. Changing an existing NULL to a NOT NULL column To implement this, we run an UPDATE command before the ALTER TABLE to change all the NULLs in the column to the required default value. For all three, we need some way of allowing the user to specify a default value to use instead of NULL; as this is essentially the same problem as datatype conversion (inserting values into the sync script), we can re-use the UI and engine implementation of datatype conversion values. We also provide the option to alter the new column to allow NULLs, or to ignore the problem completely. Note that there is the same (long-running) problem in SQL Compare, but it is much more of an issue in Oracle as you cannot easily roll back executed DDL statements if the script fails at some point during execution. Furthermore, the engine of SQL Compare is far less conducive to inserting user-supplied values into the generated script. As we're writing the Schema Compare engine from scratch, we used what we learnt from the SQL Compare engine and designed it to be far more modular, which makes inserting procedures like this much easier.

    Read the article

  • Handling SMS/email convergence: how does a good business app do it?

    - by Tim Cooper
    I'm writing a school administration software package, but it strikes me that many developers will face this same issue: when communicating with users, should you use email or SMS or both, and should you treat them as fundamentally equivalent channels such that any message can get sent using any media, (with long and short forms of the message template obviously) or should different business functions be specifically tailored to each of the 3? This question got kicked off "StackOverflow" for being overly general, so I'm hoping it's not too general for this site - the answers will no doubt be subjective but "you don't need to write a whole book to answer the question". I'm particularly interested in people who have direct experience of having written comparable business applications. Sub-questions: Do I treat SMS as "moderately secure" and email as less secure? (I'm thinking about booking tokens for parent/teacher nights, permission slips for excursions, absence explanation notes - so high security is not a requirement for us, although medium security is) Is it annoying for users to receive the same message on multiple channels? Should we have a unified framework that reports on delivery or lack thereof of emails and SMS's?

    Read the article

  • .NET vs Windows 8: Rematch!

    - by Simon Cooper
    So, although you will be able to use your existing .NET skills to develop Metro apps, it turns out Microsoft are limiting Visual Studio 2011 Express to Metro-only. From the Express website: Visual Studio 11 Express for Windows 8 provides tools for Metro style app development. To create desktop apps, you need to use Visual Studio 11 Professional, or higher. Oh dear. To develop any sort of non-Metro application, you will need to pay for at least VS Professional. I suspect Microsoft (or at least, certain groups within Microsoft) have a very explicit strategy in mind. By making VS Express Metro-only, developers who don't want to pay for Professional will be forced to make their simple one-shot or open-source application in Metro. This increases the number of applications available for Windows 8 and Windows mobile devices, which in turn make those platforms more attractive for consumers. When you use the free VS 11 Express, instead of paying Microsoft, you provide them a service by making applications for Metro, which in turn makes Microsoft's mobile offering more attractive to consumers, increasing their market share. Of course, it remains to be seen if developers forced to jump onto the Metro bandwagon will simply jump ship to Android or iOS instead. At least, that's what I think is going on. With Microsoft, who really knows?

    Read the article

  • Launch Photography Is a Beautiful Collection of Shuttle Photos

    - by Jason Fitzpatrick
    Photographer Ben Cooper has a soft spot for the Space Shuttles; check out this excellent galleries to see everything from dynamic launch photos to beautiful fish-eye photos of the cockpits. Launch Photography [via Neatorama] How To Create a Customized Windows 7 Installation Disc With Integrated Updates How to Get Pro Features in Windows Home Versions with Third Party Tools HTG Explains: Is ReadyBoost Worth Using?

    Read the article

  • Row Oriented Security Using Triggers

    Handling security in an application can be a bit cumbersome. New author R Glen Cooper brings us a database design technique from the real world that can help you. Free trial of SQL Backup™“SQL Backup was able to cut down my backup time significantly AND achieved a 90% compression at the same time!” Joe Cheng. Download a free trial now.

    Read the article

  • A tale of two (and more) apps

    Robert Cooper gave a great lightning talk at our recent Atlanta GTUG meetup, where he discussed using a single codebase to target multiple mediums (e.g. Android, Facebook, Wave...

    Read the article

  • Get a Totally free Apple iPad Mearly For Trying It!

    If you want to get your hands on the cutting edge Apple iPAD, but you don';t necessarily wish to spend the $500 dollars... There is certainly other alternatives out there. For instance obtaining an Ap... [Author: Tim Cooper - Computers and Internet - April 09, 2010]

    Read the article

  • Problems linking to social networks in Windows 8

    - by Andrew Cooper
    I've upgraded my laptop to Windows 8 (from Windows 7) and I'm having problems with getting information to show in the People and Messaging apps. I've linked my Facebook, Twitter and LinkedIn accounts to my Live Id, and on Windows 7 I was able to see my Friends' facebook activity in Windows Live Messenger. In the Windows 8 People app I can see all my contacts from Facebook, Twitter and LinkedIn, and I can see the on-line status of at least my Facebook contacts. I can also see the profiles details of each contact, but I don't get anything in the "What's New" view. The Messaging app is just blank. I assume I should be able to send messages to my contacts, but I can't see any way to do it. Am I missing something?

    Read the article

  • How to set the preffered network interface in linx

    - by Mike Cooper
    I have my network set up like this. http://docs.google.com/Doc?docid=0AZ1YxuLE4djaZGhqN2s1NmRfMjhjNjc0Ym1meg&hl=en In words: I have a machine (Calcium, running Arch Linux) that has two network interfaces. eth0 is hoooked up to a router, and is gigabit. Eth1 is hooked up directly to the university network over 10Megabit. The router's uplink is hooked up to the university network as well, and it is also 10Megabit. Currently (I believe) all traffic on Calcium is going through eth0, through the router, regardless of whether it is internal or external. (How can I confirm this?) Ideally, traffic that is destined for the internal network (192.168.10.0/24) would travel over eth0 to the router, and wherever it is going. ALL other traffic should go over eth1. I suspect that this behavior could be acheived with IP tables? I don't really know where to start looking to learn that though, so any links would be appreciated.

    Read the article

  • Using uk domain names on us hosting

    - by Steve Cooper
    Hi, all. I'm thinking of transferring my UK websites to a US hosting company, and they assure me they can host UK domains. However, as a bit of a n00b I don't understand the relationship between UK domain registration and US hosting. If anyone can explain this relationship I'd be very grateful. What pitfalls and problems should I be alert to? Many thanks.

    Read the article

  • Adding to App Paths for 'Run' as a non-admin

    - by Cooper
    Is there a way to add a new App Path (for adding commands to Start-Run) without needing Admin? With admin, you can add an App Path to HKLM\Software\Microsoft\Windows\CurrentVersion\App Paths. I tried adding one under HKCU with no effect. I've added a new (user-writable) location to my user's PATH environment variable which lets me launch things from Start-Run, but the nerd in me still wants to know about the App Paths.

    Read the article

  • How to set the preferred network interface in linux

    - by Mike Cooper
    I have my network set up like this. http://docs.google.com/Doc?docid=0AZ1YxuLE4djaZGhqN2s1NmRfMjhjNjc0Ym1meg&hl=en In words: I have a machine (Calcium, running Arch Linux) that has two network interfaces. eth0 is hoooked up to a router, and is gigabit. Eth1 is hooked up directly to the university network over 10Megabit. The router's uplink is hooked up to the university network as well, and it is also 10Megabit. Currently (I believe) all traffic on Calcium is going through eth0, through the router, regardless of whether it is internal or external. (How can I confirm this?) Ideally, traffic that is destined for the internal network (192.168.10.0/24) would travel over eth0 to the router, and wherever it is going. ALL other traffic should go over eth1.

    Read the article

< Previous Page | 1 2 3 4 5 6  | Next Page >