Search Results

Search found 407 results on 17 pages for 'sequences'.

Page 17/17 | < Previous Page | 13 14 15 16 17 

  • Help with Java Program for Prime numbers

    - by Ben
    Hello everyone, I was wondering if you can help me with this program. I have been struggling with it for hours and have just trashed my code because the TA doesn't like how I executed it. I am completely hopeless and if anyone can help me out step by step, I would greatly appreciate it. In this project you will write a Java program that reads a positive integer n from standard input, then prints out the first n prime numbers. We say that an integer m is divisible by a non-zero integer d if there exists an integer k such that m = k d , i.e. if d divides evenly into m. Equivalently, m is divisible by d if the remainder of m upon (integer) division by d is zero. We would also express this by saying that d is a divisor of m. A positive integer p is called prime if its only positive divisors are 1 and p. The one exception to this rule is the number 1 itself, which is considered to be non-prime. A positive integer that is not prime is called composite. Euclid showed that there are infinitely many prime numbers. The prime and composite sequences begin as follows: Primes: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, … Composites: 1, 4, 6, 8, 9, 10, 12, 14, 15, 16, 18, 20, 21, 22, 24, 25, 26, 27, 28, … There are many ways to test a number for primality, but perhaps the simplest is to simply do trial divisions. Begin by dividing m by 2, and if it divides evenly, then m is not prime. Otherwise, divide by 3, then 4, then 5, etc. If at any point m is found to be divisible by a number d in the range 2 d m-1, then halt, and conclude that m is composite. Otherwise, conclude that m is prime. A moment’s thought shows that one need not do any trial divisions by numbers d which are themselves composite. For instance, if a trial division by 2 fails (i.e. has non-zero remainder, so m is odd), then a trial division by 4, 6, or 8, or any even number, must also fail. Thus to test a number m for primality, one need only do trial divisions by prime numbers less than m. Furthermore, it is not necessary to go all the way up to m-1. One need only do trial divisions of m by primes p in the range 2 p m . To see this, suppose m 1 is composite. Then there exist positive integers a and b such that 1 < a < m, 1 < b < m, and m = ab . But if both a m and b m , then ab m, contradicting that m = ab . Hence one of a or b must be less than or equal to m . To implement this process in java you will write a function called isPrime() with the following signature: static boolean isPrime(int m, int[] P) This function will return true or false according to whether m is prime or composite. The array argument P will contain a sufficient number of primes to do the testing. Specifically, at the time isPrime() is called, array P must contain (at least) all primes p in the range 2 p m . For instance, to test m = 53 for primality, one must do successive trial divisions by 2, 3, 5, and 7. We go no further since 11 53 . Thus a precondition for the function call isPrime(53, P) is that P[0] = 2 , P[1] = 3 , P[2] = 5, and P[3] = 7 . The return value in this case would be true since all these divisions fail. Similarly to test m =143 , one must do trial divisions by 2, 3, 5, 7, and 11 (since 13 143 ). The precondition for the function call isPrime(143, P) is therefore P[0] = 2 , P[1] = 3 , P[2] = 5, P[3] = 7 , and P[4] =11. The return value in this case would be false since 11 divides 143. Function isPrime() should contain a loop that steps through array P, doing trial divisions. This loop should terminate when 2 either a trial division succeeds, in which case false is returned, or until the next prime in P is greater than m , in which case true is returned. Function main() in this project will read the command line argument n, allocate an int array of length n, fill the array with primes, then print the contents of the array to stdout according to the format described below. In the context of function main(), we will refer to this array as Primes[]. Thus array Primes[] plays a dual role in this project. On the one hand, it is used to collect, store, and print the output data. On the other hand, it is passed to function isPrime() to test new integers for primality. Whenever isPrime() returns true, the newly discovered prime will be placed at the appropriate position in array Primes[]. This process works since, as explained above, the primes needed to test an integer m range only up to m , and all of these primes (and more) will already be stored in array Primes[] when m is tested. Of course it will be necessary to initialize Primes[0] = 2 manually, then proceed to test 3, 4, … for primality using function isPrime(). The following is an outline of the steps to be performed in function main(). • Check that the user supplied exactly one command line argument which can be interpreted as a positive integer n. If the command line argument is not a single positive integer, your program will print a usage message as specified in the examples below, then exit. • Allocate array Primes[] of length n and initialize Primes[0] = 2 . • Enter a loop which will discover subsequent primes and store them as Primes[1] , Primes[2], Primes[3] , ……, Primes[n -1] . This loop should contain an inner loop which walks through successive integers and tests them for primality by calling function isPrime() with appropriate arguments. • Print the contents of array Primes[] to stdout, 10 to a line separated by single spaces. In other words Primes[0] through Primes[9] will go on line 1, Primes[10] though Primes[19] will go on line 2, and so on. Note that if n is not a multiple of 10, then the last line of output will contain fewer than 10 primes. Your program, which will be called Prime.java, will produce output identical to that of the sample runs below. (As usual % signifies the unix prompt.) % java Prime Usage: java Prime [PositiveInteger] % java Prime xyz Usage: java Prime [PositiveInteger] % java Prime 10 20 Usage: java Prime [PositiveInteger] % java Prime 75 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 % 3 As you can see, inappropriate command line argument(s) generate a usage message which is similar to that of many unix commands. (Try doing the more command with no arguments to see such a message.) Your program will include a function called Usage() having signature static void Usage() that prints this message to stderr, then exits. Thus your program will contain three functions in all: main(), isPrime(), and Usage(). Each should be preceded by a comment block giving it’s name, a short description of it’s operation, and any necessary preconditions (such as those for isPrime().) See examples on the webpage.

    Read the article

  • Will these optimizations to my Ruby implementation of diff improve performance in a Rails app?

    - by grg-n-sox
    <tl;dr> In source version control diff patch generation, would it be worth it to use the optimizations listed at the very bottom of this writing (see <optimizations>) in my Ruby implementation of diff for making diff patches? </tl;dr> <introduction> I am programming something I have never done before and there might already be tools out there to do the exact thing I am programming but at this point I am having too much fun to care so I am still going to do it from scratch, even if there is a tool for this. So anyways, I am working on a Ruby on Rails app and need a certain feature. Basically I want each entry in a table of mine, let's say for example a table of video games, to have a stored chunk of text that represents a review or something of the sort for that table entry. However, I want this text to be both editable by any registered user and also keep track of different submissions in a version control system. The simplest solution I could think of is just implement a solution that keeps track of the text body and the diff patch history of different versions of the text body as objects in Ruby and then serialize it, preferably in human readable form (so I'll most likely use YAML for this) for editing if needed due to corruption by a software bug or a mistake is made by an admin doing some version editing. So at first I just tried to dive in head first into this feature to find that the problem of generating a diff patch is more difficult that I thought to do efficiently. So I did some research and came across some ideas. Some I have implemented already and some I have not. However, it all pretty much revolves around the longest common subsequence problem, as you would already know if you have already done anything with diff or diff-like features, and optimization the function that solves it. Currently I have it so it truncates the compared versions of the text body from the beginning and end until non-matching lines are found. Then it solves the problem using a comparison matrix, but instead of incrementing the value stored in a cell when it finds a matching line like in most longest common subsequence algorithms I have seen examples of, I increment when I have a non-matching line so as to calculate edit distance instead of longest common subsequence. Although as far as I can tell between the two approaches, they are essentially two sides of the same coin so either could be used to derive an answer. It then back-traces through the comparison matrix and notes when there was an incrementation and in which adjacent cell (West, Northwest, or North) to determine that line's diff entry and assumes all other lines to be unchanged. Normally I would leave it at that, but since this is going into a Rails environment and not just some stand-alone Ruby script, I started getting worried about needing to optimize at least enough so if a spammer that somehow knew how I implemented the version control system and knew my worst case scenario entry still wouldn't be able to hit the server that bad. After some searching and reading of research papers and articles through the internet, I've come across several that seem decent but all seem to have pros and cons and I am having a hard time deciding how well in this situation that the pros and cons balance out. So are the ones listed here worth it? I have listed them with known pros and cons. </introduction> <optimizations> Chop the compared sequences into multiple chucks of subsequences by splitting where lines are unchanged, and then truncating each section of unchanged lines at the beginning and end of each section. Then solve the edit distance of each subsequence. Pro: Changes the time increase as the changed area gets bigger from a quadratic increase to something more similar to a linear increase. Con: Figuring out where to split already seems like you have to solve edit distance except now you don't care how it is changed. Would be fine if this was solvable by a process closer to solving hamming distance but a single insertion would throw this off. Use a cryptographic hash function to both convert all sequence elements into integers and ensure uniqueness. Then solve the edit distance comparing the hash integers instead of the sequence elements themselves. Pro: The operation of comparing two integers is faster than the operation of comparing two strings, so a slight performance gain is received after every comparison, which can be a lot overall. Con: Using a cryptographic hash function takes time to convert all the sequence elements and may end up costing more time to do the conversion that you gain back from the integer comparisons. You could use the built in hash function for a string but that will not guarantee uniqueness. Use lazy evaluation to only calculate the three center-most diagonals of the comparison matrix and then only calculate additional diagonals as needed. And then also use this approach to possibly remove the need on some comparisons to compare all three adjacent cells as desribed here. Pro: Can turn an algorithm that always takes O(n * m) time and make it so only worst case scenario is that time, best case becomes practically linear, and average case is somewhere between the two. Con: It is an algorithm I've only seen implemented in functional programming languages and I am having a difficult time comprehending how to convert this into Ruby based on how it is described at the site linked to above. Make a C module and do the hard work at the native level in C and just make a Ruby wrapper for it so Ruby can make all the calls to it that it needs. Pro: I have to imagine that evaluating something like this in could be a LOT faster. Con: I have no idea how Rails handles apps with ruby code that has C extensions and it hurts the portability of the app. This is an optimization for after the solving of edit distance, but idea is to store additional combined diffs with the ones produced by each version to make a delta-tree data structure with the most recently made diff as the root node of the tree so getting to any version takes worst case time of O(log n) instead of O(n). Pro: Would make going back to an old version a lot faster. Con: It would mean every new commit, the delta-tree would get a new root node that will cost time to reorganize the delta-tree for an operation that will be carried out a lot more often than going back a version, not to mention the unlikelihood it will be an old version. </optimizations> So are these things worth the effort?

    Read the article

  • New features of C# 4.0

    This article covers New features of C# 4.0. Article has been divided into below sections. Introduction. Dynamic Lookup. Named and Optional Arguments. Features for COM interop. Variance. Relationship with Visual Basic. Resources. Other interested readings… 22 New Features of Visual Studio 2008 for .NET Professionals 50 New Features of SQL Server 2008 IIS 7.0 New features Introduction It is now close to a year since Microsoft Visual C# 3.0 shipped as part of Visual Studio 2008. In the VS Managed Languages team we are hard at work on creating the next version of the language (with the unsurprising working title of C# 4.0), and this document is a first public description of the planned language features as we currently see them. Please be advised that all this is in early stages of production and is subject to change. Part of the reason for sharing our plans in public so early is precisely to get the kind of feedback that will cause us to improve the final product before it rolls out. Simultaneously with the publication of this whitepaper, a first public CTP (community technology preview) of Visual Studio 2010 is going out as a Virtual PC image for everyone to try. Please use it to play and experiment with the features, and let us know of any thoughts you have. We ask for your understanding and patience working with very early bits, where especially new or newly implemented features do not have the quality or stability of a final product. The aim of the CTP is not to give you a productive work environment but to give you the best possible impression of what we are working on for the next release. The CTP contains a number of walkthroughs, some of which highlight the new language features of C# 4.0. Those are excellent for getting a hands-on guided tour through the details of some common scenarios for the features. You may consider this whitepaper a companion document to these walkthroughs, complementing them with a focus on the overall language features and how they work, as opposed to the specifics of the concrete scenarios. C# 4.0 The major theme for C# 4.0 is dynamic programming. Increasingly, objects are “dynamic” in the sense that their structure and behavior is not captured by a static type, or at least not one that the compiler knows about when compiling your program. Some examples include a. objects from dynamic programming languages, such as Python or Ruby b. COM objects accessed through IDispatch c. ordinary .NET types accessed through reflection d. objects with changing structure, such as HTML DOM objects While C# remains a statically typed language, we aim to vastly improve the interaction with such objects. A secondary theme is co-evolution with Visual Basic. Going forward we will aim to maintain the individual character of each language, but at the same time important new features should be introduced in both languages at the same time. They should be differentiated more by style and feel than by feature set. The new features in C# 4.0 fall into four groups: Dynamic lookup Dynamic lookup allows you to write method, operator and indexer calls, property and field accesses, and even object invocations which bypass the C# static type checking and instead gets resolved at runtime. Named and optional parameters Parameters in C# can now be specified as optional by providing a default value for them in a member declaration. When the member is invoked, optional arguments can be omitted. Furthermore, any argument can be passed by parameter name instead of position. COM specific interop features Dynamic lookup as well as named and optional parameters both help making programming against COM less painful than today. On top of that, however, we are adding a number of other small features that further improve the interop experience. Variance It used to be that an IEnumerable<string> wasn’t an IEnumerable<object>. Now it is – C# embraces type safe “co-and contravariance” and common BCL types are updated to take advantage of that. Dynamic Lookup Dynamic lookup allows you a unified approach to invoking things dynamically. With dynamic lookup, when you have an object in your hand you do not need to worry about whether it comes from COM, IronPython, the HTML DOM or reflection; you just apply operations to it and leave it to the runtime to figure out what exactly those operations mean for that particular object. This affords you enormous flexibility, and can greatly simplify your code, but it does come with a significant drawback: Static typing is not maintained for these operations. A dynamic object is assumed at compile time to support any operation, and only at runtime will you get an error if it wasn’t so. Oftentimes this will be no loss, because the object wouldn’t have a static type anyway, in other cases it is a tradeoff between brevity and safety. In order to facilitate this tradeoff, it is a design goal of C# to allow you to opt in or opt out of dynamic behavior on every single call. The dynamic type C# 4.0 introduces a new static type called dynamic. When you have an object of type dynamic you can “do things to it” that are resolved only at runtime: dynamic d = GetDynamicObject(…); d.M(7); The C# compiler allows you to call a method with any name and any arguments on d because it is of type dynamic. At runtime the actual object that d refers to will be examined to determine what it means to “call M with an int” on it. The type dynamic can be thought of as a special version of the type object, which signals that the object can be used dynamically. It is easy to opt in or out of dynamic behavior: any object can be implicitly converted to dynamic, “suspending belief” until runtime. Conversely, there is an “assignment conversion” from dynamic to any other type, which allows implicit conversion in assignment-like constructs: dynamic d = 7; // implicit conversion int i = d; // assignment conversion Dynamic operations Not only method calls, but also field and property accesses, indexer and operator calls and even delegate invocations can be dispatched dynamically: dynamic d = GetDynamicObject(…); d.M(7); // calling methods d.f = d.P; // getting and settings fields and properties d[“one”] = d[“two”]; // getting and setting thorugh indexers int i = d + 3; // calling operators string s = d(5,7); // invoking as a delegate The role of the C# compiler here is simply to package up the necessary information about “what is being done to d”, so that the runtime can pick it up and determine what the exact meaning of it is given an actual object d. Think of it as deferring part of the compiler’s job to runtime. The result of any dynamic operation is itself of type dynamic. Runtime lookup At runtime a dynamic operation is dispatched according to the nature of its target object d: COM objects If d is a COM object, the operation is dispatched dynamically through COM IDispatch. This allows calling to COM types that don’t have a Primary Interop Assembly (PIA), and relying on COM features that don’t have a counterpart in C#, such as indexed properties and default properties. Dynamic objects If d implements the interface IDynamicObject d itself is asked to perform the operation. Thus by implementing IDynamicObject a type can completely redefine the meaning of dynamic operations. This is used intensively by dynamic languages such as IronPython and IronRuby to implement their own dynamic object models. It will also be used by APIs, e.g. by the HTML DOM to allow direct access to the object’s properties using property syntax. Plain objects Otherwise d is a standard .NET object, and the operation will be dispatched using reflection on its type and a C# “runtime binder” which implements C#’s lookup and overload resolution semantics at runtime. This is essentially a part of the C# compiler running as a runtime component to “finish the work” on dynamic operations that was deferred by the static compiler. Example Assume the following code: dynamic d1 = new Foo(); dynamic d2 = new Bar(); string s; d1.M(s, d2, 3, null); Because the receiver of the call to M is dynamic, the C# compiler does not try to resolve the meaning of the call. Instead it stashes away information for the runtime about the call. This information (often referred to as the “payload”) is essentially equivalent to: “Perform an instance method call of M with the following arguments: 1. a string 2. a dynamic 3. a literal int 3 4. a literal object null” At runtime, assume that the actual type Foo of d1 is not a COM type and does not implement IDynamicObject. In this case the C# runtime binder picks up to finish the overload resolution job based on runtime type information, proceeding as follows: 1. Reflection is used to obtain the actual runtime types of the two objects, d1 and d2, that did not have a static type (or rather had the static type dynamic). The result is Foo for d1 and Bar for d2. 2. Method lookup and overload resolution is performed on the type Foo with the call M(string,Bar,3,null) using ordinary C# semantics. 3. If the method is found it is invoked; otherwise a runtime exception is thrown. Overload resolution with dynamic arguments Even if the receiver of a method call is of a static type, overload resolution can still happen at runtime. This can happen if one or more of the arguments have the type dynamic: Foo foo = new Foo(); dynamic d = new Bar(); var result = foo.M(d); The C# runtime binder will choose between the statically known overloads of M on Foo, based on the runtime type of d, namely Bar. The result is again of type dynamic. The Dynamic Language Runtime An important component in the underlying implementation of dynamic lookup is the Dynamic Language Runtime (DLR), which is a new API in .NET 4.0. The DLR provides most of the infrastructure behind not only C# dynamic lookup but also the implementation of several dynamic programming languages on .NET, such as IronPython and IronRuby. Through this common infrastructure a high degree of interoperability is ensured, but just as importantly the DLR provides excellent caching mechanisms which serve to greatly enhance the efficiency of runtime dispatch. To the user of dynamic lookup in C#, the DLR is invisible except for the improved efficiency. However, if you want to implement your own dynamically dispatched objects, the IDynamicObject interface allows you to interoperate with the DLR and plug in your own behavior. This is a rather advanced task, which requires you to understand a good deal more about the inner workings of the DLR. For API writers, however, it can definitely be worth the trouble in order to vastly improve the usability of e.g. a library representing an inherently dynamic domain. Open issues There are a few limitations and things that might work differently than you would expect. · The DLR allows objects to be created from objects that represent classes. However, the current implementation of C# doesn’t have syntax to support this. · Dynamic lookup will not be able to find extension methods. Whether extension methods apply or not depends on the static context of the call (i.e. which using clauses occur), and this context information is not currently kept as part of the payload. · Anonymous functions (i.e. lambda expressions) cannot appear as arguments to a dynamic method call. The compiler cannot bind (i.e. “understand”) an anonymous function without knowing what type it is converted to. One consequence of these limitations is that you cannot easily use LINQ queries over dynamic objects: dynamic collection = …; var result = collection.Select(e => e + 5); If the Select method is an extension method, dynamic lookup will not find it. Even if it is an instance method, the above does not compile, because a lambda expression cannot be passed as an argument to a dynamic operation. There are no plans to address these limitations in C# 4.0. Named and Optional Arguments Named and optional parameters are really two distinct features, but are often useful together. Optional parameters allow you to omit arguments to member invocations, whereas named arguments is a way to provide an argument using the name of the corresponding parameter instead of relying on its position in the parameter list. Some APIs, most notably COM interfaces such as the Office automation APIs, are written specifically with named and optional parameters in mind. Up until now it has been very painful to call into these APIs from C#, with sometimes as many as thirty arguments having to be explicitly passed, most of which have reasonable default values and could be omitted. Even in APIs for .NET however you sometimes find yourself compelled to write many overloads of a method with different combinations of parameters, in order to provide maximum usability to the callers. Optional parameters are a useful alternative for these situations. Optional parameters A parameter is declared optional simply by providing a default value for it: public void M(int x, int y = 5, int z = 7); Here y and z are optional parameters and can be omitted in calls: M(1, 2, 3); // ordinary call of M M(1, 2); // omitting z – equivalent to M(1, 2, 7) M(1); // omitting both y and z – equivalent to M(1, 5, 7) Named and optional arguments C# 4.0 does not permit you to omit arguments between commas as in M(1,,3). This could lead to highly unreadable comma-counting code. Instead any argument can be passed by name. Thus if you want to omit only y from a call of M you can write: M(1, z: 3); // passing z by name or M(x: 1, z: 3); // passing both x and z by name or even M(z: 3, x: 1); // reversing the order of arguments All forms are equivalent, except that arguments are always evaluated in the order they appear, so in the last example the 3 is evaluated before the 1. Optional and named arguments can be used not only with methods but also with indexers and constructors. Overload resolution Named and optional arguments affect overload resolution, but the changes are relatively simple: A signature is applicable if all its parameters are either optional or have exactly one corresponding argument (by name or position) in the call which is convertible to the parameter type. Betterness rules on conversions are only applied for arguments that are explicitly given – omitted optional arguments are ignored for betterness purposes. If two signatures are equally good, one that does not omit optional parameters is preferred. M(string s, int i = 1); M(object o); M(int i, string s = “Hello”); M(int i); M(5); Given these overloads, we can see the working of the rules above. M(string,int) is not applicable because 5 doesn’t convert to string. M(int,string) is applicable because its second parameter is optional, and so, obviously are M(object) and M(int). M(int,string) and M(int) are both better than M(object) because the conversion from 5 to int is better than the conversion from 5 to object. Finally M(int) is better than M(int,string) because no optional arguments are omitted. Thus the method that gets called is M(int). Features for COM interop Dynamic lookup as well as named and optional parameters greatly improve the experience of interoperating with COM APIs such as the Office Automation APIs. In order to remove even more of the speed bumps, a couple of small COM-specific features are also added to C# 4.0. Dynamic import Many COM methods accept and return variant types, which are represented in the PIAs as object. In the vast majority of cases, a programmer calling these methods already knows the static type of a returned object from context, but explicitly has to perform a cast on the returned value to make use of that knowledge. These casts are so common that they constitute a major nuisance. In order to facilitate a smoother experience, you can now choose to import these COM APIs in such a way that variants are instead represented using the type dynamic. In other words, from your point of view, COM signatures now have occurrences of dynamic instead of object in them. This means that you can easily access members directly off a returned object, or you can assign it to a strongly typed local variable without having to cast. To illustrate, you can now say excel.Cells[1, 1].Value = "Hello"; instead of ((Excel.Range)excel.Cells[1, 1]).Value2 = "Hello"; and Excel.Range range = excel.Cells[1, 1]; instead of Excel.Range range = (Excel.Range)excel.Cells[1, 1]; Compiling without PIAs Primary Interop Assemblies are large .NET assemblies generated from COM interfaces to facilitate strongly typed interoperability. They provide great support at design time, where your experience of the interop is as good as if the types where really defined in .NET. However, at runtime these large assemblies can easily bloat your program, and also cause versioning issues because they are distributed independently of your application. The no-PIA feature allows you to continue to use PIAs at design time without having them around at runtime. Instead, the C# compiler will bake the small part of the PIA that a program actually uses directly into its assembly. At runtime the PIA does not have to be loaded. Omitting ref Because of a different programming model, many COM APIs contain a lot of reference parameters. Contrary to refs in C#, these are typically not meant to mutate a passed-in argument for the subsequent benefit of the caller, but are simply another way of passing value parameters. It therefore seems unreasonable that a C# programmer should have to create temporary variables for all such ref parameters and pass these by reference. Instead, specifically for COM methods, the C# compiler will allow you to pass arguments by value to such a method, and will automatically generate temporary variables to hold the passed-in values, subsequently discarding these when the call returns. In this way the caller sees value semantics, and will not experience any side effects, but the called method still gets a reference. Open issues A few COM interface features still are not surfaced in C#. Most notably these include indexed properties and default properties. As mentioned above these will be respected if you access COM dynamically, but statically typed C# code will still not recognize them. There are currently no plans to address these remaining speed bumps in C# 4.0. Variance An aspect of generics that often comes across as surprising is that the following is illegal: IList<string> strings = new List<string>(); IList<object> objects = strings; The second assignment is disallowed because strings does not have the same element type as objects. There is a perfectly good reason for this. If it were allowed you could write: objects[0] = 5; string s = strings[0]; Allowing an int to be inserted into a list of strings and subsequently extracted as a string. This would be a breach of type safety. However, there are certain interfaces where the above cannot occur, notably where there is no way to insert an object into the collection. Such an interface is IEnumerable<T>. If instead you say: IEnumerable<object> objects = strings; There is no way we can put the wrong kind of thing into strings through objects, because objects doesn’t have a method that takes an element in. Variance is about allowing assignments such as this in cases where it is safe. The result is that a lot of situations that were previously surprising now just work. Covariance In .NET 4.0 the IEnumerable<T> interface will be declared in the following way: public interface IEnumerable<out T> : IEnumerable { IEnumerator<T> GetEnumerator(); } public interface IEnumerator<out T> : IEnumerator { bool MoveNext(); T Current { get; } } The “out” in these declarations signifies that the T can only occur in output position in the interface – the compiler will complain otherwise. In return for this restriction, the interface becomes “covariant” in T, which means that an IEnumerable<A> is considered an IEnumerable<B> if A has a reference conversion to B. As a result, any sequence of strings is also e.g. a sequence of objects. This is useful e.g. in many LINQ methods. Using the declarations above: var result = strings.Union(objects); // succeeds with an IEnumerable<object> This would previously have been disallowed, and you would have had to to some cumbersome wrapping to get the two sequences to have the same element type. Contravariance Type parameters can also have an “in” modifier, restricting them to occur only in input positions. An example is IComparer<T>: public interface IComparer<in T> { public int Compare(T left, T right); } The somewhat baffling result is that an IComparer<object> can in fact be considered an IComparer<string>! It makes sense when you think about it: If a comparer can compare any two objects, it can certainly also compare two strings. This property is referred to as contravariance. A generic type can have both in and out modifiers on its type parameters, as is the case with the Func<…> delegate types: public delegate TResult Func<in TArg, out TResult>(TArg arg); Obviously the argument only ever comes in, and the result only ever comes out. Therefore a Func<object,string> can in fact be used as a Func<string,object>. Limitations Variant type parameters can only be declared on interfaces and delegate types, due to a restriction in the CLR. Variance only applies when there is a reference conversion between the type arguments. For instance, an IEnumerable<int> is not an IEnumerable<object> because the conversion from int to object is a boxing conversion, not a reference conversion. Also please note that the CTP does not contain the new versions of the .NET types mentioned above. In order to experiment with variance you have to declare your own variant interfaces and delegate types. COM Example Here is a larger Office automation example that shows many of the new C# features in action. using System; using System.Diagnostics; using System.Linq; using Excel = Microsoft.Office.Interop.Excel; using Word = Microsoft.Office.Interop.Word; class Program { static void Main(string[] args) { var excel = new Excel.Application(); excel.Visible = true; excel.Workbooks.Add(); // optional arguments omitted excel.Cells[1, 1].Value = "Process Name"; // no casts; Value dynamically excel.Cells[1, 2].Value = "Memory Usage"; // accessed var processes = Process.GetProcesses() .OrderByDescending(p =&gt; p.WorkingSet) .Take(10); int i = 2; foreach (var p in processes) { excel.Cells[i, 1].Value = p.ProcessName; // no casts excel.Cells[i, 2].Value = p.WorkingSet; // no casts i++; } Excel.Range range = excel.Cells[1, 1]; // no casts Excel.Chart chart = excel.ActiveWorkbook.Charts. Add(After: excel.ActiveSheet); // named and optional arguments chart.ChartWizard( Source: range.CurrentRegion, Title: "Memory Usage in " + Environment.MachineName); //named+optional chart.ChartStyle = 45; chart.CopyPicture(Excel.XlPictureAppearance.xlScreen, Excel.XlCopyPictureFormat.xlBitmap, Excel.XlPictureAppearance.xlScreen); var word = new Word.Application(); word.Visible = true; word.Documents.Add(); // optional arguments word.Selection.Paste(); } } The code is much more terse and readable than the C# 3.0 counterpart. Note especially how the Value property is accessed dynamically. This is actually an indexed property, i.e. a property that takes an argument; something which C# does not understand. However the argument is optional. Since the access is dynamic, it goes through the runtime COM binder which knows to substitute the default value and call the indexed property. Thus, dynamic COM allows you to avoid accesses to the puzzling Value2 property of Excel ranges. Relationship with Visual Basic A number of the features introduced to C# 4.0 already exist or will be introduced in some form or other in Visual Basic: · Late binding in VB is similar in many ways to dynamic lookup in C#, and can be expected to make more use of the DLR in the future, leading to further parity with C#. · Named and optional arguments have been part of Visual Basic for a long time, and the C# version of the feature is explicitly engineered with maximal VB interoperability in mind. · NoPIA and variance are both being introduced to VB and C# at the same time. VB in turn is adding a number of features that have hitherto been a mainstay of C#. As a result future versions of C# and VB will have much better feature parity, for the benefit of everyone. Resources All available resources concerning C# 4.0 can be accessed through the C# Dev Center. Specifically, this white paper and other resources can be found at the Code Gallery site. Enjoy! span.fullpost {display:none;}

    Read the article

  • value types in the vm

    - by john.rose
    value types in the vm p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} p.p2 {margin: 0.0px 0.0px 14.0px 0.0px; font: 14.0px Times} p.p3 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times} p.p4 {margin: 0.0px 0.0px 15.0px 0.0px; font: 14.0px Times} p.p5 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier} p.p6 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier; min-height: 17.0px} p.p7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p8 {margin: 0.0px 0.0px 0.0px 36.0px; text-indent: -36.0px; font: 14.0px Times; min-height: 18.0px} p.p9 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p10 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; color: #000000} li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} li.li7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} span.s1 {font: 14.0px Courier} span.s2 {color: #000000} span.s3 {font: 14.0px Courier; color: #000000} ol.ol1 {list-style-type: decimal} Or, enduring values for a changing world. Introduction A value type is a data type which, generally speaking, is designed for being passed by value in and out of methods, and stored by value in data structures. The only value types which the Java language directly supports are the eight primitive types. Java indirectly and approximately supports value types, if they are implemented in terms of classes. For example, both Integer and String may be viewed as value types, especially if their usage is restricted to avoid operations appropriate to Object. In this note, we propose a definition of value types in terms of a design pattern for Java classes, accompanied by a set of usage restrictions. We also sketch the relation of such value types to tuple types (which are a JVM-level notion), and point out JVM optimizations that can apply to value types. This note is a thought experiment to extend the JVM’s performance model in support of value types. The demonstration has two phases.  Initially the extension can simply use design patterns, within the current bytecode architecture, and in today’s Java language. But if the performance model is to be realized in practice, it will probably require new JVM bytecode features, changes to the Java language, or both.  We will look at a few possibilities for these new features. An Axiom of Value In the context of the JVM, a value type is a data type equipped with construction, assignment, and equality operations, and a set of typed components, such that, whenever two variables of the value type produce equal corresponding values for their components, the values of the two variables cannot be distinguished by any JVM operation. Here are some corollaries: A value type is immutable, since otherwise a copy could be constructed and the original could be modified in one of its components, allowing the copies to be distinguished. Changing the component of a value type requires construction of a new value. The equals and hashCode operations are strictly component-wise. If a value type is represented by a JVM reference, that reference cannot be successfully synchronized on, and cannot be usefully compared for reference equality. A value type can be viewed in terms of what it doesn’t do. We can say that a value type omits all value-unsafe operations, which could violate the constraints on value types.  These operations, which are ordinarily allowed for Java object types, are pointer equality comparison (the acmp instruction), synchronization (the monitor instructions), all the wait and notify methods of class Object, and non-trivial finalize methods. The clone method is also value-unsafe, although for value types it could be treated as the identity function. Finally, and most importantly, any side effect on an object (however visible) also counts as an value-unsafe operation. A value type may have methods, but such methods must not change the components of the value. It is reasonable and useful to define methods like toString, equals, and hashCode on value types, and also methods which are specifically valuable to users of the value type. Representations of Value Value types have two natural representations in the JVM, unboxed and boxed. An unboxed value consists of the components, as simple variables. For example, the complex number x=(1+2i), in rectangular coordinate form, may be represented in unboxed form by the following pair of variables: /*Complex x = Complex.valueOf(1.0, 2.0):*/ double x_re = 1.0, x_im = 2.0; These variables might be locals, parameters, or fields. Their association as components of a single value is not defined to the JVM. Here is a sample computation which computes the norm of the difference between two complex numbers: double distance(/*Complex x:*/ double x_re, double x_im,         /*Complex y:*/ double y_re, double y_im) {     /*Complex z = x.minus(y):*/     double z_re = x_re - y_re, z_im = x_im - y_im;     /*return z.abs():*/     return Math.sqrt(z_re*z_re + z_im*z_im); } A boxed representation groups component values under a single object reference. The reference is to a ‘wrapper class’ that carries the component values in its fields. (A primitive type can naturally be equated with a trivial value type with just one component of that type. In that view, the wrapper class Integer can serve as a boxed representation of value type int.) The unboxed representation of complex numbers is practical for many uses, but it fails to cover several major use cases: return values, array elements, and generic APIs. The two components of a complex number cannot be directly returned from a Java function, since Java does not support multiple return values. The same story applies to array elements: Java has no ’array of structs’ feature. (Double-length arrays are a possible workaround for complex numbers, but not for value types with heterogeneous components.) By generic APIs I mean both those which use generic types, like Arrays.asList and those which have special case support for primitive types, like String.valueOf and PrintStream.println. Those APIs do not support unboxed values, and offer some problems to boxed values. Any ’real’ JVM type should have a story for returns, arrays, and API interoperability. The basic problem here is that value types fall between primitive types and object types. Value types are clearly more complex than primitive types, and object types are slightly too complicated. Objects are a little bit dangerous to use as value carriers, since object references can be compared for pointer equality, and can be synchronized on. Also, as many Java programmers have observed, there is often a performance cost to using wrapper objects, even on modern JVMs. Even so, wrapper classes are a good starting point for talking about value types. If there were a set of structural rules and restrictions which would prevent value-unsafe operations on value types, wrapper classes would provide a good notation for defining value types. This note attempts to define such rules and restrictions. Let’s Start Coding Now it is time to look at some real code. Here is a definition, written in Java, of a complex number value type. @ValueSafe public final class Complex implements java.io.Serializable {     // immutable component structure:     public final double re, im;     private Complex(double re, double im) {         this.re = re; this.im = im;     }     // interoperability methods:     public String toString() { return "Complex("+re+","+im+")"; }     public List<Double> asList() { return Arrays.asList(re, im); }     public boolean equals(Complex c) {         return re == c.re && im == c.im;     }     public boolean equals(@ValueSafe Object x) {         return x instanceof Complex && equals((Complex) x);     }     public int hashCode() {         return 31*Double.valueOf(re).hashCode()                 + Double.valueOf(im).hashCode();     }     // factory methods:     public static Complex valueOf(double re, double im) {         return new Complex(re, im);     }     public Complex changeRe(double re2) { return valueOf(re2, im); }     public Complex changeIm(double im2) { return valueOf(re, im2); }     public static Complex cast(@ValueSafe Object x) {         return x == null ? ZERO : (Complex) x;     }     // utility methods and constants:     public Complex plus(Complex c)  { return new Complex(re+c.re, im+c.im); }     public Complex minus(Complex c) { return new Complex(re-c.re, im-c.im); }     public double abs() { return Math.sqrt(re*re + im*im); }     public static final Complex PI = valueOf(Math.PI, 0.0);     public static final Complex ZERO = valueOf(0.0, 0.0); } This is not a minimal definition, because it includes some utility methods and other optional parts.  The essential elements are as follows: The class is marked as a value type with an annotation. The class is final, because it does not make sense to create subclasses of value types. The fields of the class are all non-private and final.  (I.e., the type is immutable and structurally transparent.) From the supertype Object, all public non-final methods are overridden. The constructor is private. Beyond these bare essentials, we can observe the following features in this example, which are likely to be typical of all value types: One or more factory methods are responsible for value creation, including a component-wise valueOf method. There are utility methods for complex arithmetic and instance creation, such as plus and changeIm. There are static utility constants, such as PI. The type is serializable, using the default mechanisms. There are methods for converting to and from dynamically typed references, such as asList and cast. The Rules In order to use value types properly, the programmer must avoid value-unsafe operations.  A helpful Java compiler should issue errors (or at least warnings) for code which provably applies value-unsafe operations, and should issue warnings for code which might be correct but does not provably avoid value-unsafe operations.  No such compilers exist today, but to simplify our account here, we will pretend that they do exist. A value-safe type is any class, interface, or type parameter marked with the @ValueSafe annotation, or any subtype of a value-safe type.  If a value-safe class is marked final, it is in fact a value type.  All other value-safe classes must be abstract.  The non-static fields of a value class must be non-public and final, and all its constructors must be private. Under the above rules, a standard interface could be helpful to define value types like Complex.  Here is an example: @ValueSafe public interface ValueType extends java.io.Serializable {     // All methods listed here must get redefined.     // Definitions must be value-safe, which means     // they may depend on component values only.     List<? extends Object> asList();     int hashCode();     boolean equals(@ValueSafe Object c);     String toString(); } //@ValueSafe inherited from supertype: public final class Complex implements ValueType { … The main advantage of such a conventional interface is that (unlike an annotation) it is reified in the runtime type system.  It could appear as an element type or parameter bound, for facilities which are designed to work on value types only.  More broadly, it might assist the JVM to perform dynamic enforcement of the rules for value types. Besides types, the annotation @ValueSafe can mark fields, parameters, local variables, and methods.  (This is redundant when the type is also value-safe, but may be useful when the type is Object or another supertype of a value type.)  Working forward from these annotations, an expression E is defined as value-safe if it satisfies one or more of the following: The type of E is a value-safe type. E names a field, parameter, or local variable whose declaration is marked @ValueSafe. E is a call to a method whose declaration is marked @ValueSafe. E is an assignment to a value-safe variable, field reference, or array reference. E is a cast to a value-safe type from a value-safe expression. E is a conditional expression E0 ? E1 : E2, and both E1 and E2 are value-safe. Assignments to value-safe expressions and initializations of value-safe names must take their values from value-safe expressions. A value-safe expression may not be the subject of a value-unsafe operation.  In particular, it cannot be synchronized on, nor can it be compared with the “==” operator, not even with a null or with another value-safe type. In a program where all of these rules are followed, no value-type value will be subject to a value-unsafe operation.  Thus, the prime axiom of value types will be satisfied, that no two value type will be distinguishable as long as their component values are equal. More Code To illustrate these rules, here are some usage examples for Complex: Complex pi = Complex.valueOf(Math.PI, 0); Complex zero = pi.changeRe(0);  //zero = pi; zero.re = 0; ValueType vtype = pi; @SuppressWarnings("value-unsafe")   Object obj = pi; @ValueSafe Object obj2 = pi; obj2 = new Object();  // ok List<Complex> clist = new ArrayList<Complex>(); clist.add(pi);  // (ok assuming List.add param is @ValueSafe) List<ValueType> vlist = new ArrayList<ValueType>(); vlist.add(pi);  // (ok) List<Object> olist = new ArrayList<Object>(); olist.add(pi);  // warning: "value-unsafe" boolean z = pi.equals(zero); boolean z1 = (pi == zero);  // error: reference comparison on value type boolean z2 = (pi == null);  // error: reference comparison on value type boolean z3 = (pi == obj2);  // error: reference comparison on value type synchronized (pi) { }  // error: synch of value, unpredictable result synchronized (obj2) { }  // unpredictable result Complex qq = pi; qq = null;  // possible NPE; warning: “null-unsafe" qq = (Complex) obj;  // warning: “null-unsafe" qq = Complex.cast(obj);  // OK @SuppressWarnings("null-unsafe")   Complex empty = null;  // possible NPE qq = empty;  // possible NPE (null pollution) The Payoffs It follows from this that either the JVM or the java compiler can replace boxed value-type values with unboxed ones, without affecting normal computations.  Fields and variables of value types can be split into their unboxed components.  Non-static methods on value types can be transformed into static methods which take the components as value parameters. Some common questions arise around this point in any discussion of value types. Why burden the programmer with all these extra rules?  Why not detect programs automagically and perform unboxing transparently?  The answer is that it is easy to break the rules accidently unless they are agreed to by the programmer and enforced.  Automatic unboxing optimizations are tantalizing but (so far) unreachable ideal.  In the current state of the art, it is possible exhibit benchmarks in which automatic unboxing provides the desired effects, but it is not possible to provide a JVM with a performance model that assures the programmer when unboxing will occur.  This is why I’m writing this note, to enlist help from, and provide assurances to, the programmer.  Basically, I’m shooting for a good set of user-supplied “pragmas” to frame the desired optimization. Again, the important thing is that the unboxing must be done reliably, or else programmers will have no reason to work with the extra complexity of the value-safety rules.  There must be a reasonably stable performance model, wherein using a value type has approximately the same performance characteristics as writing the unboxed components as separate Java variables. There are some rough corners to the present scheme.  Since Java fields and array elements are initialized to null, value-type computations which incorporate uninitialized variables can produce null pointer exceptions.  One workaround for this is to require such variables to be null-tested, and the result replaced with a suitable all-zero value of the value type.  That is what the “cast” method does above. Generically typed APIs like List<T> will continue to manipulate boxed values always, at least until we figure out how to do reification of generic type instances.  Use of such APIs will elicit warnings until their type parameters (and/or relevant members) are annotated or typed as value-safe.  Retrofitting List<T> is likely to expose flaws in the present scheme, which we will need to engineer around.  Here are a couple of first approaches: public interface java.util.List<@ValueSafe T> extends Collection<T> { … public interface java.util.List<T extends Object|ValueType> extends Collection<T> { … (The second approach would require disjunctive types, in which value-safety is “contagious” from the constituent types.) With more transformations, the return value types of methods can also be unboxed.  This may require significant bytecode-level transformations, and would work best in the presence of a bytecode representation for multiple value groups, which I have proposed elsewhere under the title “Tuples in the VM”. But for starters, the JVM can apply this transformation under the covers, to internally compiled methods.  This would give a way to express multiple return values and structured return values, which is a significant pain-point for Java programmers, especially those who work with low-level structure types favored by modern vector and graphics processors.  The lack of multiple return values has a strong distorting effect on many Java APIs. Even if the JVM fails to unbox a value, there is still potential benefit to the value type.  Clustered computing systems something have copy operations (serialization or something similar) which apply implicitly to command operands.  When copying JVM objects, it is extremely helpful to know when an object’s identity is important or not.  If an object reference is a copied operand, the system may have to create a proxy handle which points back to the original object, so that side effects are visible.  Proxies must be managed carefully, and this can be expensive.  On the other hand, value types are exactly those types which a JVM can “copy and forget” with no downside. Array types are crucial to bulk data interfaces.  (As data sizes and rates increase, bulk data becomes more important than scalar data, so arrays are definitely accompanying us into the future of computing.)  Value types are very helpful for adding structure to bulk data, so a successful value type mechanism will make it easier for us to express richer forms of bulk data. Unboxing arrays (i.e., arrays containing unboxed values) will provide better cache and memory density, and more direct data movement within clustered or heterogeneous computing systems.  They require the deepest transformations, relative to today’s JVM.  There is an impedance mismatch between value-type arrays and Java’s covariant array typing, so compromises will need to be struck with existing Java semantics.  It is probably worth the effort, since arrays of unboxed value types are inherently more memory-efficient than standard Java arrays, which rely on dependent pointer chains. It may be sufficient to extend the “value-safe” concept to array declarations, and allow low-level transformations to change value-safe array declarations from the standard boxed form into an unboxed tuple-based form.  Such value-safe arrays would not be convertible to Object[] arrays.  Certain connection points, such as Arrays.copyOf and System.arraycopy might need additional input/output combinations, to allow smooth conversion between arrays with boxed and unboxed elements. Alternatively, the correct solution may have to wait until we have enough reification of generic types, and enough operator overloading, to enable an overhaul of Java arrays. Implicit Method Definitions The example of class Complex above may be unattractively complex.  I believe most or all of the elements of the example class are required by the logic of value types. If this is true, a programmer who writes a value type will have to write lots of error-prone boilerplate code.  On the other hand, I think nearly all of the code (except for the domain-specific parts like plus and minus) can be implicitly generated. Java has a rule for implicitly defining a class’s constructor, if no it defines no constructors explicitly.  Likewise, there are rules for providing default access modifiers for interface members.  Because of the highly regular structure of value types, it might be reasonable to perform similar implicit transformations on value types.  Here’s an example of a “highly implicit” definition of a complex number type: public class Complex implements ValueType {  // implicitly final     public double re, im;  // implicitly public final     //implicit methods are defined elementwise from te fields:     //  toString, asList, equals(2), hashCode, valueOf, cast     //optionally, explicit methods (plus, abs, etc.) would go here } In other words, with the right defaults, a simple value type definition can be a one-liner.  The observant reader will have noticed the similarities (and suitable differences) between the explicit methods above and the corresponding methods for List<T>. Another way to abbreviate such a class would be to make an annotation the primary trigger of the functionality, and to add the interface(s) implicitly: public @ValueType class Complex { … // implicitly final, implements ValueType (But to me it seems better to communicate the “magic” via an interface, even if it is rooted in an annotation.) Implicitly Defined Value Types So far we have been working with nominal value types, which is to say that the sequence of typed components is associated with a name and additional methods that convey the intention of the programmer.  A simple ordered pair of floating point numbers can be variously interpreted as (to name a few possibilities) a rectangular or polar complex number or Cartesian point.  The name and the methods convey the intended meaning. But what if we need a truly simple ordered pair of floating point numbers, without any further conceptual baggage?  Perhaps we are writing a method (like “divideAndRemainder”) which naturally returns a pair of numbers instead of a single number.  Wrapping the pair of numbers in a nominal type (like “QuotientAndRemainder”) makes as little sense as wrapping a single return value in a nominal type (like “Quotient”).  What we need here are structural value types commonly known as tuples. For the present discussion, let us assign a conventional, JVM-friendly name to tuples, roughly as follows: public class java.lang.tuple.$DD extends java.lang.tuple.Tuple {      double $1, $2; } Here the component names are fixed and all the required methods are defined implicitly.  The supertype is an abstract class which has suitable shared declarations.  The name itself mentions a JVM-style method parameter descriptor, which may be “cracked” to determine the number and types of the component fields. The odd thing about such a tuple type (and structural types in general) is it must be instantiated lazily, in response to linkage requests from one or more classes that need it.  The JVM and/or its class loaders must be prepared to spin a tuple type on demand, given a simple name reference, $xyz, where the xyz is cracked into a series of component types.  (Specifics of naming and name mangling need some tasteful engineering.) Tuples also seem to demand, even more than nominal types, some support from the language.  (This is probably because notations for non-nominal types work best as combinations of punctuation and type names, rather than named constructors like Function3 or Tuple2.)  At a minimum, languages with tuples usually (I think) have some sort of simple bracket notation for creating tuples, and a corresponding pattern-matching syntax (or “destructuring bind”) for taking tuples apart, at least when they are parameter lists.  Designing such a syntax is no simple thing, because it ought to play well with nominal value types, and also with pre-existing Java features, such as method parameter lists, implicit conversions, generic types, and reflection.  That is a task for another day. Other Use Cases Besides complex numbers and simple tuples there are many use cases for value types.  Many tuple-like types have natural value-type representations. These include rational numbers, point locations and pixel colors, and various kinds of dates and addresses. Other types have a variable-length ‘tail’ of internal values. The most common example of this is String, which is (mathematically) a sequence of UTF-16 character values. Similarly, bit vectors, multiple-precision numbers, and polynomials are composed of sequences of values. Such types include, in their representation, a reference to a variable-sized data structure (often an array) which (somehow) represents the sequence of values. The value type may also include ’header’ information. Variable-sized values often have a length distribution which favors short lengths. In that case, the design of the value type can make the first few values in the sequence be direct ’header’ fields of the value type. In the common case where the header is enough to represent the whole value, the tail can be a shared null value, or even just a null reference. Note that the tail need not be an immutable object, as long as the header type encapsulates it well enough. This is the case with String, where the tail is a mutable (but never mutated) character array. Field types and their order must be a globally visible part of the API.  The structure of the value type must be transparent enough to have a globally consistent unboxed representation, so that all callers and callees agree about the type and order of components  that appear as parameters, return types, and array elements.  This is a trade-off between efficiency and encapsulation, which is forced on us when we remove an indirection enjoyed by boxed representations.  A JVM-only transformation would not care about such visibility, but a bytecode transformation would need to take care that (say) the components of complex numbers would not get swapped after a redefinition of Complex and a partial recompile.  Perhaps constant pool references to value types need to declare the field order as assumed by each API user. This brings up the delicate status of private fields in a value type.  It must always be possible to load, store, and copy value types as coordinated groups, and the JVM performs those movements by moving individual scalar values between locals and stack.  If a component field is not public, what is to prevent hostile code from plucking it out of the tuple using a rogue aload or astore instruction?  Nothing but the verifier, so we may need to give it more smarts, so that it treats value types as inseparable groups of stack slots or locals (something like long or double). My initial thought was to make the fields always public, which would make the security problem moot.  But public is not always the right answer; consider the case of String, where the underlying mutable character array must be encapsulated to prevent security holes.  I believe we can win back both sides of the tradeoff, by training the verifier never to split up the components in an unboxed value.  Just as the verifier encapsulates the two halves of a 64-bit primitive, it can encapsulate the the header and body of an unboxed String, so that no code other than that of class String itself can take apart the values. Similar to String, we could build an efficient multi-precision decimal type along these lines: public final class DecimalValue extends ValueType {     protected final long header;     protected private final BigInteger digits;     public DecimalValue valueOf(int value, int scale) {         assert(scale >= 0);         return new DecimalValue(((long)value << 32) + scale, null);     }     public DecimalValue valueOf(long value, int scale) {         if (value == (int) value)             return valueOf((int)value, scale);         return new DecimalValue(-scale, new BigInteger(value));     } } Values of this type would be passed between methods as two machine words. Small values (those with a significand which fits into 32 bits) would be represented without any heap data at all, unless the DecimalValue itself were boxed. (Note the tension between encapsulation and unboxing in this case.  It would be better if the header and digits fields were private, but depending on where the unboxing information must “leak”, it is probably safer to make a public revelation of the internal structure.) Note that, although an array of Complex can be faked with a double-length array of double, there is no easy way to fake an array of unboxed DecimalValues.  (Either an array of boxed values or a transposed pair of homogeneous arrays would be reasonable fallbacks, in a current JVM.)  Getting the full benefit of unboxing and arrays will require some new JVM magic. Although the JVM emphasizes portability, system dependent code will benefit from using machine-level types larger than 64 bits.  For example, the back end of a linear algebra package might benefit from value types like Float4 which map to stock vector types.  This is probably only worthwhile if the unboxing arrays can be packed with such values. More Daydreams A more finely-divided design for dynamic enforcement of value safety could feature separate marker interfaces for each invariant.  An empty marker interface Unsynchronizable could cause suitable exceptions for monitor instructions on objects in marked classes.  More radically, a Interchangeable marker interface could cause JVM primitives that are sensitive to object identity to raise exceptions; the strangest result would be that the acmp instruction would have to be specified as raising an exception. @ValueSafe public interface ValueType extends java.io.Serializable,         Unsynchronizable, Interchangeable { … public class Complex implements ValueType {     // inherits Serializable, Unsynchronizable, Interchangeable, @ValueSafe     … It seems possible that Integer and the other wrapper types could be retro-fitted as value-safe types.  This is a major change, since wrapper objects would be unsynchronizable and their references interchangeable.  It is likely that code which violates value-safety for wrapper types exists but is uncommon.  It is less plausible to retro-fit String, since the prominent operation String.intern is often used with value-unsafe code. We should also reconsider the distinction between boxed and unboxed values in code.  The design presented above obscures that distinction.  As another thought experiment, we could imagine making a first class distinction in the type system between boxed and unboxed representations.  Since only primitive types are named with a lower-case initial letter, we could define that the capitalized version of a value type name always refers to the boxed representation, while the initial lower-case variant always refers to boxed.  For example: complex pi = complex.valueOf(Math.PI, 0); Complex boxPi = pi;  // convert to boxed myList.add(boxPi); complex z = myList.get(0);  // unbox Such a convention could perhaps absorb the current difference between int and Integer, double and Double. It might also allow the programmer to express a helpful distinction among array types. As said above, array types are crucial to bulk data interfaces, but are limited in the JVM.  Extending arrays beyond the present limitations is worth thinking about; for example, the Maxine JVM implementation has a hybrid object/array type.  Something like this which can also accommodate value type components seems worthwhile.  On the other hand, does it make sense for value types to contain short arrays?  And why should random-access arrays be the end of our design process, when bulk data is often sequentially accessed, and it might make sense to have heterogeneous streams of data as the natural “jumbo” data structure.  These considerations must wait for another day and another note. More Work It seems to me that a good sequence for introducing such value types would be as follows: Add the value-safety restrictions to an experimental version of javac. Code some sample applications with value types, including Complex and DecimalValue. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so.  Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. A staggered roll-out like this would decouple language changes from bytecode changes, which is always a convenient thing. A similar investigation should be applied (concurrently) to array types.  In this case, it seems to me that the starting point is in the JVM: Add an experimental unboxing array data structure to a production JVM, perhaps along the lines of Maxine hybrids.  No bytecode or language support is required at first; everything can be done with encapsulated unsafe operations and/or method handles. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so.  Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. That’s enough musing me for now.  Back to work!

    Read the article

  • The Incremental Architect&rsquo;s Napkin - #5 - Design functions for extensibility and readability

    - by Ralf Westphal
    Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/08/24/the-incremental-architectrsquos-napkin---5---design-functions-for.aspx The functionality of programs is entered via Entry Points. So what we´re talking about when designing software is a bunch of functions handling the requests represented by and flowing in through those Entry Points. Designing software thus consists of at least three phases: Analyzing the requirements to find the Entry Points and their signatures Designing the functionality to be executed when those Entry Points get triggered Implementing the functionality according to the design aka coding I presume, you´re familiar with phase 1 in some way. And I guess you´re proficient in implementing functionality in some programming language. But in my experience developers in general are not experienced in going through an explicit phase 2. “Designing functionality? What´s that supposed to mean?” you might already have thought. Here´s my definition: To design functionality (or functional design for short) means thinking about… well, functions. You find a solution for what´s supposed to happen when an Entry Point gets triggered in terms of functions. A conceptual solution that is, because those functions only exist in your head (or on paper) during this phase. But you may have guess that, because it´s “design” not “coding”. And here is, what functional design is not: It´s not about logic. Logic is expressions (e.g. +, -, && etc.) and control statements (e.g. if, switch, for, while etc.). Also I consider calling external APIs as logic. It´s equally basic. It´s what code needs to do in order to deliver some functionality or quality. Logic is what´s doing that needs to be done by software. Transformations are either done through expressions or API-calls. And then there is alternative control flow depending on the result of some expression. Basically it´s just jumps in Assembler, sometimes to go forward (if, switch), sometimes to go backward (for, while, do). But calling your own function is not logic. It´s not necessary to produce any outcome. Functionality is not enhanced by adding functions (subroutine calls) to your code. Nor is quality increased by adding functions. No performance gain, no higher scalability etc. through functions. Functions are not relevant to functionality. Strange, isn´t it. What they are important for is security of investment. By introducing functions into our code we can become more productive (re-use) and can increase evolvability (higher unterstandability, easier to keep code consistent). That´s no small feat, however. Evolvable code can hardly be overestimated. That´s why to me functional design is so important. It´s at the core of software development. To sum this up: Functional design is on a level of abstraction above (!) logical design or algorithmic design. Functional design is only done until you get to a point where each function is so simple you are very confident you can easily code it. Functional design an logical design (which mostly is coding, but can also be done using pseudo code or flow charts) are complementary. Software needs both. If you start coding right away you end up in a tangled mess very quickly. Then you need back out through refactoring. Functional design on the other hand is bloodless without actual code. It´s just a theory with no experiments to prove it. But how to do functional design? An example of functional design Let´s assume a program to de-duplicate strings. The user enters a number of strings separated by commas, e.g. a, b, a, c, d, b, e, c, a. And the program is supposed to clear this list of all doubles, e.g. a, b, c, d, e. There is only one Entry Point to this program: the user triggers the de-duplication by starting the program with the string list on the command line C:\>deduplicate "a, b, a, c, d, b, e, c, a" a, b, c, d, e …or by clicking on a GUI button. This leads to the Entry Point function to get called. It´s the program´s main function in case of the batch version or a button click event handler in the GUI version. That´s the physical Entry Point so to speak. It´s inevitable. What then happens is a three step process: Transform the input data from the user into a request. Call the request handler. Transform the output of the request handler into a tangible result for the user. Or to phrase it a bit more generally: Accept input. Transform input into output. Present output. This does not mean any of these steps requires a lot of effort. Maybe it´s just one line of code to accomplish it. Nevertheless it´s a distinct step in doing the processing behind an Entry Point. Call it an aspect or a responsibility - and you will realize it most likely deserves a function of its own to satisfy the Single Responsibility Principle (SRP). Interestingly the above list of steps is already functional design. There is no logic, but nevertheless the solution is described - albeit on a higher level of abstraction than you might have done yourself. But it´s still on a meta-level. The application to the domain at hand is easy, though: Accept string list from command line De-duplicate Present de-duplicated strings on standard output And this concrete list of processing steps can easily be transformed into code:static void Main(string[] args) { var input = Accept_string_list(args); var output = Deduplicate(input); Present_deduplicated_string_list(output); } Instead of a big problem there are three much smaller problems now. If you think each of those is trivial to implement, then go for it. You can stop the functional design at this point. But maybe, just maybe, you´re not so sure how to go about with the de-duplication for example. Then just implement what´s easy right now, e.g.private static string Accept_string_list(string[] args) { return args[0]; } private static void Present_deduplicated_string_list( string[] output) { var line = string.Join(", ", output); Console.WriteLine(line); } Accept_string_list() contains logic in the form of an API-call. Present_deduplicated_string_list() contains logic in the form of an expression and an API-call. And then repeat the functional design for the remaining processing step. What´s left is the domain logic: de-duplicating a list of strings. How should that be done? Without any logic at our disposal during functional design you´re left with just functions. So which functions could make up the de-duplication? Here´s a suggestion: De-duplicate Parse the input string into a true list of strings. Register each string in a dictionary/map/set. That way duplicates get cast away. Transform the data structure into a list of unique strings. Processing step 2 obviously was the core of the solution. That´s where real creativity was needed. That´s the core of the domain. But now after this refinement the implementation of each step is easy again:private static string[] Parse_string_list(string input) { return input.Split(',') .Select(s => s.Trim()) .ToArray(); } private static Dictionary<string,object> Compile_unique_strings(string[] strings) { return strings.Aggregate( new Dictionary<string, object>(), (agg, s) => { agg[s] = null; return agg; }); } private static string[] Serialize_unique_strings( Dictionary<string,object> dict) { return dict.Keys.ToArray(); } With these three additional functions Main() now looks like this:static void Main(string[] args) { var input = Accept_string_list(args); var strings = Parse_string_list(input); var dict = Compile_unique_strings(strings); var output = Serialize_unique_strings(dict); Present_deduplicated_string_list(output); } I think that´s very understandable code: just read it from top to bottom and you know how the solution to the problem works. It´s a mirror image of the initial design: Accept string list from command line Parse the input string into a true list of strings. Register each string in a dictionary/map/set. That way duplicates get cast away. Transform the data structure into a list of unique strings. Present de-duplicated strings on standard output You can even re-generate the design by just looking at the code. Code and functional design thus are always in sync - if you follow some simple rules. But about that later. And as a bonus: all the functions making up the process are small - which means easy to understand, too. So much for an initial concrete example. Now it´s time for some theory. Because there is method to this madness ;-) The above has only scratched the surface. Introducing Flow Design Functional design starts with a given function, the Entry Point. Its goal is to describe the behavior of the program when the Entry Point is triggered using a process, not an algorithm. An algorithm consists of logic, a process on the other hand consists just of steps or stages. Each processing step transforms input into output or a side effect. Also it might access resources, e.g. a printer, a database, or just memory. Processing steps thus can rely on state of some sort. This is different from Functional Programming, where functions are supposed to not be stateful and not cause side effects.[1] In its simplest form a process can be written as a bullet point list of steps, e.g. Get data from user Output result to user Transform data Parse data Map result for output Such a compilation of steps - possibly on different levels of abstraction - often is the first artifact of functional design. It can be generated by a team in an initial design brainstorming. Next comes ordering the steps. What should happen first, what next etc.? Get data from user Parse data Transform data Map result for output Output result to user That´s great for a start into functional design. It´s better than starting to code right away on a given function using TDD. Please get me right: TDD is a valuable practice. But it can be unnecessarily hard if the scope of a functionn is too large. But how do you know beforehand without investing some thinking? And how to do this thinking in a systematic fashion? My recommendation: For any given function you´re supposed to implement first do a functional design. Then, once you´re confident you know the processing steps - which are pretty small - refine and code them using TDD. You´ll see that´s much, much easier - and leads to cleaner code right away. For more information on this approach I call “Informed TDD” read my book of the same title. Thinking before coding is smart. And writing down the solution as a bunch of functions possibly is the simplest thing you can do, I´d say. It´s more according to the KISS (Keep It Simple, Stupid) principle than returning constants or other trivial stuff TDD development often is started with. So far so good. A simple ordered list of processing steps will do to start with functional design. As shown in the above example such steps can easily be translated into functions. Moving from design to coding thus is simple. However, such a list does not scale. Processing is not always that simple to be captured in a list. And then the list is just text. Again. Like code. That means the design is lacking visuality. Textual representations need more parsing by your brain than visual representations. Plus they are limited in their “dimensionality”: text just has one dimension, it´s sequential. Alternatives and parallelism are hard to encode in text. In addition the functional design using numbered lists lacks data. It´s not visible what´s the input, output, and state of the processing steps. That´s why functional design should be done using a lightweight visual notation. No tool is necessary to draw such designs. Use pen and paper; a flipchart, a whiteboard, or even a napkin is sufficient. Visualizing processes The building block of the functional design notation is a functional unit. I mostly draw it like this: Something is done, it´s clear what goes in, it´s clear what comes out, and it´s clear what the processing step requires in terms of state or hardware. Whenever input flows into a functional unit it gets processed and output is produced and/or a side effect occurs. Flowing data is the driver of something happening. That´s why I call this approach to functional design Flow Design. It´s about data flow instead of control flow. Control flow like in algorithms is of no concern to functional design. Thinking about control flow simply is too low level. Once you start with control flow you easily get bogged down by tons of details. That´s what you want to avoid during design. Design is supposed to be quick, broad brush, abstract. It should give overview. But what about all the details? As Robert C. Martin rightly said: “Programming is abot detail”. Detail is a matter of code. Once you start coding the processing steps you designed you can worry about all the detail you want. Functional design does not eliminate all the nitty gritty. It just postpones tackling them. To me that´s also an example of the SRP. Function design has the responsibility to come up with a solution to a problem posed by a single function (Entry Point). And later coding has the responsibility to implement the solution down to the last detail (i.e. statement, API-call). TDD unfortunately mixes both responsibilities. It´s just coding - and thereby trying to find detailed implementations (green phase) plus getting the design right (refactoring). To me that´s one reason why TDD has failed to deliver on its promise for many developers. Using functional units as building blocks of functional design processes can be depicted very easily. Here´s the initial process for the example problem: For each processing step draw a functional unit and label it. Choose a verb or an “action phrase” as a label, not a noun. Functional design is about activities, not state or structure. Then make the output of an upstream step the input of a downstream step. Finally think about the data that should flow between the functional units. Write the data above the arrows connecting the functional units in the direction of the data flow. Enclose the data description in brackets. That way you can clearly see if all flows have already been specified. Empty brackets mean “no data is flowing”, but nevertheless a signal is sent. A name like “list” or “strings” in brackets describes the data content. Use lower case labels for that purpose. A name starting with an upper case letter like “String” or “Customer” on the other hand signifies a data type. If you like, you also can combine descriptions with data types by separating them with a colon, e.g. (list:string) or (strings:string[]). But these are just suggestions from my practice with Flow Design. You can do it differently, if you like. Just be sure to be consistent. Flows wired-up in this manner I call one-dimensional (1D). Each functional unit just has one input and/or one output. A functional unit without an output is possible. It´s like a black hole sucking up input without producing any output. Instead it produces side effects. A functional unit without an input, though, does make much sense. When should it start to work? What´s the trigger? That´s why in the above process even the first processing step has an input. If you like, view such 1D-flows as pipelines. Data is flowing through them from left to right. But as you can see, it´s not always the same data. It get´s transformed along its passage: (args) becomes a (list) which is turned into (strings). The Principle of Mutual Oblivion A very characteristic trait of flows put together from function units is: no functional units knows another one. They are all completely independent of each other. Functional units don´t know where their input is coming from (or even when it´s gonna arrive). They just specify a range of values they can process. And they promise a certain behavior upon input arriving. Also they don´t know where their output is going. They just produce it in their own time independent of other functional units. That means at least conceptually all functional units work in parallel. Functional units don´t know their “deployment context”. They now nothing about the overall flow they are place in. They are just consuming input from some upstream, and producing output for some downstream. That makes functional units very easy to test. At least as long as they don´t depend on state or resources. I call this the Principle of Mutual Oblivion (PoMO). Functional units are oblivious of others as well as an overall context/purpose. They are just parts of a whole focused on a single responsibility. How the whole is built, how a larger goal is achieved, is of no concern to the single functional units. By building software in such a manner, functional design interestingly follows nature. Nature´s building blocks for organisms also follow the PoMO. The cells forming your body do not know each other. Take a nerve cell “controlling” a muscle cell for example:[2] The nerve cell does not know anything about muscle cells, let alone the specific muscel cell it is “attached to”. Likewise the muscle cell does not know anything about nerve cells, let a lone a specific nerve cell “attached to” it. Saying “the nerve cell is controlling the muscle cell” thus only makes sense when viewing both from the outside. “Control” is a concept of the whole, not of its parts. Control is created by wiring-up parts in a certain way. Both cells are mutually oblivious. Both just follow a contract. One produces Acetylcholine (ACh) as output, the other consumes ACh as input. Where the ACh is going, where it´s coming from neither cell cares about. Million years of evolution have led to this kind of division of labor. And million years of evolution have produced organism designs (DNA) which lead to the production of these different cell types (and many others) and also to their co-location. The result: the overall behavior of an organism. How and why this happened in nature is a mystery. For our software, though, it´s clear: functional and quality requirements needs to be fulfilled. So we as developers have to become “intelligent designers” of “software cells” which we put together to form a “software organism” which responds in satisfying ways to triggers from it´s environment. My bet is: If nature gets complex organisms working by following the PoMO, who are we to not apply this recipe for success to our much simpler “machines”? So my rule is: Wherever there is functionality to be delivered, because there is a clear Entry Point into software, design the functionality like nature would do it. Build it from mutually oblivious functional units. That´s what Flow Design is about. In that way it´s even universal, I´d say. Its notation can also be applied to biology: Never mind labeling the functional units with nouns. That´s ok in Flow Design. You´ll do that occassionally for functional units on a higher level of abstraction or when their purpose is close to hardware. Getting a cockroach to roam your bedroom takes 1,000,000 nerve cells (neurons). Getting the de-duplication program to do its job just takes 5 “software cells” (functional units). Both, though, follow the same basic principle. Translating functional units into code Moving from functional design to code is no rocket science. In fact it´s straightforward. There are two simple rules: Translate an input port to a function. Translate an output port either to a return statement in that function or to a function pointer visible to that function. The simplest translation of a functional unit is a function. That´s what you saw in the above example. Functions are mutually oblivious. That why Functional Programming likes them so much. It makes them composable. Which is the reason, nature works according to the PoMO. Let´s be clear about one thing: There is no dependency injection in nature. For all of an organism´s complexity no DI container is used. Behavior is the result of smooth cooperation between mutually oblivious building blocks. Functions will often be the adequate translation for the functional units in your designs. But not always. Take for example the case, where a processing step should not always produce an output. Maybe the purpose is to filter input. Here the functional unit consumes words and produces words. But it does not pass along every word flowing in. Some words are swallowed. Think of a spell checker. It probably should not check acronyms for correctness. There are too many of them. Or words with no more than two letters. Such words are called “stop words”. In the above picture the optionality of the output is signified by the astrisk outside the brackets. It means: Any number of (word) data items can flow from the functional unit for each input data item. It might be none or one or even more. This I call a stream of data. Such behavior cannot be translated into a function where output is generated with return. Because a function always needs to return a value. So the output port is translated into a function pointer or continuation which gets passed to the subroutine when called:[3]void filter_stop_words( string word, Action<string> onNoStopWord) { if (...check if not a stop word...) onNoStopWord(word); } If you want to be nitpicky you might call such a function pointer parameter an injection. And technically you´re right. Conceptually, though, it´s not an injection. Because the subroutine is not functionally dependent on the continuation. Firstly continuations are procedures, i.e. subroutines without a return type. Remember: Flow Design is about unidirectional data flow. Secondly the name of the formal parameter is chosen in a way as to not assume anything about downstream processing steps. onNoStopWord describes a situation (or event) within the functional unit only. Translating output ports into function pointers helps keeping functional units mutually oblivious in cases where output is optional or produced asynchronically. Either pass the function pointer to the function upon call. Or make it global by putting it on the encompassing class. Then it´s called an event. In C# that´s even an explicit feature.class Filter { public void filter_stop_words( string word) { if (...check if not a stop word...) onNoStopWord(word); } public event Action<string> onNoStopWord; } When to use a continuation and when to use an event dependens on how a functional unit is used in flows and how it´s packed together with others into classes. You´ll see examples further down the Flow Design road. Another example of 1D functional design Let´s see Flow Design once more in action using the visual notation. How about the famous word wrap kata? Robert C. Martin has posted a much cited solution including an extensive reasoning behind his TDD approach. So maybe you want to compare it to Flow Design. The function signature given is:string WordWrap(string text, int maxLineLength) {...} That´s not an Entry Point since we don´t see an application with an environment and users. Nevertheless it´s a function which is supposed to provide a certain functionality. The text passed in has to be reformatted. The input is a single line of arbitrary length consisting of words separated by spaces. The output should consist of one or more lines of a maximum length specified. If a word is longer than a the maximum line length it can be split in multiple parts each fitting in a line. Flow Design Let´s start by brainstorming the process to accomplish the feat of reformatting the text. What´s needed? Words need to be assembled into lines Words need to be extracted from the input text The resulting lines need to be assembled into the output text Words too long to fit in a line need to be split Does sound about right? I guess so. And it shows a kind of priority. Long words are a special case. So maybe there is a hint for an incremental design here. First let´s tackle “average words” (words not longer than a line). Here´s the Flow Design for this increment: The the first three bullet points turned into functional units with explicit data added. As the signature requires a text is transformed into another text. See the input of the first functional unit and the output of the last functional unit. In between no text flows, but words and lines. That´s good to see because thereby the domain is clearly represented in the design. The requirements are talking about words and lines and here they are. But note the asterisk! It´s not outside the brackets but inside. That means it´s not a stream of words or lines, but lists or sequences. For each text a sequence of words is output. For each sequence of words a sequence of lines is produced. The asterisk is used to abstract from the concrete implementation. Like with streams. Whether the list of words gets implemented as an array or an IEnumerable is not important during design. It´s an implementation detail. Does any processing step require further refinement? I don´t think so. They all look pretty “atomic” to me. And if not… I can always backtrack and refine a process step using functional design later once I´ve gained more insight into a sub-problem. Implementation The implementation is straightforward as you can imagine. The processing steps can all be translated into functions. Each can be tested easily and separately. Each has a focused responsibility. And the process flow becomes just a sequence of function calls: Easy to understand. It clearly states how word wrapping works - on a high level of abstraction. And it´s easy to evolve as you´ll see. Flow Design - Increment 2 So far only texts consisting of “average words” are wrapped correctly. Words not fitting in a line will result in lines too long. Wrapping long words is a feature of the requested functionality. Whether it´s there or not makes a difference to the user. To quickly get feedback I decided to first implement a solution without this feature. But now it´s time to add it to deliver the full scope. Fortunately Flow Design automatically leads to code following the Open Closed Principle (OCP). It´s easy to extend it - instead of changing well tested code. How´s that possible? Flow Design allows for extension of functionality by inserting functional units into the flow. That way existing functional units need not be changed. The data flow arrow between functional units is a natural extension point. No need to resort to the Strategy Pattern. No need to think ahead where extions might need to be made in the future. I just “phase in” the remaining processing step: Since neither Extract words nor Reformat know of their environment neither needs to be touched due to the “detour”. The new processing step accepts the output of the existing upstream step and produces data compatible with the existing downstream step. Implementation - Increment 2 A trivial implementation checking the assumption if this works does not do anything to split long words. The input is just passed on: Note how clean WordWrap() stays. The solution is easy to understand. A developer looking at this code sometime in the future, when a new feature needs to be build in, quickly sees how long words are dealt with. Compare this to Robert C. Martin´s solution:[4] How does this solution handle long words? Long words are not even part of the domain language present in the code. At least I need considerable time to understand the approach. Admittedly the Flow Design solution with the full implementation of long word splitting is longer than Robert C. Martin´s. At least it seems. Because his solution does not cover all the “word wrap situations” the Flow Design solution handles. Some lines would need to be added to be on par, I guess. But even then… Is a difference in LOC that important as long as it´s in the same ball park? I value understandability and openness for extension higher than saving on the last line of code. Simplicity is not just less code, it´s also clarity in design. But don´t take my word for it. Try Flow Design on larger problems and compare for yourself. What´s the easier, more straightforward way to clean code? And keep in mind: You ain´t seen all yet ;-) There´s more to Flow Design than described in this chapter. In closing I hope I was able to give you a impression of functional design that makes you hungry for more. To me it´s an inevitable step in software development. Jumping from requirements to code does not scale. And it leads to dirty code all to quickly. Some thought should be invested first. Where there is a clear Entry Point visible, it´s functionality should be designed using data flows. Because with data flows abstraction is possible. For more background on why that´s necessary read my blog article here. For now let me point out to you - if you haven´t already noticed - that Flow Design is a general purpose declarative language. It´s “programming by intention” (Shalloway et al.). Just write down how you think the solution should work on a high level of abstraction. This breaks down a large problem in smaller problems. And by following the PoMO the solutions to those smaller problems are independent of each other. So they are easy to test. Or you could even think about getting them implemented in parallel by different team members. Flow Design not only increases evolvability, but also helps becoming more productive. All team members can participate in functional design. This goes beyon collective code ownership. We´re talking collective design/architecture ownership. Because with Flow Design there is a common visual language to talk about functional design - which is the foundation for all other design activities.   PS: If you like what you read, consider getting my ebook “The Incremental Architekt´s Napkin”. It´s where I compile all the articles in this series for easier reading. I like the strictness of Function Programming - but I also find it quite hard to live by. And it certainly is not what millions of programmers are used to. Also to me it seems, the real world is full of state and side effects. So why give them such a bad image? That´s why functional design takes a more pragmatic approach. State and side effects are ok for processing steps - but be sure to follow the SRP. Don´t put too much of it into a single processing step. ? Image taken from www.physioweb.org ? My code samples are written in C#. C# sports typed function pointers called delegates. Action is such a function pointer type matching functions with signature void someName(T t). Other languages provide similar ways to work with functions as first class citizens - even Java now in version 8. I trust you find a way to map this detail of my translation to your favorite programming language. I know it works for Java, C++, Ruby, JavaScript, Python, Go. And if you´re using a Functional Programming language it´s of course a no brainer. ? Taken from his blog post “The Craftsman 62, The Dark Path”. ?

    Read the article

  • Setting up a pc bluetooth server for android

    - by Del
    Alright, I've been reading a lot of topics the past two or three days and nothing seems to have asked this. I am writing a PC side server for my andriod device, this is for exchanging some information and general debugging. Eventually I will be connecting to a SPP device to control a microcontroller. I have managed, using the following (Android to pc) to connect to rfcomm channel 11 and exchange data between my android device and my pc. Method m = device.getClass().getMethod("createRfcommSocket", new Class[] { int.class }); tmp = (BluetoothSocket) m.invoke(device, Integer.valueOf(11)); I have attempted the createRfcommSocketToServiceRecord(UUID) method, with absolutely no luck. For the PC side, I have been using the C Bluez stack for linux. I have the following code which registers the service and opens a server socket: int main(int argc, char **argv) { struct sockaddr_rc loc_addr = { 0 }, rem_addr = { 0 }; char buf[1024] = { 0 }; char str[1024] = { 0 }; int s, client, bytes_read; sdp_session_t *session; socklen_t opt = sizeof(rem_addr); session = register_service(); s = socket(AF_BLUETOOTH, SOCK_STREAM, BTPROTO_RFCOMM); loc_addr.rc_family = AF_BLUETOOTH; loc_addr.rc_bdaddr = *BDADDR_ANY; loc_addr.rc_channel = (uint8_t) 11; bind(s, (struct sockaddr *)&loc_addr, sizeof(loc_addr)); listen(s, 1); client = accept(s, (struct sockaddr *)&rem_addr, &opt); ba2str( &rem_addr.rc_bdaddr, buf ); fprintf(stderr, "accepted connection from %s\n", buf); memset(buf, 0, sizeof(buf)); bytes_read = read(client, buf, sizeof(buf)); if( bytes_read 0 ) { printf("received [%s]\n", buf); } sprintf(str,"to Android."); printf("sent [%s]\n",str); write(client, str, sizeof(str)); close(client); close(s); sdp_close( session ); return 0; } sdp_session_t *register_service() { uint32_t svc_uuid_int[] = { 0x00000000,0x00000000,0x00000000,0x00000000 }; uint8_t rfcomm_channel = 11; const char *service_name = "Remote Host"; const char *service_dsc = "What the remote should be connecting to."; const char *service_prov = "Your mother"; uuid_t root_uuid, l2cap_uuid, rfcomm_uuid, svc_uuid; sdp_list_t *l2cap_list = 0, *rfcomm_list = 0, *root_list = 0, *proto_list = 0, *access_proto_list = 0; sdp_data_t *channel = 0, *psm = 0; sdp_record_t *record = sdp_record_alloc(); // set the general service ID sdp_uuid128_create( &svc_uuid, &svc_uuid_int ); sdp_set_service_id( record, svc_uuid ); // make the service record publicly browsable sdp_uuid16_create(&root_uuid, PUBLIC_BROWSE_GROUP); root_list = sdp_list_append(0, &root_uuid); sdp_set_browse_groups( record, root_list ); // set l2cap information sdp_uuid16_create(&l2cap_uuid, L2CAP_UUID); l2cap_list = sdp_list_append( 0, &l2cap_uuid ); proto_list = sdp_list_append( 0, l2cap_list ); // set rfcomm information sdp_uuid16_create(&rfcomm_uuid, RFCOMM_UUID); channel = sdp_data_alloc(SDP_UINT8, &rfcomm_channel); rfcomm_list = sdp_list_append( 0, &rfcomm_uuid ); sdp_list_append( rfcomm_list, channel ); sdp_list_append( proto_list, rfcomm_list ); // attach protocol information to service record access_proto_list = sdp_list_append( 0, proto_list ); sdp_set_access_protos( record, access_proto_list ); // set the name, provider, and description sdp_set_info_attr(record, service_name, service_prov, service_dsc); int err = 0; sdp_session_t *session = 0; // connect to the local SDP server, register the service record, and // disconnect session = sdp_connect( BDADDR_ANY, BDADDR_LOCAL, SDP_RETRY_IF_BUSY ); err = sdp_record_register(session, record, 0); // cleanup //sdp_data_free( channel ); sdp_list_free( l2cap_list, 0 ); sdp_list_free( rfcomm_list, 0 ); sdp_list_free( root_list, 0 ); sdp_list_free( access_proto_list, 0 ); return session; } And another piece of code, in addition to 'sdptool browse local' which can verifty that the service record is running on the pc: int main(int argc, char **argv) { uuid_t svc_uuid; uint32_t svc_uuid_int[] = { 0x00000000,0x00000000,0x00000000,0x00000000 }; int err; bdaddr_t target; sdp_list_t *response_list = NULL, *search_list, *attrid_list; sdp_session_t *session = 0; str2ba( "01:23:45:67:89:AB", &target ); // connect to the SDP server running on the remote machine session = sdp_connect( BDADDR_ANY, BDADDR_LOCAL, SDP_RETRY_IF_BUSY ); // specify the UUID of the application we're searching for sdp_uuid128_create( &svc_uuid, &svc_uuid_int ); search_list = sdp_list_append( NULL, &svc_uuid ); // specify that we want a list of all the matching applications' attributes uint32_t range = 0x0000ffff; attrid_list = sdp_list_append( NULL, &range ); // get a list of service records that have UUID 0xabcd err = sdp_service_search_attr_req( session, search_list, \ SDP_ATTR_REQ_RANGE, attrid_list, &response_list); sdp_list_t *r = response_list; // go through each of the service records for (; r; r = r-next ) { sdp_record_t *rec = (sdp_record_t*) r-data; sdp_list_t *proto_list; // get a list of the protocol sequences if( sdp_get_access_protos( rec, &proto_list ) == 0 ) { sdp_list_t *p = proto_list; // go through each protocol sequence for( ; p ; p = p-next ) { sdp_list_t *pds = (sdp_list_t*)p-data; // go through each protocol list of the protocol sequence for( ; pds ; pds = pds-next ) { // check the protocol attributes sdp_data_t *d = (sdp_data_t*)pds-data; int proto = 0; for( ; d; d = d-next ) { switch( d-dtd ) { case SDP_UUID16: case SDP_UUID32: case SDP_UUID128: proto = sdp_uuid_to_proto( &d-val.uuid ); break; case SDP_UINT8: if( proto == RFCOMM_UUID ) { printf("rfcomm channel: %d\n",d-val.int8); } break; } } } sdp_list_free( (sdp_list_t*)p-data, 0 ); } sdp_list_free( proto_list, 0 ); } printf("found service record 0x%x\n", rec-handle); sdp_record_free( rec ); } sdp_close(session); } Output: $ ./search rfcomm channel: 11 found service record 0x10008 sdptool: Service Name: Remote Host Service Description: What the remote should be connecting to. Service Provider: Your mother Service RecHandle: 0x10008 Protocol Descriptor List: "L2CAP" (0x0100) "RFCOMM" (0x0003) Channel: 11 And for logcat I'm getting this: 07-22 15:57:06.087: ERROR/BTLD(215): ****************search UUID = 0000*********** 07-22 15:57:06.087: INFO//system/bin/btld(209): btapp_dm_GetRemoteServiceChannel() 07-22 15:57:06.087: INFO//system/bin/btld(209): ##### USerial_Ioctl: BT_Wake, 0x8003 #### 07-22 15:57:06.097: INFO/ActivityManager(88): Displayed activity com.example.socktest/.socktest: 79 ms (total 79 ms) 07-22 15:57:06.697: INFO//system/bin/btld(209): ##### USerial_Ioctl: BT_Sleep, 0x8004 #### 07-22 15:57:07.517: WARN/BTLD(215): ccb timer ticks: 2147483648 07-22 15:57:07.517: INFO//system/bin/btld(209): ##### USerial_Ioctl: BT_Wake, 0x8003 #### 07-22 15:57:07.547: WARN/BTLD(215): info:x10 07-22 15:57:07.547: INFO/BTL-IFS(215): send_ctrl_msg: [BTL_IFS CTRL] send BTLIF_DTUN_SIGNAL_EVT (CTRL) 10 pbytes (hdl 14) 07-22 15:57:07.547: DEBUG/DTUN_HCID_BZ4(253): dtun_dm_sig_link_up() 07-22 15:57:07.547: INFO/DTUN_HCID_BZ4(253): dtun_dm_sig_link_up: dummy_handle = 342 07-22 15:57:07.547: DEBUG/ADAPTER(253): adapter_get_device(00:02:72:AB:7C:EE) 07-22 15:57:07.547: ERROR/BluetoothEventLoop.cpp(88): pollData[0] is revented, check next one 07-22 15:57:07.547: ERROR/BluetoothEventLoop.cpp(88): event_filter: Received signal org.bluez.Device:PropertyChanged from /org/bluez/253/hci0/dev_00_02_72_AB_7C_EE 07-22 15:57:07.777: WARN/BTLD(215): process_service_search_attr_rsp 07-22 15:57:07.787: INFO/BTL-IFS(215): send_ctrl_msg: [BTL_IFS CTRL] send BTLIF_DTUN_SIGNAL_EVT (CTRL) 13 pbytes (hdl 14) 07-22 15:57:07.787: INFO/DTUN_HCID_BZ4(253): dtun_dm_sig_rmt_service_channel: success=0, service=00000000 07-22 15:57:07.787: ERROR/DTUN_HCID_BZ4(253): discovery unsuccessful! 07-22 15:57:08.497: INFO//system/bin/btld(209): ##### USerial_Ioctl: BT_Sleep, 0x8004 #### 07-22 15:57:09.507: INFO//system/bin/btld(209): ##### USerial_Ioctl: BT_Wake, 0x8003 #### 07-22 15:57:09.597: INFO/BTL-IFS(215): send_ctrl_msg: [BTL_IFS CTRL] send BTLIF_DTUN_SIGNAL_EVT (CTRL) 11 pbytes (hdl 14) 07-22 15:57:09.597: DEBUG/DTUN_HCID_BZ4(253): dtun_dm_sig_link_down() 07-22 15:57:09.597: INFO/DTUN_HCID_BZ4(253): dtun_dm_sig_link_down device = 0xf7a0 handle = 342 reason = 22 07-22 15:57:09.597: ERROR/BluetoothEventLoop.cpp(88): pollData[0] is revented, check next one 07-22 15:57:09.597: ERROR/BluetoothEventLoop.cpp(88): event_filter: Received signal org.bluez.Device:PropertyChanged from /org/bluez/253/hci0/dev_00_02_72_AB_7C_EE 07-22 15:57:09.597: DEBUG/BluetoothA2dpService(88): Received intent Intent { act=android.bluetooth.device.action.ACL_DISCONNECTED (has extras) } 07-22 15:57:10.107: INFO//system/bin/btld(209): ##### USerial_Ioctl: BT_Sleep, 0x8004 #### 07-22 15:57:12.107: DEBUG/BluetoothService(88): Cleaning up failed UUID channel lookup: 00:02:72:AB:7C:EE 00000000-0000-0000-0000-000000000000 07-22 15:57:12.107: ERROR/Socket Test(5234): connect() failed 07-22 15:57:12.107: DEBUG/ASOCKWRP(5234): asocket_abort [31,32,33] 07-22 15:57:12.107: INFO/BLZ20_WRAPPER(5234): blz20_wrp_shutdown: s 31, how 2 07-22 15:57:12.107: DEBUG/BLZ20_WRAPPER(5234): blz20_wrp_shutdown: fd (-1:31), bta -1, rc 0, wflags 0x0 07-22 15:57:12.107: INFO/BLZ20_WRAPPER(5234): __close_prot_rfcomm: fd 31 07-22 15:57:12.107: INFO/BLZ20_WRAPPER(5234): __close_prot_rfcomm: bind not completed on this socket 07-22 15:57:12.107: DEBUG/BLZ20_WRAPPER(5234): btlif_signal_event: fd (-1:31), bta -1, rc 0, wflags 0x0 07-22 15:57:12.107: DEBUG/BLZ20_WRAPPER(5234): btlif_signal_event: event BTLIF_BTS_EVT_ABORT matched 07-22 15:57:12.107: DEBUG/BTL_IFC_WRP(5234): wrp_close_s_only: wrp_close_s_only [31] (31:-1) [] 07-22 15:57:12.107: DEBUG/BTL_IFC_WRP(5234): wrp_close_s_only: data socket closed 07-22 15:57:12.107: DEBUG/BTL_IFC_WRP(5234): wsactive_del: delete wsock 31 from active list [ad3e1494] 07-22 15:57:12.107: DEBUG/BTL_IFC_WRP(5234): wrp_close_s_only: wsock fully closed, return to pool 07-22 15:57:12.107: DEBUG/BLZ20_WRAPPER(5234): btsk_free: success 07-22 15:57:12.107: DEBUG/BLZ20_WRAPPER(5234): blz20_wrp_write: wrote 1 bytes out of 1 on fd 33 07-22 15:57:12.107: DEBUG/ASOCKWRP(5234): asocket_destroy 07-22 15:57:12.107: DEBUG/ASOCKWRP(5234): asocket_abort [31,32,33] 07-22 15:57:12.107: INFO/BLZ20_WRAPPER(5234): blz20_wrp_shutdown: s 31, how 2 07-22 15:57:12.107: DEBUG/BLZ20_WRAPPER(5234): blz20_wrp_shutdown: btsk not found, normal close (31) 07-22 15:57:12.107: DEBUG/BLZ20_WRAPPER(5234): blz20_wrp_write: wrote 1 bytes out of 1 on fd 33 07-22 15:57:12.107: INFO/BLZ20_WRAPPER(5234): blz20_wrp_close: s 33 07-22 15:57:12.107: DEBUG/BLZ20_WRAPPER(5234): blz20_wrp_close: btsk not found, normal close (33) 07-22 15:57:12.107: INFO/BLZ20_WRAPPER(5234): blz20_wrp_close: s 32 07-22 15:57:12.107: DEBUG/BLZ20_WRAPPER(5234): blz20_wrp_close: btsk not found, normal close (32) 07-22 15:57:12.107: INFO/BLZ20_WRAPPER(5234): blz20_wrp_close: s 31 07-22 15:57:12.107: DEBUG/BLZ20_WRAPPER(5234): blz20_wrp_close: btsk not found, normal close (31) 07-22 15:57:12.157: DEBUG/Sensors(88): close_akm, fd=151 07-22 15:57:12.167: ERROR/CachedBluetoothDevice(477): onUuidChanged: Time since last connect14970690 07-22 15:57:12.237: DEBUG/Socket Test(5234): -On Stop- Sorry for bombarding you guys with what seems like a difficult question and a lot to read, but I've been working on this problem for a while and I've tried a lot of different things to get this working. Let me reiterate, I can get it to work, but not using service discovery protocol. I've tried a several different UUIDs and on two different computers, although I only have my HTC Incredible to test with. I've also heard some rumors that the BT stack wasn't working on the HTC Droid, but that isn't the case, at least, for PC interaction.

    Read the article

  • Followup: Python 2.6, 3 abstract base class misunderstanding

    - by Aaron
    I asked a question at Python 2.6, 3 abstract base class misunderstanding. My problem was that python abstract base classes didn't work quite the way I expected them to. There was some discussion in the comments about why I would want to use ABCs at all, and Alex Martelli provided an excellent answer on why my use didn't work and how to accomplish what I wanted. Here I'd like to address why one might want to use ABCs, and show my test code implementation based on Alex's answer. tl;dr: Code after the 16th paragraph. In the discussion on the original post, statements were made along the lines that you don't need ABCs in Python, and that ABCs don't do anything and are therefore not real classes; they're merely interface definitions. An abstract base class is just a tool in your tool box. It's a design tool that's been around for many years, and a programming tool that is explicitly available in many programming languages. It can be implemented manually in languages that don't provide it. An ABC is always a real class, even when it doesn't do anything but define an interface, because specifying the interface is what an ABC does. If that was all an ABC could do, that would be enough reason to have it in your toolbox, but in Python and some other languages they can do more. The basic reason to use an ABC is when you have a number of classes that all do the same thing (have the same interface) but do it differently, and you want to guarantee that that complete interface is implemented in all objects. A user of your classes can rely on the interface being completely implemented in all classes. You can maintain this guarantee manually. Over time you may succeed. Or you might forget something. Before Python had ABCs you could guarantee it semi-manually, by throwing NotImplementedError in all the base class's interface methods; you must implement these methods in derived classes. This is only a partial solution, because you can still instantiate such a base class. A more complete solution is to use ABCs as provided in Python 2.6 and above. Template methods and other wrinkles and patterns are ideas whose implementation can be made easier with full-citizen ABCs. Another idea in the comments was that Python doesn't need ABCs (understood as a class that only defines an interface) because it has multiple inheritance. The implied reference there seems to be Java and its single inheritance. In Java you "get around" single inheritance by inheriting from one or more interfaces. Java uses the word "interface" in two ways. A "Java interface" is a class with method signatures but no implementations. The methods are the interface's "interface" in the more general, non-Java sense of the word. Yes, Python has multiple inheritance, so you don't need Java-like "interfaces" (ABCs) merely to provide sets of interface methods to a class. But that's not the only reason in software development to use ABCs. Most generally, you use an ABC to specify an interface (set of methods) that will likely be implemented differently in different derived classes, yet that all derived classes must have. Additionally, there may be no sensible default implementation for the base class to provide. Finally, even an ABC with almost no interface is still useful. We use something like it when we have multiple except clauses for a try. Many exceptions have exactly the same interface, with only two differences: the exception's string value, and the actual class of the exception. In many exception clauses we use nothing about the exception except its class to decide what to do; catching one type of exception we do one thing, and another except clause catching a different exception does another thing. According to the exception module's doc page, BaseException is not intended to be derived by any user defined exceptions. If ABCs had been a first class Python concept from the beginning, it's easy to imagine BaseException being specified as an ABC. But enough of that. Here's some 2.6 code that demonstrates how to use ABCs, and how to specify a list-like ABC. Examples are run in ipython, which I like much better than the python shell for day to day work; I only wish it was available for python3. Your basic 2.6 ABC: from abc import ABCMeta, abstractmethod class Super(): __metaclass__ = ABCMeta @abstractmethod def method1(self): pass Test it (in ipython, python shell would be similar): In [2]: a = Super() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /home/aaron/projects/test/<ipython console> in <module>() TypeError: Can't instantiate abstract class Super with abstract methods method1 Notice the end of the last line, where the TypeError exception tells us that method1 has not been implemented ("abstract methods method1"). That was the method designated as @abstractmethod in the preceding code. Create a subclass that inherits Super, implement method1 in the subclass and you're done. My problem, which caused me to ask the original question, was how to specify an ABC that itself defines a list interface. My naive solution was to make an ABC as above, and in the inheritance parentheses say (list). My assumption was that the class would still be abstract (can't instantiate it), and would be a list. That was wrong; inheriting from list made the class concrete, despite the abstract bits in the class definition. Alex suggested inheriting from collections.MutableSequence, which is abstract (and so doesn't make the class concrete) and list-like. I used collections.Sequence, which is also abstract but has a shorter interface and so was quicker to implement. First, Super derived from Sequence, with nothing extra: from abc import abstractmethod from collections import Sequence class Super(Sequence): pass Test it: In [6]: a = Super() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /home/aaron/projects/test/<ipython console> in <module>() TypeError: Can't instantiate abstract class Super with abstract methods __getitem__, __len__ We can't instantiate it. A list-like full-citizen ABC; yea! Again, notice in the last line that TypeError tells us why we can't instantiate it: __getitem__ and __len__ are abstract methods. They come from collections.Sequence. But, I want a bunch of subclasses that all act like immutable lists (which collections.Sequence essentially is), and that have their own implementations of my added interface methods. In particular, I don't want to implement my own list code, Python already did that for me. So first, let's implement the missing Sequence methods, in terms of Python's list type, so that all subclasses act as lists (Sequences). First let's see the signatures of the missing abstract methods: In [12]: help(Sequence.__getitem__) Help on method __getitem__ in module _abcoll: __getitem__(self, index) unbound _abcoll.Sequence method (END) In [14]: help(Sequence.__len__) Help on method __len__ in module _abcoll: __len__(self) unbound _abcoll.Sequence method (END) __getitem__ takes an index, and __len__ takes nothing. And the implementation (so far) is: from abc import abstractmethod from collections import Sequence class Super(Sequence): # Gives us a list member for ABC methods to use. def __init__(self): self._list = [] # Abstract method in Sequence, implemented in terms of list. def __getitem__(self, index): return self._list.__getitem__(index) # Abstract method in Sequence, implemented in terms of list. def __len__(self): return self._list.__len__() # Not required. Makes printing behave like a list. def __repr__(self): return self._list.__repr__() Test it: In [34]: a = Super() In [35]: a Out[35]: [] In [36]: print a [] In [37]: len(a) Out[37]: 0 In [38]: a[0] --------------------------------------------------------------------------- IndexError Traceback (most recent call last) /home/aaron/projects/test/<ipython console> in <module>() /home/aaron/projects/test/test.py in __getitem__(self, index) 10 # Abstract method in Sequence, implemented in terms of list. 11 def __getitem__(self, index): ---> 12 return self._list.__getitem__(index) 13 14 # Abstract method in Sequence, implemented in terms of list. IndexError: list index out of range Just like a list. It's not abstract (for the moment) because we implemented both of Sequence's abstract methods. Now I want to add my bit of interface, which will be abstract in Super and therefore required to implement in any subclasses. And we'll cut to the chase and add subclasses that inherit from our ABC Super. from abc import abstractmethod from collections import Sequence class Super(Sequence): # Gives us a list member for ABC methods to use. def __init__(self): self._list = [] # Abstract method in Sequence, implemented in terms of list. def __getitem__(self, index): return self._list.__getitem__(index) # Abstract method in Sequence, implemented in terms of list. def __len__(self): return self._list.__len__() # Not required. Makes printing behave like a list. def __repr__(self): return self._list.__repr__() @abstractmethod def method1(): pass class Sub0(Super): pass class Sub1(Super): def __init__(self): self._list = [1, 2, 3] def method1(self): return [x**2 for x in self._list] def method2(self): return [x/2.0 for x in self._list] class Sub2(Super): def __init__(self): self._list = [10, 20, 30, 40] def method1(self): return [x+2 for x in self._list] We've added a new abstract method to Super, method1. This makes Super abstract again. A new class Sub0 which inherits from Super but does not implement method1, so it's also an ABC. Two new classes Sub1 and Sub2, which both inherit from Super. They both implement method1 from Super, so they're not abstract. Both implementations of method1 are different. Sub1 and Sub2 also both initialize themselves differently; in real life they might initialize themselves wildly differently. So you have two subclasses which both "is a" Super (they both implement Super's required interface) although their implementations are different. Also remember that Super, although an ABC, provides four non-abstract methods. So Super provides two things to subclasses: an implementation of collections.Sequence, and an additional abstract interface (the one abstract method) that subclasses must implement. Also, class Sub1 implements an additional method, method2, which is not part of Super's interface. Sub1 "is a" Super, but it also has additional capabilities. Test it: In [52]: a = Super() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /home/aaron/projects/test/<ipython console> in <module>() TypeError: Can't instantiate abstract class Super with abstract methods method1 In [53]: a = Sub0() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /home/aaron/projects/test/<ipython console> in <module>() TypeError: Can't instantiate abstract class Sub0 with abstract methods method1 In [54]: a = Sub1() In [55]: a Out[55]: [1, 2, 3] In [56]: b = Sub2() In [57]: b Out[57]: [10, 20, 30, 40] In [58]: print a, b [1, 2, 3] [10, 20, 30, 40] In [59]: a, b Out[59]: ([1, 2, 3], [10, 20, 30, 40]) In [60]: a.method1() Out[60]: [1, 4, 9] In [61]: b.method1() Out[61]: [12, 22, 32, 42] In [62]: a.method2() Out[62]: [0.5, 1.0, 1.5] [63]: a[:2] Out[63]: [1, 2] In [64]: a[0] = 5 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /home/aaron/projects/test/<ipython console> in <module>() TypeError: 'Sub1' object does not support item assignment Super and Sub0 are abstract and can't be instantiated (lines 52 and 53). Sub1 and Sub2 are concrete and have an immutable Sequence interface (54 through 59). Sub1 and Sub2 are instantiated differently, and their method1 implementations are different (60, 61). Sub1 includes an additional method2, beyond what's required by Super (62). Any concrete Super acts like a list/Sequence (63). A collections.Sequence is immutable (64). Finally, a wart: In [65]: a._list Out[65]: [1, 2, 3] In [66]: a._list = [] In [67]: a Out[67]: [] Super._list is spelled with a single underscore. Double underscore would have protected it from this last bit, but would have broken the implementation of methods in subclasses. Not sure why; I think because double underscore is private, and private means private. So ultimately this whole scheme relies on a gentleman's agreement not to reach in and muck with Super._list directly, as in line 65 above. Would love to know if there's a safer way to do that.

    Read the article

< Previous Page | 13 14 15 16 17