Search Results

Search found 234 results on 10 pages for 'conventional'.

Page 8/10 | < Previous Page | 4 5 6 7 8 9 10 | Next Page >

value types in the vm

- by john.rose

value types in the vm p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} p.p2 {margin: 0.0px 0.0px 14.0px 0.0px; font: 14.0px Times} p.p3 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times} p.p4 {margin: 0.0px 0.0px 15.0px 0.0px; font: 14.0px Times} p.p5 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier} p.p6 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier; min-height: 17.0px} p.p7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p8 {margin: 0.0px 0.0px 0.0px 36.0px; text-indent: -36.0px; font: 14.0px Times; min-height: 18.0px} p.p9 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p10 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; color: #000000} li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} li.li7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} span.s1 {font: 14.0px Courier} span.s2 {color: #000000} span.s3 {font: 14.0px Courier; color: #000000} ol.ol1 {list-style-type: decimal} Or, enduring values for a changing world. Introduction A value type is a data type which, generally speaking, is designed for being passed by value in and out of methods, and stored by value in data structures. The only value types which the Java language directly supports are the eight primitive types. Java indirectly and approximately supports value types, if they are implemented in terms of classes. For example, both Integer and String may be viewed as value types, especially if their usage is restricted to avoid operations appropriate to Object. In this note, we propose a definition of value types in terms of a design pattern for Java classes, accompanied by a set of usage restrictions. We also sketch the relation of such value types to tuple types (which are a JVM-level notion), and point out JVM optimizations that can apply to value types. This note is a thought experiment to extend the JVM’s performance model in support of value types. The demonstration has two phases. Initially the extension can simply use design patterns, within the current bytecode architecture, and in today’s Java language. But if the performance model is to be realized in practice, it will probably require new JVM bytecode features, changes to the Java language, or both. We will look at a few possibilities for these new features. An Axiom of Value In the context of the JVM, a value type is a data type equipped with construction, assignment, and equality operations, and a set of typed components, such that, whenever two variables of the value type produce equal corresponding values for their components, the values of the two variables cannot be distinguished by any JVM operation. Here are some corollaries: A value type is immutable, since otherwise a copy could be constructed and the original could be modified in one of its components, allowing the copies to be distinguished. Changing the component of a value type requires construction of a new value. The equals and hashCode operations are strictly component-wise. If a value type is represented by a JVM reference, that reference cannot be successfully synchronized on, and cannot be usefully compared for reference equality. A value type can be viewed in terms of what it doesn’t do. We can say that a value type omits all value-unsafe operations, which could violate the constraints on value types. These operations, which are ordinarily allowed for Java object types, are pointer equality comparison (the acmp instruction), synchronization (the monitor instructions), all the wait and notify methods of class Object, and non-trivial finalize methods. The clone method is also value-unsafe, although for value types it could be treated as the identity function. Finally, and most importantly, any side effect on an object (however visible) also counts as an value-unsafe operation. A value type may have methods, but such methods must not change the components of the value. It is reasonable and useful to define methods like toString, equals, and hashCode on value types, and also methods which are specifically valuable to users of the value type. Representations of Value Value types have two natural representations in the JVM, unboxed and boxed. An unboxed value consists of the components, as simple variables. For example, the complex number x=(1+2i), in rectangular coordinate form, may be represented in unboxed form by the following pair of variables: /*Complex x = Complex.valueOf(1.0, 2.0):*/ double x_re = 1.0, x_im = 2.0; These variables might be locals, parameters, or fields. Their association as components of a single value is not defined to the JVM. Here is a sample computation which computes the norm of the difference between two complex numbers: double distance(/*Complex x:*/ double x_re, double x_im, /*Complex y:*/ double y_re, double y_im) { /*Complex z = x.minus(y):*/ double z_re = x_re - y_re, z_im = x_im - y_im; /*return z.abs():*/ return Math.sqrt(z_re*z_re + z_im*z_im); } A boxed representation groups component values under a single object reference. The reference is to a ‘wrapper class’ that carries the component values in its fields. (A primitive type can naturally be equated with a trivial value type with just one component of that type. In that view, the wrapper class Integer can serve as a boxed representation of value type int.) The unboxed representation of complex numbers is practical for many uses, but it fails to cover several major use cases: return values, array elements, and generic APIs. The two components of a complex number cannot be directly returned from a Java function, since Java does not support multiple return values. The same story applies to array elements: Java has no ’array of structs’ feature. (Double-length arrays are a possible workaround for complex numbers, but not for value types with heterogeneous components.) By generic APIs I mean both those which use generic types, like Arrays.asList and those which have special case support for primitive types, like String.valueOf and PrintStream.println. Those APIs do not support unboxed values, and offer some problems to boxed values. Any ’real’ JVM type should have a story for returns, arrays, and API interoperability. The basic problem here is that value types fall between primitive types and object types. Value types are clearly more complex than primitive types, and object types are slightly too complicated. Objects are a little bit dangerous to use as value carriers, since object references can be compared for pointer equality, and can be synchronized on. Also, as many Java programmers have observed, there is often a performance cost to using wrapper objects, even on modern JVMs. Even so, wrapper classes are a good starting point for talking about value types. If there were a set of structural rules and restrictions which would prevent value-unsafe operations on value types, wrapper classes would provide a good notation for defining value types. This note attempts to define such rules and restrictions. Let’s Start Coding Now it is time to look at some real code. Here is a definition, written in Java, of a complex number value type. @ValueSafe public final class Complex implements java.io.Serializable { // immutable component structure: public final double re, im; private Complex(double re, double im) { this.re = re; this.im = im; } // interoperability methods: public String toString() { return "Complex("+re+","+im+")"; } public List<Double> asList() { return Arrays.asList(re, im); } public boolean equals(Complex c) { return re == c.re && im == c.im; } public boolean equals(@ValueSafe Object x) { return x instanceof Complex && equals((Complex) x); } public int hashCode() { return 31*Double.valueOf(re).hashCode() + Double.valueOf(im).hashCode(); } // factory methods: public static Complex valueOf(double re, double im) { return new Complex(re, im); } public Complex changeRe(double re2) { return valueOf(re2, im); } public Complex changeIm(double im2) { return valueOf(re, im2); } public static Complex cast(@ValueSafe Object x) { return x == null ? ZERO : (Complex) x; } // utility methods and constants: public Complex plus(Complex c) { return new Complex(re+c.re, im+c.im); } public Complex minus(Complex c) { return new Complex(re-c.re, im-c.im); } public double abs() { return Math.sqrt(re*re + im*im); } public static final Complex PI = valueOf(Math.PI, 0.0); public static final Complex ZERO = valueOf(0.0, 0.0); } This is not a minimal definition, because it includes some utility methods and other optional parts. The essential elements are as follows: The class is marked as a value type with an annotation. The class is final, because it does not make sense to create subclasses of value types. The fields of the class are all non-private and final. (I.e., the type is immutable and structurally transparent.) From the supertype Object, all public non-final methods are overridden. The constructor is private. Beyond these bare essentials, we can observe the following features in this example, which are likely to be typical of all value types: One or more factory methods are responsible for value creation, including a component-wise valueOf method. There are utility methods for complex arithmetic and instance creation, such as plus and changeIm. There are static utility constants, such as PI. The type is serializable, using the default mechanisms. There are methods for converting to and from dynamically typed references, such as asList and cast. The Rules In order to use value types properly, the programmer must avoid value-unsafe operations. A helpful Java compiler should issue errors (or at least warnings) for code which provably applies value-unsafe operations, and should issue warnings for code which might be correct but does not provably avoid value-unsafe operations. No such compilers exist today, but to simplify our account here, we will pretend that they do exist. A value-safe type is any class, interface, or type parameter marked with the @ValueSafe annotation, or any subtype of a value-safe type. If a value-safe class is marked final, it is in fact a value type. All other value-safe classes must be abstract. The non-static fields of a value class must be non-public and final, and all its constructors must be private. Under the above rules, a standard interface could be helpful to define value types like Complex. Here is an example: @ValueSafe public interface ValueType extends java.io.Serializable { // All methods listed here must get redefined. // Definitions must be value-safe, which means // they may depend on component values only. List<? extends Object> asList(); int hashCode(); boolean equals(@ValueSafe Object c); String toString(); } //@ValueSafe inherited from supertype: public final class Complex implements ValueType { … The main advantage of such a conventional interface is that (unlike an annotation) it is reified in the runtime type system. It could appear as an element type or parameter bound, for facilities which are designed to work on value types only. More broadly, it might assist the JVM to perform dynamic enforcement of the rules for value types. Besides types, the annotation @ValueSafe can mark fields, parameters, local variables, and methods. (This is redundant when the type is also value-safe, but may be useful when the type is Object or another supertype of a value type.) Working forward from these annotations, an expression E is defined as value-safe if it satisfies one or more of the following: The type of E is a value-safe type. E names a field, parameter, or local variable whose declaration is marked @ValueSafe. E is a call to a method whose declaration is marked @ValueSafe. E is an assignment to a value-safe variable, field reference, or array reference. E is a cast to a value-safe type from a value-safe expression. E is a conditional expression E0 ? E1 : E2, and both E1 and E2 are value-safe. Assignments to value-safe expressions and initializations of value-safe names must take their values from value-safe expressions. A value-safe expression may not be the subject of a value-unsafe operation. In particular, it cannot be synchronized on, nor can it be compared with the “==” operator, not even with a null or with another value-safe type. In a program where all of these rules are followed, no value-type value will be subject to a value-unsafe operation. Thus, the prime axiom of value types will be satisfied, that no two value type will be distinguishable as long as their component values are equal. More Code To illustrate these rules, here are some usage examples for Complex: Complex pi = Complex.valueOf(Math.PI, 0); Complex zero = pi.changeRe(0); //zero = pi; zero.re = 0; ValueType vtype = pi; @SuppressWarnings("value-unsafe") Object obj = pi; @ValueSafe Object obj2 = pi; obj2 = new Object(); // ok List<Complex> clist = new ArrayList<Complex>(); clist.add(pi); // (ok assuming List.add param is @ValueSafe) List<ValueType> vlist = new ArrayList<ValueType>(); vlist.add(pi); // (ok) List<Object> olist = new ArrayList<Object>(); olist.add(pi); // warning: "value-unsafe" boolean z = pi.equals(zero); boolean z1 = (pi == zero); // error: reference comparison on value type boolean z2 = (pi == null); // error: reference comparison on value type boolean z3 = (pi == obj2); // error: reference comparison on value type synchronized (pi) { } // error: synch of value, unpredictable result synchronized (obj2) { } // unpredictable result Complex qq = pi; qq = null; // possible NPE; warning: “null-unsafe" qq = (Complex) obj; // warning: “null-unsafe" qq = Complex.cast(obj); // OK @SuppressWarnings("null-unsafe") Complex empty = null; // possible NPE qq = empty; // possible NPE (null pollution) The Payoffs It follows from this that either the JVM or the java compiler can replace boxed value-type values with unboxed ones, without affecting normal computations. Fields and variables of value types can be split into their unboxed components. Non-static methods on value types can be transformed into static methods which take the components as value parameters. Some common questions arise around this point in any discussion of value types. Why burden the programmer with all these extra rules? Why not detect programs automagically and perform unboxing transparently? The answer is that it is easy to break the rules accidently unless they are agreed to by the programmer and enforced. Automatic unboxing optimizations are tantalizing but (so far) unreachable ideal. In the current state of the art, it is possible exhibit benchmarks in which automatic unboxing provides the desired effects, but it is not possible to provide a JVM with a performance model that assures the programmer when unboxing will occur. This is why I’m writing this note, to enlist help from, and provide assurances to, the programmer. Basically, I’m shooting for a good set of user-supplied “pragmas” to frame the desired optimization. Again, the important thing is that the unboxing must be done reliably, or else programmers will have no reason to work with the extra complexity of the value-safety rules. There must be a reasonably stable performance model, wherein using a value type has approximately the same performance characteristics as writing the unboxed components as separate Java variables. There are some rough corners to the present scheme. Since Java fields and array elements are initialized to null, value-type computations which incorporate uninitialized variables can produce null pointer exceptions. One workaround for this is to require such variables to be null-tested, and the result replaced with a suitable all-zero value of the value type. That is what the “cast” method does above. Generically typed APIs like List<T> will continue to manipulate boxed values always, at least until we figure out how to do reification of generic type instances. Use of such APIs will elicit warnings until their type parameters (and/or relevant members) are annotated or typed as value-safe. Retrofitting List<T> is likely to expose flaws in the present scheme, which we will need to engineer around. Here are a couple of first approaches: public interface java.util.List<@ValueSafe T> extends Collection<T> { … public interface java.util.List<T extends Object|ValueType> extends Collection<T> { … (The second approach would require disjunctive types, in which value-safety is “contagious” from the constituent types.) With more transformations, the return value types of methods can also be unboxed. This may require significant bytecode-level transformations, and would work best in the presence of a bytecode representation for multiple value groups, which I have proposed elsewhere under the title “Tuples in the VM”. But for starters, the JVM can apply this transformation under the covers, to internally compiled methods. This would give a way to express multiple return values and structured return values, which is a significant pain-point for Java programmers, especially those who work with low-level structure types favored by modern vector and graphics processors. The lack of multiple return values has a strong distorting effect on many Java APIs. Even if the JVM fails to unbox a value, there is still potential benefit to the value type. Clustered computing systems something have copy operations (serialization or something similar) which apply implicitly to command operands. When copying JVM objects, it is extremely helpful to know when an object’s identity is important or not. If an object reference is a copied operand, the system may have to create a proxy handle which points back to the original object, so that side effects are visible. Proxies must be managed carefully, and this can be expensive. On the other hand, value types are exactly those types which a JVM can “copy and forget” with no downside. Array types are crucial to bulk data interfaces. (As data sizes and rates increase, bulk data becomes more important than scalar data, so arrays are definitely accompanying us into the future of computing.) Value types are very helpful for adding structure to bulk data, so a successful value type mechanism will make it easier for us to express richer forms of bulk data. Unboxing arrays (i.e., arrays containing unboxed values) will provide better cache and memory density, and more direct data movement within clustered or heterogeneous computing systems. They require the deepest transformations, relative to today’s JVM. There is an impedance mismatch between value-type arrays and Java’s covariant array typing, so compromises will need to be struck with existing Java semantics. It is probably worth the effort, since arrays of unboxed value types are inherently more memory-efficient than standard Java arrays, which rely on dependent pointer chains. It may be sufficient to extend the “value-safe” concept to array declarations, and allow low-level transformations to change value-safe array declarations from the standard boxed form into an unboxed tuple-based form. Such value-safe arrays would not be convertible to Object[] arrays. Certain connection points, such as Arrays.copyOf and System.arraycopy might need additional input/output combinations, to allow smooth conversion between arrays with boxed and unboxed elements. Alternatively, the correct solution may have to wait until we have enough reification of generic types, and enough operator overloading, to enable an overhaul of Java arrays. Implicit Method Definitions The example of class Complex above may be unattractively complex. I believe most or all of the elements of the example class are required by the logic of value types. If this is true, a programmer who writes a value type will have to write lots of error-prone boilerplate code. On the other hand, I think nearly all of the code (except for the domain-specific parts like plus and minus) can be implicitly generated. Java has a rule for implicitly defining a class’s constructor, if no it defines no constructors explicitly. Likewise, there are rules for providing default access modifiers for interface members. Because of the highly regular structure of value types, it might be reasonable to perform similar implicit transformations on value types. Here’s an example of a “highly implicit” definition of a complex number type: public class Complex implements ValueType { // implicitly final public double re, im; // implicitly public final //implicit methods are defined elementwise from te fields: // toString, asList, equals(2), hashCode, valueOf, cast //optionally, explicit methods (plus, abs, etc.) would go here } In other words, with the right defaults, a simple value type definition can be a one-liner. The observant reader will have noticed the similarities (and suitable differences) between the explicit methods above and the corresponding methods for List<T>. Another way to abbreviate such a class would be to make an annotation the primary trigger of the functionality, and to add the interface(s) implicitly: public @ValueType class Complex { … // implicitly final, implements ValueType (But to me it seems better to communicate the “magic” via an interface, even if it is rooted in an annotation.) Implicitly Defined Value Types So far we have been working with nominal value types, which is to say that the sequence of typed components is associated with a name and additional methods that convey the intention of the programmer. A simple ordered pair of floating point numbers can be variously interpreted as (to name a few possibilities) a rectangular or polar complex number or Cartesian point. The name and the methods convey the intended meaning. But what if we need a truly simple ordered pair of floating point numbers, without any further conceptual baggage? Perhaps we are writing a method (like “divideAndRemainder”) which naturally returns a pair of numbers instead of a single number. Wrapping the pair of numbers in a nominal type (like “QuotientAndRemainder”) makes as little sense as wrapping a single return value in a nominal type (like “Quotient”). What we need here are structural value types commonly known as tuples. For the present discussion, let us assign a conventional, JVM-friendly name to tuples, roughly as follows: public class java.lang.tuple.$DD extends java.lang.tuple.Tuple { double $1, $2; } Here the component names are fixed and all the required methods are defined implicitly. The supertype is an abstract class which has suitable shared declarations. The name itself mentions a JVM-style method parameter descriptor, which may be “cracked” to determine the number and types of the component fields. The odd thing about such a tuple type (and structural types in general) is it must be instantiated lazily, in response to linkage requests from one or more classes that need it. The JVM and/or its class loaders must be prepared to spin a tuple type on demand, given a simple name reference, $xyz, where the xyz is cracked into a series of component types. (Specifics of naming and name mangling need some tasteful engineering.) Tuples also seem to demand, even more than nominal types, some support from the language. (This is probably because notations for non-nominal types work best as combinations of punctuation and type names, rather than named constructors like Function3 or Tuple2.) At a minimum, languages with tuples usually (I think) have some sort of simple bracket notation for creating tuples, and a corresponding pattern-matching syntax (or “destructuring bind”) for taking tuples apart, at least when they are parameter lists. Designing such a syntax is no simple thing, because it ought to play well with nominal value types, and also with pre-existing Java features, such as method parameter lists, implicit conversions, generic types, and reflection. That is a task for another day. Other Use Cases Besides complex numbers and simple tuples there are many use cases for value types. Many tuple-like types have natural value-type representations. These include rational numbers, point locations and pixel colors, and various kinds of dates and addresses. Other types have a variable-length ‘tail’ of internal values. The most common example of this is String, which is (mathematically) a sequence of UTF-16 character values. Similarly, bit vectors, multiple-precision numbers, and polynomials are composed of sequences of values. Such types include, in their representation, a reference to a variable-sized data structure (often an array) which (somehow) represents the sequence of values. The value type may also include ’header’ information. Variable-sized values often have a length distribution which favors short lengths. In that case, the design of the value type can make the first few values in the sequence be direct ’header’ fields of the value type. In the common case where the header is enough to represent the whole value, the tail can be a shared null value, or even just a null reference. Note that the tail need not be an immutable object, as long as the header type encapsulates it well enough. This is the case with String, where the tail is a mutable (but never mutated) character array. Field types and their order must be a globally visible part of the API. The structure of the value type must be transparent enough to have a globally consistent unboxed representation, so that all callers and callees agree about the type and order of components that appear as parameters, return types, and array elements. This is a trade-off between efficiency and encapsulation, which is forced on us when we remove an indirection enjoyed by boxed representations. A JVM-only transformation would not care about such visibility, but a bytecode transformation would need to take care that (say) the components of complex numbers would not get swapped after a redefinition of Complex and a partial recompile. Perhaps constant pool references to value types need to declare the field order as assumed by each API user. This brings up the delicate status of private fields in a value type. It must always be possible to load, store, and copy value types as coordinated groups, and the JVM performs those movements by moving individual scalar values between locals and stack. If a component field is not public, what is to prevent hostile code from plucking it out of the tuple using a rogue aload or astore instruction? Nothing but the verifier, so we may need to give it more smarts, so that it treats value types as inseparable groups of stack slots or locals (something like long or double). My initial thought was to make the fields always public, which would make the security problem moot. But public is not always the right answer; consider the case of String, where the underlying mutable character array must be encapsulated to prevent security holes. I believe we can win back both sides of the tradeoff, by training the verifier never to split up the components in an unboxed value. Just as the verifier encapsulates the two halves of a 64-bit primitive, it can encapsulate the the header and body of an unboxed String, so that no code other than that of class String itself can take apart the values. Similar to String, we could build an efficient multi-precision decimal type along these lines: public final class DecimalValue extends ValueType { protected final long header; protected private final BigInteger digits; public DecimalValue valueOf(int value, int scale) { assert(scale >= 0); return new DecimalValue(((long)value << 32) + scale, null); } public DecimalValue valueOf(long value, int scale) { if (value == (int) value) return valueOf((int)value, scale); return new DecimalValue(-scale, new BigInteger(value)); } } Values of this type would be passed between methods as two machine words. Small values (those with a significand which fits into 32 bits) would be represented without any heap data at all, unless the DecimalValue itself were boxed. (Note the tension between encapsulation and unboxing in this case. It would be better if the header and digits fields were private, but depending on where the unboxing information must “leak”, it is probably safer to make a public revelation of the internal structure.) Note that, although an array of Complex can be faked with a double-length array of double, there is no easy way to fake an array of unboxed DecimalValues. (Either an array of boxed values or a transposed pair of homogeneous arrays would be reasonable fallbacks, in a current JVM.) Getting the full benefit of unboxing and arrays will require some new JVM magic. Although the JVM emphasizes portability, system dependent code will benefit from using machine-level types larger than 64 bits. For example, the back end of a linear algebra package might benefit from value types like Float4 which map to stock vector types. This is probably only worthwhile if the unboxing arrays can be packed with such values. More Daydreams A more finely-divided design for dynamic enforcement of value safety could feature separate marker interfaces for each invariant. An empty marker interface Unsynchronizable could cause suitable exceptions for monitor instructions on objects in marked classes. More radically, a Interchangeable marker interface could cause JVM primitives that are sensitive to object identity to raise exceptions; the strangest result would be that the acmp instruction would have to be specified as raising an exception. @ValueSafe public interface ValueType extends java.io.Serializable, Unsynchronizable, Interchangeable { … public class Complex implements ValueType { // inherits Serializable, Unsynchronizable, Interchangeable, @ValueSafe … It seems possible that Integer and the other wrapper types could be retro-fitted as value-safe types. This is a major change, since wrapper objects would be unsynchronizable and their references interchangeable. It is likely that code which violates value-safety for wrapper types exists but is uncommon. It is less plausible to retro-fit String, since the prominent operation String.intern is often used with value-unsafe code. We should also reconsider the distinction between boxed and unboxed values in code. The design presented above obscures that distinction. As another thought experiment, we could imagine making a first class distinction in the type system between boxed and unboxed representations. Since only primitive types are named with a lower-case initial letter, we could define that the capitalized version of a value type name always refers to the boxed representation, while the initial lower-case variant always refers to boxed. For example: complex pi = complex.valueOf(Math.PI, 0); Complex boxPi = pi; // convert to boxed myList.add(boxPi); complex z = myList.get(0); // unbox Such a convention could perhaps absorb the current difference between int and Integer, double and Double. It might also allow the programmer to express a helpful distinction among array types. As said above, array types are crucial to bulk data interfaces, but are limited in the JVM. Extending arrays beyond the present limitations is worth thinking about; for example, the Maxine JVM implementation has a hybrid object/array type. Something like this which can also accommodate value type components seems worthwhile. On the other hand, does it make sense for value types to contain short arrays? And why should random-access arrays be the end of our design process, when bulk data is often sequentially accessed, and it might make sense to have heterogeneous streams of data as the natural “jumbo” data structure. These considerations must wait for another day and another note. More Work It seems to me that a good sequence for introducing such value types would be as follows: Add the value-safety restrictions to an experimental version of javac. Code some sample applications with value types, including Complex and DecimalValue. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. A staggered roll-out like this would decouple language changes from bytecode changes, which is always a convenient thing. A similar investigation should be applied (concurrently) to array types. In this case, it seems to me that the starting point is in the JVM: Add an experimental unboxing array data structure to a production JVM, perhaps along the lines of Maxine hybrids. No bytecode or language support is required at first; everything can be done with encapsulated unsafe operations and/or method handles. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. That’s enough musing me for now. Back to work!

Read the article
Inequality joins, Asynchronous transformations and Lookups : SSIS

- by jamiet

It is pretty much accepted by SQL Server Integration Services (SSIS) developers that synchronous transformations are generally quicker than asynchronous transformations (for a description of synchronous and asynchronous transformations go read Asynchronous and synchronous data flow components). Notice I said “generally” and not “always”; there are circumstances where using asynchronous transformations can be beneficial and in this blog post I’ll demonstrate such a scenario, one that is pretty common when building data warehouses. Imagine I have a [Customer] dimension table that manages information about all of my customers as a slowly-changing dimension. If that is a type 2 slowly changing dimension then you will likely have multiple rows per customer in that table. Furthermore you might also have datetime fields that indicate the effective time period of each member record. Here is such a table that contains data for four dimension members {Terry, Max, Henry, Horace}: Notice that we have multiple records per customer and that the [SCDStartDate] of a record is equivalent to the [SCDEndDate] of the record that preceded it (if there was one). (Note that I am on record as saying I am not a fan of this technique of storing an [SCDEndDate] but for the purposes of clarity I have included it here.) Anyway, the idea here is that we will have some incoming data containing [CustomerName] & [EffectiveDate] and we need to use those values to lookup [Customer].[CustomerId]. The logic will be: Lookup a [CustomerId] WHERE [CustomerName]=[CustomerName] AND [SCDStartDate] <= [EffectiveDate] AND [EffectiveDate] <= [SCDEndDate] The conventional approach to this would be to use a full cached lookup but that isn’t an option here because we are using inequality conditions. The obvious next step then is to use a non-cached lookup which enables us to change the SQL statement to use inequality operators: Let’s take a look at the dataflow: Notice these are all synchronous components. This approach works just fine however it does have the limitation that it has to issue a SQL statement against your lookup set for every row thus we can expect the execution time of our dataflow to increase linearly in line with the number of rows in our dataflow; that’s not good. OK, that’s the obvious method. Let’s now look at a different way of achieving this using an asynchronous Merge Join transform coupled with a Conditional Split. I’ve shown it post-execution so that I can include the row counts which help to illustrate what is going on here: Notice that there are more rows output from our Merge Join component than on the input. That is because we are joining on [CustomerName] and, as we know, we have multiple records per [CustomerName] in our lookup set. Notice also that there are two asynchronous components in here (the Sort and the Merge Join). I have embedded a video below that compares the execution times for each of these two methods. The video is just over 8minutes long. View on Vimeo For those that can’t be bothered watching the video I’ll tell you the results here. The dataflow that used the Lookup transform took 36 seconds whereas the dataflow that used the Merge Join took less than two seconds. An illustration in case it is needed: Pretty conclusive proof that in some scenarios it may be quicker to use an asynchronous component than a synchronous one. Your mileage may of course vary. The scenario outlined here is analogous to performance tuning procedural SQL that uses cursors. It is common to eliminate cursors by converting them to set-based operations and that is effectively what we have done here. Our non-cached lookup is performing a discrete operation for every single row of data, exactly like a cursor does. By eliminating this cursor-in-disguise we have dramatically sped up our dataflow. I hope all of that proves useful. You can download the package that I demonstrated in the video from my SkyDrive at http://cid-550f681dad532637.skydrive.live.com/self.aspx/Public/BlogShare/20100514/20100514%20Lookups%20and%20Merge%20Joins.zip Comments are welcome as always. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Inappropriate Updates?

- by Tony Davis

A recent Simple-talk article by Kathi Kellenberger dissected the fastest SQL solution, submitted by Peter Larsson as part of Phil Factor's SQL Speed Phreak challenge, to the classic "running total" problem. In its analysis of the code, the article re-ignited a heated debate regarding the techniques that should, and should not, be deemed acceptable in your search for fast SQL code. Peter's code for running total calculation uses a variation of a somewhat contentious technique, sometimes referred to as a "quirky update": SET @Subscribers = Subscribers = @Subscribers + PeopleJoined - PeopleLeft This form of the UPDATE statement, @variable = column = expression, is documented and it allows you to set a variable to the value returned by the expression. Microsoft does not guarantee the order in which rows are updated in this technique because, in relational theory, a table doesn’t have a natural order to its rows and the UPDATE statement has no means of specifying the order. Traditionally, in cases where a specific order is requires, such as for running aggregate calculations, programmers who used the technique have relied on the fact that the UPDATE statement, without the WHERE clause, is executed in the order imposed by the clustered index, or in heap order, if there isn’t one. Peter wasn’t satisfied with this, and so used the ingenious device of assuring the order of the UPDATE by the use of an "ordered CTE", based on an underlying temporary staging table (a heap). However, in either case, the ordering is still not guaranteed and, in addition, would be broken under conditions of parallelism, or partitioning. Many argue, with validity, that this reliance on a given order where none can ever be guaranteed is an abuse of basic relational principles, and so is a bad practice; perhaps even irresponsible. More importantly, Microsoft doesn't wish to support the technique and offers no guarantee that it will always work. If you put it into production and it breaks in a later version, you can't file a bug. As such, many believe that the technique should never be tolerated in a production system, under any circumstances. Is this attitude justified? After all, both forms of the technique, using a clustered index to guarantee the order or using an ordered CTE, have been tested rigorously and are proven to be robust; although not guaranteed by Microsoft, the ordering is reliable, provided none of the conditions that are known to break it are violated. In Peter's particular case, the technique is being applied to a temporary table, where the developer has full control of the data ordering, and indexing, and knows that the table will never be subject to parallelism or partitioning. It might be argued that, in such circumstances, the technique is not really "quirky" at all and to ban it from your systems would server no real purpose other than to deprive yourself of a reliable technique that has uses that extend well beyond the running total calculations. Of course, it is doubly important that such a technique, including its unsupported status and the assumptions that underpin its success, is fully and clearly documented, preferably even when posting it online in a competition or forum post. Ultimately, however, this technique has been available to programmers throughout the time Sybase and SQL Server has existed, and so cannot be lightly cast aside, even if one sympathises with Microsoft for the awkwardness of maintaining an archaic way of doing updates. After all, a Table hint could easily be devised that, if specified in the WITH (<Table_Hint_Limited>) clause, could be used to request the database engine to do the update in the conventional order. Then perhaps everyone would be satisfied. Cheers, Tony.

Read the article
Does your analytic solution tell you what questions to ask?

- by Manan Goel

Analytic solutions exist to answer business questions. Conventional wisdom holds that if you can answer business questions quickly and accurately, you can take better business decisions and therefore achieve better business results and outperform the competition. Most business questions are well understood (read structured) so they are relatively easy to ask and answer. Questions like what were the revenues, cost of goods sold, margins, which regions and products outperformed/underperformed are relatively well understood and as a result most analytics solutions are well equipped to answer such questions. Things get really interesting when you are looking for answers but you don’t know what questions to ask in the first place? That’s like an explorer looking to make new discoveries by exploration. An example of this scenario is the Center of Disease Control (CDC) in United States trying to find the vaccine for the latest strand of the swine flu virus. The researchers at CDC may try hundreds of options before finally discovering the vaccine. The exploration process is inherently messy and complex. The process is fraught with false starts, one question or a hunch leading to another and the final result may look entirely different from what was envisioned in the beginning. Speed and flexibility is the key; speed so the hundreds of possible options can be explored quickly and flexibility because almost everything about the problem, solutions and the process is unknown. Come to think of it, most organizations operate in an increasingly unknown or uncertain environment. Business Leaders have to take decisions based on a largely unknown view of the future. And since the value proposition of analytic solutions is to help the business leaders take better business decisions, for best results, consider adding information exploration and discovery capabilities to your analytic solution. Such exploratory analysis capabilities will help the business leaders perform even better by empowering them to refine their hunches, ask better questions and take better decisions. That’s your analytic system not only answering the questions but also suggesting what questions to ask in the first place. Today, most leading analytic software vendors offer exploratory analysis products as part of their analytic solutions offerings. So, what characteristics should be top of mind while evaluating the various solutions? The answer is quite simply the same characteristics that are essential for exploration and analysis – speed & flexibility. Speed is required because the system inherently has to be agile to handle hundreds of different scenarios with large volumes of data across large user populations. Exploration happens at the speed of thought so make sure that you system is capable of operating at speed of thought. Flexibility is required because the exploration process from start to finish is full of unknowns; unknown questions, answers and hunches. So, make sure that the system is capable of managing and exploring all relevant data – structured or unstructured like databases, enterprise applications, tweets, social media updates, documents, texts, emails etc. and provides flexible Google like user interface to quickly explore all relevant data. Getting Started You can help business leaders become “Decision Masters” by augmenting your analytic solution with information discovery capabilities. For best results make sure that the solution you choose is enterprise class and allows advanced, yet intuitive, exploration and analysis of complex and varied data including structured, semi-structured and unstructured data. You can learn more about Oracle’s exploratory analysis solutions by clicking here.

Read the article
Sweet and Sour Source Control

- by Tony Davis

Most database developers don't use Source Control. A recent anonymous poll on SQL Server Central asked its readers "Which Version Control system do you currently use to store you database scripts?" The winner, with almost 30% of the vote was...none: "We don't use source control for database scripts". In second place with almost 28% of the vote was Microsoft's VSS. VSS? Given its reputation for being buggy, unstable and lacking most of the basic features required of a proper source control system, answering VSS is really just another way of saying "I don't use Source Control". At first glance, it's a surprising thought. You wonder how database developers can work in a team and find out what changed, when the system worked before but is now broken; to work out what happened to their changes that now seem to have vanished; to roll-back a mistake quickly so that the rest of the team have a functioning build; to find instantly whether a suspect change has been deployed to production. Unfortunately, the survey didn't ask about the scale of the database development, and correlate the two questions. If there is only one database developer within a schema, who has an automated approach to regular generation of build scripts, then the need for a formal source control system is questionable. After all, a database stores far more about its metadata than a traditional compiled application. However, what is meat for a small development is poison for a team-based development. Here, we need a form of Source Control that can reconcile simultaneous changes, store the history of changes, derive versions and builds and that can cope with forks and merges. The problem comes when one borrows a solution that was designed for conventional programming. A database is not thought of as a "file", but a vast, interdependent and intricate matrix of tables, indexes, constraints, triggers, enumerations, static data and so on, all subtly interconnected. It is an awkward fit. Subversion with its support for merges and forks, and the tolerance of different work practices, can be made to work well, if used carefully. It has a standards-based architecture that allows it to be used on all platforms such as Windows Mac, and Linux. In the words of Erland Sommerskog, developers should "just do it". What's in a database is akin to a "binary file", and the developer must work only from the file. You check out the file, edit it, and save it to disk to compile it. Dependencies are validated at this point and if you've broken anything (e.g. you renamed a column and broke all the objects that reference the column), you'll find out about it right away, and you'll be forced to fix it. Nevertheless, for many this is an alien way of working with SQL Server. Subversion is the powerhouse, not the GUI. It doesn't work seamlessly with your existing IDE, and that usually means SSMS. So the question then becomes more subtle. Would developers be less reluctant to use a fully-featured source (revision) control system for a team database development if they had a turn-key, reliable system that fitted in with their existing work-practices? I'd love to hear what you think. Cheers, Tony.

Read the article
The Best Data Integration for Exadata Comes from Oracle

- by maria costanzo

Oracle Data Integrator and Oracle GoldenGate offer unique and optimized data integration solutions for Oracle Exadata. For example, customers that choose to feed their data warehouse or reporting database with near real-time throughout the day, can do so without decreasing performance or availability of source and target systems. And if you ask why real-time, the short answer is: in today’s fast-paced, always-on world, business decisions need to use more relevant, timely data to be able to act fast and seize opportunities. A longer response to "why real-time" question can be found in a related blog post. If we look at the solution architecture, as shown on the diagram below, Oracle Data Integrator and Oracle GoldenGate are both uniquely designed to take full advantage of the power of the database and to eliminate unnecessary middle-tier components. Oracle Data Integrator (ODI) is the best bulk data loading solution for Exadata. ODI is the only ETL platform that can leverage the full power of Exadata, integrate directly on the Exadata machine without any additional hardware, and by far provides the simplest setup and fastest overall performance on an Exadata system. We regularly see customers achieving a 5-10 times boost when they move their ETL to ODI on Exadata. For some companies the performance gain is even much higher. For example a large insurance company did a proof of concept comparing ODI vs a traditional ETL tool (one of the market leaders) on Exadata. The same process that was taking 5hrs and 11 minutes to complete using the competing ETL product took 7 minutes and 20 seconds with ODI. Oracle Data Integrator was 42 times faster than the conventional ETL when running on Exadata.This shows that Oracle's own data integration offering helps you to gain the most out of your Exadata investment with a truly optimized solution. GoldenGate is the best solution for streaming data from heterogeneous sources into Exadata in real time. Oracle GoldenGate can also be used together with Data Integrator for hybrid use cases that also demand non-invasive capture, high-speed real time replication. Oracle GoldenGate enables real-time data feeds from heterogeneous sources non-invasively, and delivers to the staging area on the target Exadata system. ODI runs directly on Exadata to use the database engine power to perform in-database transformations. Enterprise Data Quality is integrated with Oracle Data integrator and enables ODI to load trusted data into the data warehouse tables. Only Oracle can offer all these technical benefits wrapped into a single intelligence data warehouse solution that runs on Exadata. Compared to traditional ETL with add-on CDC this solution offers: § Non-invasive data capture from heterogeneous sources and avoids any performance impact on source § No mid-tier; set based transformations use database power § Mini-batches throughout the day –or- bulk processing nightly which means maximum availability for the DW § Integrated solution with Enterprise Data Quality enables leveraging trusted data in the data warehouse In addition to Starwood Hotels and Resorts, Morrison Supermarkets, United Kingdom’s fourth-largest food retailer, has seen the power of this solution for their new BI platform and shared their story with us. Morrisons needed to analyze data across a large number of manufacturing, warehousing, retail, and financial applications with the goal to achieve single view into operations for improved customer service. The retailer deployed Oracle GoldenGate and Oracle Data Integrator to bring new data into Oracle Exadata in near real-time and replicate the data into reporting structures within the data warehouse—extending visibility into operations. Using Oracle's data integration offering for Exadata, Morrisons produced financial reports in seconds, rather than minutes, and improved staff productivity and agility. You can read more about Morrison’s success story here and hear from Starwood here. From an Irem Radzik article.

Read the article
Documentation and Test Assertions in Databases

- by Phil Factor

When I first worked with Sybase/SQL Server, we thought our databases were impressively large but they were, by today’s standards, pathetically small. We had one script to build the whole database. Every script I ever read was richly annotated; it was more like reading a document. Every table had a comment block, and every line would be commented too. At the end of each routine (e.g. procedure) was a quick integration test, or series of test assertions, to check that nothing in the build was broken. We simply ran the build script, stored in the Version Control System, and it pulled everything together in a logical sequence that not only created the database objects but pulled in the static data. This worked fine at the scale we had. The advantage was that one could, by reading the source code, reach a rapid understanding of how the database worked and how one could interface with it. The problem was that it was a system that meant that only one developer at the time could work on the database. It was very easy for a developer to execute accidentally the entire build script rather than the selected section on which he or she was working, thereby cleansing the database of everyone else’s work-in-progress and data. It soon became the fashion to work at the object level, so that programmers could check out individual views, tables, functions, constraints and rules and work on them independently. It was then that I noticed the trend to generate the source for the VCS retrospectively from the development server. Tables were worst affected. You can, of course, add or delete a table’s columns and constraints retrospectively, which means that the existing source no longer represents the current object. If, after your development work, you generate the source from the live table, then you get no block or line comments, and the source script is sprinkled with silly square-brackets and other confetti, thereby rendering it visually indigestible. Routines, too, were affected. In our system, every routine had a directly attached string of unit-tests. A retro-generated routine has no unit-tests or test assertions. Yes, one can still commit our test code to the VCS but it’s a separate module and teams end up running the whole suite of tests for every individual change, rather than just the tests for that routine, which doesn’t scale for database testing. With Extended properties, one can get the best of both worlds, and even use them to put blame, praise or annotations into your VCS. It requires a lot of work, though, particularly the script to generate the table. The problem is that there are no conventional names beyond ‘MS_Description’ for the special use of extended properties. This makes it difficult to do splendid things such ensuring the integrity of the build by running a suite of tests that are actually stored in extended properties within the database and therefore the VCS. We have lost the readability of database source code over the years, and largely jettisoned the use of test assertions as part of the database build. This is not unexpected in view of the increasing complexity of the structure of databases and number of programmers working on them. There must, surely, be a way of getting them back, but I sometimes wonder if I’m one of very few who miss them.

Read the article
New Sample Demonstrating the Traversing of Tree Bindings

- by Duncan Mills

A technique that I seem to use a fair amount, particularly in the construction of dynamic UIs is the use of a ADF Tree Binding to encode a multi-level master-detail relationship which is then expressed in the UI in some kind of looping form – usually a series of nested af:iterators, rather than the conventional tree or treetable. This technique exploits two features of the treebinding. First the fact that an treebinding can return both a collectionModel as well as a treeModel, this collectionModel can be used directly by an iterator. Secondly that the “rows” returned by the collectionModel themselves contain an attribute called .children. This attribute in turn gives access to a collection of all the children of that node which can also be iterated over. Putting this together you can represent the data encoded into a tree binding in all sorts of ways. As an example I’ve put together a very simple sample based on the HT schema and uploaded it to the ADF Sample project. It produces this UI: The important code is shown here for a Region -> Country -> Location Hierachy: <af:iterator id="i1" value="#{bindings.AllRegions.collectionModel}" var="rgn"> <af:showDetailHeader text="#{rgn.RegionName}" disclosed="true" id="sdh1"> <af:iterator id="i2" value="#{rgn.children}" var="cnty"> <af:showDetailHeader text="#{cnty.CountryName}" disclosed="true" id="sdh2"> <af:iterator id="i3" value="#{cnty.children}" var="loc"> <af:panelList id="pl1"> <af:outputText value="#{loc.City}" id="ot3"/> </af:panelList> </af:iterator> </af:showDetailHeader> </af:iterator> </af:showDetailHeader> </af:iterator> You can download the entire sample from here:

Read the article
Should I manage authentication on my own if the alternative is very low in usability and I am already managing roles?

- by rumtscho

As a small in-house dev department, we only have experience with developing applications for our intranet. We use the existing Active Directory for user account management. It contains the accounts of all company employees and many (but not all) of the business partners we have a cooperation with. Now, the top management wants a technology exchange application, and I am the lead dev on the new project. Basically, it is a database containing our know-how, with a web frontend. Our employees, our cooperating business partners, and people who wish to become our cooperating business partners should have access to it and see what technologies we have, so they can trade for them with the department which owns them. The technologies are not patented, but very valuable to competitors, so the department bosses are paranoid about somebody unauthorized gaining access to their technology description. This constraint necessitates a nightmarishly complicated multi-dimensional RBAC-hybrid model. As the Active Directory doesn't even contain all the information needed to infer the roles I use, I will have to manage roles plus per-technology per-user granted access exceptions within my system. The current plan is to use Active Directory for authentication. This will result in a multi-hour registration process for our business partners where the database owner has to manually create logins in our Active Directory and send them credentials. If I manage the logins in my own system, we could improve the usability a lot, for example by letting people have an active (but unprivileged) account as soon as they register. It seems to me that, after I am having a users table in the DB anyway (and managing ugly details like storing historical user IDs so that recycled user IDs within the Active Directory don't unexpectedly get rights to view someone's technologies), the additional complexity from implementing authentication functionality will be minimal. Therefore, I am starting to lean towards doing my own user login management and forgetting the AD altogether. On the other hand, I see some reasons to stay with Active Directory. First, the conventional wisdom I have heard from experienced programmers is to not do your own user management if you can avoid it. Second, we have code I can reuse for connection to the active directory, while I would have to code the authentication if done in-system (and my boss has clearly stated that getting the project delivered on time has much higher priority than delivering a system with high usability). Third, I am not a very experienced developer (this is my first lead position) and have never done user management before, so I am afraid that I am overlooking some important reasons to use the AD, or that I am underestimating the amount of work left to do my own authentication. I would like to know if there are more reasons to go with the AD authentication mechanism. Specifically, if I want to do my own authentication, what would I have to implement besides a secure connection for the login screen (which I would need anyway even if I am only transporting the pw to the AD), lookup of a password hash and a mechanism for password recovery (which will probably include manual identity verification, so no need for complex mTAN-like solutions)? And, if you have experience with such security-critical systems, which one would you use and why?

Read the article
Determining failing sectors on portable flash memory

- by Faxwell Mingleton

I'm trying to write a program that will detect signs of failure for portable flash memory devices (thumb drives, etc). I have seen tools in the past that are able to detect failing sectors and other kinds of trouble on conventional mechanical hard drives, but I fear that flash memory does not have the same kind of predictable low-level access to the hardware due to the internal workings of the storage. Things like wear-leveling and other block-remapping techniques (to skip over 'dead' sectors?) lead me to believe that determining if a flash drive is failing will be difficult at best, if not impossible (short of having constant read failures and device unmounts). Flash drives at their end-of-life should be easy to detect (constant CRC discrepancies during reads and all-out failure). But what about drives that might be failing early? Are there any tell-tale signs like slower throughput speeds that might indicate a flash drive is going to fail much sooner than normal? Along the lines of detecting potentially bad blocks, I had considered attempting random reads/writes to a file close to or exactly the size of the entire volume, but even then is it possible that the drive might report sizes under its maximum capacity to account for 'dead' blocks? In short, is there any way to circumvent or at least detect (algorithmically or otherwise) the use of block-remapping or other life extension techniques for flash memory? Let me end this question by expressing my uncertainty as to whether or not this belongs on serverfault.com . This is definitely a hardware-related question, but I also desire a software solution - preferably one that I can program myself. If this question is misplaced, I will be happy to migrate it to serverfault - but I do need a programming solution. Please let me know if you need clarification :) Thanks!

Read the article
Using AJAX to get a specific DOM Element (using Javascript, not jQuery)

- by Matt Frost

How do you use AJAX (in plain JavaScript, NOT jQuery) to get a page (same domain) and display just a specific DOM Element? (Such as the DOM element marked with the id of "bodyContent"). I'm working with MediaWiki 1.18, so my methods have to be a little less conventional (I know that a version of jQuery can be enabled for MediaWiki, but I don't have access to do that so I need to look at other options). I apologize if the code is messy, but there's a reason I need to build it this way. I'm mostly interested in seeing what can be done with the Javascript. Here's the HTML code: <div class="mediaWiki-AJAX"><a href="http://www.domain.com/whatever"></a></div> Here's the Javascript I have so far: var AJAXs = document.getElementsByClassName('mediaWiki-AJAX'); if (AJAXs.length > 0) { for (var i = AJAXs.length - 1; i >= 0; i--) { var URL = AJAXs[i].getElementsByTagName('a')[0].href; xmlhttp = new XMLHttpRequest(); xmlhttp.onreadystatechange = function() { if (xmlhttp.readyState == 4 && xmlhttp.status == 200) { AJAXs[i].innerHTML = xmlhttp.responseText; } } xmlhttp.open('GET',URL,true); xmlhttp.send(); } }

Read the article
When is a good time to start thinking about scaling?

- by Slokun

I've been designing a site over the past couple days, and been doing some research into different aspects of scaling a site horizontally. If things go as planned, in a few months (years?) I know I'd need to worry about scaling the site up and out, since the resources it would end up consuming would be huge. So, this got me to thinking, when is the best time to start thinking about, and designing for, scalability? If you start too early on, you could easily over complicate your design, and make it impossible to actually build. You could also get too caught up in the details, the architecture, whatever, and wind up getting nothing done. Also, if you do get it working, but the site never takes off, you may have wasted a good chunk of extra effort. On the other hand, you could be saving yourself a ton of effort down the road. Designing it from the ground up to be big would make it much easier later on to let it grow big, with very little rewriting going on. I know for what I'm working on, I've decided to make at least a few choices now on the side of scaling, but I'm not going to do a complete change of thinking to get it to scale completely. Notably, I've redesigned my database from a conventional relational design to one similar to what was suggested on the Reddit site linked below, and I'm going to give memcache a try. So, the basic question, when is a good time to start thinking or worrying about scaling, and what are some good designs, tips, etc. for when doing so? A couple of things I've been reading, for those who are interested: http://www.codinghorror.com/blog/2009/06/scaling-up-vs-scaling-out-hidden-costs.html http://highscalability.com/blog/2010/5/17/7-lessons-learned-while-building-reddit-to-270-million-page.html http://developer.yahoo.com/performance/rules.html

Read the article
Entity Framework in layered architecture

- by Kamyar

I am using a layered architecture with the Entity Framework. Here's What I came up with till now (All the projects Except UI are class library): Entities: The POCO Entities. Completely persistence ignorant. No Reference to other projects. Generated by Microsoft's ADO.Net POCO Entity Generator. DAL: The EDMX (Entity Model) file with the context class. (t4 generated). References: Entities BLL: Business Logic Layer. Will implement repository pattern on this layer. References: Entities, DAL. This is where the objectcontext gets populated: var ctx=new DAL.MyDBEntities(); UI: The presentation layer: ASP.NET website. References: Entities, BLL + a connection string entry to entities in the config file (question #2). Now my three questions: Is my layer discintion approach correct? In my UI, I access BLL as follows: var customerRep = new BLL.CustomerRepository(); var Customer = customerRep.GetByID(myCustomerID); The problem is that I have to define the entities connection string in my UI's web.config/app.config otherwise I get a runtime exception. IS defining the entities connectionstring in UI spoils the layers' distinction? Or is it accesptible in a muli layered architecture. Should I take any additional steps to perform chage tracking, lazy loading, etc (by etc I mean the features that Entity Framework covers in a conventional, 1 project, non POCO code generation)? Thanks and apologies for the lengthy question.

Read the article
SQL - Updating records based on most recent date

- by Remnant

I am having difficulty updating records within a database based on the most recent date and am looking for some guidance. By the way, I am new to SQL. As background, I have a windows forms application with SQL Express and am using ADO.NET to interact with the database. The application is designed to enable the user to track employee attendance on various courses that must be attended on a periodic basis (e.g. every 6 months, every year etc.). For example, they can pull back data to see the last time employees attended a given course and also update attendance dates if an employee has recently completed a course. I have three data tables: EmployeeDetailsTable - simple list of employees names, email address etc., each with unique ID CourseDetailsTable - simple list of courses, each with unique ID (e.g. 1, 2, 3 etc.) AttendanceRecordsTable - has 3 columns { EmployeeID, CourseID, AttendanceDate, Comments } For any given course, an employee will have an attendance history i.e. if the course needs to be attended each year then they will have one record for as many years as they have been at the company. What I want to be able to do is to update the 'Comments' field for a given employee and given course based on the most recent attendance date. What is the 'correct' SQL syntax for this? I have tried many things (like below) but cannot get it to work: UPDATE AttendanceRecordsTable SET Comments = @Comments WHERE AttendanceRecordsTable.EmployeeID = (SELECT EmployeeDetailsTable.EmployeeID FROM EmployeeDetailsTable WHERE (EmployeeDetailsTable.LastName =@ParameterLastName AND EmployeeDetailsTable.FirstName =@ParameterFirstName) AND AttendanceRecordsTable.CourseID = (SELECT CourseDetailsTable.CourseID FROM CourseDetailsTable WHERE CourseDetailsTable.CourseName =@CourseName)) GROUP BY MAX(AttendanceRecordsTable.LastDate) After much googling, I discovered that MAX is an aggregate function and so I need to use GROUP BY. I have also tried using the HAVING keyword but without success. Can anybody point me in the right direction? What is the 'conventional' syntax to update a database record based on the most recent date?

Read the article
Postgres error with Sinatra/Haml/DataMapper on Heroku

- by sevennineteen

I'm trying to move a simple Sinatra app over to Heroku. Migration of the Ruby app code and existing MySQL database using Taps went smoothly, but I'm getting the following Postgres error: PostgresError - ERROR: operator does not exist: text = integer LINE 1: ...d_at", "post_id" FROM "comments" WHERE ("post_id" IN (4, 17,... ^ HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts. It's evident that the problem is related to a type mismatch in the query, but this is being issued from a Haml template by the DataMapper ORM at a very high level of abstraction, so I'm not sure how I'd go about controlling this... Specifically, this seems to be throwing up on a call of p.comments from my Haml template, where p represents a given post. The Datamapper models are related as follows: class Post property :id, Serial ... has n, :comments end class Comment property :id, Serial ... belongs_to :post end This works fine on my local and current hosted environment using MySQL, but Postgres is clearly more strict. There must be hundreds of Datamapper & Haml apps running on Postgres DBs, and this model relationship is super-conventional, so hopefully someone has seen (and determined how to fix) this. Thanks!

Read the article
Populate asp.net MVC Index page with data from the database

- by Sunil Ramu

I have a web application in which I need to fetch data from the database and display in the index page. As you know, asp.net mvc has given options to edit delete etc... I need to populate the page using the conventional DB way and it uses a stored procedure to retrieve results. I dont want to use LINQ. This is my model entity class using System; using System.Collections.Generic; using System.Linq; using System.Web; namespace LogMVCApp.Models { public class Property { public int Id { get; set; } public string LogInId { get; set; } public string Username { get; set; } public string Action { get; set; } public string Information { get; set; } public bool Passed{get; set; } public string LogType { get; set; } } } and I need to retrieve data using something like this... var conString = ConfigurationManager.ConnectionStrings["connection"].ToString(); var conn = new SqlConnection(conString); var command = new SqlCommand("LogInsert", conn){CommandType=CommandType.StoredProcedure};

Read the article
implementing stretchable dialog borders in iphone sdk

- by Joey

Hi, I want to implement dialog borders that scale to the size I require the dialog to be. Perhaps there is a better more conventional name for this sort of thing. If there is, if someone would edit the title, that'd be great. Anyhow, I'd like to do this so I can have dialogs of any size without the visual artifacts that come with scaling border art to small, large, or wacky unproportional dimentions. I have a few ideas on how this is done, but am not sure which is better for iphone. I have a few questions. 1) Should I make a containing view object that basically overloads its drawRect method and draws the images where they should be at their appropriate scale when the method is called, or should I main a containing view object that simply contains 8 UIImageViews? I suspect the latter approach won't work if I need to actively scale the resulting dialog class like in an animation. 1b) If overloading drawRect is the way to go, does someone have some sample code or a link to an example that demonstrates drawing an image directly from drawRect()? 2) Is it generally better to create a) a 3 x 3 image where the segments are in their appropriate 1x1 grid of the image? If so, is it simple to draw from a portion of this image onto my target view in drawRect (if the former assumption is correct that I should use drawRect)? b) The pieces separately in 8 different files?

Read the article
Object Oriented Programming Problem

- by Danny Love

I'm designing a little CMS using PHP whilst putting OOP into practice. I've hit a problem though. I have a page class, whos constructor accepts a UID and a slug. This is then used to set the properties (unless the page don't exist in which case it would fail). Anyway, I wanted to implement a function to create a page, and I thought ... what's the best way to do this without overloading the constructor. What would the correct way, or more conventional method, of doing this be? My code is below: <?php class Page { private $dbc; private $title; private $description; private $image; private $tags; private $owner; private $timestamp; private $views; public function __construct($uid, $slug) { } public function getTitle() { return $this->title; } public function getDescription() { if($this->description != NULL) { return $this->description; } else { return false; } } public function getImage() { if($this->image != NULL) { return $this->image; } else { return false; } } public function getTags() { if($this->tags != NULL) { return $this->tags; } else { return false; } } public function getOwner() { return $this->owner; } public function getTimestamp() { return $this->timestamp; } public function getViews() { return $this->views; } public function createPage() { // Stuck? } }

Read the article
Is there a recommended way to communicate scientific/engineering programming to C developers?

- by ggkmath

Hi, I have a lot of MATLAB code that needs to get ported to C (execution speed is critical for this work) as part of a back-end process for a web application. When I attempt to outsource this code to a C developer, I assume (correct me if I'm wrong) few C developers also understand MATLAB code (things like indexing and memory management are different, etc.). I wonder if there are any C developers out there that can recommend a procedure for me to follow to best communicate what the code does? For example, should I provide the MATLAB code and explain what it's doing line by line? Or, should I just provide the math/algorithm, explain it in plain English, and let the C developer implement it with this understanding in his/her own way (e.g. can I assume the developer understands how to work with complex math (i.e. imaginary numbers), how to generate histograms, perform an FFT, etc.)? Or, is there a better method? I expect I'm not the first to need to do this, so I wonder if any C developers out there ran into this situation and can share any conventional wisdom how they'd like this task to be transferred? Thanks in advance for any comments.

Read the article
How to find a particular string in a paragraph in android?

- by user1448108

In my project the data is stored in html format along with image tag. For example the following data is stored in html format and it contains 2 to 3 images. Mother Teresa as she is commonly known, was born Agnes Gonxha Bojaxhiu. Although born on the 26 August 1910, she considered 27 August, the day she was baptized, to be her "true birthday". “By blood, I am Albanian. By citizenship, an Indian. By faith, I am a Catholic nun. As to my calling, I belong to the world. As to my heart, I belong entirely to the Heart of Jesus." img ----- src="image1.png" ---- Mother Teresa founded the Missionaries of Charity, a Roman Catholic religious congregation, which in 2012 consisted of over 4,500 sisters and is active in 133 countries.img ----- src="image2.png" ----. She was the recipient of numerous honours including the 1979 Nobel Peace Prize. She refused the conventional ceremonial banquet given to laureates, and asked that the $192,000 funds be given to the poor in India. img ----- src="image3.png" ---- Now from the above data I need to find how many images the paragraph has and need to get all the image names along with the extensions and should display them in android. Tried with splitting but did not work. Please help me regarding this. Thanks in advance

Read the article
Some ASP.NET and Access

- by Fazleh

Good Day all, I have a big problem but i think its minor for you guys in Stackoverflow. I am creating a web application that has two main parts. The Payment part and Requisition part. It backbone is using access and the script is in ASP.NET. I managed to sort out most of the application. But I have been having a few problems. I have pasted the link to the project in http://www.mediafire.com/download/p09fefreifidud3/Inyatsi.rar so it will be easy for someone to see what I am blabbing about. Now for my problems: The AddRequisition.aspx/AddPayment.aspx: both have a reference number. I wanted it to be unique number(but not a primary key). I wanted it to be in the following format: DDMMYY(TransactionNo)(UserID) eg: 24061101PK. I have tried and tried but have not been able to sort it out. The AmountINWords gets the value from Amount. It converts the Amount into words. Thats not all. It picks what currncy was picked in the CurrencyPaidIn and pust the respective currency inside. eg. 123.45 USD becomes One Hundred and twenty three dollars and forty five cents. I tried using queries but as you will see that went all wrong. Those are the only two things that I cant seem to get my head around. I do know that there are some things that are not conventional ASP.NET and some text boxes are not the right size. I was thinking of sorting out those after I get those two fixed because they are simple to do. I really need some help with this application please. If someone can just have a look at the code and add a few things here and there. Thanks in advance. Faz

Read the article
How can I measure my (SAMP) server's bandwidth usage?

- by enkrates

I'm running a Solaris server to serve PHP through Apache. What tools can I use to measure the bandwidth my server is currently using? I use Google analytics to measure traffic, but as far as I know, it ignores file size. I have a rough idea of the average size of the pages I serve, and can do a back-of-the-envelope calculation of my bandwidth usage by multiplying page views (from Google) by average page size, but I'm looking for a solution that is more rigorous and exact. Also, I'm not trying to throttle anything, or implement usage caps or anything like that. I'd just like to measure the bandwidth usage, so I know what it is. An example of what I'm after is the usage meter that Slicehost provides in their admin website for their users. They tell me (for another site I run) how much bandwidth I've used each month and also divide the usage for uploading and downloading. So, it seems like this data can be measured, and I'd like to be able to do it myself. To put it simply, what is the conventional method for measuring the bandwidth usage of my server?

Read the article
Generating dynamic data using Javascript

- by methuselah

Given that I have an array of alphabetical characters: var qwerty = [['q', 'w', 'e', 'r', 't', 'y', 'u', 'i', 'o', 'p'], ['a', 's', 'd', 'f', 'g', 'h', 'j', 'k', 'l'], ['z', 'x', 'c', 'v', 'b', 'n', 'm']]; How would I present them in three rows like on a conventional keyboard using JS without resorting to something like this: <input type='button' value='Q' id='bt1'/> <input type='button' value='W' id='bt2'/> <input type='button' value='E' id='bt3'/> <input type='button' value='R' id='bt4'/> <input type='button' value='T' id='bt5'/> <input type='button' value='Y' id='bt5'/> <input type='button' value='U' id='bt7'/> <input type='button' value='I' id='bt8'/> <input type='button' value='O' id='bt9'/> <input type='button' value='P' id='bt10'/> <br> <input type='button' value='A' id='bt11'/> <input type='button' value='S' id='bt12'/> <input type='button' value='D' id='bt13'/> <input type='button' value='F' id='bt14'/> <input type='button' value='G' id='bt15'/> <input type='button' value='H' id='bt16'/> <input type='button' value='J' id='bt17'/> <input type='button' value='K' id='bt18'/> <input type='button' value='L' id='bt19'/> <br /> ... Many thanks in advance!

Read the article
Troubleshooting an unstable internet connection

- by Konrad Rudolph

My MacBook Pro running OS X (10.9, but I had the same problem before) is connected to a Belkin router via WiFi and, using Virgin Media as the ISP, to the internet. The connection is extremely unstable – on some days, I get a ping timeout every few seconds. In addition, some domains seem to suffer general connectivity issues. For instance, I often find that while the youtube.com website loads, none of the videos (which are hosted on a separate domain) do. At other times, videos load but always fail to buffer, even though the actual connection speed is ok, even though I’ve disabled dash playback. Since I’m living in a rented room and the ISP contract isn’t actually mine I’ve got only limited possibilities of addressing the problem. In particular, I have no access to the router configuration and my non tech savvy landlady, while sympathetic, is not in a great hurry to hand the problem over to the ISP’s customer support. What’s more, I seem to be the only person in the house experiencing these problems – but I can imagine that this is simply because I’m the only one who’s using the internet continuously. I’m searching for specific tests that might be able to pinpoint – and ideally solve – the problem. So far all I’ve managed to do is establish that Virgin is routing my traffic in mysterious ways. Here’s an excerpt from traceroute google.co.uk. It’s worth mentioning that the host name doesn’t seem to matter a lot, the trace route is always the same. traceroute: Warning: google.co.uk has multiple addresses; using 62.254.36.148 traceroute to google.co.uk (62.254.36.148), 64 hops max, 52 byte packets 1 (192.168.2.1) 1.112 ms 1.300 ms 2.359 ms 2 10.100.32.1 (10.100.32.1) 11.926 ms 10.217 ms 24.987 ms 3 cmbg-core-1a-ae3-610.network.virginmedia.net (80.1.202.93) 28.809 ms * 66.653 ms 4 popl-bb-1b-ae16-0.network.virginmedia.net (212.43.163.141) 13.759 ms 126.504 ms 20.472 ms 5 nrth-bb-1b-et-010-0.network.virginmedia.net (62.253.175.57) 28.357 ms 16.398 ms 42.387 ms 6 nrth-bb-1c-ae1-0.network.virginmedia.net (62.253.174.110) 27.441 ms 15.622 ms 12.044 ms 7 lutn-icdn-1-ae0-0.network.virginmedia.net (62.253.175.82) 16.678 ms 28.463 ms 28.253 ms 8 * * * 9 * * * 10 * * * ^C If I let it, this goes on until the end of time. It never seems to reach a destination. Is this normal? A friend living in the same town who is also with Virgin Media has a more conventional traceroute output: 7 hops to google.co.uk, all of which send the ICMP TIME_EXCEEDED response. The obvious fix – rebooting the router – doesn’t seem to help. As far as I can tell, the WiFi connection is stable (I can always ping the router) so the problem is further downstream. I’ve tried using an alternative DNS before (OpenDNS) but if anything, this made things worse. In fact, it made all Google services nigh unreachable.

Read the article
Win XP error 0x80041003 using GetObject/winmgmts

- by John Lewis

My computer is called "neil" and I want to set some values using WMI in vbScript. I adapetd the script below from one supplied by Microsoft. When I run it in my browser I get Error Type: (0x80041003) /dressage/30/pdf2.asp, line 8 I suspect it is some registry/security setting. Any advice? John Lewis FULL SCRIPT call Print_HTML_Page("http://neil/dressage/ascii.asp", "ascii") Sub SetPDFFile(strPDFFile) Const HKEY_LOCAL_MACHINE = &H80000002 strKeyPath = "SOFTWARE\Dane Prairie Systems\Win2PDF" strComputer = "." Set objReg=GetObject( _ "winmgmts:{impersonationLevel=impersonate}!\\" & _ strComputer & "\root\default:StdRegProv") strValueName = "PDFFileName" objReg.SetExpandedStringValue HKEY_LOCAL_MACHINE,_ strKeyPath,strValueName,strPDFFile End Sub Sub Print_HTML_Page(strPathToPage, strPDFFile) SetPDFFile( strPDFFile ) Set objIE = CreateObject("InternetExplorer.Application") 'From http://www.tek-tips.com/viewthread.cfm?qid=1092473&page=5 On Error Resume Next strPrintStatus = objIE.QueryStatusWB(6) If Err.Number 0 Then MsgBox "Cannot find a printer. Operation aborted." objIE.Quit Set objIE = Nothing Exit Sub End If With objIE .visible=0 .left=200 .top=200 .height=400 .width=400 .menubar=0 .toolbar=1 .statusBar=0 .navigate strPathToPage End With 'Wait until IE has finished loading Do while objIE.busy WScript.Sleep 100 Loop On Error Goto 0 objIE.ExecWB 6,2 'Wait until IE has finished printing WScript.Sleep 2000 objIE.Quit Set objIE = Nothing End Sub Update: Thanks for your reply. The line breaks seem to have been introduced in the process of paasting into this form. Well spotted - I was using a PDF file name "ascii". I added a .pdf extension but still get the error. I suspect you're right that it's to do with admin rights. Here's more about the setup and what I'm trying to achieve. Win2pdf is a product for writing PDFs by works by simulating a Windows printer. You "print" the page, select win2pdf in the print dialog and it then asks for a file name. I have it installed on my pc (called Neil) and it works fine in this conventional way. My aim is to write an html page to a PDF file using win2pdf - but via ASP/vbscript/javascript rather than with manual intervention. The script for doing this was provided by win2PDF's tech support but when it did not work, that was the limit of their understanding. In the sample script the file ascii.asp just produces a table of ascii codes/characters. The URL given is on my own PC which has IIS set up to run scripts which it does fine. The error I get occurs on about the fourth line executed. I am logged in with full admin rights - I think! But I'm no expert. I hope this helps to give some more specific suggestions about how to check/fix the admin rights.

Read the article

Search Results

Search found 234 results on 10 pages for 'conventional'.

Page 8/10 | < Previous Page | 4 5 6 7 8 9 10 | Next Page >

- by john.rose

- by jamiet

- by Tony Davis

- by Manan Goel

- by Tony Davis

- by maria costanzo

- by Phil Factor

- by Duncan Mills

- by rumtscho

- by Faxwell Mingleton

- by Matt Frost

- by Slokun

- by Kamyar

- by Remnant

- by sevennineteen

- by Sunil Ramu

- by Joey

- by Danny Love

- by ggkmath

- by user1448108

- by Fazleh

- by enkrates

- by methuselah

- by Konrad Rudolph

- by John Lewis

< Previous Page | 4 5 6 7 8 9 10 | Next Page >