writes on - Page 67 - Developer IT

value types in the vm

- by john.rose

value types in the vm p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} p.p2 {margin: 0.0px 0.0px 14.0px 0.0px; font: 14.0px Times} p.p3 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times} p.p4 {margin: 0.0px 0.0px 15.0px 0.0px; font: 14.0px Times} p.p5 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier} p.p6 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier; min-height: 17.0px} p.p7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p8 {margin: 0.0px 0.0px 0.0px 36.0px; text-indent: -36.0px; font: 14.0px Times; min-height: 18.0px} p.p9 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p10 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; color: #000000} li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} li.li7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} span.s1 {font: 14.0px Courier} span.s2 {color: #000000} span.s3 {font: 14.0px Courier; color: #000000} ol.ol1 {list-style-type: decimal} Or, enduring values for a changing world. Introduction A value type is a data type which, generally speaking, is designed for being passed by value in and out of methods, and stored by value in data structures. The only value types which the Java language directly supports are the eight primitive types. Java indirectly and approximately supports value types, if they are implemented in terms of classes. For example, both Integer and String may be viewed as value types, especially if their usage is restricted to avoid operations appropriate to Object. In this note, we propose a definition of value types in terms of a design pattern for Java classes, accompanied by a set of usage restrictions. We also sketch the relation of such value types to tuple types (which are a JVM-level notion), and point out JVM optimizations that can apply to value types. This note is a thought experiment to extend the JVM’s performance model in support of value types. The demonstration has two phases. Initially the extension can simply use design patterns, within the current bytecode architecture, and in today’s Java language. But if the performance model is to be realized in practice, it will probably require new JVM bytecode features, changes to the Java language, or both. We will look at a few possibilities for these new features. An Axiom of Value In the context of the JVM, a value type is a data type equipped with construction, assignment, and equality operations, and a set of typed components, such that, whenever two variables of the value type produce equal corresponding values for their components, the values of the two variables cannot be distinguished by any JVM operation. Here are some corollaries: A value type is immutable, since otherwise a copy could be constructed and the original could be modified in one of its components, allowing the copies to be distinguished. Changing the component of a value type requires construction of a new value. The equals and hashCode operations are strictly component-wise. If a value type is represented by a JVM reference, that reference cannot be successfully synchronized on, and cannot be usefully compared for reference equality. A value type can be viewed in terms of what it doesn’t do. We can say that a value type omits all value-unsafe operations, which could violate the constraints on value types. These operations, which are ordinarily allowed for Java object types, are pointer equality comparison (the acmp instruction), synchronization (the monitor instructions), all the wait and notify methods of class Object, and non-trivial finalize methods. The clone method is also value-unsafe, although for value types it could be treated as the identity function. Finally, and most importantly, any side effect on an object (however visible) also counts as an value-unsafe operation. A value type may have methods, but such methods must not change the components of the value. It is reasonable and useful to define methods like toString, equals, and hashCode on value types, and also methods which are specifically valuable to users of the value type. Representations of Value Value types have two natural representations in the JVM, unboxed and boxed. An unboxed value consists of the components, as simple variables. For example, the complex number x=(1+2i), in rectangular coordinate form, may be represented in unboxed form by the following pair of variables: /*Complex x = Complex.valueOf(1.0, 2.0):*/ double x_re = 1.0, x_im = 2.0; These variables might be locals, parameters, or fields. Their association as components of a single value is not defined to the JVM. Here is a sample computation which computes the norm of the difference between two complex numbers: double distance(/*Complex x:*/ double x_re, double x_im, /*Complex y:*/ double y_re, double y_im) { /*Complex z = x.minus(y):*/ double z_re = x_re - y_re, z_im = x_im - y_im; /*return z.abs():*/ return Math.sqrt(z_re*z_re + z_im*z_im); } A boxed representation groups component values under a single object reference. The reference is to a ‘wrapper class’ that carries the component values in its fields. (A primitive type can naturally be equated with a trivial value type with just one component of that type. In that view, the wrapper class Integer can serve as a boxed representation of value type int.) The unboxed representation of complex numbers is practical for many uses, but it fails to cover several major use cases: return values, array elements, and generic APIs. The two components of a complex number cannot be directly returned from a Java function, since Java does not support multiple return values. The same story applies to array elements: Java has no ’array of structs’ feature. (Double-length arrays are a possible workaround for complex numbers, but not for value types with heterogeneous components.) By generic APIs I mean both those which use generic types, like Arrays.asList and those which have special case support for primitive types, like String.valueOf and PrintStream.println. Those APIs do not support unboxed values, and offer some problems to boxed values. Any ’real’ JVM type should have a story for returns, arrays, and API interoperability. The basic problem here is that value types fall between primitive types and object types. Value types are clearly more complex than primitive types, and object types are slightly too complicated. Objects are a little bit dangerous to use as value carriers, since object references can be compared for pointer equality, and can be synchronized on. Also, as many Java programmers have observed, there is often a performance cost to using wrapper objects, even on modern JVMs. Even so, wrapper classes are a good starting point for talking about value types. If there were a set of structural rules and restrictions which would prevent value-unsafe operations on value types, wrapper classes would provide a good notation for defining value types. This note attempts to define such rules and restrictions. Let’s Start Coding Now it is time to look at some real code. Here is a definition, written in Java, of a complex number value type. @ValueSafe public final class Complex implements java.io.Serializable { // immutable component structure: public final double re, im; private Complex(double re, double im) { this.re = re; this.im = im; } // interoperability methods: public String toString() { return "Complex("+re+","+im+")"; } public List<Double> asList() { return Arrays.asList(re, im); } public boolean equals(Complex c) { return re == c.re && im == c.im; } public boolean equals(@ValueSafe Object x) { return x instanceof Complex && equals((Complex) x); } public int hashCode() { return 31*Double.valueOf(re).hashCode() + Double.valueOf(im).hashCode(); } // factory methods: public static Complex valueOf(double re, double im) { return new Complex(re, im); } public Complex changeRe(double re2) { return valueOf(re2, im); } public Complex changeIm(double im2) { return valueOf(re, im2); } public static Complex cast(@ValueSafe Object x) { return x == null ? ZERO : (Complex) x; } // utility methods and constants: public Complex plus(Complex c) { return new Complex(re+c.re, im+c.im); } public Complex minus(Complex c) { return new Complex(re-c.re, im-c.im); } public double abs() { return Math.sqrt(re*re + im*im); } public static final Complex PI = valueOf(Math.PI, 0.0); public static final Complex ZERO = valueOf(0.0, 0.0); } This is not a minimal definition, because it includes some utility methods and other optional parts. The essential elements are as follows: The class is marked as a value type with an annotation. The class is final, because it does not make sense to create subclasses of value types. The fields of the class are all non-private and final. (I.e., the type is immutable and structurally transparent.) From the supertype Object, all public non-final methods are overridden. The constructor is private. Beyond these bare essentials, we can observe the following features in this example, which are likely to be typical of all value types: One or more factory methods are responsible for value creation, including a component-wise valueOf method. There are utility methods for complex arithmetic and instance creation, such as plus and changeIm. There are static utility constants, such as PI. The type is serializable, using the default mechanisms. There are methods for converting to and from dynamically typed references, such as asList and cast. The Rules In order to use value types properly, the programmer must avoid value-unsafe operations. A helpful Java compiler should issue errors (or at least warnings) for code which provably applies value-unsafe operations, and should issue warnings for code which might be correct but does not provably avoid value-unsafe operations. No such compilers exist today, but to simplify our account here, we will pretend that they do exist. A value-safe type is any class, interface, or type parameter marked with the @ValueSafe annotation, or any subtype of a value-safe type. If a value-safe class is marked final, it is in fact a value type. All other value-safe classes must be abstract. The non-static fields of a value class must be non-public and final, and all its constructors must be private. Under the above rules, a standard interface could be helpful to define value types like Complex. Here is an example: @ValueSafe public interface ValueType extends java.io.Serializable { // All methods listed here must get redefined. // Definitions must be value-safe, which means // they may depend on component values only. List<? extends Object> asList(); int hashCode(); boolean equals(@ValueSafe Object c); String toString(); } //@ValueSafe inherited from supertype: public final class Complex implements ValueType { … The main advantage of such a conventional interface is that (unlike an annotation) it is reified in the runtime type system. It could appear as an element type or parameter bound, for facilities which are designed to work on value types only. More broadly, it might assist the JVM to perform dynamic enforcement of the rules for value types. Besides types, the annotation @ValueSafe can mark fields, parameters, local variables, and methods. (This is redundant when the type is also value-safe, but may be useful when the type is Object or another supertype of a value type.) Working forward from these annotations, an expression E is defined as value-safe if it satisfies one or more of the following: The type of E is a value-safe type. E names a field, parameter, or local variable whose declaration is marked @ValueSafe. E is a call to a method whose declaration is marked @ValueSafe. E is an assignment to a value-safe variable, field reference, or array reference. E is a cast to a value-safe type from a value-safe expression. E is a conditional expression E0 ? E1 : E2, and both E1 and E2 are value-safe. Assignments to value-safe expressions and initializations of value-safe names must take their values from value-safe expressions. A value-safe expression may not be the subject of a value-unsafe operation. In particular, it cannot be synchronized on, nor can it be compared with the “==” operator, not even with a null or with another value-safe type. In a program where all of these rules are followed, no value-type value will be subject to a value-unsafe operation. Thus, the prime axiom of value types will be satisfied, that no two value type will be distinguishable as long as their component values are equal. More Code To illustrate these rules, here are some usage examples for Complex: Complex pi = Complex.valueOf(Math.PI, 0); Complex zero = pi.changeRe(0); //zero = pi; zero.re = 0; ValueType vtype = pi; @SuppressWarnings("value-unsafe") Object obj = pi; @ValueSafe Object obj2 = pi; obj2 = new Object(); // ok List<Complex> clist = new ArrayList<Complex>(); clist.add(pi); // (ok assuming List.add param is @ValueSafe) List<ValueType> vlist = new ArrayList<ValueType>(); vlist.add(pi); // (ok) List<Object> olist = new ArrayList<Object>(); olist.add(pi); // warning: "value-unsafe" boolean z = pi.equals(zero); boolean z1 = (pi == zero); // error: reference comparison on value type boolean z2 = (pi == null); // error: reference comparison on value type boolean z3 = (pi == obj2); // error: reference comparison on value type synchronized (pi) { } // error: synch of value, unpredictable result synchronized (obj2) { } // unpredictable result Complex qq = pi; qq = null; // possible NPE; warning: “null-unsafe" qq = (Complex) obj; // warning: “null-unsafe" qq = Complex.cast(obj); // OK @SuppressWarnings("null-unsafe") Complex empty = null; // possible NPE qq = empty; // possible NPE (null pollution) The Payoffs It follows from this that either the JVM or the java compiler can replace boxed value-type values with unboxed ones, without affecting normal computations. Fields and variables of value types can be split into their unboxed components. Non-static methods on value types can be transformed into static methods which take the components as value parameters. Some common questions arise around this point in any discussion of value types. Why burden the programmer with all these extra rules? Why not detect programs automagically and perform unboxing transparently? The answer is that it is easy to break the rules accidently unless they are agreed to by the programmer and enforced. Automatic unboxing optimizations are tantalizing but (so far) unreachable ideal. In the current state of the art, it is possible exhibit benchmarks in which automatic unboxing provides the desired effects, but it is not possible to provide a JVM with a performance model that assures the programmer when unboxing will occur. This is why I’m writing this note, to enlist help from, and provide assurances to, the programmer. Basically, I’m shooting for a good set of user-supplied “pragmas” to frame the desired optimization. Again, the important thing is that the unboxing must be done reliably, or else programmers will have no reason to work with the extra complexity of the value-safety rules. There must be a reasonably stable performance model, wherein using a value type has approximately the same performance characteristics as writing the unboxed components as separate Java variables. There are some rough corners to the present scheme. Since Java fields and array elements are initialized to null, value-type computations which incorporate uninitialized variables can produce null pointer exceptions. One workaround for this is to require such variables to be null-tested, and the result replaced with a suitable all-zero value of the value type. That is what the “cast” method does above. Generically typed APIs like List<T> will continue to manipulate boxed values always, at least until we figure out how to do reification of generic type instances. Use of such APIs will elicit warnings until their type parameters (and/or relevant members) are annotated or typed as value-safe. Retrofitting List<T> is likely to expose flaws in the present scheme, which we will need to engineer around. Here are a couple of first approaches: public interface java.util.List<@ValueSafe T> extends Collection<T> { … public interface java.util.List<T extends Object|ValueType> extends Collection<T> { … (The second approach would require disjunctive types, in which value-safety is “contagious” from the constituent types.) With more transformations, the return value types of methods can also be unboxed. This may require significant bytecode-level transformations, and would work best in the presence of a bytecode representation for multiple value groups, which I have proposed elsewhere under the title “Tuples in the VM”. But for starters, the JVM can apply this transformation under the covers, to internally compiled methods. This would give a way to express multiple return values and structured return values, which is a significant pain-point for Java programmers, especially those who work with low-level structure types favored by modern vector and graphics processors. The lack of multiple return values has a strong distorting effect on many Java APIs. Even if the JVM fails to unbox a value, there is still potential benefit to the value type. Clustered computing systems something have copy operations (serialization or something similar) which apply implicitly to command operands. When copying JVM objects, it is extremely helpful to know when an object’s identity is important or not. If an object reference is a copied operand, the system may have to create a proxy handle which points back to the original object, so that side effects are visible. Proxies must be managed carefully, and this can be expensive. On the other hand, value types are exactly those types which a JVM can “copy and forget” with no downside. Array types are crucial to bulk data interfaces. (As data sizes and rates increase, bulk data becomes more important than scalar data, so arrays are definitely accompanying us into the future of computing.) Value types are very helpful for adding structure to bulk data, so a successful value type mechanism will make it easier for us to express richer forms of bulk data. Unboxing arrays (i.e., arrays containing unboxed values) will provide better cache and memory density, and more direct data movement within clustered or heterogeneous computing systems. They require the deepest transformations, relative to today’s JVM. There is an impedance mismatch between value-type arrays and Java’s covariant array typing, so compromises will need to be struck with existing Java semantics. It is probably worth the effort, since arrays of unboxed value types are inherently more memory-efficient than standard Java arrays, which rely on dependent pointer chains. It may be sufficient to extend the “value-safe” concept to array declarations, and allow low-level transformations to change value-safe array declarations from the standard boxed form into an unboxed tuple-based form. Such value-safe arrays would not be convertible to Object[] arrays. Certain connection points, such as Arrays.copyOf and System.arraycopy might need additional input/output combinations, to allow smooth conversion between arrays with boxed and unboxed elements. Alternatively, the correct solution may have to wait until we have enough reification of generic types, and enough operator overloading, to enable an overhaul of Java arrays. Implicit Method Definitions The example of class Complex above may be unattractively complex. I believe most or all of the elements of the example class are required by the logic of value types. If this is true, a programmer who writes a value type will have to write lots of error-prone boilerplate code. On the other hand, I think nearly all of the code (except for the domain-specific parts like plus and minus) can be implicitly generated. Java has a rule for implicitly defining a class’s constructor, if no it defines no constructors explicitly. Likewise, there are rules for providing default access modifiers for interface members. Because of the highly regular structure of value types, it might be reasonable to perform similar implicit transformations on value types. Here’s an example of a “highly implicit” definition of a complex number type: public class Complex implements ValueType { // implicitly final public double re, im; // implicitly public final //implicit methods are defined elementwise from te fields: // toString, asList, equals(2), hashCode, valueOf, cast //optionally, explicit methods (plus, abs, etc.) would go here } In other words, with the right defaults, a simple value type definition can be a one-liner. The observant reader will have noticed the similarities (and suitable differences) between the explicit methods above and the corresponding methods for List<T>. Another way to abbreviate such a class would be to make an annotation the primary trigger of the functionality, and to add the interface(s) implicitly: public @ValueType class Complex { … // implicitly final, implements ValueType (But to me it seems better to communicate the “magic” via an interface, even if it is rooted in an annotation.) Implicitly Defined Value Types So far we have been working with nominal value types, which is to say that the sequence of typed components is associated with a name and additional methods that convey the intention of the programmer. A simple ordered pair of floating point numbers can be variously interpreted as (to name a few possibilities) a rectangular or polar complex number or Cartesian point. The name and the methods convey the intended meaning. But what if we need a truly simple ordered pair of floating point numbers, without any further conceptual baggage? Perhaps we are writing a method (like “divideAndRemainder”) which naturally returns a pair of numbers instead of a single number. Wrapping the pair of numbers in a nominal type (like “QuotientAndRemainder”) makes as little sense as wrapping a single return value in a nominal type (like “Quotient”). What we need here are structural value types commonly known as tuples. For the present discussion, let us assign a conventional, JVM-friendly name to tuples, roughly as follows: public class java.lang.tuple.$DD extends java.lang.tuple.Tuple { double $1, $2; } Here the component names are fixed and all the required methods are defined implicitly. The supertype is an abstract class which has suitable shared declarations. The name itself mentions a JVM-style method parameter descriptor, which may be “cracked” to determine the number and types of the component fields. The odd thing about such a tuple type (and structural types in general) is it must be instantiated lazily, in response to linkage requests from one or more classes that need it. The JVM and/or its class loaders must be prepared to spin a tuple type on demand, given a simple name reference, $xyz, where the xyz is cracked into a series of component types. (Specifics of naming and name mangling need some tasteful engineering.) Tuples also seem to demand, even more than nominal types, some support from the language. (This is probably because notations for non-nominal types work best as combinations of punctuation and type names, rather than named constructors like Function3 or Tuple2.) At a minimum, languages with tuples usually (I think) have some sort of simple bracket notation for creating tuples, and a corresponding pattern-matching syntax (or “destructuring bind”) for taking tuples apart, at least when they are parameter lists. Designing such a syntax is no simple thing, because it ought to play well with nominal value types, and also with pre-existing Java features, such as method parameter lists, implicit conversions, generic types, and reflection. That is a task for another day. Other Use Cases Besides complex numbers and simple tuples there are many use cases for value types. Many tuple-like types have natural value-type representations. These include rational numbers, point locations and pixel colors, and various kinds of dates and addresses. Other types have a variable-length ‘tail’ of internal values. The most common example of this is String, which is (mathematically) a sequence of UTF-16 character values. Similarly, bit vectors, multiple-precision numbers, and polynomials are composed of sequences of values. Such types include, in their representation, a reference to a variable-sized data structure (often an array) which (somehow) represents the sequence of values. The value type may also include ’header’ information. Variable-sized values often have a length distribution which favors short lengths. In that case, the design of the value type can make the first few values in the sequence be direct ’header’ fields of the value type. In the common case where the header is enough to represent the whole value, the tail can be a shared null value, or even just a null reference. Note that the tail need not be an immutable object, as long as the header type encapsulates it well enough. This is the case with String, where the tail is a mutable (but never mutated) character array. Field types and their order must be a globally visible part of the API. The structure of the value type must be transparent enough to have a globally consistent unboxed representation, so that all callers and callees agree about the type and order of components that appear as parameters, return types, and array elements. This is a trade-off between efficiency and encapsulation, which is forced on us when we remove an indirection enjoyed by boxed representations. A JVM-only transformation would not care about such visibility, but a bytecode transformation would need to take care that (say) the components of complex numbers would not get swapped after a redefinition of Complex and a partial recompile. Perhaps constant pool references to value types need to declare the field order as assumed by each API user. This brings up the delicate status of private fields in a value type. It must always be possible to load, store, and copy value types as coordinated groups, and the JVM performs those movements by moving individual scalar values between locals and stack. If a component field is not public, what is to prevent hostile code from plucking it out of the tuple using a rogue aload or astore instruction? Nothing but the verifier, so we may need to give it more smarts, so that it treats value types as inseparable groups of stack slots or locals (something like long or double). My initial thought was to make the fields always public, which would make the security problem moot. But public is not always the right answer; consider the case of String, where the underlying mutable character array must be encapsulated to prevent security holes. I believe we can win back both sides of the tradeoff, by training the verifier never to split up the components in an unboxed value. Just as the verifier encapsulates the two halves of a 64-bit primitive, it can encapsulate the the header and body of an unboxed String, so that no code other than that of class String itself can take apart the values. Similar to String, we could build an efficient multi-precision decimal type along these lines: public final class DecimalValue extends ValueType { protected final long header; protected private final BigInteger digits; public DecimalValue valueOf(int value, int scale) { assert(scale >= 0); return new DecimalValue(((long)value << 32) + scale, null); } public DecimalValue valueOf(long value, int scale) { if (value == (int) value) return valueOf((int)value, scale); return new DecimalValue(-scale, new BigInteger(value)); } } Values of this type would be passed between methods as two machine words. Small values (those with a significand which fits into 32 bits) would be represented without any heap data at all, unless the DecimalValue itself were boxed. (Note the tension between encapsulation and unboxing in this case. It would be better if the header and digits fields were private, but depending on where the unboxing information must “leak”, it is probably safer to make a public revelation of the internal structure.) Note that, although an array of Complex can be faked with a double-length array of double, there is no easy way to fake an array of unboxed DecimalValues. (Either an array of boxed values or a transposed pair of homogeneous arrays would be reasonable fallbacks, in a current JVM.) Getting the full benefit of unboxing and arrays will require some new JVM magic. Although the JVM emphasizes portability, system dependent code will benefit from using machine-level types larger than 64 bits. For example, the back end of a linear algebra package might benefit from value types like Float4 which map to stock vector types. This is probably only worthwhile if the unboxing arrays can be packed with such values. More Daydreams A more finely-divided design for dynamic enforcement of value safety could feature separate marker interfaces for each invariant. An empty marker interface Unsynchronizable could cause suitable exceptions for monitor instructions on objects in marked classes. More radically, a Interchangeable marker interface could cause JVM primitives that are sensitive to object identity to raise exceptions; the strangest result would be that the acmp instruction would have to be specified as raising an exception. @ValueSafe public interface ValueType extends java.io.Serializable, Unsynchronizable, Interchangeable { … public class Complex implements ValueType { // inherits Serializable, Unsynchronizable, Interchangeable, @ValueSafe … It seems possible that Integer and the other wrapper types could be retro-fitted as value-safe types. This is a major change, since wrapper objects would be unsynchronizable and their references interchangeable. It is likely that code which violates value-safety for wrapper types exists but is uncommon. It is less plausible to retro-fit String, since the prominent operation String.intern is often used with value-unsafe code. We should also reconsider the distinction between boxed and unboxed values in code. The design presented above obscures that distinction. As another thought experiment, we could imagine making a first class distinction in the type system between boxed and unboxed representations. Since only primitive types are named with a lower-case initial letter, we could define that the capitalized version of a value type name always refers to the boxed representation, while the initial lower-case variant always refers to boxed. For example: complex pi = complex.valueOf(Math.PI, 0); Complex boxPi = pi; // convert to boxed myList.add(boxPi); complex z = myList.get(0); // unbox Such a convention could perhaps absorb the current difference between int and Integer, double and Double. It might also allow the programmer to express a helpful distinction among array types. As said above, array types are crucial to bulk data interfaces, but are limited in the JVM. Extending arrays beyond the present limitations is worth thinking about; for example, the Maxine JVM implementation has a hybrid object/array type. Something like this which can also accommodate value type components seems worthwhile. On the other hand, does it make sense for value types to contain short arrays? And why should random-access arrays be the end of our design process, when bulk data is often sequentially accessed, and it might make sense to have heterogeneous streams of data as the natural “jumbo” data structure. These considerations must wait for another day and another note. More Work It seems to me that a good sequence for introducing such value types would be as follows: Add the value-safety restrictions to an experimental version of javac. Code some sample applications with value types, including Complex and DecimalValue. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. A staggered roll-out like this would decouple language changes from bytecode changes, which is always a convenient thing. A similar investigation should be applied (concurrently) to array types. In this case, it seems to me that the starting point is in the JVM: Add an experimental unboxing array data structure to a production JVM, perhaps along the lines of Maxine hybrids. No bytecode or language support is required at first; everything can be done with encapsulated unsafe operations and/or method handles. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. That’s enough musing me for now. Back to work!

Read the article

Building a better mouse-trap – Improving the creation of XML Message Requests using Reflection, XML & XSLT

- by paulschapman

Introduction The way I previously created messages to send to the GovTalk service I used the XMLDocument to create the request. While this worked it left a number of problems; not least that for every message a special function would need to created. This is OK for the short term but the biggest cost in any software project is maintenance and this would be a headache to maintain. So the following is a somewhat better way of achieving the same thing. For the purposes of this article I am going to be using the CompanyNumberSearch request of the GovTalk service – although this technique would work for any service that accepted XML. The C# functions which send and receive the messages remain the same. The magic sauce in this is the XSLT which defines the structure of the request, and the use of objects in conjunction with reflection to provide the content. It is a bit like Sweet Chilli Sauce added to Chicken on a bed of rice. So on to the Sweet Chilli Sauce The Sweet Chilli Sauce The request to search for a company based on it’s number is as follows; <GovTalkMessage xsi:schemaLocation="http://www.govtalk.gov.uk/CM/envelope http://xmlgw.companieshouse.gov.uk/v1-0/schema/Egov_ch-v2-0.xsd" xmlns="http://www.govtalk.gov.uk/CM/envelope" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:gt="http://www.govtalk.gov.uk/schemas/govtalk/core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > <EnvelopeVersion>1.0</EnvelopeVersion> <Header> <MessageDetails> <Class>NumberSearch</Class> <Qualifier>request</Qualifier> <TransactionID>1</TransactionID> </MessageDetails> <SenderDetails> <IDAuthentication> <SenderID>????????????????????????????????</SenderID> <Authentication> <Method>CHMD5</Method> <Value>????????????????????????????????</Value> </Authentication> </IDAuthentication> </SenderDetails> </Header> <GovTalkDetails> <Keys/> </GovTalkDetails> <Body> <NumberSearchRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://xmlgw.companieshouse.gov.uk/v1-0/schema/NumberSearch.xsd"> <PartialCompanyNumber>99999999</PartialCompanyNumber> <DataSet>LIVE</DataSet> <SearchRows>1</SearchRows> </NumberSearchRequest> </Body> </GovTalkMessage> This is the XML that we send to the GovTalk Service and we get back a list of companies that match the criteria passed A message is structured in two parts; The envelope which identifies the person sending the request, with the name of the request, and the body which gives the detail of the company we are looking for. The Chilli What makes it possible is the use of XSLT to define the message – and serialization to convert each request object into XML. To start we need to create an object which will represent the contents of the message we are sending. However there is a common properties in all the messages that we send to Companies House. These properties are as follows SenderId – the id of the person sending the message SenderPassword – the password associated with Id TransactionId – Unique identifier for the message AuthenticationValue – authenticates the request Because these properties are unique to the Companies House message, and because they are shared with all messages they are perfect candidates for a base class. The class is as follows; using System; using System.Collections.Generic; using System.Linq; using System.Web; using System.Security.Cryptography; using System.Text; using System.Text.RegularExpressions; using Microsoft.WindowsAzure.ServiceRuntime; namespace CompanyHub.Services { public class GovTalkRequest { public GovTalkRequest() { try { SenderID = RoleEnvironment.GetConfigurationSettingValue("SenderId"); SenderPassword = RoleEnvironment.GetConfigurationSettingValue("SenderPassword"); TransactionId = DateTime.Now.Ticks.ToString(); AuthenticationValue = EncodePassword(String.Format("{0}{1}{2}", SenderID, SenderPassword, TransactionId)); } catch (System.Exception ex) { throw ex; } } /// <summary> /// returns the Sender ID to be used when communicating with the GovTalk Service /// </summary> public String SenderID { get; set; } /// <summary> /// return the password to be used when communicating with the GovTalk Service /// </summary> public String SenderPassword { get; set; } // end SenderPassword /// <summary> /// Transaction Id - uses the Time and Date converted to Ticks /// </summary> public String TransactionId { get; set; } // end TransactionId /// <summary> /// calculate the authentication value that will be used when /// communicating with /// </summary> public String AuthenticationValue { get; set; } // end AuthenticationValue property /// <summary> /// encodes password(s) using MD5 /// </summary> /// <param name="clearPassword"></param> /// <returns></returns> public static String EncodePassword(String clearPassword) { MD5CryptoServiceProvider md5Hasher = new MD5CryptoServiceProvider(); byte[] hashedBytes; UTF32Encoding encoder = new UTF32Encoding(); hashedBytes = md5Hasher.ComputeHash(ASCIIEncoding.Default.GetBytes(clearPassword)); String result = Regex.Replace(BitConverter.ToString(hashedBytes), "-", "").ToLower(); return result; } } } There is nothing particularly clever here, except for the EncodePassword method which hashes the value made up of the SenderId, Password and Transaction id. Each message inherits from this object. So for the Company Number Search in addition to the properties above we need a partial number, which dataset to search – for the purposes of the project we only need to search the LIVE set so this can be set in the constructor and the SearchRows. Again all are set as properties. With the SearchRows and DataSet initialized in the constructor. public class CompanyNumberSearchRequest : GovTalkRequest, IDisposable { /// <summary> /// /// </summary> public CompanyNumberSearchRequest() : base() { DataSet = "LIVE"; SearchRows = 1; } /// <summary> /// Company Number to search against /// </summary> public String PartialCompanyNumber { get; set; } /// <summary> /// What DataSet should be searched for the company /// </summary> public String DataSet { get; set; } /// <summary> /// How many rows should be returned /// </summary> public int SearchRows { get; set; } public void Dispose() { DataSet = String.Empty; PartialCompanyNumber = String.Empty; DataSet = "LIVE"; SearchRows = 1; } } As well as inheriting from our base class, I have also inherited from IDisposable – not just because it is just plain good practice to dispose of objects when coding, but it gives also gives us more versatility when using the object. There are four stages in making a request and this is reflected in the four methods we execute in making a call to the Companies House service; Create a request Send a request Check the status If OK then get the results of the request I’ve implemented each of these stages within a static class called Toolbox – which also means I don’t need to create an instance of the class to use it. When making a request there are three stages; Get the template for the message Serialize the object representing the message Transform the serialized object using a predefined XSLT file. Each of my templates I have defined as an embedded resource. When retrieving a resource of this kind we have to include the full namespace to the resource. In making the code re-usable as much as possible I defined the full ‘path’ within the GetRequest method. requestFile = String.Format("CompanyHub.Services.Schemas.{0}", RequestFile); So we now have the full path of the file within the assembly. Now all we need do is retrieve the assembly and get the resource. asm = Assembly.GetExecutingAssembly(); sr = asm.GetManifestResourceStream(requestFile); Once retrieved So this can be returned to the calling function and we now have a stream of XSLT to define the message. Time now to serialize the request to create the other side of this message. // Serialize object containing Request, Load into XML Document t = Obj.GetType(); ms = new MemoryStream(); serializer = new XmlSerializer(t); xmlTextWriter = new XmlTextWriter(ms, Encoding.ASCII); serializer.Serialize(xmlTextWriter, Obj); ms = (MemoryStream)xmlTextWriter.BaseStream; GovTalkRequest = Toolbox.ConvertByteArrayToString(ms.ToArray()); First off we need the type of the object so we make a call to the GetType method of the object containing the Message properties. Next we need a MemoryStream, XmlSerializer and an XMLTextWriter so these can be initialized. The object is serialized by making the call to the Serialize method of the serializer object. The result of that is then converted into a MemoryStream. That MemoryStream is then converted into a string. ConvertByteArrayToString This is a fairly simple function which uses an ASCIIEncoding object found within the System.Text namespace to convert an array of bytes into a string. public static String ConvertByteArrayToString(byte[] bytes) { System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding(); return enc.GetString(bytes); } I only put it into a function because I will be using this in various places. The Sauce When adding support for other messages outside of creating a new object to store the properties of the message, the C# components do not need to change. It is in the XSLT file that the versatility of the technique lies. The XSLT file determines the format of the message. For the CompanyNumberSearch the XSLT file is as follows; <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <GovTalkMessage xsi:schemaLocation="http://www.govtalk.gov.uk/CM/envelope http://xmlgw.companieshouse.gov.uk/v1-0/schema/Egov_ch-v2-0.xsd" xmlns="http://www.govtalk.gov.uk/CM/envelope" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:gt="http://www.govtalk.gov.uk/schemas/govtalk/core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > <EnvelopeVersion>1.0</EnvelopeVersion> <Header> <MessageDetails> <Class>NumberSearch</Class> <Qualifier>request</Qualifier> <TransactionID> <xsl:value-of select="CompanyNumberSearchRequest/TransactionId"/> </TransactionID> </MessageDetails> <SenderDetails> <IDAuthentication> <SenderID><xsl:value-of select="CompanyNumberSearchRequest/SenderID"/></SenderID> <Authentication> <Method>CHMD5</Method> <Value> <xsl:value-of select="CompanyNumberSearchRequest/AuthenticationValue"/> </Value> </Authentication> </IDAuthentication> </SenderDetails> </Header> <GovTalkDetails> <Keys/> </GovTalkDetails> <Body> <NumberSearchRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://xmlgw.companieshouse.gov.uk/v1-0/schema/NumberSearch.xsd"> <PartialCompanyNumber> <xsl:value-of select="CompanyNumberSearchRequest/PartialCompanyNumber"/> </PartialCompanyNumber> <DataSet> <xsl:value-of select="CompanyNumberSearchRequest/DataSet"/> </DataSet> <SearchRows> <xsl:value-of select="CompanyNumberSearchRequest/SearchRows"/> </SearchRows> </NumberSearchRequest> </Body> </GovTalkMessage> </xsl:template> </xsl:stylesheet> The outer two tags define that this is a XSLT stylesheet and the root tag from which the nodes are searched for. The GovTalkMessage is the format of the message that will be sent to Companies House. We first set up the XslCompiledTransform object which will transform the XSLT template and the serialized object into the request to Companies House. xslt = new XslCompiledTransform(); resultStream = new MemoryStream(); writer = new XmlTextWriter(resultStream, Encoding.ASCII); doc = new XmlDocument(); The Serialize method require XmlTextWriter to write the XML (writer) and a stream to place the transferred object into (writer). The XML will be loaded into an XMLDocument object (doc) prior to the transformation. // create XSLT Template xslTemplate = Toolbox.GetRequest(Template); xslTemplate.Seek(0, SeekOrigin.Begin); templateReader = XmlReader.Create(xslTemplate); xslt.Load(templateReader); I have stored all the templates as a series of Embedded Resources and the GetRequestCall takes the name of the template and extracts the relevent XSLT file. /// <summary> /// Gets the framwork XML which makes the request /// </summary> /// <param name="RequestFile"></param> /// <returns></returns> public static Stream GetRequest(String RequestFile) { String requestFile = String.Empty; Stream sr = null; Assembly asm = null; try { requestFile = String.Format("CompanyHub.Services.Schemas.{0}", RequestFile); asm = Assembly.GetExecutingAssembly(); sr = asm.GetManifestResourceStream(requestFile); } catch (Exception) { throw; } finally { asm = null; } return sr; } // end private static stream GetRequest We first take the template name and expand it to include the full namespace to the Embedded Resource I like to keep all my schemas in the same directory and so the namespace reflects this. The rest is the default namespace for the project. Then we get the currently executing assembly (which will contain the resources with the call to GetExecutingAssembly() ) Finally we get a stream which contains the XSLT file. We use this stream and then load an XmlReader with the contents of the template, and that is in turn loaded into the XslCompiledTransform object. We convert the object containing the message properties into Xml by serializing it; calling the Serialize() method of the XmlSerializer object. To set up the object we do the following; t = Obj.GetType(); ms = new MemoryStream(); serializer = new XmlSerializer(t); xmlTextWriter = new XmlTextWriter(ms, Encoding.ASCII); We first determine the type of the object being transferred by calling GetType() We create an XmlSerializer object by passing the type of the object being serialized. The serializer writes to a memory stream and that is linked to an XmlTextWriter. Next job is to serialize the object and load it into an XmlDocument. serializer.Serialize(xmlTextWriter, Obj); ms = (MemoryStream)xmlTextWriter.BaseStream; xmlRequest = new XmlTextReader(ms); GovTalkRequest = Toolbox.ConvertByteArrayToString(ms.ToArray()); doc.LoadXml(GovTalkRequest); Time to transform the XML to construct the full request. xslt.Transform(doc, writer); resultStream.Seek(0, SeekOrigin.Begin); request = Toolbox.ConvertByteArrayToString(resultStream.ToArray()); So that creates the full request to be sent to Companies House. Sending the request So far we have a string with a request for the Companies House service. Now we need to send the request to the Companies House Service. Configuration within an Azure project There are entire blog entries written about configuration within an Azure project – most of this is out of scope for this article but the following is a summary. Configuration is defined in two files within the parent project *.csdef which contains the definition of configuration setting. <?xml version="1.0" encoding="utf-8"?> <ServiceDefinition name="OnlineCompanyHub" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition"> <WebRole name="CompanyHub.Host"> <InputEndpoints> <InputEndpoint name="HttpIn" protocol="http" port="80" /> </InputEndpoints> <ConfigurationSettings> <Setting name="DiagnosticsConnectionString" /> <Setting name="DataConnectionString" /> </ConfigurationSettings> </WebRole> <WebRole name="CompanyHub.Services"> <InputEndpoints> <InputEndpoint name="HttpIn" protocol="http" port="8080" /> </InputEndpoints> <ConfigurationSettings> <Setting name="DiagnosticsConnectionString" /> <Setting name="SenderId"/> <Setting name="SenderPassword" /> <Setting name="GovTalkUrl"/> </ConfigurationSettings> </WebRole> <WorkerRole name="CompanyHub.Worker"> <ConfigurationSettings> <Setting name="DiagnosticsConnectionString" /> </ConfigurationSettings> </WorkerRole> </ServiceDefinition> Above is the configuration definition from the project. What we are interested in however is the ConfigurationSettings tag of the CompanyHub.Services WebRole. There are four configuration settings here, but at the moment we are interested in the second to forth settings; SenderId, SenderPassword and GovTalkUrl The value of these settings are defined in the ServiceDefinition.cscfg file; <?xml version="1.0"?> <ServiceConfiguration serviceName="OnlineCompanyHub" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration"> <Role name="CompanyHub.Host"> <Instances count="2" /> <ConfigurationSettings> <Setting name="DiagnosticsConnectionString" value="UseDevelopmentStorage=true" /> <Setting name="DataConnectionString" value="UseDevelopmentStorage=true" /> </ConfigurationSettings> </Role> <Role name="CompanyHub.Services"> <Instances count="2" /> <ConfigurationSettings> <Setting name="DiagnosticsConnectionString" value="UseDevelopmentStorage=true" /> <Setting name="SenderId" value="UserID"/> <Setting name="SenderPassword" value="Password"/> <Setting name="GovTalkUrl" value="http://xmlgw.companieshouse.gov.uk/v1-0/xmlgw/Gateway"/> </ConfigurationSettings> </Role> <Role name="CompanyHub.Worker"> <Instances count="2" /> <ConfigurationSettings> <Setting name="DiagnosticsConnectionString" value="UseDevelopmentStorage=true" /> </ConfigurationSettings> </Role> </ServiceConfiguration> Look for the Role tag that contains our project name (CompanyHub.Services). Having configured the parameters we can now transmit the request. This is done by ‘POST’ing a stream of XML to the Companies House servers. govTalkUrl = RoleEnvironment.GetConfigurationSettingValue("GovTalkUrl"); request = WebRequest.Create(govTalkUrl); request.Method = "POST"; request.ContentType = "text/xml"; writer = new StreamWriter(request.GetRequestStream()); writer.WriteLine(RequestMessage); writer.Close(); We use the WebRequest object to send the object. Set the method of sending to ‘POST’ and the type of data as text/xml. Once set up all we do is write the request to the writer – this sends the request to Companies House. Did the Request Work Part I – Getting the response Having sent a request – we now need the result of that request. response = request.GetResponse(); reader = response.GetResponseStream(); result = Toolbox.ConvertByteArrayToString(Toolbox.ReadFully(reader)); The WebRequest object has a GetResponse() method which allows us to get the response sent back. Like many of these calls the results come in the form of a stream which we convert into a string. Did the Request Work Part II – Translating the Response Much like XSLT and XML were used to create the original request, so it can be used to extract the response and by deserializing the result we create an object that contains the response. Did it work? It would be really great if everything worked all the time. Of course if it did then I don’t suppose people would pay me and others the big bucks so that our programmes do not a) Collapse in a heap (this is an area of memory) b) Blow every fuse in the place in a shower of sparks (this will probably not happen this being real life and not a Hollywood movie, but it was possible to blow the sound system of a BBC Model B with a poorly coded setting) c) Go nuts and trap everyone outside the airlock (this was from a movie, and unless NASA get a manned moon/mars mission set up unlikely to happen) d) Go nuts and take over the world (this was also from a movie, but please note life has a habit of being of exceeding the wildest imaginations of Hollywood writers (note writers – Hollywood executives have no imagination and judging by recent output of that town have turned plagiarism into an art form). e) Freeze in total confusion because the cleaner pulled the plug to the internet router (this has happened) So anyway – we need to check to see if our request actually worked. Within the GovTalk response there is a section that details the status of the message and a description of what went wrong (if anything did). I have defined an XSLT template which will extract these into an XML document. <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ev="http://www.govtalk.gov.uk/CM/envelope" xmlns:gt="http://www.govtalk.gov.uk/schemas/govtalk/core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <xsl:template match="/"> <GovTalkStatus xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <Status> <xsl:value-of select="ev:GovTalkMessage/ev:Header/ev:MessageDetails/ev:Qualifier"/> </Status> <Text> <xsl:value-of select="ev:GovTalkMessage/ev:GovTalkDetails/ev:GovTalkErrors/ev:Error/ev:Text"/> </Text> <Location> <xsl:value-of select="ev:GovTalkMessage/ev:GovTalkDetails/ev:GovTalkErrors/ev:Error/ev:Location"/> </Location> <Number> <xsl:value-of select="ev:GovTalkMessage/ev:GovTalkDetails/ev:GovTalkErrors/ev:Error/ev:Number"/> </Number> <Type> <xsl:value-of select="ev:GovTalkMessage/ev:GovTalkDetails/ev:GovTalkErrors/ev:Error/ev:Type"/> </Type> </GovTalkStatus> </xsl:template> </xsl:stylesheet> Only thing different about previous XSL files is the references to two namespaces ev & gt. These are defined in the GovTalk response at the top of the response; xsi:schemaLocation="http://www.govtalk.gov.uk/CM/envelope http://xmlgw.companieshouse.gov.uk/v1-0/schema/Egov_ch-v2-0.xsd" xmlns="http://www.govtalk.gov.uk/CM/envelope" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:gt="http://www.govtalk.gov.uk/schemas/govtalk/core" If we do not put these references into the XSLT template then the XslCompiledTransform object will not be able to find the relevant tags. Deserialization is a fairly simple activity. encoder = new ASCIIEncoding(); ms = new MemoryStream(encoder.GetBytes(statusXML)); serializer = new XmlSerializer(typeof(GovTalkStatus)); xmlTextWriter = new XmlTextWriter(ms, Encoding.ASCII); messageStatus = (GovTalkStatus)serializer.Deserialize(ms); We set up a serialization object using the object type containing the error state and pass to it the results of a transformation between the XSLT above and the GovTalk response. Now we have an object containing any error state, and the error message. All we need to do is check the status. If there is an error then we can flag an error. If not then we extract the results and pass that as an object back to the calling function. We go this by guess what – defining an XSLT template for the result and using that to create an Xml Stream which can be deserialized into a .Net object. In this instance the XSLT to create the result of a Company Number Search is; <?xml version="1.0" encoding="us-ascii"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ev="http://www.govtalk.gov.uk/CM/envelope" xmlns:sch="http://xmlgw.companieshouse.gov.uk/v1-0/schema" exclude-result-prefixes="ev"> <xsl:template match="/"> <CompanySearchResult xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <CompanyNumber> <xsl:value-of select="ev:GovTalkMessage/ev:Body/sch:NumberSearch/sch:CoSearchItem/sch:CompanyNumber"/> </CompanyNumber> <CompanyName> <xsl:value-of select="ev:GovTalkMessage/ev:Body/sch:NumberSearch/sch:CoSearchItem/sch:CompanyName"/> </CompanyName> </CompanySearchResult> </xsl:template> </xsl:stylesheet> and the object definition is; using System; using System.Collections.Generic; using System.Linq; using System.Web; namespace CompanyHub.Services { public class CompanySearchResult { public CompanySearchResult() { CompanyNumber = String.Empty; CompanyName = String.Empty; } public String CompanyNumber { get; set; } public String CompanyName { get; set; } } } Our entire code to make calls to send a request, and interpret the results are; String request = String.Empty; String response = String.Empty; GovTalkStatus status = null; fault = null; try { using (CompanyNumberSearchRequest requestObj = new CompanyNumberSearchRequest()) { requestObj.PartialCompanyNumber = CompanyNumber; request = Toolbox.CreateRequest(requestObj, "CompanyNumberSearch.xsl"); response = Toolbox.SendGovTalkRequest(request); status = Toolbox.GetMessageStatus(response); if (status.Status.ToLower() == "error") { fault = new HubFault() { Message = status.Text }; } else { Object obj = Toolbox.GetGovTalkResponse(response, "CompanyNumberSearchResult.xsl", typeof(CompanySearchResult)); } } } catch (FaultException<ArgumentException> ex) { fault = new HubFault() { FaultType = ex.Detail.GetType().FullName, Message = ex.Detail.Message }; } catch (System.Exception ex) { fault = new HubFault() { FaultType = ex.GetType().FullName, Message = ex.Message }; } finally { } Wrap up So there we have it – a reusable set of functions to send and interpret XML results from an internet based service. The code is reusable with a little change with any service which uses XML as a transport mechanism – and as for the Companies House GovTalk service all I need to do is create various objects for the result and message sent and the relevent XSLT files. I might need minor changes for other services but something like 70-90% will be exactly the same.

Read the article

Need help unformatting text

- by Axilus

I am currently programming a Visual C# service to receive emails from various sources then I take certain info and organize it in a database using Regex to retrieve the deferent cell values (such as header, body, problem, cost, etc.etc.). My problem is that I am currently using a Hotmail account to email the service which the service then extracts data and writes it to a csv file; however this is all going fine an dandy except for the fact that the text is formated so when there is a "\n" or something of the sort, the program decides to not input the data that follows that into the cell. For instance, if I emailed this: Cost:$1000.00 Body: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed vulputate mattis dolor, a dapibus turpis lacinia mollis. Fusce in enim nulla, sit amet gravida dolor. Suspendisse at nisi velit, vel ornare odio. Integer metus justo, imperdiet et pellentesque in, facilisis dignissim metus. Suspendisse potenti. Vivamus purus nisl, hendrerit sit amet rutrum eu, euismod in felis. Maecenas blandit, metus ac eleifend vulputate, nibh ligula mollis mi, non malesuada nunc tellus ac risus. In at rutrum elit. Proin metus sem, ullamcorper ut rhoncus sed, semper nec tellus. Maecenas adipiscing nisl nec elit egestas vel bibendum justo vehicula. Aliquam erat volutpat. Nullam fermentum enim in magna consequat a lacinia felis iaculis. Ut odio justo, consectetur nec cursus eu, dignissim non sapien. Duis tincidunt fringilla aliquet. Vivamus elementum lobortis massa vel posuere. Aenean non congue odio. Aenean aliquam elit volutpat tortor tempor pharetra. Mauris non est eu orci ultricies lacinia. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Ut vitae orci lectus, sit amet convallis nunc. Vivamus feugiat ante at justo auctor at pretium ante congue. In hac habitasse platea dictumst. Sed at feugiat odio. The body cell would look as follows: <span class=3D"ecxecxApple-style-s= pan" style=3D"font-family:Arial=2C Helvetica=2C sans=3Bfont-size:11px"><p s= tyle=3D"text-align:justify=3Bfont-size:11px=3Bline-height:14px=3Bmargin-rig= ht:0px=3Bmargin-bottom:14px=3Bmargin-left:0px=3Bpadding-top:0px=3Bpadding-r= ight:0px=3Bpadding-bottom:0px=3Bpadding-left:0px">Lorem ipsum dolor sit ame= t=2C consectetur adipiscing elit. Praesent in augue nec justo tempor varius= eu et tellus. Nunc id massa tortor=2C ut lobortis sem. Class aptent taciti= sociosqu ad litora torquent per conubia nostra=2C per inceptos himenaeos. = Maecenas quis nisl nec quam tristique posuere sed at nibh. Cras fringilla v= estibulum metus vel porttitor. Cras iaculis=2C erat nec gravida accumsan=2C= metus felis vestibulum risus=2C quis venenatis nisl nulla sed diam. Aenean= quis viverra velit. Etiam quis massa lectus=2C faucibus facilisis sem. Cur= abitur non eros tellus. Sed at ligula neque. Donec elementum rhoncus volutp= at. Curabitur eu accumsan erat. Phasellus auctor odio dolor=2C ut ornare au= gue. Suspendisse vel est nibh. Vivamus facilisis placerat augue sit amet al= iquam. Maecenas viverra=2C ipsum a tincidunt elementum=2C arcu tellus rutru= m ipsum=2C et dignissim urna orci ac mi. Vivamus non odio massa. Nulla cong= ue massa eu leo pretium non consequat urna molestie.</p><p style=3D"text-al= ign:justify=3Bfont-size:11px=3Bline-height:14px=3Bmargin-right:0px=3Bmargin= -bottom:14px=3Bmargin-left:0px=3Bpadding-top:0px=3Bpadding-right:0px=3Bpadd= ing-bottom:0px=3Bpadding-left:0px">Integer neque odio=2C scelerisque at mol= estie quis=2C congue sed arcu. Praesent a arcu odio. Donec sollicitudin=2C = quam vel tincidunt lobortis=2C urna augue cursus lorem=2C in eleifend nunc = risus nec neque. Donec euismod mauris non nibh blandit sollicitudin. Vivamu= s sed tincidunt augue. Suspendisse iaculis massa ut tellus rutrum auctor. C= ras venenatis consequat urna in viverra. Ut blandit imperdiet dolor non sce= lerisque. Suspendisse potenti. Sed vitae lacus ac odio euismod tempus. Aene= an ut sem odio. Curabitur auctor purus a diam iaculis facilisis. Integer mo= lestie commodo mauris a imperdiet. Nunc aliquet tempus orci sit amet viverr= a.</p><p style=3D"text-align:justify=3Bfont-size:11px=3Bline-height:14px=3B= margin-right:0px=3Bmargin-bottom:14px=3Bmargin-left:0px=3Bpadding-top:0px= =3Bpadding-right:0px=3Bpadding-bottom:0px=3Bpadding-left:0px">Morbi ultrici= es fermentum magna=2C et ultricies urna convallis non. Aenean nibh felis=2C= faucibus et pellentesque ultrices=2C accumsan a ligula. Aliquam vulputate = nisi vitae mi pretium et pretium nulla aliquet. Nam egestas diam vel elit c= ommodo fermentum. Aenean venenatis bibendum tellus=2C eget scelerisque risu= s consequat ut. In porta interdum eleifend. Cras laoreet venenatis pulvinar= .. Praesent ultricies tristique lorem=2C quis interdum arcu scelerisque nec.= Quisque arcu tellus=2C consectetur vel mattis nec=2C feugiat ac quam. Prae= sent sit amet fermentum nulla. Nulla lobortis=2C odio vitae elementum aucto= r=2C libero turpis condimentum mi=2C sed aliquet felis sapien nec tortor. I= nteger vehicula=2C neque in egestas accumsan=2C felis metus sagittis nulla= =2C eu dapibus ligula ipsum ut sapien. Nulla quis urna tortor=2C sed facili= sis leo. In at metus sed velit venenatis varius. Fusce aliquam mattis enim= =2C vitae tincidunt sem cursus in.</p><p style=3D"text-align:justify=3Bfont= -size:11px=3Bline-height:14px=3Bmargin-right:0px=3Bmargin-bottom:14px=3Bmar= gin-left:0px=3Bpadding-top:0px=3Bpadding-right:0px=3Bpadding-bottom:0px=3Bp= adding-left:0px">Proin tincidunt ligula at ligula bibendum vitae condimentu= m nunc congue. Curabitur ac magna nibh=2C vel accumsan nisl. Duis nec eros = et purus vestibulum tincidunt at sit amet libero. Donec eu nibh eros. Pelle= ntesque habitant morbi tristique senectus et netus et malesuada fames ac tu= rpis egestas. Donec accumsan=2C tellus at luctus faucibus=2C est nibh sempe= r diam=2C vitae adipiscing lorem tellus vel nulla. Donec eget ipsum ut lore= m tristique ultricies. Aliquam sem diam=2C semper sit amet volutpat pretium= =2C lobortis et eros. Sed vel iaculis metus. Phasellus malesuada elementum = porta.</p><p style=3D"text-align:justify=3Bfont-size:11px=3Bline-height:14p= x=3Bmargin-right:0px=3Bmargin-bottom:14px=3Bmargin-left:0px=3Bpadding-top:0= px=3Bpadding-right:0px=3Bpadding-bottom:0px=3Bpadding-left:0px">Fusce tinci= dunt dignissim massa quis dapibus. Sed aliquet consequat orci=2C eu cursus = libero dapibus vitae. Pellentesque at felis felis=2C vitae condimentum libe= ro. Vivamus eros erat=2C elementum et tristique vitae=2C mattis et neque. P= raesent bibendum leo ac tortor congue id mollis libero ornare. Pellentesque= adipiscing accumsan mi=2C a bibendum purus dignissim id. Cum sociis natoqu= e penatibus et magnis dis parturient montes=2C nascetur ridiculus mus. Morb= i mollis nisi in nibh cursus facilisis. Ut eu quam dolor=2C sit amet congue= orci. Aliquam quam dolor=2C viverra vitae varius sed=2C molestie et quam. = Suspendisse purus mauris=2C fermentum condimentum pharetra at=2C molestie a= nunc. Nam rhoncus euismod venenatis. Nam pellentesque quam ac ipsum volutp= at a eleifend odio imperdiet. Class aptent taciti sociosqu ad litora torque= nt per conubia nostra=2C per inceptos himenaeos. Nulla in nunc magna. Lorem= ipsum dolor sit amet=2C consectetur adipiscing elit. Donec pretium tincidu= nt gravida.</p></span> As you can tell I need a way to get rid of all that html junk and make it readable again. Is there anyway to do this with Regex? Or an easier way if possible. Cheers

Read the article

JAVA image transfer problem

- by user579098

Hi, I have a school assignment, to send a jpg image,split it into groups of 100 bytes, corrupt it, use a CRC check to locate the errors and re-transmit until it eventually is built back into its original form. It's practically ready, however when I check out the new images, they appear with errors.. I would really appreciate if someone could look at my code below and maybe locate this logical mistake as I can't understand what the problem is because everything looks ok :S For the file with all the data needed including photos and error patterns one could download it from this link:http://rapidshare.com/#!download|932tl2|443122762|Data.zip|739 Thanks in advance, Stefan p.s dont forget to change the paths in the code for the image and error files package networks; import java.io.*; // for file reader import java.util.zip.CRC32; // CRC32 IEEE (Ethernet) public class Main { /** * Reads a whole file into an array of bytes. * @param file The file in question. * @return Array of bytes containing file data. * @throws IOException Message contains why it failed. */ public static byte[] readFileArray(File file) throws IOException { InputStream is = new FileInputStream(file); byte[] data=new byte[(int)file.length()]; is.read(data); is.close(); return data; } /** * Writes (or overwrites if exists) a file with data from an array of bytes. * @param file The file in question. * @param data Array of bytes containing the new file data. * @throws IOException Message contains why it failed. */ public static void writeFileArray(File file, byte[] data) throws IOException { OutputStream os = new FileOutputStream(file,false); os.write(data); os.close(); } /** * Converts a long value to an array of bytes. * @param data The target variable. * @return Byte array conversion of data. * @see http://www.daniweb.com/code/snippet216874.html */ public static byte[] toByta(long data) { return new byte[] { (byte)((data >> 56) & 0xff), (byte)((data >> 48) & 0xff), (byte)((data >> 40) & 0xff), (byte)((data >> 32) & 0xff), (byte)((data >> 24) & 0xff), (byte)((data >> 16) & 0xff), (byte)((data >> 8) & 0xff), (byte)((data >> 0) & 0xff), }; } /** * Converts a an array of bytes to long value. * @param data The target variable. * @return Long value conversion of data. * @see http://www.daniweb.com/code/snippet216874.html */ public static long toLong(byte[] data) { if (data == null || data.length != 8) return 0x0; return (long)( // (Below) convert to longs before shift because digits // are lost with ints beyond the 32-bit limit (long)(0xff & data[0]) << 56 | (long)(0xff & data[1]) << 48 | (long)(0xff & data[2]) << 40 | (long)(0xff & data[3]) << 32 | (long)(0xff & data[4]) << 24 | (long)(0xff & data[5]) << 16 | (long)(0xff & data[6]) << 8 | (long)(0xff & data[7]) << 0 ); } public static byte[] nextNoise(){ byte[] result=new byte[100]; // copy a frame's worth of data (or remaining data if it is less than frame length) int read=Math.min(err_data.length-err_pstn, 100); System.arraycopy(err_data, err_pstn, result, 0, read); // if read data is less than frame length, reset position and add remaining data if(read<100){ err_pstn=100-read; System.arraycopy(err_data, 0, result, read, err_pstn); }else // otherwise, increase position err_pstn+=100; // return noise segment return result; } /** * Given some original data, it is purposefully corrupted according to a * second data array (which is read from a file). In pseudocode: * corrupt = original xor corruptor * @param data The original data. * @return The new (corrupted) data. */ public static byte[] corruptData(byte[] data){ // get the next noise sequence byte[] noise = nextNoise(); // finally, xor data with noise and return result for(int i=0; i<100; i++)data[i]^=noise[i]; return data; } /** * Given an array of data, a packet is created. In pseudocode: * frame = corrupt(data) + crc(data) * @param data The original frame data. * @return The resulting frame data. */ public static byte[] buildFrame(byte[] data){ // pack = [data]+crc32([data]) byte[] hash = new byte[8]; // calculate crc32 of data and copy it to byte array CRC32 crc = new CRC32(); crc.update(data); hash=toByta(crc.getValue()); // create a byte array holding the final packet byte[] pack = new byte[data.length+hash.length]; // create the corrupted data byte[] crpt = new byte[data.length]; crpt = corruptData(data); // copy corrupted data into pack System.arraycopy(crpt, 0, pack, 0, crpt.length); // copy hash into pack System.arraycopy(hash, 0, pack, data.length, hash.length); // return pack return pack; } /** * Verifies frame contents. * @param frame The frame data (data+crc32). * @return True if frame is valid, false otherwise. */ public static boolean verifyFrame(byte[] frame){ // allocate hash and data variables byte[] hash=new byte[8]; byte[] data=new byte[frame.length-hash.length]; // read frame into hash and data variables System.arraycopy(frame, frame.length-hash.length, hash, 0, hash.length); System.arraycopy(frame, 0, data, 0, frame.length-hash.length); // get crc32 of data CRC32 crc = new CRC32(); crc.update(data); // compare crc32 of data with crc32 of frame return crc.getValue()==toLong(hash); } /** * Transfers a file through a channel in frames and reconstructs it into a new file. * @param jpg_file File name of target file to transfer. * @param err_file The channel noise file used to simulate corruption. * @param out_file The name of the newly-created file. * @throws IOException */ public static void transferFile(String jpg_file, String err_file, String out_file) throws IOException { // read file data into global variables jpg_data = readFileArray(new File(jpg_file)); err_data = readFileArray(new File(err_file)); err_pstn = 0; // variable that will hold the final (transfered) data byte[] out_data = new byte[jpg_data.length]; // holds the current frame data byte[] frame_orig = new byte[100]; byte[] frame_sent = new byte[100]; // send file in chunks (frames) of 100 bytes for(int i=0; i<Math.ceil(jpg_data.length/100); i++){ // copy jpg data into frame and init first-time switch System.arraycopy(jpg_data, i*100, frame_orig, 0, 100); boolean not_first=false; System.out.print("Packet #"+i+": "); // repeat getting same frame until frame crc matches with frame content do { if(not_first)System.out.print("F"); frame_sent=buildFrame(frame_orig); not_first=true; }while(!verifyFrame(frame_sent)); // usually, you'd constrain this by time to prevent infinite loops (in // case the channel is so wacked up it doesn't get a single packet right) // copy frame to image file System.out.println("S"); System.arraycopy(frame_sent, 0, out_data, i*100, 100); } System.out.println("\nDone."); writeFileArray(new File(out_file),out_data); } // global variables for file data and pointer public static byte[] jpg_data; public static byte[] err_data; public static int err_pstn=0; public static void main(String[] args) throws IOException { // list of jpg files String[] jpg_file={ "C:\\Users\\Stefan\\Desktop\\Data\\Images\\photo1.jpg", "C:\\Users\\Stefan\\Desktop\\Data\\Images\\photo2.jpg", "C:\\Users\\Stefan\\Desktop\\Data\\Images\\photo3.jpg", "C:\\Users\\Stefan\\Desktop\\Data\\Images\\photo4.jpg" }; // list of error patterns String[] err_file={ "C:\\Users\\Stefan\\Desktop\\Data\\Error Pattern\\Error Pattern 1.DAT", "C:\\Users\\Stefan\\Desktop\\Data\\Error Pattern\\Error Pattern 2.DAT", "C:\\Users\\Stefan\\Desktop\\Data\\Error Pattern\\Error Pattern 3.DAT", "C:\\Users\\Stefan\\Desktop\\Data\\Error Pattern\\Error Pattern 4.DAT" }; // loop through all jpg/channel combinations and run tests for(int x=0; x<jpg_file.length; x++){ for(int y=0; y<err_file.length; y++){ System.out.println("Transfering photo"+(x+1)+".jpg using Pattern "+(y+1)+"..."); transferFile(jpg_file[x],err_file[y],jpg_file[x].replace("photo","CH#"+y+"_photo")); } } } }

Read the article

undefined reference to function, despite giving reference in c

- by Jamie Edwards

I'm following a tutorial, but when it comes to compiling and linking the code I get the following error: /tmp/cc8gRrVZ.o: In function `main': main.c:(.text+0xa): undefined reference to `monitor_clear' main.c:(.text+0x16): undefined reference to `monitor_write' collect2: ld returned 1 exit status make: *** [obj/main.o] Error 1 What that is telling me is that I haven't defined both 'monitor_clear' and 'monitor_write'. But I have, in both the header and source files. They are as follows: monitor.c: // monitor.c -- Defines functions for writing to the monitor. // heavily based on Bran's kernel development tutorials, // but rewritten for JamesM's kernel tutorials. #include "monitor.h" // The VGA framebuffer starts at 0xB8000. u16int *video_memory = (u16int *)0xB8000; // Stores the cursor position. u8int cursor_x = 0; u8int cursor_y = 0; // Updates the hardware cursor. static void move_cursor() { // The screen is 80 characters wide... u16int cursorLocation = cursor_y * 80 + cursor_x; outb(0x3D4, 14); // Tell the VGA board we are setting the high cursor byte. outb(0x3D5, cursorLocation >> 8); // Send the high cursor byte. outb(0x3D4, 15); // Tell the VGA board we are setting the low cursor byte. outb(0x3D5, cursorLocation); // Send the low cursor byte. } // Scrolls the text on the screen up by one line. static void scroll() { // Get a space character with the default colour attributes. u8int attributeByte = (0 /*black*/ << 4) | (15 /*white*/ & 0x0F); u16int blank = 0x20 /* space */ | (attributeByte << 8); // Row 25 is the end, this means we need to scroll up if(cursor_y >= 25) { // Move the current text chunk that makes up the screen // back in the buffer by a line int i; for (i = 0*80; i < 24*80; i++) { video_memory[i] = video_memory[i+80]; } // The last line should now be blank. Do this by writing // 80 spaces to it. for (i = 24*80; i < 25*80; i++) { video_memory[i] = blank; } // The cursor should now be on the last line. cursor_y = 24; } } // Writes a single character out to the screen. void monitor_put(char c) { // The background colour is black (0), the foreground is white (15). u8int backColour = 0; u8int foreColour = 15; // The attribute byte is made up of two nibbles - the lower being the // foreground colour, and the upper the background colour. u8int attributeByte = (backColour << 4) | (foreColour & 0x0F); // The attribute byte is the top 8 bits of the word we have to send to the // VGA board. u16int attribute = attributeByte << 8; u16int *location; // Handle a backspace, by moving the cursor back one space if (c == 0x08 && cursor_x) { cursor_x--; } // Handle a tab by increasing the cursor's X, but only to a point // where it is divisible by 8. else if (c == 0x09) { cursor_x = (cursor_x+8) & ~(8-1); } // Handle carriage return else if (c == '\r') { cursor_x = 0; } // Handle newline by moving cursor back to left and increasing the row else if (c == '\n') { cursor_x = 0; cursor_y++; } // Handle any other printable character. else if(c >= ' ') { location = video_memory + (cursor_y*80 + cursor_x); *location = c | attribute; cursor_x++; } // Check if we need to insert a new line because we have reached the end // of the screen. if (cursor_x >= 80) { cursor_x = 0; cursor_y ++; } // Scroll the screen if needed. scroll(); // Move the hardware cursor. move_cursor(); } // Clears the screen, by copying lots of spaces to the framebuffer. void monitor_clear() { // Make an attribute byte for the default colours u8int attributeByte = (0 /*black*/ << 4) | (15 /*white*/ & 0x0F); u16int blank = 0x20 /* space */ | (attributeByte << 8); int i; for (i = 0; i < 80*25; i++) { video_memory[i] = blank; } // Move the hardware cursor back to the start. cursor_x = 0; cursor_y = 0; move_cursor(); } // Outputs a null-terminated ASCII string to the monitor. void monitor_write(char *c) { int i = 0; while (c[i]) { monitor_put(c[i++]); } } void monitor_write_hex(u32int n) { s32int tmp; monitor_write("0x"); char noZeroes = 1; int i; for (i = 28; i > 0; i -= 4) { tmp = (n >> i) & 0xF; if (tmp == 0 && noZeroes != 0) { continue; } if (tmp >= 0xA) { noZeroes = 0; monitor_put (tmp-0xA+'a' ); } else { noZeroes = 0; monitor_put( tmp+'0' ); } } tmp = n & 0xF; if (tmp >= 0xA) { monitor_put (tmp-0xA+'a'); } else { monitor_put (tmp+'0'); } } void monitor_write_dec(u32int n) { if (n == 0) { monitor_put('0'); return; } s32int acc = n; char c[32]; int i = 0; while (acc > 0) { c[i] = '0' + acc%10; acc /= 10; i++; } c[i] = 0; char c2[32]; c2[i--] = 0; int j = 0; while(i >= 0) { c2[i--] = c[j++]; } monitor_write(c2); } monitor.h: // monitor.h -- Defines the interface for monitor.h // From JamesM's kernel development tutorials. #ifndef MONITOR_H #define MONITOR_H #include "common.h" // Write a single character out to the screen. void monitor_put(char c); // Clear the screen to all black. void monitor_clear(); // Output a null-terminated ASCII string to the monitor. void monitor_write(char *c); #endif // MONITOR_H common.c: // common.c -- Defines some global functions. // From JamesM's kernel development tutorials. #include "common.h" // Write a byte out to the specified port. void outb ( u16int port, u8int value ) { asm volatile ( "outb %1, %0" : : "dN" ( port ), "a" ( value ) ); } u8int inb ( u16int port ) { u8int ret; asm volatile ( "inb %1, %0" : "=a" ( ret ) : "dN" ( port ) ); return ret; } u16int inw ( u16int port ) { u16int ret; asm volatile ( "inw %1, %0" : "=a" ( ret ) : "dN" ( port ) ); return ret; } // Copy len bytes from src to dest. void memcpy(u8int *dest, const u8int *src, u32int len) { const u8int *sp = ( const u8int * ) src; u8int *dp = ( u8int * ) dest; for ( ; len != 0; len-- ) *dp++ =*sp++; } // Write len copies of val into dest. void memset(u8int *dest, u8int val, u32int len) { u8int *temp = ( u8int * ) dest; for ( ; len != 0; len-- ) *temp++ = val; } // Compare two strings. Should return -1 if // str1 < str2, 0 if they are equal or 1 otherwise. int strcmp(char *str1, char *str2) { int i = 0; int failed = 0; while ( str1[i] != '\0' && str2[i] != '\0' ) { if ( str1[i] != str2[i] ) { failed = 1; break; } i++; } // Why did the loop exit? if ( ( str1[i] == '\0' && str2[i] != '\0' || (str1[i] != '\0' && str2[i] =='\0' ) ) failed =1; return failed; } // Copy the NULL-terminated string src into dest, and // return dest. char *strcpy(char *dest, const char *src) { do { *dest++ = *src++; } while ( *src != 0 ); } // Concatenate the NULL-terminated string src onto // the end of dest, and return dest. char *strcat(char *dest, const char *src) { while ( *dest != 0 ) { *dest = *dest++; } do { *dest++ = *src++; } while ( *src != 0 ); return dest; } common.h: // common.h -- Defines typedefs and some global functions. // From JamesM's kernel development tutorials. #ifndef COMMON_H #define COMMON_H // Some nice typedefs, to standardise sizes across platforms. // These typedefs are written for 32-bit x86. typedef unsigned int u32int; typedef int s32int; typedef unsigned short u16int; typedef short s16int; typedef unsigned char u8int; typedef char s8int; void outb ( u16int port, u8int value ); u8int inb ( u16int port ); u16int inw ( u16int port ); #endif //COMMON_H main.c: // main.c -- Defines the C-code kernel entry point, calls initialisation routines. // Made for JamesM's tutorials <www.jamesmolloy.co.uk> #include "monitor.h" int main(struct multiboot *mboot_ptr) { monitor_clear(); monitor_write ( "hello, world!" ); return 0; } here is my makefile: C_SOURCES= main.c monitor.c common.c S_SOURCES= boot.s C_OBJECTS=$(patsubst %.c, obj/%.o, $(C_SOURCES)) S_OBJECTS=$(patsubst %.s, obj/%.o, $(S_SOURCES)) CFLAGS=-nostdlib -nostdinc -fno-builtin -fno-stack-protector -m32 -Iheaders LDFLAGS=-Tlink.ld -melf_i386 --oformat=elf32-i386 ASFLAGS=-felf all: kern/kernel .PHONY: clean clean: -rm -f kern/kernel kern/kernel: $(S_OBJECTS) $(C_OBJECTS) ld $(LDFLAGS) -o $@ $^ $(C_OBJECTS): obj/%.o : %.c gcc $(CFLAGS) $< -o $@ vpath %.c source $(S_OBJECTS): obj/%.o : %.s nasm $(ASFLAGS) $< -o $@ vpath %.s asem Hopefully this will help you understand what is going wrong and how to fix it :L Thanks in advance. Jamie.

Read the article

help me improve my sse yuv to rgb ssse3 code

- by David McPaul

Hello, I am looking to optimise some sse code I wrote for converting yuv to rgb (both planar and packed yuv functions). i am using SSSE3 at the moment but if there are useful functions from later sse versions thats ok. I am mainly interested in how I would work out processor stalls and the like. Anyone know of any tools that do static analysis of sse code? ; ; Copyright (C) 2009-2010 David McPaul ; ; All rights reserved. Distributed under the terms of the MIT License. ; ; A rather unoptimised set of ssse3 yuv to rgb converters ; does 8 pixels per loop ; inputer: ; reads 128 bits of yuv 8 bit data and puts ; the y values converted to 16 bit in xmm0 ; the u values converted to 16 bit and duplicated into xmm1 ; the v values converted to 16 bit and duplicated into xmm2 ; conversion: ; does the yuv to rgb conversion using 16 bit integer and the ; results are placed into the following registers as 8 bit clamped values ; r values in xmm3 ; g values in xmm4 ; b values in xmm5 ; outputer: ; writes out the rgba pixels as 8 bit values with 0 for alpha ; xmm6 used for scratch ; xmm7 used for scratch %macro cglobal 1 global _%1 %define %1 _%1 align 16 %1: %endmacro ; conversion code %macro yuv2rgbsse2 0 ; u = u - 128 ; v = v - 128 ; r = y + v + v >> 2 + v >> 3 + v >> 5 ; g = y - (u >> 2 + u >> 4 + u >> 5) - (v >> 1 + v >> 3 + v >> 4 + v >> 5) ; b = y + u + u >> 1 + u >> 2 + u >> 6 ; subtract 16 from y movdqa xmm7, [Const16] ; loads a constant using data cache (slower on first fetch but then cached) psubsw xmm0,xmm7 ; y = y - 16 ; subtract 128 from u and v movdqa xmm7, [Const128] ; loads a constant using data cache (slower on first fetch but then cached) psubsw xmm1,xmm7 ; u = u - 128 psubsw xmm2,xmm7 ; v = v - 128 ; load r,b with y movdqa xmm3,xmm0 ; r = y pshufd xmm5,xmm0, 0xE4 ; b = y ; r = y + v + v >> 2 + v >> 3 + v >> 5 paddsw xmm3, xmm2 ; add v to r movdqa xmm7, xmm1 ; move u to scratch pshufd xmm6, xmm2, 0xE4 ; move v to scratch psraw xmm6,2 ; divide v by 4 paddsw xmm3, xmm6 ; and add to r psraw xmm6,1 ; divide v by 2 paddsw xmm3, xmm6 ; and add to r psraw xmm6,2 ; divide v by 4 paddsw xmm3, xmm6 ; and add to r ; b = y + u + u >> 1 + u >> 2 + u >> 6 paddsw xmm5, xmm1 ; add u to b psraw xmm7,1 ; divide u by 2 paddsw xmm5, xmm7 ; and add to b psraw xmm7,1 ; divide u by 2 paddsw xmm5, xmm7 ; and add to b psraw xmm7,4 ; divide u by 32 paddsw xmm5, xmm7 ; and add to b ; g = y - u >> 2 - u >> 4 - u >> 5 - v >> 1 - v >> 3 - v >> 4 - v >> 5 movdqa xmm7,xmm2 ; move v to scratch pshufd xmm6,xmm1, 0xE4 ; move u to scratch movdqa xmm4,xmm0 ; g = y psraw xmm6,2 ; divide u by 4 psubsw xmm4,xmm6 ; subtract from g psraw xmm6,2 ; divide u by 4 psubsw xmm4,xmm6 ; subtract from g psraw xmm6,1 ; divide u by 2 psubsw xmm4,xmm6 ; subtract from g psraw xmm7,1 ; divide v by 2 psubsw xmm4,xmm7 ; subtract from g psraw xmm7,2 ; divide v by 4 psubsw xmm4,xmm7 ; subtract from g psraw xmm7,1 ; divide v by 2 psubsw xmm4,xmm7 ; subtract from g psraw xmm7,1 ; divide v by 2 psubsw xmm4,xmm7 ; subtract from g %endmacro ; outputer %macro rgba32sse2output 0 ; clamp values pxor xmm7,xmm7 packuswb xmm3,xmm7 ; clamp to 0,255 and pack R to 8 bit per pixel packuswb xmm4,xmm7 ; clamp to 0,255 and pack G to 8 bit per pixel packuswb xmm5,xmm7 ; clamp to 0,255 and pack B to 8 bit per pixel ; convert to bgra32 packed punpcklbw xmm5,xmm4 ; bgbgbgbgbgbgbgbg movdqa xmm0, xmm5 ; save bg values punpcklbw xmm3,xmm7 ; r0r0r0r0r0r0r0r0 punpcklwd xmm5,xmm3 ; lower half bgr0bgr0bgr0bgr0 punpckhwd xmm0,xmm3 ; upper half bgr0bgr0bgr0bgr0 ; write to output ptr movntdq [edi], xmm5 ; output first 4 pixels bypassing cache movntdq [edi+16], xmm0 ; output second 4 pixels bypassing cache %endmacro SECTION .data align=16 Const16 dw 16 dw 16 dw 16 dw 16 dw 16 dw 16 dw 16 dw 16 Const128 dw 128 dw 128 dw 128 dw 128 dw 128 dw 128 dw 128 dw 128 UMask db 0x01 db 0x80 db 0x01 db 0x80 db 0x05 db 0x80 db 0x05 db 0x80 db 0x09 db 0x80 db 0x09 db 0x80 db 0x0d db 0x80 db 0x0d db 0x80 VMask db 0x03 db 0x80 db 0x03 db 0x80 db 0x07 db 0x80 db 0x07 db 0x80 db 0x0b db 0x80 db 0x0b db 0x80 db 0x0f db 0x80 db 0x0f db 0x80 YMask db 0x00 db 0x80 db 0x02 db 0x80 db 0x04 db 0x80 db 0x06 db 0x80 db 0x08 db 0x80 db 0x0a db 0x80 db 0x0c db 0x80 db 0x0e db 0x80 ; void Convert_YUV422_RGBA32_SSSE3(void *fromPtr, void *toPtr, int width) width equ ebp+16 toPtr equ ebp+12 fromPtr equ ebp+8 ; void Convert_YUV420P_RGBA32_SSSE3(void *fromYPtr, void *fromUPtr, void *fromVPtr, void *toPtr, int width) width1 equ ebp+24 toPtr1 equ ebp+20 fromVPtr equ ebp+16 fromUPtr equ ebp+12 fromYPtr equ ebp+8 SECTION .text align=16 cglobal Convert_YUV422_RGBA32_SSSE3 ; reserve variables push ebp mov ebp, esp push edi push esi push ecx mov esi, [fromPtr] mov edi, [toPtr] mov ecx, [width] ; loop width / 8 times shr ecx,3 test ecx,ecx jng ENDLOOP REPEATLOOP: ; loop over width / 8 ; YUV422 packed inputer movdqa xmm0, [esi] ; should have yuyv yuyv yuyv yuyv pshufd xmm1, xmm0, 0xE4 ; copy to xmm1 movdqa xmm2, xmm0 ; copy to xmm2 ; extract both y giving y0y0 pshufb xmm0, [YMask] ; extract u and duplicate so each u in yuyv becomes u0u0 pshufb xmm1, [UMask] ; extract v and duplicate so each v in yuyv becomes v0v0 pshufb xmm2, [VMask] yuv2rgbsse2 rgba32sse2output ; endloop add edi,32 add esi,16 sub ecx, 1 ; apparently sub is better than dec jnz REPEATLOOP ENDLOOP: ; Cleanup pop ecx pop esi pop edi mov esp, ebp pop ebp ret cglobal Convert_YUV420P_RGBA32_SSSE3 ; reserve variables push ebp mov ebp, esp push edi push esi push ecx push eax push ebx mov esi, [fromYPtr] mov eax, [fromUPtr] mov ebx, [fromVPtr] mov edi, [toPtr1] mov ecx, [width1] ; loop width / 8 times shr ecx,3 test ecx,ecx jng ENDLOOP1 REPEATLOOP1: ; loop over width / 8 ; YUV420 Planar inputer movq xmm0, [esi] ; fetch 8 y values (8 bit) yyyyyyyy00000000 movd xmm1, [eax] ; fetch 4 u values (8 bit) uuuu000000000000 movd xmm2, [ebx] ; fetch 4 v values (8 bit) vvvv000000000000 ; extract y pxor xmm7,xmm7 ; 00000000000000000000000000000000 punpcklbw xmm0,xmm7 ; interleave xmm7 into xmm0 y0y0y0y0y0y0y0y0 ; extract u and duplicate so each becomes 0u0u punpcklbw xmm1,xmm7 ; interleave xmm7 into xmm1 u0u0u0u000000000 punpcklwd xmm1,xmm7 ; interleave again u000u000u000u000 pshuflw xmm1,xmm1, 0xA0 ; copy u values pshufhw xmm1,xmm1, 0xA0 ; to get u0u0 ; extract v punpcklbw xmm2,xmm7 ; interleave xmm7 into xmm1 v0v0v0v000000000 punpcklwd xmm2,xmm7 ; interleave again v000v000v000v000 pshuflw xmm2,xmm2, 0xA0 ; copy v values pshufhw xmm2,xmm2, 0xA0 ; to get v0v0 yuv2rgbsse2 rgba32sse2output ; endloop add edi,32 add esi,8 add eax,4 add ebx,4 sub ecx, 1 ; apparently sub is better than dec jnz REPEATLOOP1 ENDLOOP1: ; Cleanup pop ebx pop eax pop ecx pop esi pop edi mov esp, ebp pop ebp ret SECTION .note.GNU-stack noalloc noexec nowrite progbits

Read the article

?Exadata??????DBFS

- by Liu Maclean(???)

?Exadata???DBFS ??????? 1. ??fuse RPM [root@dm01db01 ~]# yum install fuse Loaded plugins: rhnplugin, security This system is not registered with ULN. ULN support will be disabled. Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package fuse.x86_64 0:2.7.4-8.0.1.el5 set to be updated --> Finished Dependency Resolution Dependencies Resolved ======================================================================================================================================================================== Package Arch Version Repository Size ======================================================================================================================================================================== Installing: fuse x86_64 2.7.4-8.0.1.el5 el5_latest 85 k Transaction Summary ======================================================================================================================================================================== Install 1 Package(s) Upgrade 0 Package(s) Total download size: 85 k Is this ok [y/N]: y Downloading Packages: fuse-2.7.4-8.0.1.el5.x86_64.rpm | 85 kB 00:00 Running rpm_check_debug Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing : fuse 1/1 Installed: fuse.x86_64 0:2.7.4-8.0.1.el5 [root@dm01db01 ~]# yum install fuse-libs Loaded plugins: rhnplugin, security This system is not registered with ULN. ULN support will be disabled. Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package fuse-libs.i386 0:2.7.4-8.0.1.el5 set to be updated ---> Package fuse-libs.x86_64 0:2.7.4-8.0.1.el5 set to be updated --> Finished Dependency Resolution Dependencies Resolved ======================================================================================================================================================================== Package Arch Version Repository Size ======================================================================================================================================================================== Installing: fuse-libs i386 2.7.4-8.0.1.el5 el5_latest 71 k fuse-libs x86_64 2.7.4-8.0.1.el5 el5_latest 70 k Transaction Summary ======================================================================================================================================================================== Install 2 Package(s) Upgrade 0 Package(s) Total download size: 141 k Is this ok [y/N]: y Downloading Packages: (1/2): fuse-libs-2.7.4-8.0.1.el5.x86_64.rpm | 70 kB 00:00 (2/2): fuse-libs-2.7.4-8.0.1.el5.i386.rpm | 71 kB 00:00 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Total 71 kB/s | 141 kB 00:01 Running rpm_check_debug Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing : fuse-libs 1/2 Installing : fuse-libs 2/2 Installed: fuse-libs.i386 0:2.7.4-8.0.1.el5 fuse-libs.x86_64 0:2.7.4-8.0.1.el5 Complete! [root@dm01db01 ~]# yum install fuse-devel Loaded plugins: rhnplugin, security This system is not registered with ULN. ULN support will be disabled. Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package fuse-devel.i386 0:2.7.4-8.0.1.el5 set to be updated ---> Package fuse-devel.x86_64 0:2.7.4-8.0.1.el5 set to be updated --> Finished Dependency Resolution Dependencies Resolved ======================================================================================================================================================================== Package Arch Version Repository Size ======================================================================================================================================================================== Installing: fuse-devel i386 2.7.4-8.0.1.el5 el5_latest 28 k fuse-devel x86_64 2.7.4-8.0.1.el5 el5_latest 28 k Transaction Summary ======================================================================================================================================================================== Install 2 Package(s) Upgrade 0 Package(s) Total download size: 57 k Is this ok [y/N]: y Downloading Packages: (1/2): fuse-devel-2.7.4-8.0.1.el5.x86_64.rpm | 28 kB 00:00 (2/2): fuse-devel-2.7.4-8.0.1.el5.i386.rpm | 28 kB 00:00 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Total 21 kB/s | 57 kB 00:02 Running rpm_check_debug Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing : fuse-devel 1/2 Installing : fuse-devel 2/2 Installed: fuse-devel.i386 0:2.7.4-8.0.1.el5 fuse-devel.x86_64 0:2.7.4-8.0.1.el5 Complete! 2. ?? DBFS??? ?????? cd $ORACLE_HOME/rdbms/admin sqlplus / as sysdba Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options SQL> @prvtfspi.plb Package body created. No errors. Package body created. No errors. ?????dbms_dbfs_sfs package SQL> create tablespace dbfstbs datafile size 20g; Tablespace created. SQL> create user maclean_dbfs identified by oracle; User created. SQL> grant dba to maclean_dbfs; Grant succeeded. @@!!! SQL> grant dbfs_role to maclean_dbfs; Grant succeeded. 3. ??DBFS SQL> conn maclean_dbfs/oracle Connected. SQL> @?/rdbms/admin/dbfs_create_filesystem.sql dbfstbs mac_dbfs No errors. -------- CREATE STORE: begin dbms_dbfs_sfs.createFilesystem(store_name => 'FS_MAC_DBFS', tbl_name => 'T_MAC_DBFS', tbl_tbs => 'dbfstbs', lob_tbs => 'dbfstbs', do_partition => false, partition_key => 1, do_compress => false, compression => '', do_dedup => false, do_encrypt => false); end; -------- REGISTER STORE: begin dbms_dbfs_content.registerStore(store_name=> 'FS_MAC_DBFS', provider_name => 'sample1', provider_package => 'dbms_dbfs_sfs'); end; -------- MOUNT STORE: begin dbms_dbfs_content.mountStore(store_name=>'FS_MAC_DBFS', store_mount=>'mac_dbfs'); end; -------- CHMOD STORE: declare m integer; begin m := dbms_fuse.fs_chmod('/mac_dbfs', 16895); end; No errors. 4. ??mount point [root@dm01db01 ~]# mkdir /dbfs [root@dm01db01 ~]# chown oracle:oinstall /dbfs 5. ??library path ?OS # echo "/usr/local/lib" >> /etc/ld.so.conf.d/usr_local_lib.conf 6. ?????? export ORACLE_HOME=/s01/orabase/product/11.2.0/dbhome_1 [root@dm01db01 ~]# ln -s $ORACLE_HOME/lib/libclntsh.so.11.1 /usr/local/lib/libclntsh.so.11.1 [root@dm01db01 ~]# ln -s $ORACLE_HOME/lib/libnnz11.so /usr/local/lib/libnnz11.so [root@dm01db01 ~]# ln -s /lib64/libfuse.so.2 /usr/local/lib/libfuse.so.2 7. ??ldconfig [root@dm01db01 ~]# ldconfig [root@dm01db01 ~]# 8. ??fusermount??????? [root@dm01db01 ~]# chmod +x /usr/bin/fusermount [root@dm01db01 ~]# ls -l /usr/bin/fusermount lrwxrwxrwx 1 root root 15 Sep 7 03:06 /usr/bin/fusermount -> /bin/fusermount [root@dm01db01 ~]# ls -l /bin/fusermount -rwsr-x--x 1 root fuse 27072 Oct 17 2011 /bin/fusermount 9. ???????OS dbfs_client maclean_dbfs@dm01db01:1521/orcl /dbfs 10. ????nohup + &?????mount DBFS,???????????? [oracle@dm01db01 ~]$ echo "oracle" >> dbfs_pw [oracle@dm01db01 ~]$ nohup dbfs_client maclean_dbfs@dm01db01:1521/orcl /dbfs < dbfs_pw & [oracle@dm01db01 ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VGExaDb-LVDbSys1 30G 15G 14G 53% / /dev/sda1 502M 30M 447M 7% /boot /dev/mapper/VGExaDb-LVDbOra1 99G 20G 75G 21% /u01 tmpfs 81G 0 81G 0% /dev/shm dbfs-maclean_dbfs@orcl:/ 20G 120K 20G 1% /dbfs [oracle@dm01db01 ~]$ mount /dev/mapper/VGExaDb-LVDbSys1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/sda1 on /boot type ext3 (rw,nodev) /dev/mapper/VGExaDb-LVDbOra1 on /u01 type ext3 (rw,nodev) tmpfs on /dev/shm type tmpfs (rw,size=82052m) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) dbfs-maclean_dbfs@orcl:/ on /dbfs type fuse (rw,nosuid,nodev,max_read=1048576,default_permissions,user=oracle) [oracle@dm01db01 ~]$ ls -l /dbfs/ total 0 drwxrwxrwx 3 root root 0 Sep 14 05:11 mac_dbfs [oracle@nas ~]$ dbfs_client --------MOUNT mode: usage: dbfs_client <db_user>@<db_server> [options] <mountpoint> db_user: Name of Database user that owns DBFS content repository filesystem(s) db_server: A valid connect string for Oracle database server (for example, hrdb_host:1521/hrservice) mountpoint: Path to mount Database File System(s) All the file systems owned by the database user will be seen at the mountpoint. DBFS options: -o direct_io Bypass the Linux page cache. Gives much better performance for large files. Programs in the file system cannot be executed with this option. This option is recommended when DBFS is used as an ETL staging area. -o wallet Run dbfs_client in background. Wallet must be configured to get credentials. -o failover dbfs_client fails over to surviving database instance with no data loss. Some performance cost on writes, especially for small files. -o allow_root Allows root access to the filesystem. This option requires setting 'user_allow_other' parameter in '/etc/fuse.conf'. -o allow_other Allows other users access to the file system. This option requires setting 'user_allow_other' parameter in '/etc/fuse.conf'. -o rw Mount the filesystem read-write. [Default] -o ro Mount the filesystem read-only. Files cannot be modified. -o trace_file=STR Tracing <filename> | 'syslog' -o trace_level=N Trace Level: 1->DEBUG, 2->INFO, 3->WARNING, 4->ERROR, 5->CRITICAL [Default: 4] -h help -V version --------COMMAND mode: Usage: dbfs_client <db_user>@<db_server> --command command [switches] [arguments] command: Command to be executed, e.g., ls, cp, mkdir, rm switches: Switches are described below for each command. arguments: File names or directory names NOTE: All database pathnames must be absolute and preceded by dbfs:/ Commands ls dbfs_client <db_user>@<db_server> --command ls [switches] target Switches: -a Show all files including those starting with '.' -l Use a long listing format. In addition to the name of each file print the file type, permissions, size, user and group information -R List subdirectories recursively cp dbfs_client <db_user>@<db_server> --command cp [switches] source destination Switches: -r, -R Copy a directory and its contents recursively into the destination directory rm dbfs_client <db_user>@<db_server> --command rm [switches] target Switches: -r, -R Removes a directory and its contents recursively mkdir dbfs_client <db_user>@<db_server> --command mkdir directory_name Examples dbfs_client ETLUser@DBConnectString --command ls -l -a dbfs:/staging_area/directory1 dbfs_client ETLUser@DBConnectString --command cp -R /tmp/1-Jan-2009-dump dbfs:/staging_area dbfs_client ETLUser@DBConnectString --command rm dbfs:/staging_area/hello.txt dbfs_client ETLUser@DBConnectString --command mkdir dbfs:/staging_area/directory2 [oracle@dm01db01 ~]$ ls -lh /tmp/largefile -rw-r--r-- 1 oracle oinstall 2.0G Sep 14 08:50 /tmp/largefile [oracle@dm01db01 ~]$ time dbfs_client maclean_dbfs@dm01db01:1521/orcl --command cp /tmp/largefile dbfs:/mac_dbfs Password: /tmp/largefile -> dbfs:/mac_dbfs/largefile real 0m11.802s user 0m0.580s sys 0m2.375s ?Exadata?????2G?????? DBFS???11s => 200MB/s

Developer IT