Search Results

Search found 2155 results on 87 pages for 'mathematical expressions'.

Page 68/87 | < Previous Page | 64 65 66 67 68 69 70 71 72 73 74 75 | Next Page >

Verify client certificate CN in Tomcat(APR)

- by Petter

I'm running a tomcat installation with the APR libraries installed (with the OpenSSL HTTPS stack that comes with it). What I'm trying to do is to lock a specific HTTPS connector down to users of a specific certificate. Adding client certificate verification is no issue, but I can't get it to validate against a specific Common name only. I was perhaps a bit naïve and thought the mod_ssl attribute SSLRequire typically used in Apache Httpd would work, but that property is not recognized by the Tomcat implementation. (http://tomcat.apache.org/tomcat-7.0-doc/config/http.html#SSL%20Support points to some mod_ssl docs, but the Tomcat implementation does not seem to cover all aspects of mod_ssl). I can get this to work by using the Java version of the connector instead of APR (losing some performance) and just add a trust store with that one certificate in it. However, using openssl without the SSLRequire expressions, I'm not sure how to do this with Tomcat7 (on Windows if that matters). <Connector protocol="HTTP/1.1" port="443" maxThreads="150" scheme="https" secure="true" SSLEnabled="true" SSLCertificateFile="mycert.pem" SSLCertificateKeyFile="privkey.pem" SSLCACertificateFile="CABundle.pem" SSLVerifyClient="require" SSLProtocol="TLSv1" SSLRequire="(%{SSL_CLIENT_S_DN_CN} eq "host.example.com")"/> Can you suggest a way to make this work using Tomcat/APR/OpenSSL?

Read the article
How do I format this regex so it will work in fail2ban?

- by chapkom

I've just installed fail2ban on my CentOS server in response to an SSH brute force attempt. The default regular expressions in fail2ban's sshd.conf file do not match any entries in audit.log, which is where SSH seems to be logging all connection attempts, so I am trying to add an expression that will match. The string I am trying to match is as follows: type=USER_LOGIN msg=audit(1333630430.185:503332): user pid=30230 uid=0 auid=500 subj=user_u:system_r:unconfined_t:s0-s0:c0.c1023 msg='acct="root": exe="/usr /sbin/sshd" (hostname=?, addr=<HOST IP>, terminal=sshd res=failed)' The regular expression I am attempting to use is: ^.*addr=<HOST>, terminal=sshd res=failed.*$ I've used regextester.com and regexr to try to build the regex. The testers give me a match for this regex:^.*addr=\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}, terminal=sshd res=failed.*$ but fail2ban-regex complains if I don't use the <HOST> tag in the regex. However, using ^.*addr=<HOST>, terminal=sshd res=failed.*$ gives me 0 matches. At this point, I am totally stuck and I would greatly appreciate any assistance. What am I doing wrong in the regex I am trying to use?

Read the article
less maximum buffer size?

- by Tyzoid

I was messing around with my system and found a novel way to use up memory, but it seems that the less command only holds a limited amount of data before stopping/killing the command. To test, run (careful! uses lots of system memory very fast!) $ cat /dev/zero | less From my testing, it looks like the command is killed after less reaches 2.5 gigabytes of memory, but I can't find anything in the man page that suggests that it would limit it in such a way. In addition, I couldn't find any documentation via the google on the subject. Any light to this quite surprising discovery would be great! System Information: Quad core intel i7, 8gb ram. $ uname -a Linux Tyler-Work 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux $ less --version less 458 (GNU regular expressions) Copyright (C) 1984-2012 Mark Nudelman less comes with NO WARRANTY, to the extent permitted by law. For information about the terms of redistribution, see the file named README in the less distribution. Homepage: http://www.greenwoodsoftware.com/less $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 14.04 LTS Release: 14.04 Codename: trusty

Read the article
Trouble Letting Users Get to Certain Sites through Squid Proxy

- by armani

We have Squid running on a RHEL server. We want to block users from getting to Facebook, other than a couple specific sites, like our organization's page. Unfortunately, I can't get those specific pages unblocked without allowing ALL of Facebook through. [squid.conf] # Local users: acl local_c src 192.168.0.0/16 # HTTP & HTTPS: acl Safe_ports port 80 443 # File containing blocked sites, including Facebook: acl blocked dst_dom_regex "/etc/squid/blocked_content" # Whitelist: acl whitelist url_regex "/etc/squid/whitelist" # I do know that order matters: http_access allow local_c whitelist http_access allow local_c !blocked http_access deny all [blocked_content] .porn_site.com .porn_site_2.com [...] facebook.com [whitelist] facebook.com/pages/Our-Organization/2828242522 facebook.com/OurOrganization facebook.com/media/set/ facebook.com/photo.php www.facebook.com/OurOrganization My biggest weakness is regular expressions, so I'm not 100% sure about if this is all correct. If I remove the "!blocked" part of the http_access rule, all of Facebook works. If I remove "facebook.com" from the blocked_content file, all of Facebook works. Right now, visiting facebook.com/OurOrganization gives a "The website declined to show this webpage / HTTP 403" error in Internet Explorer, and "Error 111 (net::ERR_TUNNEL_CONNECTION_FAILED): Unknown error" in Chrome. WhereGoes.com tells me the URL redirects for that URL goes like this: facebook.com/OurOrganization -- [301 Redirect] -- http://www.facebook.com/OurOrganization -- [302 Redirect] -- https://www.facebook.com/OurOrganization I tried turning up the debug traffic out of squid using "debug_options ALL,6" but I can't narrow anything down in /var/log/access.log and /var/log/cache.log. I know to issue "squid -k reconfigure" whenever I make changes to any files.

Read the article
Can you see something wrong in my .htaccess?

- by AlexV

OK, after many search, trial and errors I've managed to create an .htaccess that do what I wanted (see explanations and questions after the code block): <IfModule mod_rewrite.c> RewriteEngine On #1 If the requested file is not url-mapper.php (to avoid .htaccess loop) RewriteCond %{REQUEST_FILENAME} (?<!url-mapper\.php)$ #2 If the requested URI does not end with an extension OR if the URI ends with .php* RewriteCond %{REQUEST_URI} !\.(.*) [OR] RewriteCond %{REQUEST_URI} \.php.*$ [NC] #3 If the requested URI is not in an excluded location RewriteCond %{REQUEST_URI} !^/seo-urls\/(excluded1|excluded2)(/.*)?$ #Then serve the URI via the mapper RewriteRule .* /seo-urls/url-mapper.php?uri=%{REQUEST_URI} [L,QSA] </IfModule> This is what the .htaccess should do: #1 is checking that the file requested is not url-mapper.php (to avoid infinite redirect loops). This file will always be at the root of the domain. #2 the .htaccess must only catch URLs that don't end with an extension (www.foo.com -- catch | www.foo.com/catch-me -- catch | www.foo.com/dont-catch.me -- don't catch) and URLs ending with .php* files (.php, .php4, .php5, .php123...). #3 some directories (and childs) can be excluded from the .htaccess (in this case /seo-urls/excluded1 and /seo-urls/excluded2). Finally the .htaccess feed the mapper with an hidden GET parameter named uri containing the requested uri. Even if I tested and everything works, I want to know if what I do is correct (and if it's the "best" way to do it). I've learned a lot with this "project" but I still consider myself a beginner at .htaccess and regular expressions so I want to triple check it there before putting it in production...

Read the article
nginx short urls for mediawiki

- by William

I am trying to do short URLs for a MediaWiki site. The wiki is in a subdirectory mydir (http://www.example.com/mywiki). I've already set up rewrites in /etc/nginx/sites-available so that example.com redirects to example.com/mywiki. Currently the URL is like http://www.example.com/mywiki/index.php?title=Main_Page. I want to clean up the url so that it looks like http://www.example.com/mywiki/Main_Page. I am having quite a bit of trouble doing this. I am not familiar with regular expressions or the syntax that the nginx config files use. This is what I currently have: server_name example.com www.example.com; location / { rewrite ^.+ /mywiki/ permanent; } location /wiki/ { rewrite ^/mywiki/([^?]*)(?:\?(.*))? /mywiki/index.php?title=$1&$2 last; } The second rewrite is obviously the one that's broken. It is based off of Page title -- nginx rewrite--root access in the MediaWiki documentation. When I try to load the site, the browser tells me I get infinite redirects. Does anyone who how I should go about fixing this issue? Or rather, what is the correct way to implement this, and what do all those symbols mean?

Read the article
Advice on resizing 1280*720 for web audiences.

- by jamiethompson90

Forgive my spelling, I'm posting this from my mobile. I've recently decided to record videos to help teach a visual language. My camera likes to boast it can record in 1280; its a cheap camera about £75 so the quality isn't amazing. But its okay. Anyways, it has some other settings for lower res, but I figure might as well record in a larger size in case the need arises for a bigger source file in the future. I've been looking at jw player to play the converted files (mp4 to flv I think). What do you think a good size would be to convert to? I want to to look nice and clear remembering it is a visual language so lip patterns, facial expressions, body movement, fingers etc are all important, sound is not that important but I would like to have a choice to toggle captions. Thanks for any help, any advice apreciated, first time I have done a video project! P.s. If anyones interested its BSL. Jamie

Read the article
screen scraper templates for various websites

- by intuited

I'm looking specifically for a convenient way to locally archive posts from this and other similar sites. I'd like to separate the question itself from the answers, or maybe crop the question and store it, keeping the page title. Obviously I don't need to store the menu or the various other site interface chrome. The best way to do this would seem to be to associate an XSLT template with a match on the URL and use that template to pull the various relevant informations and format them. My two-part question: Is there a tool specifically built for this task? I.E. something that takes a URL and checks it against a map of path-matching expressions to templates, and outputs the result of applying the template to that resource? xmlto seems to be most of the way there, and could probably just be called from a script that does the pattern-matching, but something already integrated would be more convenient. Is such a URL_pattern-to-XSLT_template map publicly available somewhere? Question 2.5: Is it legal to do this with sites like this one that have public licenses on their content?

Read the article
Can you see something wrong in my working .htaccess?

- by AlexV

OK, after many search, trial and errors I've managed to create an .htaccess that do what I wanted (see explanations and questions after the code block): <IfModule mod_rewrite.c> RewriteEngine On #1 If the requested file is not url-mapper.php (to avoid .htaccess loop) RewriteCond %{REQUEST_FILENAME} (?<!url-mapper\.php)$ #2 If the requested URI does not end with an extension OR if the URI ends with .php* RewriteCond %{REQUEST_URI} !\.(.*) [OR] RewriteCond %{REQUEST_URI} \.php.*$ [NC] #3 If the requested URI is not in an excluded location RewriteCond %{REQUEST_URI} !^/seo-urls\/(excluded1|excluded2)(/.*)?$ #Then serve the URI via the mapper RewriteRule .* /seo-urls/url-mapper.php?uri=%{REQUEST_URI} [L,QSA] </IfModule> This is what the .htaccess should do: #1 is checking that the file requested is not url-mapper.php (to avoid infinite redirect loops). This file will always be at the root of the domain. #2 the .htaccess must only catch URLs that don't end with an extension (www.foo.com -- catch | www.foo.com/catch-me -- catch | www.foo.com/dont-catch.me -- don't catch) and URLs ending with .php* files (.php, .php4, .php5, .php123...). #3 some directories (and childs) can be excluded from the .htaccess (in this case /seo-urls/excluded1 and /seo-urls/excluded2). Finally the .htaccess feed the mapper with an hidden GET parameter named uri containing the requested uri. Even if I tested and everything works, I want to know if what I do is correct (and if it's the "best" way to do it). I've learned a lot with this "project" but I still consider myself a beginner at .htaccess and regular expressions so I want to triple check it there before putting it in production...

Read the article
C# 4.0: Dynamic Programming

- by Paulo Morgado

The major feature of C# 4.0 is dynamic programming. Not just dynamic typing, but dynamic in broader sense, which means talking to anything that is not statically typed to be a .NET object. Dynamic Language Runtime The Dynamic Language Runtime (DLR) is piece of technology that unifies dynamic programming on the .NET platform, the same way the Common Language Runtime (CLR) has been a common platform for statically typed languages. The CLR always had dynamic capabilities. You could always use reflection, but its main goal was never to be a dynamic programming environment and there were some features missing. The DLR is built on top of the CLR and adds those missing features to the .NET platform. The Dynamic Language Runtime is the core infrastructure that consists of: Expression Trees The same expression trees used in LINQ, now improved to support statements. Dynamic Dispatch Dispatches invocations to the appropriate binder. Call Site Caching For improved efficiency. Dynamic languages and languages with dynamic capabilities are built on top of the DLR. IronPython and IronRuby were already built on top of the DLR, and now, the support for using the DLR is being added to C# and Visual Basic. Other languages built on top of the CLR are expected to also use the DLR in the future. Underneath the DLR there are binders that talk to a variety of different technologies: .NET Binder Allows to talk to .NET objects. JavaScript Binder Allows to talk to JavaScript in SilverLight. IronPython Binder Allows to talk to IronPython. IronRuby Binder Allows to talk to IronRuby. COM Binder Allows to talk to COM. Whit all these binders it is possible to have a single programming experience to talk to all these environments that are not statically typed .NET objects. The dynamic Static Type Let’s take this traditional statically typed code: Calculator calculator = GetCalculator(); int sum = calculator.Sum(10, 20); Because the variable that receives the return value of the GetCalulator method is statically typed to be of type Calculator and, because the Calculator type has an Add method that receives two integers and returns an integer, it is possible to call that Sum method and assign its return value to a variable statically typed as integer. Now lets suppose the calculator was not a statically typed .NET class, but, instead, a COM object or some .NET code we don’t know he type of. All of the sudden it gets very painful to call the Add method: object calculator = GetCalculator(); Type calculatorType = calculator.GetType(); object res = calculatorType.InvokeMember("Add", BindingFlags.InvokeMethod, null, calculator, new object[] { 10, 20 }); int sum = Convert.ToInt32(res); And what if the calculator was a JavaScript object? ScriptObject calculator = GetCalculator(); object res = calculator.Invoke("Add", 10, 20); int sum = Convert.ToInt32(res); For each dynamic domain we have a different programming experience and that makes it very hard to unify the code. With C# 4.0 it becomes possible to write code this way: dynamic calculator = GetCalculator(); int sum = calculator.Add(10, 20); You simply declare a variable who’s static type is dynamic. dynamic is a pseudo-keyword (like var) that indicates to the compiler that operations on the calculator object will be done dynamically. The way you should look at dynamic is that it’s just like object (System.Object) with dynamic semantics associated. Anything can be assigned to a dynamic. dynamic x = 1; dynamic y = "Hello"; dynamic z = new List<int> { 1, 2, 3 }; At run-time, all object will have a type. In the above example x is of type System.Int32. When one or more operands in an operation are typed dynamic, member selection is deferred to run-time instead of compile-time. Then the run-time type is substituted in all variables and normal overload resolution is done, just like it would happen at compile-time. The result of any dynamic operation is always dynamic and, when a dynamic object is assigned to something else, a dynamic conversion will occur. Code Resolution Method double x = 1.75; double y = Math.Abs(x); compile-time double Abs(double x) dynamic x = 1.75; dynamic y = Math.Abs(x); run-time double Abs(double x) dynamic x = 2; dynamic y = Math.Abs(x); run-time int Abs(int x) The above code will always be strongly typed. The difference is that, in the first case the method resolution is done at compile-time, and the others it’s done ate run-time. IDynamicMetaObjectObject The DLR is pre-wired to know .NET objects, COM objects and so forth but any dynamic language can implement their own objects or you can implement your own objects in C# through the implementation of the IDynamicMetaObjectProvider interface. When an object implements IDynamicMetaObjectProvider, it can participate in the resolution of how method calls and property access is done. The .NET Framework already provides two implementations of IDynamicMetaObjectProvider: DynamicObject : IDynamicMetaObjectProvider The DynamicObject class enables you to define which operations can be performed on dynamic objects and how to perform those operations. For example, you can define what happens when you try to get or set an object property, call a method, or perform standard mathematical operations such as addition and multiplication. ExpandoObject : IDynamicMetaObjectProvider The ExpandoObject class enables you to add and delete members of its instances at run time and also to set and get values of these members. This class supports dynamic binding, which enables you to use standard syntax like sampleObject.sampleMember, instead of more complex syntax like sampleObject.GetAttribute("sampleMember").

Read the article
Agile Testing Days 2012 – Day 1 – The birth of the #unicorn…

- by Chris George

Still riding the high from the tutorial day, I arrived at the conference venue eager to get cracking with the days talks. The opening Keynote was “Disciplined Agile Delivery: The Foundation for Scaling Agile” presented by Scott Ambler. The general ideas behind the methodology such as not re-inventing the wheel, and being goal driven, not prescriptive in how you work certainly struck chords with how we are trying to work in my team. Scott made some interesting observations about how scrum is quite prescriptive and is this really agile? I agreed with quite a few of his points on how what works for one team may not work for another. How a team works should be driven by context and reflection, not process and prescription. However was somewhat dubious about some of the statistics he rolled out towards the end. However, out of this keynote was born something that was to transcend this one presentation. During the talk, Scott mentioned on more than one occasion “In the real world”, and at one point made reference to people living in the land of unicorns and rainbows. The challenge was then laid down on twitter for all speakers to include a unicorn in their presentations… and for the most part this happened! It became an identity for this years conference, and I’m sure something that any attendee will always associate with Agile Testing Days 2012! Following this keynote, I attended “Going agile with Automated GUI Testing – Some personal insights” by Jan Zdunek from codecentric on the vendor track. My speciality is test automation, and in particular GUI testing, so this drew me to this talk more than the others. Thankfully, it was made clear from the very start that this was not peddling any particular product (even though it was on the vendor track), and Jan faithfully stuck to that. Most of the content was not new to me, but it was really comforting to hear someone else with very similar experiences to my own. In particular, things like how GUI testing is hard and is not a silver bullet; how record & replay is NOT a good thing to do (which drew a somewhat inflammatory tweet from an automation company when I tweeted that!). Something that I have started hearing around the place, and has certainly been murmuring at work is to push more of the automation coding onto the developers. After all they are the coding experts. I agree with this to a degree, but I personally enjoy coding and find it very rewarding doing so, therefore I’d be reluctant to give it up. I think there are some better alternatives such as pairing with a developer. Lastly, Jan mentioned, almost in passing, that we should consider virtualisation for gui testing for covering configuration combinations. On my project we’ve been running our win32/.NET GUI tests in cloud virtualisation for a couple of years now… I really should write about that! After lunch the second keynote of the day was by Lisa Crispin and Janet Gregory,”Myths about Agile Testing, De-Bunked”. It started off well… with the two ladies donning Medusa style head bands whilst they disbanding several myths about agile testing! I got the impression that it was perhaps not as slick as they would have liked, but then Janet was suffering with a very sore throat so kept losing her voice. Nevertheless, the presentation was captivating, and they debunked several myths such as : “Testing is dead”, “Testers must write code”, “Agile teams always deliver faster”. I didn’t take many notes for this because it was being recorded, but unfortunately the recordings have not been posted yet so I’ll write more about this when they are. The TestLab was held during a somewhat free for all time during most of the afternoon. It looked intriguing and proved to be one of the surprising experiences of the conference for me. Run by James Lyndsay and Bart Knaack, it consisted of a number of ‘stations’ that offered different testing problems. I opted for testing a mathematical drawing app call Geogebra, the task being to pair up and exploratory test it. After an allotted time, we discussed issues we’d found and decided if we wanted to continue ‘playing’ to which we all agreed! It was fun! The last track talk of the day was “Developers Exploratory Testing – Raising the bar” by Sigge Birgisson. One of the teams at Red Gate have tried Dev or Team exploratory testing a couple of times, and I was really interested to go to the presentation that prompted that. I was not disappointed! Sigge gave a first class presentation, and not only explained what DET was all about, but also how to go about implementing it. Little tips like calling it a ‘workshop’ rather than ‘testing’ I can really see working! Monday evening saw the presentation of the award for the Most Influential Agile Testing Professional Person go to a much deserved Lisa Crispin. The evening was great, with acrobatics, magic and music. My Takeaway Triple from Day 1: Some of the cool stuff that was suggested in the GUI Testing talk, we are already doing. I should write about that! Testing is not dead! Perhaps testing will become more of a skill than a specific role, but it is certainly not dead. Team/Developer exploratory testing… seems like a no-brainer assuming you have a team who is willing. Day 2 – Coming soon…

Read the article
value types in the vm

- by john.rose

value types in the vm p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} p.p2 {margin: 0.0px 0.0px 14.0px 0.0px; font: 14.0px Times} p.p3 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times} p.p4 {margin: 0.0px 0.0px 15.0px 0.0px; font: 14.0px Times} p.p5 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier} p.p6 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier; min-height: 17.0px} p.p7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p8 {margin: 0.0px 0.0px 0.0px 36.0px; text-indent: -36.0px; font: 14.0px Times; min-height: 18.0px} p.p9 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p10 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; color: #000000} li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} li.li7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} span.s1 {font: 14.0px Courier} span.s2 {color: #000000} span.s3 {font: 14.0px Courier; color: #000000} ol.ol1 {list-style-type: decimal} Or, enduring values for a changing world. Introduction A value type is a data type which, generally speaking, is designed for being passed by value in and out of methods, and stored by value in data structures. The only value types which the Java language directly supports are the eight primitive types. Java indirectly and approximately supports value types, if they are implemented in terms of classes. For example, both Integer and String may be viewed as value types, especially if their usage is restricted to avoid operations appropriate to Object. In this note, we propose a definition of value types in terms of a design pattern for Java classes, accompanied by a set of usage restrictions. We also sketch the relation of such value types to tuple types (which are a JVM-level notion), and point out JVM optimizations that can apply to value types. This note is a thought experiment to extend the JVM’s performance model in support of value types. The demonstration has two phases. Initially the extension can simply use design patterns, within the current bytecode architecture, and in today’s Java language. But if the performance model is to be realized in practice, it will probably require new JVM bytecode features, changes to the Java language, or both. We will look at a few possibilities for these new features. An Axiom of Value In the context of the JVM, a value type is a data type equipped with construction, assignment, and equality operations, and a set of typed components, such that, whenever two variables of the value type produce equal corresponding values for their components, the values of the two variables cannot be distinguished by any JVM operation. Here are some corollaries: A value type is immutable, since otherwise a copy could be constructed and the original could be modified in one of its components, allowing the copies to be distinguished. Changing the component of a value type requires construction of a new value. The equals and hashCode operations are strictly component-wise. If a value type is represented by a JVM reference, that reference cannot be successfully synchronized on, and cannot be usefully compared for reference equality. A value type can be viewed in terms of what it doesn’t do. We can say that a value type omits all value-unsafe operations, which could violate the constraints on value types. These operations, which are ordinarily allowed for Java object types, are pointer equality comparison (the acmp instruction), synchronization (the monitor instructions), all the wait and notify methods of class Object, and non-trivial finalize methods. The clone method is also value-unsafe, although for value types it could be treated as the identity function. Finally, and most importantly, any side effect on an object (however visible) also counts as an value-unsafe operation. A value type may have methods, but such methods must not change the components of the value. It is reasonable and useful to define methods like toString, equals, and hashCode on value types, and also methods which are specifically valuable to users of the value type. Representations of Value Value types have two natural representations in the JVM, unboxed and boxed. An unboxed value consists of the components, as simple variables. For example, the complex number x=(1+2i), in rectangular coordinate form, may be represented in unboxed form by the following pair of variables: /*Complex x = Complex.valueOf(1.0, 2.0):*/ double x_re = 1.0, x_im = 2.0; These variables might be locals, parameters, or fields. Their association as components of a single value is not defined to the JVM. Here is a sample computation which computes the norm of the difference between two complex numbers: double distance(/*Complex x:*/ double x_re, double x_im, /*Complex y:*/ double y_re, double y_im) { /*Complex z = x.minus(y):*/ double z_re = x_re - y_re, z_im = x_im - y_im; /*return z.abs():*/ return Math.sqrt(z_re*z_re + z_im*z_im); } A boxed representation groups component values under a single object reference. The reference is to a ‘wrapper class’ that carries the component values in its fields. (A primitive type can naturally be equated with a trivial value type with just one component of that type. In that view, the wrapper class Integer can serve as a boxed representation of value type int.) The unboxed representation of complex numbers is practical for many uses, but it fails to cover several major use cases: return values, array elements, and generic APIs. The two components of a complex number cannot be directly returned from a Java function, since Java does not support multiple return values. The same story applies to array elements: Java has no ’array of structs’ feature. (Double-length arrays are a possible workaround for complex numbers, but not for value types with heterogeneous components.) By generic APIs I mean both those which use generic types, like Arrays.asList and those which have special case support for primitive types, like String.valueOf and PrintStream.println. Those APIs do not support unboxed values, and offer some problems to boxed values. Any ’real’ JVM type should have a story for returns, arrays, and API interoperability. The basic problem here is that value types fall between primitive types and object types. Value types are clearly more complex than primitive types, and object types are slightly too complicated. Objects are a little bit dangerous to use as value carriers, since object references can be compared for pointer equality, and can be synchronized on. Also, as many Java programmers have observed, there is often a performance cost to using wrapper objects, even on modern JVMs. Even so, wrapper classes are a good starting point for talking about value types. If there were a set of structural rules and restrictions which would prevent value-unsafe operations on value types, wrapper classes would provide a good notation for defining value types. This note attempts to define such rules and restrictions. Let’s Start Coding Now it is time to look at some real code. Here is a definition, written in Java, of a complex number value type. @ValueSafe public final class Complex implements java.io.Serializable { // immutable component structure: public final double re, im; private Complex(double re, double im) { this.re = re; this.im = im; } // interoperability methods: public String toString() { return "Complex("+re+","+im+")"; } public List<Double> asList() { return Arrays.asList(re, im); } public boolean equals(Complex c) { return re == c.re && im == c.im; } public boolean equals(@ValueSafe Object x) { return x instanceof Complex && equals((Complex) x); } public int hashCode() { return 31*Double.valueOf(re).hashCode() + Double.valueOf(im).hashCode(); } // factory methods: public static Complex valueOf(double re, double im) { return new Complex(re, im); } public Complex changeRe(double re2) { return valueOf(re2, im); } public Complex changeIm(double im2) { return valueOf(re, im2); } public static Complex cast(@ValueSafe Object x) { return x == null ? ZERO : (Complex) x; } // utility methods and constants: public Complex plus(Complex c) { return new Complex(re+c.re, im+c.im); } public Complex minus(Complex c) { return new Complex(re-c.re, im-c.im); } public double abs() { return Math.sqrt(re*re + im*im); } public static final Complex PI = valueOf(Math.PI, 0.0); public static final Complex ZERO = valueOf(0.0, 0.0); } This is not a minimal definition, because it includes some utility methods and other optional parts. The essential elements are as follows: The class is marked as a value type with an annotation. The class is final, because it does not make sense to create subclasses of value types. The fields of the class are all non-private and final. (I.e., the type is immutable and structurally transparent.) From the supertype Object, all public non-final methods are overridden. The constructor is private. Beyond these bare essentials, we can observe the following features in this example, which are likely to be typical of all value types: One or more factory methods are responsible for value creation, including a component-wise valueOf method. There are utility methods for complex arithmetic and instance creation, such as plus and changeIm. There are static utility constants, such as PI. The type is serializable, using the default mechanisms. There are methods for converting to and from dynamically typed references, such as asList and cast. The Rules In order to use value types properly, the programmer must avoid value-unsafe operations. A helpful Java compiler should issue errors (or at least warnings) for code which provably applies value-unsafe operations, and should issue warnings for code which might be correct but does not provably avoid value-unsafe operations. No such compilers exist today, but to simplify our account here, we will pretend that they do exist. A value-safe type is any class, interface, or type parameter marked with the @ValueSafe annotation, or any subtype of a value-safe type. If a value-safe class is marked final, it is in fact a value type. All other value-safe classes must be abstract. The non-static fields of a value class must be non-public and final, and all its constructors must be private. Under the above rules, a standard interface could be helpful to define value types like Complex. Here is an example: @ValueSafe public interface ValueType extends java.io.Serializable { // All methods listed here must get redefined. // Definitions must be value-safe, which means // they may depend on component values only. List<? extends Object> asList(); int hashCode(); boolean equals(@ValueSafe Object c); String toString(); } //@ValueSafe inherited from supertype: public final class Complex implements ValueType { … The main advantage of such a conventional interface is that (unlike an annotation) it is reified in the runtime type system. It could appear as an element type or parameter bound, for facilities which are designed to work on value types only. More broadly, it might assist the JVM to perform dynamic enforcement of the rules for value types. Besides types, the annotation @ValueSafe can mark fields, parameters, local variables, and methods. (This is redundant when the type is also value-safe, but may be useful when the type is Object or another supertype of a value type.) Working forward from these annotations, an expression E is defined as value-safe if it satisfies one or more of the following: The type of E is a value-safe type. E names a field, parameter, or local variable whose declaration is marked @ValueSafe. E is a call to a method whose declaration is marked @ValueSafe. E is an assignment to a value-safe variable, field reference, or array reference. E is a cast to a value-safe type from a value-safe expression. E is a conditional expression E0 ? E1 : E2, and both E1 and E2 are value-safe. Assignments to value-safe expressions and initializations of value-safe names must take their values from value-safe expressions. A value-safe expression may not be the subject of a value-unsafe operation. In particular, it cannot be synchronized on, nor can it be compared with the “==” operator, not even with a null or with another value-safe type. In a program where all of these rules are followed, no value-type value will be subject to a value-unsafe operation. Thus, the prime axiom of value types will be satisfied, that no two value type will be distinguishable as long as their component values are equal. More Code To illustrate these rules, here are some usage examples for Complex: Complex pi = Complex.valueOf(Math.PI, 0); Complex zero = pi.changeRe(0); //zero = pi; zero.re = 0; ValueType vtype = pi; @SuppressWarnings("value-unsafe") Object obj = pi; @ValueSafe Object obj2 = pi; obj2 = new Object(); // ok List<Complex> clist = new ArrayList<Complex>(); clist.add(pi); // (ok assuming List.add param is @ValueSafe) List<ValueType> vlist = new ArrayList<ValueType>(); vlist.add(pi); // (ok) List<Object> olist = new ArrayList<Object>(); olist.add(pi); // warning: "value-unsafe" boolean z = pi.equals(zero); boolean z1 = (pi == zero); // error: reference comparison on value type boolean z2 = (pi == null); // error: reference comparison on value type boolean z3 = (pi == obj2); // error: reference comparison on value type synchronized (pi) { } // error: synch of value, unpredictable result synchronized (obj2) { } // unpredictable result Complex qq = pi; qq = null; // possible NPE; warning: “null-unsafe" qq = (Complex) obj; // warning: “null-unsafe" qq = Complex.cast(obj); // OK @SuppressWarnings("null-unsafe") Complex empty = null; // possible NPE qq = empty; // possible NPE (null pollution) The Payoffs It follows from this that either the JVM or the java compiler can replace boxed value-type values with unboxed ones, without affecting normal computations. Fields and variables of value types can be split into their unboxed components. Non-static methods on value types can be transformed into static methods which take the components as value parameters. Some common questions arise around this point in any discussion of value types. Why burden the programmer with all these extra rules? Why not detect programs automagically and perform unboxing transparently? The answer is that it is easy to break the rules accidently unless they are agreed to by the programmer and enforced. Automatic unboxing optimizations are tantalizing but (so far) unreachable ideal. In the current state of the art, it is possible exhibit benchmarks in which automatic unboxing provides the desired effects, but it is not possible to provide a JVM with a performance model that assures the programmer when unboxing will occur. This is why I’m writing this note, to enlist help from, and provide assurances to, the programmer. Basically, I’m shooting for a good set of user-supplied “pragmas” to frame the desired optimization. Again, the important thing is that the unboxing must be done reliably, or else programmers will have no reason to work with the extra complexity of the value-safety rules. There must be a reasonably stable performance model, wherein using a value type has approximately the same performance characteristics as writing the unboxed components as separate Java variables. There are some rough corners to the present scheme. Since Java fields and array elements are initialized to null, value-type computations which incorporate uninitialized variables can produce null pointer exceptions. One workaround for this is to require such variables to be null-tested, and the result replaced with a suitable all-zero value of the value type. That is what the “cast” method does above. Generically typed APIs like List<T> will continue to manipulate boxed values always, at least until we figure out how to do reification of generic type instances. Use of such APIs will elicit warnings until their type parameters (and/or relevant members) are annotated or typed as value-safe. Retrofitting List<T> is likely to expose flaws in the present scheme, which we will need to engineer around. Here are a couple of first approaches: public interface java.util.List<@ValueSafe T> extends Collection<T> { … public interface java.util.List<T extends Object|ValueType> extends Collection<T> { … (The second approach would require disjunctive types, in which value-safety is “contagious” from the constituent types.) With more transformations, the return value types of methods can also be unboxed. This may require significant bytecode-level transformations, and would work best in the presence of a bytecode representation for multiple value groups, which I have proposed elsewhere under the title “Tuples in the VM”. But for starters, the JVM can apply this transformation under the covers, to internally compiled methods. This would give a way to express multiple return values and structured return values, which is a significant pain-point for Java programmers, especially those who work with low-level structure types favored by modern vector and graphics processors. The lack of multiple return values has a strong distorting effect on many Java APIs. Even if the JVM fails to unbox a value, there is still potential benefit to the value type. Clustered computing systems something have copy operations (serialization or something similar) which apply implicitly to command operands. When copying JVM objects, it is extremely helpful to know when an object’s identity is important or not. If an object reference is a copied operand, the system may have to create a proxy handle which points back to the original object, so that side effects are visible. Proxies must be managed carefully, and this can be expensive. On the other hand, value types are exactly those types which a JVM can “copy and forget” with no downside. Array types are crucial to bulk data interfaces. (As data sizes and rates increase, bulk data becomes more important than scalar data, so arrays are definitely accompanying us into the future of computing.) Value types are very helpful for adding structure to bulk data, so a successful value type mechanism will make it easier for us to express richer forms of bulk data. Unboxing arrays (i.e., arrays containing unboxed values) will provide better cache and memory density, and more direct data movement within clustered or heterogeneous computing systems. They require the deepest transformations, relative to today’s JVM. There is an impedance mismatch between value-type arrays and Java’s covariant array typing, so compromises will need to be struck with existing Java semantics. It is probably worth the effort, since arrays of unboxed value types are inherently more memory-efficient than standard Java arrays, which rely on dependent pointer chains. It may be sufficient to extend the “value-safe” concept to array declarations, and allow low-level transformations to change value-safe array declarations from the standard boxed form into an unboxed tuple-based form. Such value-safe arrays would not be convertible to Object[] arrays. Certain connection points, such as Arrays.copyOf and System.arraycopy might need additional input/output combinations, to allow smooth conversion between arrays with boxed and unboxed elements. Alternatively, the correct solution may have to wait until we have enough reification of generic types, and enough operator overloading, to enable an overhaul of Java arrays. Implicit Method Definitions The example of class Complex above may be unattractively complex. I believe most or all of the elements of the example class are required by the logic of value types. If this is true, a programmer who writes a value type will have to write lots of error-prone boilerplate code. On the other hand, I think nearly all of the code (except for the domain-specific parts like plus and minus) can be implicitly generated. Java has a rule for implicitly defining a class’s constructor, if no it defines no constructors explicitly. Likewise, there are rules for providing default access modifiers for interface members. Because of the highly regular structure of value types, it might be reasonable to perform similar implicit transformations on value types. Here’s an example of a “highly implicit” definition of a complex number type: public class Complex implements ValueType { // implicitly final public double re, im; // implicitly public final //implicit methods are defined elementwise from te fields: // toString, asList, equals(2), hashCode, valueOf, cast //optionally, explicit methods (plus, abs, etc.) would go here } In other words, with the right defaults, a simple value type definition can be a one-liner. The observant reader will have noticed the similarities (and suitable differences) between the explicit methods above and the corresponding methods for List<T>. Another way to abbreviate such a class would be to make an annotation the primary trigger of the functionality, and to add the interface(s) implicitly: public @ValueType class Complex { … // implicitly final, implements ValueType (But to me it seems better to communicate the “magic” via an interface, even if it is rooted in an annotation.) Implicitly Defined Value Types So far we have been working with nominal value types, which is to say that the sequence of typed components is associated with a name and additional methods that convey the intention of the programmer. A simple ordered pair of floating point numbers can be variously interpreted as (to name a few possibilities) a rectangular or polar complex number or Cartesian point. The name and the methods convey the intended meaning. But what if we need a truly simple ordered pair of floating point numbers, without any further conceptual baggage? Perhaps we are writing a method (like “divideAndRemainder”) which naturally returns a pair of numbers instead of a single number. Wrapping the pair of numbers in a nominal type (like “QuotientAndRemainder”) makes as little sense as wrapping a single return value in a nominal type (like “Quotient”). What we need here are structural value types commonly known as tuples. For the present discussion, let us assign a conventional, JVM-friendly name to tuples, roughly as follows: public class java.lang.tuple.$DD extends java.lang.tuple.Tuple { double $1, $2; } Here the component names are fixed and all the required methods are defined implicitly. The supertype is an abstract class which has suitable shared declarations. The name itself mentions a JVM-style method parameter descriptor, which may be “cracked” to determine the number and types of the component fields. The odd thing about such a tuple type (and structural types in general) is it must be instantiated lazily, in response to linkage requests from one or more classes that need it. The JVM and/or its class loaders must be prepared to spin a tuple type on demand, given a simple name reference, $xyz, where the xyz is cracked into a series of component types. (Specifics of naming and name mangling need some tasteful engineering.) Tuples also seem to demand, even more than nominal types, some support from the language. (This is probably because notations for non-nominal types work best as combinations of punctuation and type names, rather than named constructors like Function3 or Tuple2.) At a minimum, languages with tuples usually (I think) have some sort of simple bracket notation for creating tuples, and a corresponding pattern-matching syntax (or “destructuring bind”) for taking tuples apart, at least when they are parameter lists. Designing such a syntax is no simple thing, because it ought to play well with nominal value types, and also with pre-existing Java features, such as method parameter lists, implicit conversions, generic types, and reflection. That is a task for another day. Other Use Cases Besides complex numbers and simple tuples there are many use cases for value types. Many tuple-like types have natural value-type representations. These include rational numbers, point locations and pixel colors, and various kinds of dates and addresses. Other types have a variable-length ‘tail’ of internal values. The most common example of this is String, which is (mathematically) a sequence of UTF-16 character values. Similarly, bit vectors, multiple-precision numbers, and polynomials are composed of sequences of values. Such types include, in their representation, a reference to a variable-sized data structure (often an array) which (somehow) represents the sequence of values. The value type may also include ’header’ information. Variable-sized values often have a length distribution which favors short lengths. In that case, the design of the value type can make the first few values in the sequence be direct ’header’ fields of the value type. In the common case where the header is enough to represent the whole value, the tail can be a shared null value, or even just a null reference. Note that the tail need not be an immutable object, as long as the header type encapsulates it well enough. This is the case with String, where the tail is a mutable (but never mutated) character array. Field types and their order must be a globally visible part of the API. The structure of the value type must be transparent enough to have a globally consistent unboxed representation, so that all callers and callees agree about the type and order of components that appear as parameters, return types, and array elements. This is a trade-off between efficiency and encapsulation, which is forced on us when we remove an indirection enjoyed by boxed representations. A JVM-only transformation would not care about such visibility, but a bytecode transformation would need to take care that (say) the components of complex numbers would not get swapped after a redefinition of Complex and a partial recompile. Perhaps constant pool references to value types need to declare the field order as assumed by each API user. This brings up the delicate status of private fields in a value type. It must always be possible to load, store, and copy value types as coordinated groups, and the JVM performs those movements by moving individual scalar values between locals and stack. If a component field is not public, what is to prevent hostile code from plucking it out of the tuple using a rogue aload or astore instruction? Nothing but the verifier, so we may need to give it more smarts, so that it treats value types as inseparable groups of stack slots or locals (something like long or double). My initial thought was to make the fields always public, which would make the security problem moot. But public is not always the right answer; consider the case of String, where the underlying mutable character array must be encapsulated to prevent security holes. I believe we can win back both sides of the tradeoff, by training the verifier never to split up the components in an unboxed value. Just as the verifier encapsulates the two halves of a 64-bit primitive, it can encapsulate the the header and body of an unboxed String, so that no code other than that of class String itself can take apart the values. Similar to String, we could build an efficient multi-precision decimal type along these lines: public final class DecimalValue extends ValueType { protected final long header; protected private final BigInteger digits; public DecimalValue valueOf(int value, int scale) { assert(scale >= 0); return new DecimalValue(((long)value << 32) + scale, null); } public DecimalValue valueOf(long value, int scale) { if (value == (int) value) return valueOf((int)value, scale); return new DecimalValue(-scale, new BigInteger(value)); } } Values of this type would be passed between methods as two machine words. Small values (those with a significand which fits into 32 bits) would be represented without any heap data at all, unless the DecimalValue itself were boxed. (Note the tension between encapsulation and unboxing in this case. It would be better if the header and digits fields were private, but depending on where the unboxing information must “leak”, it is probably safer to make a public revelation of the internal structure.) Note that, although an array of Complex can be faked with a double-length array of double, there is no easy way to fake an array of unboxed DecimalValues. (Either an array of boxed values or a transposed pair of homogeneous arrays would be reasonable fallbacks, in a current JVM.) Getting the full benefit of unboxing and arrays will require some new JVM magic. Although the JVM emphasizes portability, system dependent code will benefit from using machine-level types larger than 64 bits. For example, the back end of a linear algebra package might benefit from value types like Float4 which map to stock vector types. This is probably only worthwhile if the unboxing arrays can be packed with such values. More Daydreams A more finely-divided design for dynamic enforcement of value safety could feature separate marker interfaces for each invariant. An empty marker interface Unsynchronizable could cause suitable exceptions for monitor instructions on objects in marked classes. More radically, a Interchangeable marker interface could cause JVM primitives that are sensitive to object identity to raise exceptions; the strangest result would be that the acmp instruction would have to be specified as raising an exception. @ValueSafe public interface ValueType extends java.io.Serializable, Unsynchronizable, Interchangeable { … public class Complex implements ValueType { // inherits Serializable, Unsynchronizable, Interchangeable, @ValueSafe … It seems possible that Integer and the other wrapper types could be retro-fitted as value-safe types. This is a major change, since wrapper objects would be unsynchronizable and their references interchangeable. It is likely that code which violates value-safety for wrapper types exists but is uncommon. It is less plausible to retro-fit String, since the prominent operation String.intern is often used with value-unsafe code. We should also reconsider the distinction between boxed and unboxed values in code. The design presented above obscures that distinction. As another thought experiment, we could imagine making a first class distinction in the type system between boxed and unboxed representations. Since only primitive types are named with a lower-case initial letter, we could define that the capitalized version of a value type name always refers to the boxed representation, while the initial lower-case variant always refers to boxed. For example: complex pi = complex.valueOf(Math.PI, 0); Complex boxPi = pi; // convert to boxed myList.add(boxPi); complex z = myList.get(0); // unbox Such a convention could perhaps absorb the current difference between int and Integer, double and Double. It might also allow the programmer to express a helpful distinction among array types. As said above, array types are crucial to bulk data interfaces, but are limited in the JVM. Extending arrays beyond the present limitations is worth thinking about; for example, the Maxine JVM implementation has a hybrid object/array type. Something like this which can also accommodate value type components seems worthwhile. On the other hand, does it make sense for value types to contain short arrays? And why should random-access arrays be the end of our design process, when bulk data is often sequentially accessed, and it might make sense to have heterogeneous streams of data as the natural “jumbo” data structure. These considerations must wait for another day and another note. More Work It seems to me that a good sequence for introducing such value types would be as follows: Add the value-safety restrictions to an experimental version of javac. Code some sample applications with value types, including Complex and DecimalValue. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. A staggered roll-out like this would decouple language changes from bytecode changes, which is always a convenient thing. A similar investigation should be applied (concurrently) to array types. In this case, it seems to me that the starting point is in the JVM: Add an experimental unboxing array data structure to a production JVM, perhaps along the lines of Maxine hybrids. No bytecode or language support is required at first; everything can be done with encapsulated unsafe operations and/or method handles. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. That’s enough musing me for now. Back to work!

Read the article
Data Mining Resources

- by Dejan Sarka

There are many different types of analyses, each one with its own pros and cons. Relational reports have a predefined structure, and end users cannot change it. They are simple to use for end users. Reports can use real-time data and snapshots of data to show the state of a report at specific points in time. One of the drawbacks is that report authoring is limited to IT pros and advanced users. Any kind of dynamic restructuring is very limited. If real-time data is used for a report, the report has a negative impact on the performance of the source system. Processing of the reports might be slow because the data comes from relational database management systems, which are not optimized for reporting only. If you create a semantic model of your data, your end users can create ad-hoc report structures. However, the development is more complex because a developer is needed to create these semantic models. For OLAP, you typically use specialized database management systems. You get lightning speed of analyses. End users can use rich and thin clients to interactively change the structure of the report. Typically, they do it graphically. However, the development of an OLAP system is many times quite complex. It involves the preparation and maintenance of an enterprise data warehouse and OLAP cubes. In order to exploit the possibility of real-time restructuring of reports, the users must be both active and educated. The data is usually stale, as it is loaded into data warehouses and OLAP cubes with a scheduled process. With data mining, a structure is not selected in advance; it searches for the structure. As a result, data mining can give you the most valuable results because you can discover patterns you did not expect. A data mining model structure is limited only by the attributes that you use to train the model. One of the drawbacks is that a lot of knowledge is needed for a successful data mining project. End users have to understand the results. Subject matter experts and IT professionals need to understand business problem thoroughly. The development might be sometimes even more complex than the development of OLAP cubes. Each type of analysis has its own place in an enterprise system. SQL Server has tools for all kinds of analyses. However, data mining is the most advanced way of analyzing the data; this is the “I” in BI. In order to get the most out of it, you need to learn quite a lot. In this blog post, I am gathering together resources for learning, including forthcoming events. Books Multiple authors: SQL Server MVP Deep Dives – I wrote an introductory data mining chapter there. Erik Veerman, Teo Lachev and Dejan Sarka: MCTS Self-Paced Training Kit (Exam 70-448): Microsoft SQL Server 2008 - Business Intelligence Development and Maintenance – you can find a good overview of a complete BI solution, including data mining, in this book. Jamie MacLennan, ZhaoHui Tang, and Bogdan Crivat: Data Mining with Microsoft SQL Server 2008 – can’t miss this book if you want to mine your data with SQL Server tools. Michael Berry, Gordon Linoff: Mastering Data Mining: The Art and Science of Customer Relationship Management – data mining from both, business and technical perspective. Dorian Pyle: Data Preparation for Data Mining – an in-depth book about data preparation. Thomas and Ronald Wonnacott: Introductory Statistics – if you thought that you could get away without statistics, then you are not serious about data mining. Jiawei Han and Micheline Kamber: Data Mining Concepts and Techniques – in-depth explanation of the most popular data mining algorithms. Michael Berry and Gordon Linoff: Data Mining Techniques – another book that explains data mining algorithms, more fro a business perspective. Paolo Guidici: Applied Data Mining – very mathematical book, only if you enjoy statistics and mathematics in general. Forthcoming presentations I am presenting two data mining related sessions during the PASS Summit in Charlotte, NC: Wednesday, October 16th, 2013 - Fraud Detection: Notes from the Field – I am showing how to use data mining for a specific business problem. The presentation is based on real-life projects. Friday, October 18th: Excel 2013 Advanced Analytics – I am focusing on Excel Data Mining Add-ins, and how to use them together with Power Pivot and other add-ins. This is the most you can get out of Excel. Sinergija 2013, Belgrade, Serbia Tuesday, October 22nd: Excel 2013 Analytics to the Max – another presentation focusing on the most advanced analytics you can get in Excel. SQL Rally Amsterdam, Netherlands Thursday, November 7th: Advanced Analytics in Excel 2013 – and again I am presenting about data mining in Excel. Why three different titles for the same presentation? I don’t know, I guess I forgot the name I proposed every time right after I sent the proposal. Courses Data Mining with SQL Server 2012 – I wrote a 3-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses [email protected] or [email protected]. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well. OK, now you know: no more excuses, start learning data mining, get the most out of your data

Read the article
SQL SERVER – Number-Crunching with SQL Server – Exceed the Functionality of Excel

- by Pinal Dave

Imagine this. Your users have developed an Excel spreadsheet that extracts data from your SQL Server database, manipulates that data through the use of Excel formulas and, possibly, some VBA code which is then used to calculate P&L, hedging requirements or even risk numbers. Management comes to you and tells you that they need to get rid of the spreadsheet and that the results of the spreadsheet calculations need to be persisted on the database. SQL Server has a very small set of functions for analyzing data. Excel has hundreds of functions for analyzing data, with many of them focused on specific financial and statistical calculations. Is it even remotely possible that you can use SQL Server to replace the complex calculations being done in a spreadsheet? Westclintech has developed a library of functions that match or exceed the functionality of Excel’s functions and contains many functions that are not available in EXCEL. Their XLeratorDB library of functions contains over 700 functions that can be incorporated into T-SQL statements. XLeratorDB takes advantage of the SQL CLR architecture introduced in SQL Server 2005. SQL CLR permits managed code to be compiled into the database and run alongside built-in SQL Server functions like COUNT or SUM. The Westclintech developers have taken advantage of this architecture to bring robust analytical functions to the database. In our hypothetical spreadsheet, let’s assume that our users are using the YIELD function and that the data are extracted from a table in our database called BONDS. Here’s what the spreadsheet might look like. We go to column G and see that it contains the following formula. Obviously, SQL Server does not offer a native YIELD function. However, with XLeratorDB we can replicate this calculation in SQL Server with the following statement: SELECT *, wct.YIELD(CAST(GETDATE() AS date),Maturity,Rate,Price,100,Frequency,Basis) AS YIELD FROM BONDS This produces the following result. This illustrates one of the best features about XLeratorDB; it is so easy to use. Since I knew that the spreadsheet was using the YIELD function I could use the same function with the same calling structure to do the calculation in SQL Server. I didn’t need to know anything at all about the mechanics of calculating the yield on a bond. It was pretty close to cut and paste. In fact, that’s one way to construct the SQL. Just copy the function call from the cell in the spreadsheet and paste it into SMS and change the cell references to column names. I built the SQL for this query by starting with this. SELECT * ,YIELD(TODAY(),B2,C2,D2,100,E2,F2) FROM BONDS I then changed the cell references to column names. SELECT * --,YIELD(TODAY(),B2,C2,D2,100,E2,F2) ,YIELD(TODAY(),Maturity,Rate,Price,100,Frequency,Basis) FROM BONDS Finally, I replicated the TODAY() function using GETDATE() and added the schema name to the function name. SELECT * --,YIELD(TODAY(),B2,C2,D2,100,E2,F2) --,YIELD(TODAY(),Maturity,Rate,Price,100,Frequency,Basis) ,wct.YIELD(GETDATE(),Maturity,Rate,Price,100,Frequency,Basis) FROM BONDS Then I am able to execute the statement returning the results seen above. The XLeratorDB libraries are heavy on financial, statistical, and mathematical functions. Where there is an analog to an Excel function, the XLeratorDB function uses the same naming conventions and calling structure as the Excel function, but there are also hundreds of additional functions for SQL Server that are not found in Excel. You can find the functions by opening Object Explorer in SQL Server Management Studio (SSMS) and expanding the Programmability folder under the database where the functions have been installed. The Functions folder expands to show 3 sub-folders: Table-valued Functions; Scalar-valued functions, Aggregate Functions, and System Functions. You can expand any of the first three folders to see the XLeratorDB functions. Since the wct.YIELD function is a scalar function, we will open the Scalar-valued Functions folder, scroll down to the wct.YIELD function and and click the plus sign (+) to display the input parameters. The functions are also Intellisense-enabled, with the input parameters displayed directly in the query tab. The Westclintech website contains documentation for all the functions including examples that can be copied directly into a query window and executed. There are also more one hundred articles on the site which go into more detail about how some of the functions work and demonstrate some of the extensive business processes that can be done in SQL Server using XLeratorDB functions and some T-SQL. XLeratorDB is organized into libraries: finance, statistics; math; strings; engineering; and financial options. There is also a windowing library for SQL Server 2005, 2008, and 2012 which provides functions for calculating things like running and moving averages (which were introduced in SQL Server 2012), FIFO inventory calculations, financial ratios and more, without having to use triangular joins. To get started you can download the XLeratorDB 15-day free trial from the Westclintech web site. It is a fully-functioning, unrestricted version of the software. If you need more than 15 days to evaluate the software, you can simply download another 15-day free trial. XLeratorDB is an easy and cost-effective way to start adding sophisticated data analysis to your SQL Server database without having to know anything more than T-SQL. Get XLeratorDB Today and Now! Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: Excel

Read the article
Towards an F# .NET Reflector add-in

- by CliveT

When I had the opportunity to spent some time during Red Gate's recent "down tools" week on a project of my choice, the obvious project was an F# add-in for Reflector . To be honest, this was a bit of a misnomer as the amount of time in the designated week for coding was really less than three days, so it was always unlikely that very much progress would be made in such a small amount of time (and that certainly proved to be the case), but I did learn some things from the experiment. Like lots of problems, one useful technique is to take examples, get them to work, and then generalise to get something that works across the board. Unfortunately, I didn't have enough time to do the last stage. The obvious first step is to take a few function definitions, starting with the obvious hello world, moving on to a non-recursive function and finishing with the ubiquitous recursive Fibonacci function. let rec printMessage message = printfn message let foo x = (x + 1) let rec fib x = if (x >= 2) then (fib (x - 1) + fib (x - 2)) else 1 The major problem in decompiling these simple functions is that Reflector has an in-memory object model that is designed to support object-oriented languages. In particular it has a return statement that allows function bodies to finish early. I used some of the in-built functionality to take the IL and produce an in-memory object model for the language, but then needed to write a transformer to push the return statements to the top of the tree to make it easy to render the code into a functional language. This tree transform works in some scenarios, but not in others where we simply regenerate code that looks more like CPS style. The next thing to get working was library level bindings of values where these values are calculated at runtime. let x = [1 ; 2 ; 3 ; 4] let y = List.map (fun x -> foo x) x The way that this is translated into a set of classes for the underlying platform means that the code needs to follow references around, from the property exposing the calculated value to the class in which the code for generating the value is embedded. One of the strongest selling points of functional languages is the algebraic datatypes, which allow definitions via standard mathematical-style inductive definitions across the union cases. type Foo = | Something of int | Nothing type 'a Foo2 = | Something2 of 'a | Nothing2 Such a definition is compiled into a number of classes for the cases of the union, which all inherit from a class representing the type itself. It wasn't too hard to get such a de-compilation happening in the cases I tried. What did I learn from this? Firstly, that there are various bits of functionality inside Reflector that it would be useful for us to allow add-in writers to access. In particular, there are various implementations of the Visitor pattern which implement algorithms such as calculating the number of references for particular variables, and which perform various substitutions which could be more generally useful to add-in writers. I hope to do something about this at some point in the future. Secondly, when you transform a functional language into something that runs on top of an object-based platform, you lose some fidelity in the representation. The F# compiler leaves attributes in place so that tools can tell which classes represent classes from the source program and which are there for purposes of the implementation, allowing the decompiler to regenerate these constructs again. However, decompilation technology is a long way from being able to take unannotated IL and transform it into a program in a different language. For a simple function definition, like Fibonacci, I could write a simple static function and have it come out in F# as the same function, but it would be practically impossible to take a mass of class definitions and have a decompiler translate it automatically into an F# algebraic data type. What have we got out of this? Some data on the feasibility of implementing an F# decompiler inside Reflector, though it's hard at the moment to say how long this would take to do. The work we did is included the 6.5 EAP for Reflector that you can get from the EAP forum. All things considered though, it was a useful way to gain more familiarity with the process of writing an add-in and understand difficulties other add-in authors might experience. If you'd like to check out a video of Down Tools Week, click here.

Read the article
MCM Lab exam this week

- by Rob Farley

In two days I’ll’ve finished the MCM Lab exam, 88-971. If you do an internet search for 88-971, it’ll tell you the answer is –883. Obviously. It’ll also give you a link to the actual exam page, which is useful too, once you’ve finished being distracted by the calculator instead of going to the thing you’re actually looking for. (Do people actually search the internet for the results of mathematical questions? Really?) The list of Skills Measured for this exam is quite short, but can essentially be broken down into one word “Anything”. The Preparation Materials section is even better. Classroom Training – none available. Microsoft E-Learning – none available. Microsoft Press Books – none available. Practice Tests – none available. But there are links to Readiness Videos and a page which has no resources listed, but tells you a list of people who have already qualified. Three in Australia who have MCM SQL Server 2008 so far. The list doesn’t include some of the latest batch, such as Jason Strate or Tom LaRock. I’ve used SQL Server for almost 15 years. During that time I’ve been awarded SQL Server MVP seven times, but the MVP award doesn’t actually mean all that much when considering this particular certification. I know lots of MVPs who have tried this particular exam and failed – including Jason and Tom. Right now, I have no idea whether I’ll pass or not. People tell me I’ll pass no problem, but I honestly have no idea. There’s something about that “Anything” aspect that worries me. I keep looking at the list of things in the Readiness Videos, and think to myself “I’m comfortable with Resource Governor (or whatever) – that should be fine.” Except that then I feel like I maybe don’t know all the different things that can go wrong with Resource Governor (or whatever), and I wonder what kind of situations I’ll be faced with. And then I find myself looking through the stuff that’s explained in the videos, and wondering what kinds of things I should know that I don’t, and then I get amazingly bored and frustrated (after all, I tell people that these exams aren’t supposed to be studied for – you’ve been studying for the last 15 years, right?), and I figure “What’s the worst that can happen? A fail?” I’m told that the exam provides a list of scenarios (maybe 14 of them?) and you have 5.5 hours to complete them. When I say “complete”, I mean complete – you don’t get to leave them unfinished, that’ll get you ‘nil points’ for that scenario. Apparently no-one gets to complete all of them. Now, I’m a consultant. I get called on to fix the problems that people have on their SQL boxes. Sometimes this involves fixing corruption. Sometimes it’s figuring out some performance problem. Sometimes it’s as straight forward as getting past a full transaction log; sometimes it’s as tricky as recovering a database that has lost its metadata, without backups. Most situations aren’t a problem, but I also have the confidence of being able to do internet searches to verify my maths (in case I forget it’s –883). In the exam, I’ll have maybe twenty minutes per scenario (but if I need longer, I’ll have to take longer – no point in stopping half way if it takes more than twenty minutes, unless I don’t see an end coming up), so I’ll have time constraints too. And of course, I won’t have any of my usual tools. I can’t take scripts in, I can’t take staff members. Hopefully I can use the coffee machine that will be in the room. I figure it’s going to feel like one of those days when I’ve gone into a client site, and found that the problems are way worse than I expected, and that the site is down, with people standing over me needing me to get things right first time... ...so it should be fine, I’ve done that before. :) If I do fail, it won’t make me any less of a consultant. It won’t make me any less able to help all of my clients (including you if you get in touch – hehe), it’ll just mean that the particular problem might’ve taken me more than the twenty minutes that the exam gave me. @rob_farley PS: Apparently the done thing is to NOT advertise that you’re sitting the exam at a particular time, only that you’re expecting to take it at some point in the future. I think it’s akin to the idea of not telling people you’re pregnant for the first few months – it’s just in case the worst happens. Personally, I’m happy to tell you all that I’m going to take this exam the day after tomorrow (which is the 19th in the US, the 20th here). If I end up failing, you can all commiserate and tell me that I’m not actually as unqualified as I feel.

Read the article
To Bit or Not To Bit

- by Johnm

'Twas a long day of troubleshooting and firefighting and now, with most of the office vacant, you face a blank scripting window to create a new table in his database. Many questions circle your mind like dirty water gurgling down the bathtub drain: "How normalized should this table be?", "Should I use an identity column?", "NVarchar or Varchar?", "Should this column be NULLABLE?", "I wonder what apple blue cheese bacon cheesecake tastes like?" Well, there are times when the mind goes it's own direction. A Bit About Bit At some point during your table creation efforts you will encounter the decision of whether to use the bit data type for a column. The bit data type is an integer data type that recognizes only the values of 1, 0 and NULL as valid. This data type is often utilized to store yes/no or true/false values. An example of its use would be a column called [IsGasoline] which would be intended to contain the value of 1 if the row's subject (a car) had a gasoline engine and a 0 if the subject did not have a gasoline engine. The bit data type can even be found in some of the system tables of SQL Server. For example, the sysssispackages table in the msdb database which contains SQL Server Integration Services Package information for the packages stored in SQL Server. This table contains a column called [IsEncrypted]. A value of 1 indicates that the package has been encrypted while the value of 0 indicates that it is not. I have learned that the most effective way to disperse the crowd that surrounds the office coffee machine is to engage into SQL Server debates. The bit data type has been one of the most reoccurring, as well as the most enjoyable, of these topics. It contains a practical side and a philosophical side. Practical Consideration This data type certainly has its place and is a valuable option for database design; but it is often used in situations where the answer is really not a pure true/false response. In addition, true/false values are not very informative or scalable. Let's use the previously noted [IsGasoline] column for illustration. While on the surface it appears to be a rather simple question when evaluating a car: "Does the car have a gasoline engine?" If the person entering data is entering a row for a Jeep Liberty, the response would be a 1 since it has a gasoline engine. If the person is entering data is entering a row for a Chevrolet Volt, the response would be a 0 since it is an electric engine. What happens when a person is entering a row for the gasoline/electric hybrid Toyota Prius? Would one person's conclusion be consistent with another person's conclusion? The argument could be made that the current intent for the database is to be used only for pure gasoline and pure electric engines; but this is where the scalability issue comes into play. With the use of a bit data type a database modification and data conversion would be required if the business decided to take on hybrid engines. Whereas, alternatively, if the int data type were used as a foreign key to a reference table containing the engine type options, the change to include the hybrid option would only require an entry into the reference table. Philosophical Consideration Since the bit data type is often used for true/false or yes/no data (also called Boolean) it presents a philosophical conundrum of what to do about the allowance of the NULL value. The inclusion of NULL in a true/false or yes/no response simply violates the logical principle of bivalence which states that "every proposition is either true or false". If NULL is not true, then it must be false. The mathematical laws of Boolean logic support this concept by stating that the only valid values of this scenario are 1 and 0. There is another way to look at this conundrum: NULL is also considered to be the absence of a response. In other words, it is the equivalent to "undecided". Anyone who watches the news can tell you that polls always include an "undecided" option. This could be considered a valid option in the world of yes/no/dunno. Through out all of these considerations I have discovered one absolute certainty: When you have found a person, or group of persons, who are willing to entertain a philosophical debate of the bit data type, you have found some true friends.

Read the article
Why do we use Pythagoras in game physics?

- by Starkers

I've recently learned that we use Pythagoras a lot in our physics calculations and I'm afraid I don't really get the point. Here's an example from a book to make sure an object doesn't travel faster than a MAXIMUM_VELOCITY constant in the horizontal plane: MAXIMUM_VELOCITY = <any number>; SQUARED_MAXIMUM_VELOCITY = MAXIMUM_VELOCITY * MAXIMUM_VELOCITY; function animate(){ var squared_horizontal_velocity = (x_velocity * x_velocity) + (z_velocity * z_velocity); if( squared_horizontal_velocity <= SQUARED_MAXIMUM_VELOCITY ){ scalar = squared_horizontal_velocity / SQUARED_MAXIMUM_VELOCITY; x_velocity = x_velocity / scalar; z_velocity = x_velocity / scalar; } } Let's try this with some numbers: An object is attempting to move 5 units in x and 5 units in z. It should only be able to move 5 units horizontally in total! MAXIMUM_VELOCITY = 5; SQUARED_MAXIMUM_VELOCITY = 5 * 5; SQUARED_MAXIMUM_VELOCITY = 25; function animate(){ var x_velocity = 5; var z_velocity = 5; var squared_horizontal_velocity = (x_velocity * x_velocity) + (z_velocity * z_velocity); var squared_horizontal_velocity = 5 * 5 + 5 * 5; var squared_horizontal_velocity = 25 + 25; var squared_horizontal_velocity = 50; // if( squared_horizontal_velocity <= SQUARED_MAXIMUM_VELOCITY ){ if( 50 <= 25 ){ scalar = squared_horizontal_velocity / SQUARED_MAXIMUM_VELOCITY; scalar = 50 / 25; scalar = 2.0; x_velocity = x_velocity / scalar; x_velocity = 5 / 2.0; x_velocity = 2.5; z_velocity = z_velocity / scalar; z_velocity = 5 / 2.0; z_velocity = 2.5; // new_horizontal_velocity = x_velocity + z_velocity // new_horizontal_velocity = 2.5 + 2.5 // new_horizontal_velocity = 5 } } Now this works well, but we can do the same thing without Pythagoras: MAXIMUM_VELOCITY = 5; function animate(){ var x_velocity = 5; var z_velocity = 5; var horizontal_velocity = x_velocity + z_velocity; var horizontal_velocity = 5 + 5; var horizontal_velocity = 10; // if( horizontal_velocity >= MAXIMUM_VELOCITY ){ if( 10 >= 5 ){ scalar = horizontal_velocity / MAXIMUM_VELOCITY; scalar = 10 / 5; scalar = 2.0; x_velocity = x_velocity / scalar; x_velocity = 5 / 2.0; x_velocity = 2.5; z_velocity = z_velocity / scalar; z_velocity = 5 / 2.0; z_velocity = 2.5; // new_horizontal_velocity = x_velocity + z_velocity // new_horizontal_velocity = 2.5 + 2.5 // new_horizontal_velocity = 5 } } Benefits of doing it without Pythagoras: Less lines Within those lines, it's easier to read what's going on ...and it takes less time to compute, as there are less multiplications Seems to me like computers and humans get a better deal without Pythagoras! However, I'm sure I'm wrong as I've seen Pythagoras' theorem in a number of reputable places, so I'd like someone to explain me the benefit of using Pythagoras to a maths newbie. Does this have anything to do with unit vectors? To me a unit vector is when we normalize a vector and turn it into a fraction. We do this by dividing the vector by a larger constant. I'm not sure what constant it is. The total size of the graph? Anyway, because it's a fraction, I take it, a unit vector is basically a graph that can fit inside a 3D grid with the x-axis running from -1 to 1, z-axis running from -1 to 1, and the y-axis running from -1 to 1. That's literally everything I know about unit vectors... not much :P And I fail to see their usefulness. Also, we're not really creating a unit vector in the above examples. Should I be determining the scalar like this: // a mathematical work-around of my own invention. There may be a cleverer way to do this! I've also made up my own terms such as 'divisive_scalar' so don't bother googling var divisive_scalar = (squared_horizontal_velocity / SQUARED_MAXIMUM_VELOCITY); var divisive_scalar = ( 50 / 25 ); var divisive_scalar = 2; var multiplicative_scalar = (divisive_scalar / (2*divisive_scalar)); var multiplicative_scalar = (2 / (2*2)); var multiplicative_scalar = (2 / 4); var multiplicative_scalar = 0.5; x_velocity = x_velocity * multiplicative_scalar x_velocity = 5 * 0.5 x_velocity = 2.5 Again, I can't see why this is better, but it's more "unit-vector-y" because the multiplicative_scalar is a unit_vector? As you can see, I use words such as "unit-vector-y" so I'm really not a maths whiz! Also aware that unit vectors might have nothing to do with Pythagoras so ignore all of this if I'm barking up the wrong tree. I'm a very visual person (3D modeller and concept artist by trade!) and I find diagrams and graphs really, really helpful so as many as humanely possible please!

Read the article
Is Data Science “Science”?

- by BuckWoody

I hold the term “science” in very high esteem. I grew up on the Space Coast in Florida, and eventually worked at the Kennedy Space Center, surrounded by very intelligent people who worked in various scientific fields. Recently a new term has entered the computing dialog – “Data Scientist”. Since it’s not a standard term, it has a lot of definitions, and in fact has been disputed as a correct term. After all, the reasoning goes, if there’s no such thing as “Data Science” then how can there be a Data Scientist? This argument has been made before, albeit with a different term – “Computer Science”. In Peter Denning’s excellent article “Is Computer Science Science” (April 2005/Vol. 48, No. 4 COMMUNICATIONS OF THE ACM) there are many points that separate “science” from “engineering” and even “art”. I won’t repeat the content of that article here (I recommend you read it on your own) but will leverage the points he makes there. Definition of Science To ask the question “is data science ‘science’” then we need to start with a definition of terms. Various references put the definition into the same basic areas: Study of the physical world Systematic and/or disciplined study of a subject area ...and then they include the things studied, the bodies of knowledge and so on. The word itself comes from Latin, and means merely “to know” or “to study to know”. Greek divides knowledge further into “truth” (episteme), and practical use or effects (tekhne). Normally computing falls into the second realm. Definition of Data Science And now a more controversial definition: Data Science. This term is so new and perhaps so niche that the major dictionaries haven’t yet picked it up (my OED reference is older – can’t afford to pop for the online registration at present). Researching the term's general use I created an amalgam of the definitions this way: “Studying and applying mathematical and other techniques to derive information from complex data sets.” Using this definition, data science certainly seems to be science - it's learning about and studying some object or area using systematic methods. But implicit within the definition is the word “application”, which makes the process more akin to engineering or even technology than science. In fact, I find that using these techniques – and data itself – part of science, not science itself. I leave out the concept of studying data patterns or algorithms as part of this discipline. That is actually a domain I see within research, mathematics or computer science. That of course is a type of science, but does not seek for practical applications. As part of the argument against calling it “Data Science”, some point to the scientific method of creating a hypothesis, testing with controls, testing results against the hypothesis, and documenting for repeatability. These are not steps that we often take in working with data. We normally start with a question, and fit patterns and algorithms to predict outcomes and find correlations. In this way Data Science is more akin to statistics (and in fact makes heavy use of them) in the process rather than starting with an assumption and following on with it. So, is Data Science “Science”? I’m uncertain – and I’m uncertain it matters. Even if we are facing rampant “title inflation” these days (does anyone introduce themselves as a secretary or supervisor anymore?) I can tolerate the term at least from the intent that we use data to study problems across a wide spectrum, rather than restricting it to a single domain. And I also understand those who have worked hard to achieve the very honorable title of “scientist” who have issues with those who borrow the term without asking. What do you think? Science, or not? Does it matter?

Read the article
Why do we use the Pythagorean theorem in game physics?

- by Starkers

I've recently learned that we use Pythagorean theorem a lot in our physics calculations and I'm afraid I don't really get the point. Here's an example from a book to make sure an object doesn't travel faster than a MAXIMUM_VELOCITY constant in the horizontal plane: MAXIMUM_VELOCITY = <any number>; SQUARED_MAXIMUM_VELOCITY = MAXIMUM_VELOCITY * MAXIMUM_VELOCITY; function animate(){ var squared_horizontal_velocity = (x_velocity * x_velocity) + (z_velocity * z_velocity); if( squared_horizontal_velocity <= SQUARED_MAXIMUM_VELOCITY ){ scalar = squared_horizontal_velocity / SQUARED_MAXIMUM_VELOCITY; x_velocity = x_velocity / scalar; z_velocity = x_velocity / scalar; } } Let's try this with some numbers: An object is attempting to move 5 units in x and 5 units in z. It should only be able to move 5 units horizontally in total! MAXIMUM_VELOCITY = 5; SQUARED_MAXIMUM_VELOCITY = 5 * 5; SQUARED_MAXIMUM_VELOCITY = 25; function animate(){ var x_velocity = 5; var z_velocity = 5; var squared_horizontal_velocity = (x_velocity * x_velocity) + (z_velocity * z_velocity); var squared_horizontal_velocity = 5 * 5 + 5 * 5; var squared_horizontal_velocity = 25 + 25; var squared_horizontal_velocity = 50; // if( squared_horizontal_velocity <= SQUARED_MAXIMUM_VELOCITY ){ if( 50 <= 25 ){ scalar = squared_horizontal_velocity / SQUARED_MAXIMUM_VELOCITY; scalar = 50 / 25; scalar = 2.0; x_velocity = x_velocity / scalar; x_velocity = 5 / 2.0; x_velocity = 2.5; z_velocity = z_velocity / scalar; z_velocity = 5 / 2.0; z_velocity = 2.5; // new_horizontal_velocity = x_velocity + z_velocity // new_horizontal_velocity = 2.5 + 2.5 // new_horizontal_velocity = 5 } } Now this works well, but we can do the same thing without Pythagoras: MAXIMUM_VELOCITY = 5; function animate(){ var x_velocity = 5; var z_velocity = 5; var horizontal_velocity = x_velocity + z_velocity; var horizontal_velocity = 5 + 5; var horizontal_velocity = 10; // if( horizontal_velocity >= MAXIMUM_VELOCITY ){ if( 10 >= 5 ){ scalar = horizontal_velocity / MAXIMUM_VELOCITY; scalar = 10 / 5; scalar = 2.0; x_velocity = x_velocity / scalar; x_velocity = 5 / 2.0; x_velocity = 2.5; z_velocity = z_velocity / scalar; z_velocity = 5 / 2.0; z_velocity = 2.5; // new_horizontal_velocity = x_velocity + z_velocity // new_horizontal_velocity = 2.5 + 2.5 // new_horizontal_velocity = 5 } } Benefits of doing it without Pythagoras: Less lines Within those lines, it's easier to read what's going on ...and it takes less time to compute, as there are less multiplications Seems to me like computers and humans get a better deal without Pythagorean theorem! However, I'm sure I'm wrong as I've seen Pythagoras' theorem in a number of reputable places, so I'd like someone to explain me the benefit of using Pythagorean theorem to a maths newbie. Does this have anything to do with unit vectors? To me a unit vector is when we normalize a vector and turn it into a fraction. We do this by dividing the vector by a larger constant. I'm not sure what constant it is. The total size of the graph? Anyway, because it's a fraction, I take it, a unit vector is basically a graph that can fit inside a 3D grid with the x-axis running from -1 to 1, z-axis running from -1 to 1, and the y-axis running from -1 to 1. That's literally everything I know about unit vectors... not much :P And I fail to see their usefulness. Also, we're not really creating a unit vector in the above examples. Should I be determining the scalar like this: // a mathematical work-around of my own invention. There may be a cleverer way to do this! I've also made up my own terms such as 'divisive_scalar' so don't bother googling var divisive_scalar = (squared_horizontal_velocity / SQUARED_MAXIMUM_VELOCITY); var divisive_scalar = ( 50 / 25 ); var divisive_scalar = 2; var multiplicative_scalar = (divisive_scalar / (2*divisive_scalar)); var multiplicative_scalar = (2 / (2*2)); var multiplicative_scalar = (2 / 4); var multiplicative_scalar = 0.5; x_velocity = x_velocity * multiplicative_scalar x_velocity = 5 * 0.5 x_velocity = 2.5 Again, I can't see why this is better, but it's more "unit-vector-y" because the multiplicative_scalar is a unit_vector? As you can see, I use words such as "unit-vector-y" so I'm really not a maths whiz! Also aware that unit vectors might have nothing to do with Pythagorean theorem so ignore all of this if I'm barking up the wrong tree. I'm a very visual person (3D modeller and concept artist by trade!) and I find diagrams and graphs really, really helpful so as many as humanely possible please!

Read the article
Determining the angle to fire a shot when target and shooter moves, and bullet moves with shooter velocity added in

- by Azaral

I saw this question: Predicting enemy position in order to have an object lead its target and followed the link in the answer to stack overflow. In the stack overflow page I used the 2nd answer, the one that is a large mathematical derivation. My situation is a little different though. My first question though is will the answer provided in the stack overflow page even work to begin with, assuming the original circumstances of moving target and stationary shooter. My situation is a little different than that situation. My target moves, the shooter moves, and the bullets from the shooter start off with the velocities in x and y added to the bullets' x and y velocities. If you are sliding to the right, the bullets will remain in front of you as you move so as long as your velocity remains constant. What I'm trying to do is to get the enemy to be able to determine where they need to shoot in order to hit the player. Unless the player and enemy is stationary, the velocity from the ship adding to the velocity of the bullets will cause a miss. I'd rather like to prevent that. I used the formula in the stack overflow answer and did what I thought were the appropriate adjustments. I've been banging at this for the last four hours and I just can't make it click. It is probably something really simple and boneheaded that I am missing (that seems to be a lot of my problems lately). Here is the solution presented from the stack overflow answer: It boils down to solving a quadratic equation of the form: a * sqr(x) + b * x + c == 0 Note that by sqr I mean square, as opposed to square root. Use the following values: a := sqr(target.velocityX) + sqr(target.velocityY) - sqr(projectile_speed) b := 2 * (target.velocityX * (target.startX - cannon.X) + target.velocityY * (target.startY - cannon.Y)) c := sqr(target.startX - cannon.X) + sqr(target.startY - cannon.Y) Now we can look at the discriminant to determine if we have a possible solution. disc := sqr(b) - 4 * a * c If the discriminant is less than 0, forget about hitting your target -- your projectile can never get there in time. Otherwise, look at two candidate solutions: t1 := (-b + sqrt(disc)) / (2 * a) t2 := (-b - sqrt(disc)) / (2 * a) Note that if disc == 0 then t1 and t2 are equal. If there are no other considerations such as intervening obstacles, simply choose the smaller positive value. (Negative t values would require firing backward in time to use!) Substitute the chosen t value back into the target's position equations to get the coordinates of the leading point you should be aiming at: aim.X := t * target.velocityX + target.startX aim.Y := t * target.velocityY + target.startY Here is my code, after being corrected by Sam Hocevar (thank you again for your help!). It still doesn't work. For some reason it never enters the section of code inside the if(disc = 0) (obviously because it is always less than zero but...). However, if I plug the numbers from my game log on the enemy and player positions and velocities it outputs a valid firing solution. I have looked at the code side by side a couple of times now and I can't find any differences. There has got to be something simple I'm missing here. If someone else could look at this code and determine what is going on here I'd appreciate it. I know it's not going through that section because if it were, shouldShoot would become true and the enemy would be blasting away at the player. This section calls the function in question, CalculateShootHeading() if(shouldMove) { UseEngines(); } x += xVelocity; y += yVelocity; CalculateShootHeading(); if(shouldShoot) { ShootWeapons(); } UpdateWeapons(); This is CalculateShootHeading(). This is inside the enemy class so x and y are the enemy's x and y and the same with velocity. One output from my game log gives Player X = 2108, Player Y = -180.956, Player X velocity = 10.9949, Player Y Velocity = -6.26017, Enemy X = 1988.31, Enemy Y = -339.051, Enemy X velocity = 1.81666, Enemy Y velocity = -9.67762, 0 enemy projectiles. The output from the console tester is Bullet position = 2210.49, -239.313 and Player Position = 2210.49, -239.313. This doesn't make any sense. The only thing that could be different is the code or the input into my function in the game and I've checked that and I don't think that it is wrong as it's updated before this and never changed. float const bulletSpeed = 30.f; float const dx = playerX - x; float const dy = playerY - y; float const vx = playerXVelocity - xVelocity; float const vy = playerYVelocity - yVelocity; float const a = vx * vx + vy * vy - bulletSpeed * bulletSpeed; float const b = 2.f * (vx * dx + vy * dy); float const c = dx * dx + dy * dy; float const disc = b * b - 4.f * a * c; shouldShoot = false; if (disc >= 0.f) { float t0 = (-b - std::sqrt(disc)) / (2.f * a); float t1 = (-b + std::sqrt(disc)) / (2.f * a); if (t0 < 0.f || (t1 < t0 && t1 >= 0.f)) { t0 = t1; } if (t0 >= 0.f) { float shootx = vx + dx / t0; float shooty = vy + dy / t0; heading = std::atan2(shooty, shootx) * RAD2DEGREE; } shouldShoot = true; }

Read the article
Dynamic DataGrid columns in WPF DataGrid based on the underlying set of data (and their type)

- by StatsMan

Hello everyone, I've got kind of a conceptual question. I am in the process of wrapping some statistics classes I wrote into WPF. For that I have two DataGrid(-Views, currently in WinForms). In one DataGrid each row represents a column in the other. There I can set-up different variables (as in mathematical/statistical variables) with fields like "Header", "DataType", "ValidationBehaviour", "DisplayType". There I can also set-up how it should be displayed. Some Columns can automatically be set to ComboBoxColumns, some TextBoxColumns, and so on and so forth. So, now once I've set-up these Columns I can go to the other grid and enter my data. I may, for instance, have generated (in grid 1) one Column called "Annual Gross Salary" with input of numerical values. Another Column called "Education" with "0=NoEducation", "1=College Level", "3=Universitary" etc. These labels are displayed as text in the combobox and my statistics engine behind then selects the respective value (0-3) for calculations (i.e. ordinal, nominal variables). Sooo. In WinForms I could basically generate all the columns by hand in code and then add my data in the respective cells/rows. Now in WPF I thought that must be easy to realise. However, yesterday I got started with ICustomPropertyDescriptor which (maybe I was too thick) didn't give me the results I was looking for. Basically, I just need to be able to dynamically generate columns (and rows) with different Layout, Controls (ComboBox, simple Input, DateTimes) based on the data that I have. But I don't really know how to go about it? So here in summary: DataGrid 1 Purpose is to display columns that have been specified in DataGrid 2 In rows, the user can add any kind of data in the rows below the columns that is allowed as to the columns specifications DataGrid 2 Each row in this grid represents a column in DataGrid 1 Contains fields like Name/Header, DataType, Validation Behaviour, Default Value, Data Formatting, etc. Also contains a function to be able to set-up how it should be displayed. The user can select from, for instance, ComboBoxColumn (and also add the available options), DateTime, normal TextBox, CheckBox etc. After finishing adding a row it will automatically appear as a new column in DataGrid 1 I'd appreciate any kind of pointer into the right direction. Thanks very, very much in advance! :)

Read the article
List of freely available programming books

- by Karan Bhangui

I'm trying to amass a list of programming books with opensource licenses, like Creative Commons, GPL, etc. The books can be about a particular programming language or about computers in general. Hoping you guys could help: Languages BASH Advanced Bash-Scripting Guide (An in-depth exploration of the art of shell scripting) C The C book C++ Thinking in C++ C++ Annotations How to Think Like a Computer Scientist C# .NET Book Zero: What the C or C++ Programmer Needs to Know About C# and the .NET Framework Illustrated C# 2008 (Dead Link) Data Structures and Algorithms with Object-Oriented Design Patterns in C# Threading in C# Common Lisp Practical Common Lisp On Lisp Java Thinking in Java How to Think Like a Computer Scientist Java Thin-Client Programming JavaScript Eloquent JavaScript Haskell Real world Haskell Learn You a Haskell for Great Good! Objective-C The Objective-C Programming Language Perl Extreme Perl (license not specified - home page is saying "freely available") The Mason Book (Open Publication License) Practical mod_perl (CreativeCommons Attribution Share-Alike License) Higher-Order Perl Learning Perl the Hard Way PHP Practical PHP Programming Zend Framework: Survive the Deep End PowerShell Mastering PowerShell Prolog Building Expert Systems in Prolog Adventure in Prolog Prolog Programming A First Course Logic, Programming and Prolog (2ed) Introduction to Prolog for Mathematicians Learn Prolog Now! Natural Language Processing Techniques in Prolog Python Dive Into Python Dive Into Python 3 How to Think Like a Computer Scientist A Byte of Python Python for Fun Invent Your Own Computer Games With Python Ruby Why's (Poignant) Guide to Ruby Programming Ruby - The Pragmatic Programmer's Guide Mr. Neighborly's Humble Little Ruby Book SQL Practical PostgreSQL x86 assembly Paul Carter's tutorial Lua Programming In Lua (for v5 but still largely relevant) Algorithms and Data Structures Algorithms Data Structures and Algorithms with Object-Oriented Design Patterns in Java Planning Algorithms Frameworks/Projects The Django Book The Pylons Book Introduction to Design Patterns in C++ with Qt 4 (Open Publication License) Version control The SVN Book Mercurial: The Definitive Guide Pro Git UNIX / Linux The Art of Unix Programming Linux Device Drivers, Third Edition Others Structure and Interpretation of Computer Programs The Little Book of Semaphores Mathematical Logic - an Introduction An Introduction to the Theory of Computation Developers Developers Developers Developers Linkers and loaders Beej's Guide to Network Programming Maven: The Definitive Guide I will expand on this list as I get comments or when I think of more :D Related: Programming texts and reference material for my Kindle What are some good free programming books? Can anyone recommend a free software engineering book? Edit: Oh I didn't notice the community wiki feature. Feel free to edit your suggestions right in!

Read the article
Automatically generate table of function pointers in C.

- by jeremytrimble

I'm looking for a way to automatically (as part of the compilation/build process) generate a "table" of function pointers in C. Specifically, I want to generate an array of structures something like: typedef struct { void (*p_func)(void); char * funcName; } funcRecord; /* Automatically generate the lines below: */ extern void func1(void); extern void func2(void); /* ... */ funcRecord funcTable[] = { { .p_func = &func1, .funcName = "func1" }, { .p_func = &func2, .funcName = "func2" } /* ... */ }; /* End automatically-generated code. */ ...where func1 and func2 are defined in other source files. So, given a set of source files, each of which which contain a single function that takes no arguments and returns void, how would one automatically (as part of the build process) generate an array like the one above that contains each of the functions from the files? I'd like to be able to add new files and have them automatically inserted into the table when I re-compile. I realize that this probably isn't achievable using the C language or preprocessor alone, so consider any common *nix-style tools fair game (e.g. make, perl, shell scripts (if you have to)). But Why? You're probably wondering why anyone would want to do this. I'm creating a small test framework for a library of common mathematical routines. Under this framework, there will be many small "test cases," each of which has only a few lines of code that will exercise each math function. I'd like each test case to live in its own source file as a short function. All of the test cases will get built into a single executable, and the test case(s) to be run can be specified on the command line when invoking the executable. The main() function will search through the table and, if it finds a match, jump to the test case function. Automating the process of building up the "catalog" of test cases ensures that test cases don't get left out (for instance, because someone forgets to add it to the table) and makes it very simple for maintainers to add new test cases in the future (just create a new source file in the correct directory, for instance). Hopefully someone out there has done something like this before. Thanks, StackOverflow community!

Read the article
Annoying Captcha >> How to programm a form that can SMELL difference between human and robot?

- by Sam

Hi folks. On the comment of my old form needing a CAPTHA, I felt I share my problem, perhaps you recognize it and find its time we had better solutions: FACTUAL PROBLEM I know most of my clients (typical age= 40~60) hate CAPTCHA things. Now, I myself always feel like a robot, when I have to sueeze my eyes and fill in the strange letters from the Capcha... Sometimes I fail! Go back etc. Turnoff. I mean comon its 2011, shouldnt the forms have better A.I. by now? MY NEW IDEA (please dont laugh) Ive thought about it and this is my idea's to tell difference between human and robot: My idea is to give credibility points. 100 points = human 0% = robot. require real human mouse movements require mousemovements that dont follow any mathematical pattern require non-instantaneous reading delays, between load and first input in form when typing in form, delays are measured between letters and words approve as human when typical human behaviour measured (deleting, rephrasing etc) dont allow instant pasting or all fields give points for real keyboard pressures retract points for credibility when hyperlinks in form Test wether fake email field (invisible by human) is populated (suggested by Tomalak) when more than 75% human cretibility, allow to be sent without captcha when less than 25% human crecibility, force captcha puzzle to be sure Could we write a A.I. PHP that replaces the human-annoying capthas, meanwhile stopping most spamservers filing in the data? Not only for the fun of it, but also actually to provide a 99% better alternative than CAPTHCA's. Imagine the userfriendlyness of your forms! Your site distinguishing itself from others, showing your audience your sites KNOWS the difference between a robot and a human. Imagine the advangage. I am trying to capture the essense of that distinguishing edge. PROGRAMMING QUESTION: 1) Are such things possible to programm? 2) If so how would you start such programm? 3) Are there already very good working solutions available elsewhere? 4) If it isn't so hard, your are welcome to share your answer/solutions below. 5) upon completion of hints and new ideas, could this page be the start of a new AI captcha, OR should I forget about it and just go with the flow, forget about the whole AI dream, and use captcha like everyone else.

Read the article

Search Results

Search found 2155 results on 87 pages for 'mathematical expressions'.

Page 68/87 | < Previous Page | 64 65 66 67 68 69 70 71 72 73 74 75 | Next Page >

- by Petter

- by chapkom

- by Tyzoid

- by armani

- by AlexV

- by William

- by jamiethompson90

- by intuited

- by AlexV

- by Paulo Morgado

- by Chris George

- by john.rose

- by Dejan Sarka

- by Pinal Dave

- by CliveT

- by Rob Farley

- by Johnm

- by Starkers

- by BuckWoody

- by Starkers

- by Azaral

- by StatsMan

- by Karan Bhangui

- by jeremytrimble

- by Sam

< Previous Page | 64 65 66 67 68 69 70 71 72 73 74 75 | Next Page >