Search Results

Search found 1354 results on 55 pages for 'compute scalar'.

Page 18/55 | < Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >

"Marching cubes" voxel terrain - triplanar texturing with depth?

- by Dan the Man

I am currently working on a voxel terrain that uses the marching cubes algorithm for polygonizing the scalar field of voxels. I am using a triplanar texturing shader for texturing. say I have a grass texture set to the Y axis and a dirt texture for both the X and Z axes. Now, when my player digs downwards, it still appears as grass. How would I make it to appear as dirt? I have been thinking about this for a while, and the only thing I can think of to make this effect, would be to mark vertices that have been dug with a certain vertex color. When it has that vertex color, the shader would apply that dirt texture to the vertices marked. Is there a better method?

Read the article
Spherical harmonics lighting - what does it accomplish?

- by TravisG

From my understanding, spherical harmonics are sometimes used to approximate certain aspects of lighting (depending on the application). For example, it seems like you can approximate the diffuse lighting cause by a directional light source on a surface point, or parts of it, by calculating the SH coefficients for all bands you're using (for whatever accuracy you desire) in the direction of the surface normal and scaling it with whatever you need to scale it with (e.g. light colored intensity, dot(n,l),etc.). What I don't understand yet is what this is supposed to accomplish. What are the actual advantages of doing it this way as opposed to evaluating the diffuse BRDF the normal way. Do you save calculations somewhere? Is there some additional information contained in the SH representation that you can't get out of the scalar results of the normal evaluation?

Read the article
Speaking in Omaha: December 7, 2011

- by Bill Graziano

I’m presenting in Omaha on Writing Faster SQL at 6PM on December 7th. You can find meeting details on the Omaha SQL Server User Group page. The meeting location requires an RSVP so building security has a list of attendees. The presentation is a series of suggestions on improving performance. It ranges from simple things like comparing indexed columns to scalar values up to tips for reducing query compiles and asynchronous processing patterns. Nearly all of these come from specific issues I’ve encountered working on poorly performing SQL Servers.

Read the article
Binding to Silverlight ComboBox and Using SelectedValue, SelectedValuePath and DisplayMemberPath

How do you bind a ComboBox to a collection of objects, and then bind a property from the selected objects to some other scalar property? I received this question today from a friend of mine (a variation of this question). I decided to walk through the scenario here in case anyone else runs into it. This is one of those things that can be confusing it is simple, but it is is much easier shown the explained. This post lays out the scenario and you can download the sample code at the end. When we...Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article
Detect rotated rectangle collision

- by handyface

I'm trying to implement a script that detects whether two rotated rectangles collide for my game. I used the method explained in the following article for my implementation in Google Dart. 2D Rotated Rectangle Collision I tried to implement this code into my game. Basically from what I understood was that I have two rectangles, these two rectangles can produce four axis (two per rectangle) by subtracting adjacent corner coordinates. Then all the corners from both rectangles need to be projected onto each axis, then multiplying the coordinates of the projection by the axis coordinates (point.x*axis.x+point.y*axis.y) to make a scalar value and checking whether the range of both the rectangle's projections overlap. When all the axis have overlapping projections, there's a collision. First of all, I'm wondering whether my comprehension about this algorithm is correct. If so I'd like to get some pointers in where my implementation (written in Dart, which is very readable for people comfortable with C-syntax) goes wrong. Thanks!

Read the article
SSH Login to an EC2 instance failing with previously working keys...

- by Matthew Savage

We recently had an issues where I had rebooted our EC2 instance (Ubuntu x86_64, version 9.10 server) and due to an EC2 issue the instance needed to be stopped and was down for a few days. Now I have been able to bring the instance back online I cannot connect to SSH using the keypair which previously worked. Unfortunately SSH is the only way to get into this server, and while I have another system running in its place there are a number of things I would like to try and retrieve from the machine. Running SSH in verbose mode yields the following: [Broc-MBP.local]: Broc:~/.ssh ? ssh -i ~/.ssh/EC2Keypair.pem -l ubuntu ec2-xxx.compute-1.amazonaws.com -vvv OpenSSH_5.2p1, OpenSSL 0.9.8l 5 Nov 2009 debug1: Reading configuration data /Users/Broc/.ssh/config debug1: Reading configuration data /etc/ssh_config debug2: ssh_connect: needpriv 0 debug1: Connecting to ec2-xxx.compute-1.amazonaws.com [184.73.109.130] port 22. debug1: Connection established. debug3: Not a RSA1 key file /Users/Broc/.ssh/EC2Keypair.pem. debug2: key_type_from_name: unknown key type '-----BEGIN' debug3: key_read: missing keytype debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug2: key_type_from_name: unknown key type '-----END' debug3: key_read: missing keytype debug1: identity file /Users/Broc/.ssh/EC2Keypair.pem type -1 debug3: Not a RSA1 key file /Users/Broc/.ssh/id_rsa. debug2: key_type_from_name: unknown key type '-----BEGIN' debug3: key_read: missing keytype debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug2: key_type_from_name: unknown key type '-----END' debug3: key_read: missing keytype debug1: identity file /Users/Broc/.ssh/id_rsa type 1 debug1: Remote protocol version 2.0, remote software version OpenSSH_5.1p1 Debian-6ubuntu2 debug1: match: OpenSSH_5.1p1 Debian-6ubuntu2 pat OpenSSH* debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_5.2 debug2: fd 3 setting O_NONBLOCK debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 debug2: kex_parse_kexinit: ssh-rsa,ssh-dss debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,[email protected] debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,[email protected] debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,[email protected],hmac-ripemd160,[email protected],hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,[email protected],hmac-ripemd160,[email protected],hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: none,[email protected],zlib debug2: kex_parse_kexinit: none,[email protected],zlib debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: first_kex_follows 0 debug2: kex_parse_kexinit: reserved 0 debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 debug2: kex_parse_kexinit: ssh-rsa,ssh-dss debug2: kex_parse_kexinit: aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,arcfour128,arcfour256,arcfour,aes192-cbc,aes256-cbc,[email protected],aes128-ctr,aes192-ctr,aes256-ctr debug2: kex_parse_kexinit: aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,arcfour128,arcfour256,arcfour,aes192-cbc,aes256-cbc,[email protected],aes128-ctr,aes192-ctr,aes256-ctr debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,[email protected],hmac-ripemd160,[email protected],hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,[email protected],hmac-ripemd160,[email protected],hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: none,[email protected] debug2: kex_parse_kexinit: none,[email protected] debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: first_kex_follows 0 debug2: kex_parse_kexinit: reserved 0 debug2: mac_setup: found hmac-md5 debug1: kex: server->client aes128-ctr hmac-md5 none debug2: mac_setup: found hmac-md5 debug1: kex: client->server aes128-ctr hmac-md5 none debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP debug2: dh_gen_key: priv key bits set: 123/256 debug2: bits set: 500/1024 debug1: SSH2_MSG_KEX_DH_GEX_INIT sent debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY debug3: check_host_in_hostfile: filename /Users/Broc/.ssh/known_hosts debug3: check_host_in_hostfile: match line 106 debug3: check_host_in_hostfile: filename /Users/Broc/.ssh/known_hosts debug3: check_host_in_hostfile: match line 106 debug1: Host 'ec2-xxx.compute-1.amazonaws.com' is known and matches the RSA host key. debug1: Found key in /Users/Broc/.ssh/known_hosts:106 debug2: bits set: 521/1024 debug1: ssh_rsa_verify: signature correct debug2: kex_derive_keys debug2: set_newkeys: mode 1 debug1: SSH2_MSG_NEWKEYS sent debug1: expecting SSH2_MSG_NEWKEYS debug2: set_newkeys: mode 0 debug1: SSH2_MSG_NEWKEYS received debug1: SSH2_MSG_SERVICE_REQUEST sent debug2: service_accept: ssh-userauth debug1: SSH2_MSG_SERVICE_ACCEPT received debug2: key: /Users/Broc/.ssh/id_rsa (0x100125f70) debug2: key: /Users/Broc/.ssh/EC2Keypair.pem (0x0) debug1: Authentications that can continue: publickey debug3: start over, passed a different list publickey debug3: preferred publickey,keyboard-interactive,password debug3: authmethod_lookup publickey debug3: remaining preferred: keyboard-interactive,password debug3: authmethod_is_enabled publickey debug1: Next authentication method: publickey debug1: Offering public key: /Users/Broc/.ssh/id_rsa debug3: send_pubkey_test debug2: we sent a publickey packet, wait for reply debug1: Authentications that can continue: publickey debug1: Trying private key: /Users/Broc/.ssh/EC2Keypair.pem debug1: read PEM private key done: type RSA debug3: sign_and_send_pubkey debug2: we sent a publickey packet, wait for reply debug1: Authentications that can continue: publickey debug2: we did not send a packet, disable method debug1: No more authentication methods to try. Permission denied (publickey). [Broc-MBP.local]: Broc:~/.ssh ? So, right now I'm really at a loss and not sure what to do. While I've already got another system taking the place of this one I'd really like to have access back :|

Read the article
C# Neural Networks with Encog

- by JoshReuben

Neural Networks · I recently read a book Introduction to Neural Networks for C# , by Jeff Heaton. http://www.amazon.com/Introduction-Neural-Networks-C-2nd/dp/1604390093/ref=sr_1_2?ie=UTF8&s=books&qid=1296821004&sr=8-2-spell. Not the 1st ANN book I've perused, but a nice revision. · Artificial Neural Networks (ANNs) are a mechanism of machine learning – see http://en.wikipedia.org/wiki/Artificial_neural_network , http://en.wikipedia.org/wiki/Category:Machine_learning · Problems Not Suited to a Neural Network Solution- Programs that are easily written out as flowcharts consisting of well-defined steps, program logic that is unlikely to change, problems in which you must know exactly how the solution was derived. · Problems Suited to a Neural Network – pattern recognition, classification, series prediction, and data mining. Pattern recognition - network attempts to determine if the input data matches a pattern that it has been trained to recognize. Classification - take input samples and classify them into fuzzy groups. · As far as machine learning approaches go, I thing SVMs are superior (see http://en.wikipedia.org/wiki/Support_vector_machine ) - a neural network has certain disadvantages in comparison: an ANN can be overtrained, different training sets can produce non-deterministic weights and it is not possible to discern the underlying decision function of an ANN from its weight matrix – they are black box. · In this post, I'm not going to go into internals (believe me I know them). An autoassociative network (e.g. a Hopfield network) will echo back a pattern if it is recognized. · Under the hood, there is very little maths. In a nutshell - Some simple matrix operations occur during training: the input array is processed (normalized into bipolar values of 1, -1) - transposed from input column vector into a row vector, these are subject to matrix multiplication and then subtraction of the identity matrix to get a contribution matrix. The dot product is taken against the weight matrix to yield a boolean match result. For backpropogation training, a derivative function is required. In learning, hill climbing mechanisms such as Genetic Algorithms and Simulated Annealing are used to escape local minima. For unsupervised training, such as found in Self Organizing Maps used for OCR, Hebbs rule is applied. · The purpose of this post is not to mire you in technical and conceptual details, but to show you how to leverage neural networks via an abstraction API - Encog Encog · Encog is a neural network API · Links to Encog: http://www.encog.org , http://www.heatonresearch.com/encog, http://www.heatonresearch.com/forum · Encog requires .Net 3.5 or higher – there is also a Silverlight version. Third-Party Libraries – log4net and nunit. · Encog supports feedforward, recurrent, self-organizing maps, radial basis function and Hopfield neural networks. · Encog neural networks, and related data, can be stored in .EG XML files. · Encog Workbench allows you to edit, train and visualize neural networks. The Encog Workbench can generate code. Synapses and layers · the primary building blocks - Almost every neural network will have, at a minimum, an input and output layer. In some cases, the same layer will function as both input and output layer. · To adapt a problem to a neural network, you must determine how to feed the problem into the input layer of a neural network, and receive the solution through the output layer of a neural network. · The Input Layer - For each input neuron, one double value is stored. An array is passed as input to a layer. Encog uses the interface INeuralData to hold these arrays. The class BasicNeuralData implements the INeuralData interface. Once the neural network processes the input, an INeuralData based class will be returned from the neural network's output layer. · convert a double array into an INeuralData object : INeuralData data = new BasicNeuralData(= new double[10]); · the Output Layer- The neural network outputs an array of doubles, wraped in a class based on the INeuralData interface. · The real power of a neural network comes from its pattern recognition capabilities. The neural network should be able to produce the desired output even if the input has been slightly distorted. · Hidden Layers– optional. between the input and output layers. very much a “black box”. If the structure of the hidden layer is too simple it may not learn the problem. If the structure is too complex, it will learn the problem but will be very slow to train and execute. Some neural networks have no hidden layers. The input layer may be directly connected to the output layer. Further, some neural networks have only a single layer. A single layer neural network has the single layer self-connected. · connections, called synapses, contain individual weight matrixes. These values are changed as the neural network learns. Constructing a Neural Network · the XOR operator is a frequent “first example” -the “Hello World” application for neural networks. · The XOR Operator- only returns true when both inputs differ. 0 XOR 0 = 0 1 XOR 0 = 1 0 XOR 1 = 1 1 XOR 1 = 0 · Structuring a Neural Network for XOR - two inputs to the XOR operator and one output. · input: 0.0,0.0 1.0,0.0 0.0,1.0 1.0,1.0 · Expected output: 0.0 1.0 1.0 0.0 · A Perceptron - a simple feedforward neural network to learn the XOR operator. · Because the XOR operator has two inputs and one output, the neural network will follow suit. Additionally, the neural network will have a single hidden layer, with two neurons to help process the data. The choice for 2 neurons in the hidden layer is arbitrary, and often comes down to trial and error. · Neuron Diagram for the XOR Network · · The Encog workbench displays neural networks on a layer-by-layer basis. · Encog Layer Diagram for the XOR Network: · Create a BasicNetwork - Three layers are added to this network. the FinalizeStructure method must be called to inform the network that no more layers are to be added. The call to Reset randomizes the weights in the connections between these layers. var network = new BasicNetwork(); network.AddLayer(new BasicLayer(2)); network.AddLayer(new BasicLayer(2)); network.AddLayer(new BasicLayer(1)); network.Structure.FinalizeStructure(); network.Reset(); · Neural networks frequently start with a random weight matrix. This provides a starting point for the training methods. These random values will be tested and refined into an acceptable solution. However, sometimes the initial random values are too far off. Sometimes it may be necessary to reset the weights again, if training is ineffective. These weights make up the long-term memory of the neural network. Additionally, some layers have threshold values that also contribute to the long-term memory of the neural network. Some neural networks also contain context layers, which give the neural network a short-term memory as well. The neural network learns by modifying these weight and threshold values. · Now that the neural network has been created, it must be trained. Training a Neural Network · construct a INeuralDataSet object - contains the input array and the expected output array (of corresponding range). Even though there is only one output value, we must still use a two-dimensional array to represent the output. public static double[][] XOR_INPUT ={ new double[2] { 0.0, 0.0 }, new double[2] { 1.0, 0.0 }, new double[2] { 0.0, 1.0 }, new double[2] { 1.0, 1.0 } }; public static double[][] XOR_IDEAL = { new double[1] { 0.0 }, new double[1] { 1.0 }, new double[1] { 1.0 }, new double[1] { 0.0 } }; INeuralDataSet trainingSet = new BasicNeuralDataSet(XOR_INPUT, XOR_IDEAL); · Training is the process where the neural network's weights are adjusted to better produce the expected output. Training will continue for many iterations, until the error rate of the network is below an acceptable level. Encog supports many different types of training. Resilient Propagation (RPROP) - general-purpose training algorithm. All training classes implement the ITrain interface. The RPROP algorithm is implemented by the ResilientPropagation class. Training the neural network involves calling the Iteration method on the ITrain class until the error is below a specific value. The code loops through as many iterations, or epochs, as it takes to get the error rate for the neural network to be below 1%. Once the neural network has been trained, it is ready for use. ITrain train = new ResilientPropagation(network, trainingSet); for (int epoch=0; epoch < 10000; epoch++) { train.Iteration(); Debug.Print("Epoch #" + epoch + " Error:" + train.Error); if (train.Error > 0.01) break; } Executing a Neural Network · Call the Compute method on the BasicNetwork class. Console.WriteLine("Neural Network Results:"); foreach (INeuralDataPair pair in trainingSet) { INeuralData output = network.Compute(pair.Input); Console.WriteLine(pair.Input[0] + "," + pair.Input[1] + ", actual=" + output[0] + ",ideal=" + pair.Ideal[0]); } · The Compute method accepts an INeuralData class and also returns a INeuralData object. Neural Network Results: 0.0,0.0, actual=0.002782538818034049,ideal=0.0 1.0,0.0, actual=0.9903741937121177,ideal=1.0 0.0,1.0, actual=0.9836807956566187,ideal=1.0 1.0,1.0, actual=0.0011646072586172778,ideal=0.0 · the network has not been trained to give the exact results. This is normal. Because the network was trained to 1% error, each of the results will also be within generally 1% of the expected value.

Read the article
Windows Azure Evolution - Web Sites (aka Antares) Part 1

- by Shaun

This is the 3rd post of my Windows Azure Evolution series, focus on the new features and enhancement which was alone with the Windows Azure Platform Upgrade June 2012, announced at the MEET Windows Azure event on 7th June. In the first post I introduced the new preview developer portal and how to works for the existing features such as cloud services, storages and SQL databases. In the second one I talked about the Windows Azure .NET SDK 1.7 on the latest Visual Studio 2012 RC on Windows 8. From this one I will begin to introduce some new features. Now let’s have a look on the first one of them, Windows Azure Web Sites. Overview Windows Azure Web Sites (WAWS), as known as Antares, was a new feature still in preview stage in this upgrade. It allows people to quickly and easily deploy websites to a highly scalable cloud environment, uses the languages and open source apps of the choice then deploy such as FTP, Git and TFS. It also can be integrated with Windows Azure services like SQL Database, Caching, CDN and Storage easily. After read its introduction we may have a question: since we can deploy a website from both cloud service web role and web site, what’s the different between them? So, let’s have a quick compare. CLOUD SERVICE WEB SITE OS Windows Server Windows Server Virtualization Windows Azure Virtual Machine Windows Azure Virtual Machine Host IIS IIS Platform ASP.NET WebForm, ASP.NET MVC, WCF ASP.NET WebForm, ASP.NET MVC, PHP Language C#, VB.NET C#, VB.NET, PHP Database SQL Database SQL Database, MySQL Architecture Multi layered, background worker, message queuing, etc.. Simple website with backend database. VS Project Windows Azure Cloud Service ASP.NET Web Form, ASP.NET MVC, etc.. Out-of-box Gallery (none) Drupal, DotNetNuke, WordPress, etc.. Deployment Package upload, Visual Studio publish FTP, Git, TFS, WebMatrix Compute Mode Dedicate VM Shared Across VMs, Dedicate VM Scale Scale up, scale out Scale up, scale out As you can see, there are many difference between the cloud service and web site, but the main point is that, the cloud service focus on those complex architecture web application. For example, if you want to build a website with frontend layer, middle business layer and data access layer, with some background worker process connected through the message queue, then you should better use cloud service, since it provides full control of your code and application. But if you just want to build a personal blog or a business portal, then you can use the web site. Since the web site have many galleries, you can create them even without any coding and configuration. David Pallmann have an awesome figure explains the benefits between the could service, web site and virtual machine. Create a Personal Blog in Web Site from Gallery As I mentioned above, one of the big feature in WAWS is to build a website from an existing gallery, which means we don’t need to coding and configure. What we need to do is open the windows azure developer portal and click the NEW button, select WEB SITE and FROM GALLERY. In the popping up windows there are many websites we can choose to use. For example, for personal blog there are Orchard CMS, WordPress; for CMS there are DotNetNuke, Drupal 7, mojoPortal. Let’s select WordPress and click the next button. The next step is to configure the web site. We will need to specify the DNS name and select the subscription and region. Since the WordPress uses MySQL as its backend database, we also need to create a MySQL database as well. Windows Azure Web Sites utilize ClearDB to host the MySQL databases. You cannot create a MySQL database directly from SQL Databases section. Finally, since we selected to create a new MySQL database we need to specify the database name and region in the last step. Also we need to accept the ClearDB’s terms as well. Then windows azure platform will download the WordPress codes and deploy the MySQL database and website. Then it will be ready to use. Select the website and click the BROWSE button, the WordPress administration page will be shown. After configured the WordPress here is my personal web blog on the cloud. It took me no more than 10 minutes to establish without any coding. Monitor, Configure, Scale and Linked Resources Let’s click into the website I had just created in the portal and have a look on what we can do. In the website details page where are five sections. - Dashboard The overall information about this website, such as the basic usage status, public URL, compute mode, FTP address, subscription and links that we can specify the deployment credentials, TFS and Git publish setting, etc.. - Monitor Some status information such as the CPU usage, memory usage etc., errors, etc.. We can add more metrics by clicking the ADD METRICS button and the bottom as well. - Configure Here we can set the configurations of our website such as the .NET and PHP runtime version, diagnostics settings, application settings and the IIS default documents. - Scale This is something interesting. In WAWS there are two compute mode or called web site mode. One is “shared”, which means our website will be shared with other web sites in a group of windows azure virtual machines. Each web site have its own process (w3wp.exe) with some sandbox technology to isolate from others. When we need to scaling-out our web site in shared mode, we actually increased the working process count. Hence in shared mode we cannot specify the virtual machine size since they are shared across all web sites. This is a little bit different than the scaling mode of the cloud service (hosted service web role and worker role). The other mode called “dedicate”, which means our web site will use the whole windows azure virtual machine. This is the same hosting behavior as cloud service web role. In web role it will be deployed on the virtual machines we specified and all of them are only used by us. In web sites dedicate mode, it’s the same. In this mode when we scaling-out our web site we will use more virtual machines, and each of them will only host our own website. And we can specify the virtual machine size in this mode. In the developer portal we can select which mode we are using from the scale section. In shared mode we can only specify the instance count, but in dedicate mode we can specify the instance size as well as the instance count. - Linked Resource The MySQL database created alone with the creation of our WordPress web site is a linked resource. We can add more linked resources in this section. Pricing For the web site itself, since this feature is in preview period if you are using shared mode, then you will get free up to 10 web sites. But if you are using dedicate mode, the price would be the virtual machines you are using. For example, if you are using dedicate and configured two middle size virtual machines then you will pay $230.40 per month. If there is SQL Database linked to your web site then they will be charged separately based on the Pay-As-You-Go price. For example a 1GB web edition database costs $9.99 per month. And the bandwidth will be charged as well. For example 10GB outbound data transfer costs $1.20 per month. For more information about the pricing please have a look at the windows azure pricing page. Summary Windows Azure Web Sites gives us easier and quicker way to create, develop and deploy website to window azure platform. Comparing with the cloud service web role, the WAWS have many out-of-box gallery we can use directly. So if you just want to build a blog, CMS or business portal you don’t need to learn ASP.NET, you don’t need to learn how to configure DotNetNuke, you don’t need to learn how to prepare PHP and MySQL. By using WAWS gallery you can establish a website within 10 minutes without any lines of code. But in some cases we do need to code by ourselves. We may need to tweak the layout of our pages, or we may have a traditional ASP.NET or PHP web application which needed to migrated to the cloud. Besides the gallery WAWS also provides many features to download, upload code. It also provides the feature to integrate with some version control services such as TFS and Git. And it also provides the deploy approaches through FTP and Web Deploy. In the next post I will demonstrate how to use WebMatrix to download and modify the website, and how to use TFS and Git to deploy automatically one our code changes committed. Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

Read the article
Why doesn't my implementation of ElGamal work for long text strings?

- by angstrom91

I'm playing with the El Gamal cryptosystem, and my goal is to be able to encipher and decipher long sequences of text. I have come up with a method that works for short sequences, but does not work for long sequences, and I cannot figure out why. El Gamal requires the plaintext to be an integer. I have turned my string into a byte[] using the .getBytes() method for Strings, and then created a BigInteger out of the byte[]. After encryption/decryption, I turn the BigInteger into a byte[] using the .toByteArray() method for BigIntegers, and then create a new String object from the byte[]. This works perfectly when i call ElGamalEncipher with strings up to 129 characters. With 130 or more characters, the output produced from ElGamalDecipher is garbled. Can someone suggest how to solve this issue? Is this an issue with my method of turning the string into a BigInteger? If so, is there a better way to turn my string of text into a BigInteger and back? Below is my encipher/decipher code with a program to demonstrate the problem. import java.math.BigInteger; public class Main { static BigInteger P = new BigInteger("15893293927989454301918026303382412" + "2586402937727056707057089173871237566896685250125642378268385842" + "6917261652781627945428519810052550093673226849059197769795219973" + "9423619267147615314847625134014485225178547696778149706043781174" + "2873134844164791938367765407368476144402513720666965545242487520" + "288928241768306844169"); static BigInteger G = new BigInteger("33234037774370419907086775226926852" + "1714093595439329931523707339920987838600777935381196897157489391" + "8360683761941170467795379762509619438720072694104701372808513985" + "2267495266642743136795903226571831274837537691982486936010899433" + "1742996138863988537349011363534657200181054004755211807985189183" + "22832092343085067869"); static BigInteger R = new BigInteger("72294619754760174015019300613282868" + "7219874058383991405961870844510501809885568825032608592198728334" + "7842806755320938980653857292210955880919036195738252708294945320" + "3969657021169134916999794791553544054426668823852291733234236693" + "4178738081619274342922698767296233937873073756955509269717272907" + "8566607940937442517"); static BigInteger A = new BigInteger("32189274574111378750865973746687106" + "3695160924347574569923113893643975328118502246784387874381928804" + "6865920942258286938666201264395694101012858796521485171319748255" + "4630425677084511454641229993833255506759834486100188932905136959" + "7287419551379203001848457730376230681693887924162381650252270090" + "28296990388507680954"); public static void main(String[] args) { FewChars(); System.out.println(); ManyChars(); } public static void FewChars() { //ElGamalEncipher(String plaintext, BigInteger p, BigInteger g, BigInteger r) BigInteger[] cipherText = ElGamal.ElGamalEncipher("This is a string " + "of 129 characters which works just fine . This is a string " + "of 129 characters which works just fine . This is a s", P, G, R); System.out.println("This is a string of 129 characters which works " + "just fine . This is a string of 129 characters which works " + "just fine . This is a s"); //ElGamalDecipher(BigInteger c, BigInteger d, BigInteger a, BigInteger p) System.out.println("The decrypted text is: " + ElGamal.ElGamalDecipher(cipherText[0], cipherText[1], A, P)); } public static void ManyChars() { //ElGamalEncipher(String plaintext, BigInteger p, BigInteger g, BigInteger r) BigInteger[] cipherText = ElGamal.ElGamalEncipher("This is a string " + "of 130 characters which doesn’t work! This is a string of " + "130 characters which doesn’t work! This is a string of ", P, G, R); System.out.println("This is a string of 130 characters which doesn’t " + "work! This is a string of 130 characters which doesn’t work!" + " This is a string of "); //ElGamalDecipher(BigInteger c, BigInteger d, BigInteger a, BigInteger p) System.out.println("The decrypted text is: " + ElGamal.ElGamalDecipher(cipherText[0], cipherText[1], A, P)); } } import java.math.BigInteger; import java.security.SecureRandom; public class ElGamal { public static BigInteger[] ElGamalEncipher(String plaintext, BigInteger p, BigInteger g, BigInteger r) { // returns a BigInteger[] cipherText // cipherText[0] is c // cipherText[1] is d SecureRandom sr = new SecureRandom(); BigInteger[] cipherText = new BigInteger[2]; BigInteger pText = new BigInteger(plaintext.getBytes()); // 1: select a random integer k such that 1 <= k <= p-2 BigInteger k = new BigInteger(p.bitLength() - 2, sr); // 2: Compute c = g^k(mod p) BigInteger c = g.modPow(k, p); // 3: Compute d= P*r^k = P(g^a)^k(mod p) BigInteger d = pText.multiply(r.modPow(k, p)).mod(p); // C =(c,d) is the ciphertext cipherText[0] = c; cipherText[1] = d; return cipherText; } public static String ElGamalDecipher(BigInteger c, BigInteger d, BigInteger a, BigInteger p) { //returns the plaintext enciphered as (c,d) // 1: use the private key a to compute the least non-negative residue // of an inverse of (c^a)' (mod p) BigInteger z = c.modPow(a, p).modInverse(p); BigInteger P = z.multiply(d).mod(p); byte[] plainTextArray = P.toByteArray(); return new String(plainTextArray); } }

Read the article
concurrency::accelerator_view

- by Daniel Moth

Overview We saw previously that accelerator represents a target for our C++ AMP computation or memory allocation and that there is a notion of a default accelerator. We ended that post by introducing how one can obtain accelerator_view objects from an accelerator object through the accelerator class's default_view property and the create_view method. The accelerator_view objects can be thought of as handles to an accelerator. You can also construct an accelerator_view given another accelerator_view (through the copy constructor or the assignment operator overload). Speaking of operator overloading, you can also compare (for equality and inequality) two accelerator_view objects between them to determine if they refer to the same underlying accelerator. We'll see later that when we use concurrency::array objects, the allocation of data takes place on an accelerator at array construction time, so there is a constructor overload that accepts an accelerator_view object. We'll also see later that a new concurrency::parallel_for_each function overload can take an accelerator_view object, so it knows on what target to execute the computation (represented by a lambda that the parallel_for_each also accepts). Beyond normal usage, accelerator_view is a quality of service concept that offers isolation to multiple "consumers" of an accelerator. If in your code you are accessing the accelerator from multiple threads (or, in general, from different parts of your app), then you'll want to create separate accelerator_view objects for each thread. flush, wait, and queuing_mode When you create an accelerator_view via the create_view method of the accelerator, you pass in an option of immediate or deferred, which are the two members of the queuing_mode enum. At any point you can access this value from the queuing_mode property of the accelerator_view. When the queuing_mode value is immediate (which is the default), any commands sent to the device such as kernel invocations and data transfers (e.g. parallel_for_each and copy, as we'll see in future posts), will get submitted as soon as the runtime sees fit (that is the definition of immediate). When the value of queuing_mode is deferred, the commands will be batched up. To send all buffered commands to the device for execution, there is a non-blocking flush method that you can call. If you wish to block until all the commands have been sent, there is a wait method you can call. Deferring is a more advanced scenario aimed at performance gains when you are submitting many device commands and you want to avoid the tiny overhead of flushing/submitting each command separately. Querying information Just like accelerator, accelerator_view exposes the is_debug and version properties. In fact, you can always access the accelerator object from the accelerator property on the accelerator_view class to access the accelerator interface we looked at previously. Interop with D3D (aka DX) In a later post I'll show an example of an app that uses C++ AMP to compute data that is used in pixel shaders. In those scenarios, you can benefit by integrating C++ AMP into your graphics pipeline and one of the building blocks for that is being able to use the same device context from both the compute kernel and the other shaders. You can do that by going from accelerator_view to device context (and vice versa), through part of our interop API in amp.h: *get_device, create_accelerator_view. More on those in a later post. Comments about this post by Daniel Moth welcome at the original blog.

Read the article
How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article
Triangle Line-Segment Intersection - detecting near misses

- by Will

A ray is a very poor approximation of a player! I think approximating a player with a sphere traveling a straight line each game tick will solve my problems of the player intersecting edges of scenery because their line segment missed it yet their own model is not infinitely thin... I have a 3D triangle and a line segment. I have the normal triangle-line-segment intersection code which I admit I have only a woolly grasp of. To model movement and compute collisions of the player I have to determine if a line passes within sphere-radius of a triangle. But I can find no convenient line near-miss intersection code! Here's the classic triangle intersection ### commented ### code with my starting assumptions: function triangle_ray_intersection(a,b,c,ray_origin,ray_dir,ray_radius) { // http://softsurfer.com/Archive/algorithm_0105/algorithm_0105.htm#intersect_RayTriangle%28%29 // get triangle edge vectors and plane normal var u = vec3_sub(b,a); var v = vec3_sub(c,a); var n = vec3_cross(u,v); if(n[0]==0 && n[1]==0 && n[2]==0) return null; // triangle is degenerate var w0 = vec3_sub(ray_origin,a); var j = vec3_dot(n,ray_dir); if(Math.abs(j) < 0.00000001) { //### if parallel, might still pass within ray_radius of it return null; // parallel, disjoint or on plane } var i = -vec3_dot(n,w0); // get intersect point of ray with triangle plane var k = i / j; if(k < 0.0) return null; // ray goes away from triangle //### as its a line segment, k > 1+ray_radius means no intersect var hit = vec3_add(ray_origin,vec3_scale(ray_dir,k)); // intersect point of ray and plane // is I inside T? //### here I'm a bit lost; this is presumably computing barycentric coordinates? var uu = vec3_dot(u,u); var uv = vec3_dot(u,v); var vv = vec3_dot(v,v); var w = vec3_sub(hit,a); var wu = vec3_dot(w,u); var wv = vec3_dot(w,v); var D = uv * uv - uu * vv; var s = (uv * wv - vv * wu) / D; //### therefore, compute if its within ray_radius scaled to the 0..1 of barycentric coordinates? if(s<0.0 || s>1.0) return null; // I is outside T var t = (uv * wu - uu * wv) / D; if(t<0.0 || (s+t)>1.0) return null; // I is outside T //### finally, if it passses a barycentric test it might still be too far //### to a point; must check that its distance from a corner is within ray_radius too if more than one barycentric coord is >1 //### so we have rounded corners... return [hit,n]; // I is in T } Given the distance between the point of plane intersection and each corner, I ought to be able to determine distance at world scale of how far beyond the edge - beyond 1.0 in barycentric coordinates for each axis - that point is... At this point my head explodes! Is this the right track? What's the actual code? UPDATE: you can earn 100 pts on SO if you answer this question there...! How can you determine if a line segment passes within some distance of a triangle?

Read the article
Why do my pyramids fade black and then back to colour again

- by geminiCoder

I have the following vertecies and norms GLfloat verts[36] = { -0.5, 0, 0.5, 0, 0, -0.5, 0.5, 0, 0.5, 0, 0, -0.5, 0.5, 0, 0.5, 0, 1, 0, -0.5, 0, 0.5, 0, 0, -0.5, 0, 1, 0, 0.5, 0, 0.5, -0.5, 0, 0.5, 0, 1, 0 }; GLfloat norms[36] = { 0, -1, 0, 0, -1, 0, 0, -1, 0, -1, 0.25, 0.5, -1, 0.25, 0.5, -1, 0.25, 0.5, 1, 0.25, -0.5, 1, 0.25, -0.5, 1, 0.25, -0.5, 0, -0.5, -1, 0, -0.5, -1, 0, -0.5, -1 }; I am writing my fists Open GL game, But I need to know for sure if my Normals are correct as the colours aren't rendering correctly. my Pyramids are coloured then fade to black every half rotation then back again. My app so far is based on the boiler plate code provided by apple. heres my modified setUp Method [EAGLContext setCurrentContext:self.context]; [self loadShaders]; self.effect = [[GLKBaseEffect alloc] init]; self.effect.light0.enabled = GL_TRUE; self.effect.light0.diffuseColor = GLKVector4Make(1.0f, 0.4f, 0.4f, 1.0f); glEnable(GL_DEPTH_TEST); glGenVertexArraysOES(1, &_vertexArray); //create vertex array glBindVertexArrayOES(_vertexArray); glGenBuffers(1, &_vertexBuffer); glBindBuffer(GL_ARRAY_BUFFER, _vertexBuffer); glBufferData(GL_ARRAY_BUFFER, sizeof(verts) + sizeof(norms), NULL, GL_STATIC_DRAW); //create vertex buffer big enough for both verts and norms and pass NULL as data.. uint8_t *ptr = (uint8_t *)glMapBufferOES(GL_ARRAY_BUFFER, GL_WRITE_ONLY_OES); //map buffer to pass data to it memcpy(ptr, verts, sizeof(verts)); //copy verts memcpy(ptr+sizeof(verts), norms, sizeof(norms)); //copy norms to position after verts glUnmapBufferOES(GL_ARRAY_BUFFER); glEnableVertexAttribArray(GLKVertexAttribPosition); glVertexAttribPointer(GLKVertexAttribPosition, 3, GL_FLOAT, GL_FALSE, 0, BUFFER_OFFSET(0)); //tell GL where verts are in buffer glEnableVertexAttribArray(GLKVertexAttribNormal); glVertexAttribPointer(GLKVertexAttribNormal, 3, GL_FLOAT, GL_FALSE, 0, BUFFER_OFFSET(sizeof(verts))); //tell GL where norms are in buffer glBindVertexArrayOES(0); And the update method. - (void)update { float aspect = fabsf(self.view.bounds.size.width / self.view.bounds.size.height); GLKMatrix4 projectionMatrix = GLKMatrix4MakePerspective(GLKMathDegreesToRadians(65.0f), aspect, 0.1f, 100.0f); self.effect.transform.projectionMatrix = projectionMatrix; GLKMatrix4 baseModelViewMatrix = GLKMatrix4MakeTranslation(0.0f, 0.0f, -4.0f); baseModelViewMatrix = GLKMatrix4Rotate(baseModelViewMatrix, _rotation, 0.0f, 1.0f, 0.0f); // Compute the model view matrix for the object rendered with GLKit GLKMatrix4 modelViewMatrix = GLKMatrix4MakeTranslation(0.0f, 0.0f, -1.5f); modelViewMatrix = GLKMatrix4Rotate(modelViewMatrix, _rotation, 1.0f, 1.0f, 1.0f); modelViewMatrix = GLKMatrix4Multiply(baseModelViewMatrix, modelViewMatrix); self.effect.transform.modelviewMatrix = modelViewMatrix; // Compute the model view matrix for the object rendered with ES2 modelViewMatrix = GLKMatrix4MakeTranslation(0.0f, 0.0f, 1.5f); modelViewMatrix = GLKMatrix4Rotate(modelViewMatrix, _rotation, 1.0f, 1.0f, 1.0f); modelViewMatrix = GLKMatrix4Multiply(baseModelViewMatrix, modelViewMatrix); _normalMatrix = GLKMatrix3InvertAndTranspose(GLKMatrix4GetMatrix3(modelViewMatrix), NULL); _modelViewProjectionMatrix = GLKMatrix4Multiply(projectionMatrix, modelViewMatrix); _rotation += self.timeSinceLastUpdate * 0.5f; } But providing I understand this correct one pyramid is using the GLKit base effect shaders and the other the shaders which are included in the project. So for both of them to have the same error, I thought it would be the Norms?

Read the article
Memory allocation problem with SVMs in OpenCV

- by worksintheory

Hi, I've been using OpenCV happily for a while, but now I have a problem which has bugged me for quite some time. The following code is reasonably minimal example of my problem: #include <cv.h> #include <ml.h> using namespace cv; int main(int argc, char **argv) { int sampleCountForTesting = 2731; //BROKEN: Breaks svm.train_auto(...) for values of 2731 or greater! Mat trainingData( sampleCountForTesting, 1, CV_32FC1, Scalar::all(0.0) ); Mat trainingResponses( sampleCountForTesting, 1, CV_32FC1, Scalar::all(0.0) ); for(int j = 0; j < 6; j++) { trainingData.at<float>( j, 0 ) = (float) (j%2); trainingResponses.at<float>( j, 0 ) = (float) (j%2); //Setting a few values so I don't get a "single class" error } CvSVMParams svmParams( 100, //100 is CvSVM::C_SVC, 2, //2 is CvSVM::RBF, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, NULL, TermCriteria( TermCriteria::MAX_ITER | TermCriteria::EPS, 2, 1.0 ) ); CvSVM svm = CvSVM(); svm.train_auto( trainingData, trainingResponses, Mat(), Mat(), svmParams ); return 0; } I just create matrices to hold the training data and responses, then set a few entries to some value other than zero, then run the SVM. But it breaks whenever there are 2731 rows or more: OpenCV Error: One of arguments' values is out of range (requested size is negative or too big) in cvMemStorageAlloc, file [omitted]/opencv/OpenCV-2.2.0/modules/core/src/datastructs.cpp, line 332 With fewer rows, it seems to be fine and a classifier trained in a similar manner to the above seems to be giving reasonable output. Am I doing something wrong? I'm pretty sure it's not actually anything to do with lack of memory, as I've got 6GB and also the code works fine when the data has 2730 rows and 10000 columns, which is a much bigger allocation. I'm running OpenCV 2.2 on OSX 10.6 and initially I thought the problem might be related to this bug if for some reason the fix wasn't included in the MacPorts version. Now I've also tried downloading the most recent stable version from the OpenCV site and building with cmake and using that, but I still get the same error, and the fix is definitely included in that version. Any help would be much appreciated! Thanks,

Read the article
Calling stored procedure from another stored procedure and returning result as new columns

- by luppi

How to call stored procedure from another stored procedure and return the result as first one's column? ALTER PROCEDURE [dbo].[GetItems] AS SET NOCOUNT ON SELECT ID, AddedDate, Title,Description, Result of Stored Procedure "CountAll" call with parameter ID as Total FROM dbo.Table1 And also: how to be if CountAll stored procedure returns some columns and not just scalar?

Read the article
Is there a library for Linear algebra matrix handling in Java?

- by Mario Ortegón

Is there any libraries in java that allow using mathematical matrices? I am looking for a library that allows me to perform operations in matrices such as invert, scalar multiplication, linear transformations, etc. etc. etc. In a nutshell, the operations required for lineal algebra.

Read the article
How to handle Oracle Store Procedure call with Oracle Types as input or output using EclipseLink

- by MAK

Hi, I doing a Proof Of Concept to figure out how efficient to call a store procedure using EclipseLink. I was able to call oracle store procedure using EclispeLink with Scalar/primitive data types (link Integer, varchar etc). I wanted to understand how can I handle Oracle Store procedure from EclipseLink with collection(Oracle Types/User defined types) as input or output parameters. I would really appreciate if some one help me understand this with an example. Thanks MAK

Read the article
Modifying documents in memory in yaml-cpp

- by Mike Mueller

I want to read a YML document, filter it by modifying some nodes in memory, and then spit it back out with an emitter. The problem is that YAML::Node appears to be designed to be read-only. Is there a way to replace a node's value (with a scalar in this case) that I'm missing?

Read the article
How can I tell if a set of parens in perl code will act as grouping parens or form a list?

- by Ryan Thompson

In perl, parentheses are used for overriding precedence (as in most programming languages) as well as for creating lists. How can I tell if a particular pair of parens will be treated as a grouping construct or a one-element list? For example, I'm pretty sure this is a scalar and not a one-element list: (1 + 1) But what about more complex expressions? Is there an easy way to tell?

Read the article
Updating with Related Entities - Entity Framework v4

- by Vincent BOUZON

Hi, I use Entity Framework V4 and i want to update Customer who have Visits. My code : EntityKey key; object originalItem; key = this._modelContainer.CreateEntityKey("Customers", customer); if (this._modelContainer.TryGetObjectByKey(key, out originalItem)) { this._modelContainer.ApplyCurrentValues(key.EntitySetName, customer); } this._modelContainer.SaveChanges(); It works for Scalar Property only. The customers.Visits collection is not updated. Best Regards :)

Read the article
What's FirstOrDefault for DateTime in Linq?

- by Keltex

If I have a query that returns a DateTime, what's the value of FirstOrDefault? Is there a generic way to get the default value of a C# scalar? Example: var list = (from item in db.Items where item.ID==1234 select item.StartDate).FirstOrDefault(); Edit: Assume that the column StartDate can't be null.

Read the article
Entity Framework and Modeling Collections with an Interface as a return type

- by William

I am using Entity Framework v4. I have created a POCO class that contains a bunch of scalar properties and a collection that returns an Interface type. How do I create this relationship in the EF model? How do I show a collection that contains different items but they all have a common interface?

Read the article
How can I fix my program from crashing in C++?

- by Rachel

I'm very new to programming and I am trying to write a program that adds and subtracts polynomials. My program sometimes works, but most of the time, it randomly crashes and I have no idea why. It's very buggy and has other problems I'm trying to fix, but I am unable to really get any further coding done since it crashes. I'm completely new here but any help would be greatly appreciated. Here's the code: #include <iostream> #include <cstdlib> using namespace std; int getChoice(void); class Polynomial10 { private: double* coef; int degreePoly; public: Polynomial10(int max); //Constructor for a new Polynomial10 int getDegree(){return degreePoly;}; void print(); //Print the polynomial in standard form void read(); //Read a polynomial from the user void add(const Polynomial10& pol); //Add a polynomial void multc(double factor); //Multiply the poly by scalar void subtract(const Polynomial10& pol); //Subtract polynom }; void Polynomial10::read() { cout << "Enter degree of a polynom between 1 and 10 : "; cin >> degreePoly; cout << "Enter space separated coefficients starting from highest degree" << endl; for (int i = 0; i <= degreePoly; i++) { cin >> coef[i]; } } void Polynomial10::print() { for(int i=0;i<=degreePoly;i++) { if(coef[i] == 0) { cout << ""; } else if(i>=0) { if(coef[i] > 0 && i!=0) { cout<<"+"; } if((coef[i] != 1 && coef[i] != -1) || i == degreePoly) { cout << coef[i]; } if((coef[i] != 1 && coef[i] != -1) && i!=degreePoly ) { cout << "*"; } if (i != degreePoly && coef[i] == -1) { cout << "-"; } if(i != degreePoly) { cout << "x"; } if ((degreePoly - i) != 1 && i != degreePoly) { cout << "^"; cout << degreePoly-i; } } } } void Polynomial10::add(const Polynomial10& pol) { for(int i = 0; i<degreePoly; i++) { int degree = degreePoly; coef[degreePoly-i] += pol.coef[degreePoly-(i+1)]; } } void Polynomial10::subtract(const Polynomial10& pol) { for(int i = 0; i<degreePoly; i++) { coef[degreePoly-i] -= pol.coef[degreePoly-(i+1)]; } } void Polynomial10::multc(double factor) { //int degreePoly=0; //double coef[degreePoly]; cout << "Enter the scalar multiplier : "; cin >> factor; for(int i = 0; i<degreePoly; i++) { coef[i] *= factor; } }; Polynomial10::Polynomial10(int max) { degreePoly=max; coef = new double[degreePoly]; for(int i; i<degreePoly; i++) { coef[i] = 0; } } int main() { int choice; Polynomial10 p1(1),p2(1); cout << endl << "CGS 2421: The Polynomial10 Class" << endl << endl << endl; cout << "0. Quit\n" << "1. Enter polynomial\n" << "2. Print polynomial\n" << "3. Add another polynomial\n" << "4. Subtract another polynomial\n" << "5. Multiply by scalar\n\n"; int choiceFirst = getChoice(); if (choiceFirst != 1) { cout << "Enter a Polynomial first!"; } if (choiceFirst == 1) {choiceFirst = choice;} while(choice != 0) { switch(choice) { case 0: return 0; case 1: p1.read(); break; case 2: p1.print(); break; case 3: p2.read(); p1.add(p2); cout << "Updated Polynomial: "; p1.print(); break; case 4: p2.read(); p1.subtract(p2); cout << "Updated Polynomial: "; p1.print(); break; case 5: p1.multc(10); cout << "Updated Polynomial: "; p1.print(); break; } choice = getChoice(); } return 0; } int getChoice(void) { int c; cout << "\nEnter your choice : "; cin >> c; return c; }

Read the article
delete row from selected gridview and database

- by user175084

i am trying to delete a row from the gridview and database... It should be deleted if a delte linkbutton is clicked in the gridview.. I am gettin the row index as follows: protected void LinkButton1_Click(object sender, EventArgs e) { LinkButton btn = (LinkButton)sender; GridViewRow row = (GridViewRow)btn.NamingContainer; if (row != null) { LinkButton LinkButton1 = (LinkButton)sender; // Get reference to the row that hold the button GridViewRow gvr = (GridViewRow)LinkButton1.NamingContainer; // Get row index from the row int rowIndex = gvr.RowIndex; string str = rowIndex.ToString(); //string str = GridView1.DataKeys[row.RowIndex].Value.ToString(); RemoveData(str); //call the delete method } } now i want to delete it... so i am having problems with this code.. i get an error Must declare the scalar variable "@original_MachineGroupName"... any suggestions private void RemoveData(string item) { SqlConnection conn = new SqlConnection(@"Data Source=JAGMIT-PC\SQLEXPRESS; Initial Catalog=SumooHAgentDB;Integrated Security=True"); string sql = "DELETE FROM [MachineGroups] WHERE [MachineGroupID] = @original_MachineGroupID; SqlCommand cmd = new SqlCommand(sql, conn); cmd.Parameters.AddWithValue("@original_MachineGroupID", item); conn.Open(); cmd.ExecuteNonQuery(); conn.Close(); } Blockquote <asp:SqlDataSource ID="SqlDataSource1" runat="server" ConnectionString="<%$ ConnectionStrings:SumooHAgentDBConnectionString %>" SelectCommand="SELECT MachineGroups.MachineGroupID, MachineGroups.MachineGroupName, MachineGroups.MachineGroupDesc, MachineGroups.TimeAdded, MachineGroups.CanBeDeleted, COUNT(Machines.MachineName) AS Expr1, DATENAME(month, (MachineGroups.TimeAdded - 599266080000000000) / 864000000000) + SPACE(1) + DATENAME(d, (MachineGroups.TimeAdded - 599266080000000000) / 864000000000) + ', ' + DATENAME(year, (MachineGroups.TimeAdded - 599266080000000000) / 864000000000) AS Expr2 FROM MachineGroups FULL OUTER JOIN Machines ON Machines.MachineGroupID = MachineGroups.MachineGroupID GROUP BY MachineGroups.MachineGroupID, MachineGroups.MachineGroupName, MachineGroups.MachineGroupDesc, MachineGroups.TimeAdded, MachineGroups.CanBeDeleted" DeleteCommand="DELETE FROM [MachineGroups] WHERE [MachineGroupID] =@original_MachineGroupID" > <DeleteParameters> <asp:Parameter Name="@original_MachineGroupID" Type="Int16" /> <asp:Parameter Name="@original_MachineGroupName" Type="String" /> <asp:Parameter Name="@original_MachineGroupDesc" Type="String" /> <asp:Parameter Name="@original_CanBeDeleted" Type="Boolean" /> <asp:Parameter Name="@original_TimeAdded" Type="Int64" /> </DeleteParameters> </asp:SqlDataSource> I still get an error : Must declare the scalar variable "@original_MachineGroupID"

Read the article
How to write sum calculation based on ria service?

- by KentZhou

When using ria service for SL app, I can issue following async call to get a group of entity list. LoadOperation<Person> ch = this.AMSContext.Load(this.AMSContext.GetPersonQuery().Where(a => a.PersonID == this.performer.PersonID)); But I want to get some calculation, for example, sum(Commission), sum(Salary), the result is not entity, just a scalar value. How can I do this?

Read the article

< Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >