Search Results

Search found 7436 results on 298 pages for 'simultaneous calls'.

Page 116/298 | < Previous Page | 112 113 114 115 116 117 118 119 120 121 122 123 | Next Page >

Synchronization requirements for FileStream.(Begin/End)(Read/Write)

- by Doug McClean

Is the following pattern of multi-threaded calls acceptable to a .Net FileStream? Several threads calling a method like this: ulong offset = whatever; // different for each thread byte[] buffer = new byte[8192]; object state = someState; // unique for each call, hence also for each thread lock(theFile) { theFile.Seek(whatever, SeekOrigin.Begin); IAsyncResult result = theFile.BeginRead(buffer, 0, 8192, AcceptResults, state); } if(result.CompletedSynchronously) { // is it required for us to call AcceptResults ourselves in this case? // or did BeginRead already call it for us, on this thread or another? } Where AcceptResults is: void AcceptResults(IAsyncResult result) { lock(theFile) { int bytesRead = theFile.EndRead(result); // if we guarantee that the offset of the original call was at least 8192 bytes from // the end of the file, and thus all 8192 bytes exist, can the FileStream read still // actually read fewer bytes than that? // either: if(bytesRead != 8192) { Panic("Page read borked"); } // or: // issue a new call to begin read, moving the offsets into the FileStream and // the buffer, and decreasing the requested size of the read to whatever remains of the buffer } } I'm confused because the documentation seems unclear to me. For example, the FileStream class says: Any public static members of this type are thread safe. Any instance members are not guaranteed to be thread safe. But the documentation for BeginRead seems to contemplate having multiple read requests in flight: Multiple simultaneous asynchronous requests render the request completion order uncertain. Are multiple reads permitted to be in flight or not? Writes? Is this the appropriate way to secure the location of the Position of the stream between the call to Seek and the call to BeginRead? Or does that lock need to be held all the way to EndRead, hence only one read or write in flight at a time? I understand that the callback will occur on a different thread, and my handling of state, buffer handle that in a way that would permit multiple in flight reads. Further, does anyone know where in the documentation to find the answers to these questions? Or an article written by someone in the know? I've been searching and can't find anything. Relevant documentation: FileStream class Seek method BeginRead method EndRead IAsyncResult interface

Read the article
Sockets server design advice

- by Rob

We are writing a socket server in c# and need some advice on the design. Background: Clients (from mobile devices) connect to our server app and we leave their socket open so we can send data back down to them whenever we need to. The amount of data varies but we generally send/receive data from each client every few seconds, so it's quite intensive. The amount of simultaneous connections can range from 50-500 (and more in the future). We have already written a server app using async sockets and it works, however we've come across some stumbling blocks and we need to make sure that what we're doing is correct. We have a collection which holds our client states (we have no socket/connection pool at the moment, should we?). Each time a client connects we create a socket and then wait for them to send us some data and in receiveCallBack we add their clientstate object to our connections dictionary (once we have verified who they are). When a client object then signs off we shutdown their socket and then close it as well as remove them from our collection of clients dictionary. Presumably everything happens in the right order, everything works as expected. However, almost everyday it stops accepting connections, or so we think, either that or it connects but doesn't actually do anything past that and we can't work out why it's just stopping. There are few things that we'r'e unsure about 1) Should we be creating some kind of connection pool as opposed to just a dictionary of client sockets 2) What happens to the sockets that connect but then don't get added to our dictionary, they just linger around in memory doing nothing, should we create ANOTHER dictionary that holds the sockets as soon as they are created? 3) What's the best way of finding if clients are no longer connected? We've read some many methods but we're not sure of the best one to use, send data or read data, if so how? 4) If we loop through the connections dictonary to check for disposed clients, should we be locking the dictionary, if so how does this affect other clients objects trying to use it at the same time, will it throw an error or just wait? 5) We often get disposedSocketException within ReceiveCallBack method at random times, does this mean we are safe to remove that socket from the collection? We can't seem to find any production type examples which show any of this working. Any advice would be greatly received

Read the article
Multi-part question about multi-threading, locks and multi-core processors (multi ^ 3)

- by MusiGenesis

I have a program with two methods. The first method takes two arrays as parameters, and performs an operation in which values from one array are conditionally written into the other, like so: void Blend(int[] dest, int[] src, int offset) { for (int i = 0; i < src.Length; i++) { int rdr = dest[i + offset]; dest[i + offset] = src[i] > rdr? src[i] : rdr; } } The second method creates two separate sets of int arrays and iterates through them such that each array of one set is Blended with each array from the other set, like so: void CrossBlend() { int[][] set1 = new int[150][75000]; // we'll pretend this actually compiles int[][] set2 = new int[25][10000]; // we'll pretend this actually compiles for (int i1 = 0; i1 < set1.Length; i1++) { for (int i2 = 0; i2 < set2.Length; i2++) { Blend(set1[i1], set2[i2], 0); // or any offset, doesn't matter } } } First question: Since this apporoach is an obvious candidate for parallelization, is it intrinsically thread-safe? It seems like no, since I can conceive a scenario (unlikely, I think) where one thread's changes are lost because a different threads ~simultaneous operation. If no, would this: void Blend(int[] dest, int[] src, int offset) { lock (dest) { for (int i = 0; i < src.Length; i++) { int rdr = dest[i + offset]; dest[i + offset] = src[i] > rdr? src[i] : rdr; } } } be an effective fix? Second question: If so, what would be the likely performance cost of using locks like this? I assume that with something like this, if a thread attempts to lock a destination array that is currently locked by another thread, the first thread would block until the lock was released instead of continuing to process something. Also, how much time does it actually take to acquire a lock? Nanosecond scale, or worse than that? Would this be a major issue in something like this? Third question: How would I best approach this problem in a multi-threaded way that would take advantage of multi-core processors (and this is based on the potentially wrong assumption that a multi-threaded solution would not speed up this operation on a single core processor)? I'm guessing that I would want to have one thread running per core, but I don't know if that's true.

Read the article
PulpCore music playback - loop sound and animate volume

- by Peter Perhác

I have been experimenting with PulpCore, trying to create my own tower defence game (not-playable yet), and I am enjoying it very much I ran into a problem that I can't quite figure out. I extended PulpCore with the JOrbis thing to allow OGG files to be played. Works fine. However, pulpCore seems to have a problem with looping the sound WHILE animating the volume level. I tried this with wav file too, to make sure it isn't jOrbis that breaks it. The code is like this: Sound bgMusic = Sound.load("music/music.ogg"); Playback musicPlayback; ... musicVolume = new Fixed(0.75); musicPlayback = bgMusic.loop(musicVolume); //TODO figure out why it's NOT looping when volume is animated // musicVolume.animate(0, musicVolume.get(), FADE_IN_TIME); This code, for as long as the last line is commented out, plays the music.ogg again and again in an endless loop (which I can stop by calling stop on the Playback object returned from loop(). However, I would like the music to fade in smoothly, so following the advice of the PulpCore API docs, I added the last line which will create the fade-in but the music will only play once and then stop. I wonder why is that? Here is a bit of the documentation: Playback pulpcore.sound.Sound.loop(Fixed level) Loops this sound clip with the specified volume level (0.0 to 1.0). The level may have a property animation attached. Parameters: level Returns: a Playback object for this unique sound playback (one Sound can have many simultaneous Playback objects) or null if the sound could not be played. So what could be the problem? I repeat, with the last line, the sound fades in but doesn't loop, without it it loops but starts with the specified 0.75 volume level. Why can't I animate the volume of the looped music playback? What am I doing wrong? Anyone has any experience with pulpCore and has come across this problem? Anyone could please download PulpCore and try to loop music which fades-in (out)? note: I need to keep a reference to the Playback object returned so I can kill music later.

Read the article
Java multithreaded server - each connection returns data. Processing on main thread?

- by oliwr

I am writing a client with an integrated server that should wait indefinitely for new connections - and handle each on a Thread. I want to process the received byte array in a system wide available message handler on the main thread. However, currently the processing is obviously done on the client thread. I've looked at Futures, submit() of ExecutorService, but as I create my Client-Connections within the Server, the data would be returned to the Server thread. How can I return it from there onto the main thread (in a synchronized packet store maybe?) to process it without blocking the server? My current implementation looks like this: public class Server extends Thread { private int port; private ExecutorService threadPool; public Server(int port) { this.port = port; // 50 simultaneous connections threadPool = Executors.newFixedThreadPool(50); } public void run() { try{ ServerSocket listener = new ServerSocket(this.port); System.out.println("Listening on Port " + this.port); Socket connection; while(true){ try { connection = listener.accept(); System.out.println("Accepted client " + connection.getInetAddress()); connection.setSoTimeout(4000); ClientHandler conn_c= new ClientHandler(connection); threadPool.execute(conn_c); } catch (IOException e) { System.out.println("IOException on connection: " + e); } } } catch (IOException e) { System.out.println("IOException on socket listen: " + e); e.printStackTrace(); threadPool.shutdown(); } } } class ClientHandler implements Runnable { private Socket connection; ClientHandler(Socket connection) { this.connection=connection; } @Override public void run() { try { // Read data from the InputStream, buffered int count; byte[] buffer = new byte[8192]; InputStream is = connection.getInputStream(); ByteArrayOutputStream out = new ByteArrayOutputStream(); // While there is data in the stream, read it while ((count = is.read(buffer)) > 0) { out.write(buffer, 0, count); } is.close(); out.close(); System.out.println("Disconnect client " + connection.getInetAddress()); connection.close(); // handle the received data MessageHandler.handle(out.toByteArray()); } catch (IOException e) { System.out.println("IOException on socket read: " + e); e.printStackTrace(); } return; } }

Read the article
Multiprogramming in Django, writing to the Database

- by Marcus Whybrow

Introduction I have the following code which checks to see if a similar model exists in the database, and if it does not it creates the new model: class BookProfile(): # ... def save(self, *args, **kwargs): uniqueConstraint = {'book_instance': self.book_instance, 'collection': self.collection} # Test for other objects with identical values profiles = BookProfile.objects.filter(Q(**uniqueConstraint) & ~Q(pk=self.pk)) # If none are found create the object, else fail. if len(profiles) == 0: super(BookProfile, self).save(*args, **kwargs) else: raise ValidationError('A Book Profile for that book instance in that collection already exists') I first build my constraints, then search for a model with those values which I am enforcing must be unique Q(**uniqueConstraint). In addition I ensure that if the save method is updating and not inserting, that we do not find this object when looking for other similar objects ~Q(pk=self.pk). I should mention that I ham implementing soft delete (with a modified objects manager which only shows non-deleted objects) which is why I must check for myself rather then relying on unique_together errors. Problem Right thats the introduction out of the way. My problem is that when multiple identical objects are saved in quick (or as near as simultaneous) succession, sometimes both get added even though the first being added should prevent the second. I have tested the code in the shell and it succeeds every time I run it. Thus my assumption is if say we have two objects being added Object A and Object B. Object A runs its check upon save() being called. Then the process saving Object B gets some time on the processor. Object B runs that same test, but Object A has not yet been added so Object B is added to the database. Then Object A regains control of the processor, and has allready run its test, even though identical Object B is in the database, it adds it regardless. My Thoughts The reason I fear multiprogramming could be involved is that each Object A and Object is being added through an API save view, so a request to the view is made for each save, thus not a single request with multiple sequential saves on objects. It might be the case that Apache is creating a process for each request, and thus causing the problems I think I am seeing. As you would expect, the problem only occurs sometimes, which is characteristic of multiprogramming or multiprocessing errors. If this is the case, is there a way to make the test and set parts of the save() method a critical section, so that a process switch cannot happen between the test and the set?

Read the article
Trying to run multiple HTTP requests in parallel, but being limited by Windows (registry)

- by Nailuj

I'm developing an application (winforms C# .NET 4.0) where I access a lookup functionality from a 3rd party through a simple HTTP request. I call an url with a parameter, and in return I get a small string with the result of the lookup. Simple enough. The challenge is however, that I have to do lots of these lookups (a couple of thousands), and I would like to limit the time needed. Therefore I would like to run requests in parallel (say 10-20). I use a ThreadPool to do this, and the short version of my code looks like this: public void startAsyncLookup(Action<LookupResult> returnLookupResult) { this.returnLookupResult = returnLookupResult; foreach (string number in numbersToLookup) { ThreadPool.QueueUserWorkItem(lookupNumber, number); } } public void lookupNumber(Object threadContext) { string numberToLookup = (string)threadContext; string url = @"http://some.url.com/?number=" + numberToLookup; WebClient webClient = new WebClient(); Stream responseData = webClient.OpenRead(url); LookupResult lookupResult = parseLookupResult(responseData); returnLookupResult(lookupResult); } I fill up numbersToLookup (a List<String>) from another place, call startAsyncLookup and provide it with a call-back function returnLookupResult to return each result. This works, but I found that I'm not getting the throughput I want. Initially I thought it might be the 3rd party having a poor system on their end, but I excluded this by trying to run the same code from two different machines at the same time. Each of the two took as long as one did alone, so I could rule out that one. A colleague then tipped me that this might be a limitation in Windows. I googled a bit, and found amongst others this post saying that by default Windows limits the number of simultaneous request to the same web server to 4 for HTTP 1.0 and to 2 for HTTP 1.1 (for HTTP 1.1 this is actually according to the specification (RFC2068)). The same post referred to above also provided a way to increase these limits. By adding two registry values to [HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings] (MaxConnectionsPerServer and MaxConnectionsPer1_0Server), I could control this myself. So, I tried this (sat both to 20), restarted my computer, and tried to run my program again. Sadly though, it didn't seem to help any. I also kept an eye on the Resource Monitor (see screen shot) while running my batch lookup, and I noticed that my application (the one with the title blacked out) still only was using two TCP connections. So, the question is, why isn't this working? Is the post I linked to using the wrong registry values? Is this perhaps not possible to "hack" in Windows any longer (I'm on Windows 7)? Any ideas would be highly appreciated :) And just in case anyone should wonder, I have also tried with different settings for MaxThreads on ThreadPool (everyting from 10 to 100), and this didn't seem to affect my throughput at all, so the problem shouldn't be there either.

Read the article
[NHibernate and ASP.NET MVC] How can I implement a robust session-per-request pattern in my project,

- by Guillaume Gervais

I'm currently building an ASP.NET MVC project, with NHibernate as its persistance layer. For now, some functionnalities have been implemented, but only use local NHibernate sessions: each method that accessed the database (read or write) needs to instanciate its own NHibernate session, with the "using()" directive. The problem is that I want to leverage NHibernate's Lazy-Loading capabilities to improve the performance of my project. This implies an open NHibernate session per request until the view is rendered. Furthermore, simultaneous request must be supported (multiple Sessions at the same time). How can I achieve that as cleanly as possible? I searched the Web a little bit and learned about the session-per-request pattern. Most of the implementations I saw used some sort of Http* (HttpContext, etc.) object to store the session. Also, using the Application_BeginRequest/Application_EndRequest functions is complicated, since they get fired for each HTTP request (aspx files, css files, js files, etc.), when I only want to instanciate a session once per request. The concern that I have is that I don't want my views or controllers to have access to NHibernate sessions (or, more generally, NHibernate namespaces and code). That means that I do not want to handle sessions at the controller level nor the view one. I have a few options in mind. Which one seems the best ? Use interceptors (like in GRAILS) that get triggered before and after the controller action. These would open and close sessions/transactions. Is it possible in the ASP.NET MVC world? Use the CurrentSessionContext Singleton provided by NHibernate in a Web context. Using this page as an example, I think this is quite promising, but that still requires filters at the controller level. Use the HttpContext.Current.Items to store the request session. This, coupled with a few lines of code in Global.asax.cs, can easily provide me with a session on the request level. However, it means that dependencies will be injected between NHibernate and my views (HttpContext). Thank you very much!

Read the article
The Incremental Architect’s Napkin - #5 - Design functions for extensibility and readability

- by Ralf Westphal

Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/08/24/the-incremental-architectrsquos-napkin---5---design-functions-for.aspx The functionality of programs is entered via Entry Points. So what we´re talking about when designing software is a bunch of functions handling the requests represented by and flowing in through those Entry Points. Designing software thus consists of at least three phases: Analyzing the requirements to find the Entry Points and their signatures Designing the functionality to be executed when those Entry Points get triggered Implementing the functionality according to the design aka coding I presume, you´re familiar with phase 1 in some way. And I guess you´re proficient in implementing functionality in some programming language. But in my experience developers in general are not experienced in going through an explicit phase 2. “Designing functionality? What´s that supposed to mean?” you might already have thought. Here´s my definition: To design functionality (or functional design for short) means thinking about… well, functions. You find a solution for what´s supposed to happen when an Entry Point gets triggered in terms of functions. A conceptual solution that is, because those functions only exist in your head (or on paper) during this phase. But you may have guess that, because it´s “design” not “coding”. And here is, what functional design is not: It´s not about logic. Logic is expressions (e.g. +, -, && etc.) and control statements (e.g. if, switch, for, while etc.). Also I consider calling external APIs as logic. It´s equally basic. It´s what code needs to do in order to deliver some functionality or quality. Logic is what´s doing that needs to be done by software. Transformations are either done through expressions or API-calls. And then there is alternative control flow depending on the result of some expression. Basically it´s just jumps in Assembler, sometimes to go forward (if, switch), sometimes to go backward (for, while, do). But calling your own function is not logic. It´s not necessary to produce any outcome. Functionality is not enhanced by adding functions (subroutine calls) to your code. Nor is quality increased by adding functions. No performance gain, no higher scalability etc. through functions. Functions are not relevant to functionality. Strange, isn´t it. What they are important for is security of investment. By introducing functions into our code we can become more productive (re-use) and can increase evolvability (higher unterstandability, easier to keep code consistent). That´s no small feat, however. Evolvable code can hardly be overestimated. That´s why to me functional design is so important. It´s at the core of software development. To sum this up: Functional design is on a level of abstraction above (!) logical design or algorithmic design. Functional design is only done until you get to a point where each function is so simple you are very confident you can easily code it. Functional design an logical design (which mostly is coding, but can also be done using pseudo code or flow charts) are complementary. Software needs both. If you start coding right away you end up in a tangled mess very quickly. Then you need back out through refactoring. Functional design on the other hand is bloodless without actual code. It´s just a theory with no experiments to prove it. But how to do functional design? An example of functional design Let´s assume a program to de-duplicate strings. The user enters a number of strings separated by commas, e.g. a, b, a, c, d, b, e, c, a. And the program is supposed to clear this list of all doubles, e.g. a, b, c, d, e. There is only one Entry Point to this program: the user triggers the de-duplication by starting the program with the string list on the command line C:\>deduplicate "a, b, a, c, d, b, e, c, a" a, b, c, d, e …or by clicking on a GUI button. This leads to the Entry Point function to get called. It´s the program´s main function in case of the batch version or a button click event handler in the GUI version. That´s the physical Entry Point so to speak. It´s inevitable. What then happens is a three step process: Transform the input data from the user into a request. Call the request handler. Transform the output of the request handler into a tangible result for the user. Or to phrase it a bit more generally: Accept input. Transform input into output. Present output. This does not mean any of these steps requires a lot of effort. Maybe it´s just one line of code to accomplish it. Nevertheless it´s a distinct step in doing the processing behind an Entry Point. Call it an aspect or a responsibility - and you will realize it most likely deserves a function of its own to satisfy the Single Responsibility Principle (SRP). Interestingly the above list of steps is already functional design. There is no logic, but nevertheless the solution is described - albeit on a higher level of abstraction than you might have done yourself. But it´s still on a meta-level. The application to the domain at hand is easy, though: Accept string list from command line De-duplicate Present de-duplicated strings on standard output And this concrete list of processing steps can easily be transformed into code:static void Main(string[] args) { var input = Accept_string_list(args); var output = Deduplicate(input); Present_deduplicated_string_list(output); } Instead of a big problem there are three much smaller problems now. If you think each of those is trivial to implement, then go for it. You can stop the functional design at this point. But maybe, just maybe, you´re not so sure how to go about with the de-duplication for example. Then just implement what´s easy right now, e.g.private static string Accept_string_list(string[] args) { return args[0]; } private static void Present_deduplicated_string_list( string[] output) { var line = string.Join(", ", output); Console.WriteLine(line); } Accept_string_list() contains logic in the form of an API-call. Present_deduplicated_string_list() contains logic in the form of an expression and an API-call. And then repeat the functional design for the remaining processing step. What´s left is the domain logic: de-duplicating a list of strings. How should that be done? Without any logic at our disposal during functional design you´re left with just functions. So which functions could make up the de-duplication? Here´s a suggestion: De-duplicate Parse the input string into a true list of strings. Register each string in a dictionary/map/set. That way duplicates get cast away. Transform the data structure into a list of unique strings. Processing step 2 obviously was the core of the solution. That´s where real creativity was needed. That´s the core of the domain. But now after this refinement the implementation of each step is easy again:private static string[] Parse_string_list(string input) { return input.Split(',') .Select(s => s.Trim()) .ToArray(); } private static Dictionary<string,object> Compile_unique_strings(string[] strings) { return strings.Aggregate( new Dictionary<string, object>(), (agg, s) => { agg[s] = null; return agg; }); } private static string[] Serialize_unique_strings( Dictionary<string,object> dict) { return dict.Keys.ToArray(); } With these three additional functions Main() now looks like this:static void Main(string[] args) { var input = Accept_string_list(args); var strings = Parse_string_list(input); var dict = Compile_unique_strings(strings); var output = Serialize_unique_strings(dict); Present_deduplicated_string_list(output); } I think that´s very understandable code: just read it from top to bottom and you know how the solution to the problem works. It´s a mirror image of the initial design: Accept string list from command line Parse the input string into a true list of strings. Register each string in a dictionary/map/set. That way duplicates get cast away. Transform the data structure into a list of unique strings. Present de-duplicated strings on standard output You can even re-generate the design by just looking at the code. Code and functional design thus are always in sync - if you follow some simple rules. But about that later. And as a bonus: all the functions making up the process are small - which means easy to understand, too. So much for an initial concrete example. Now it´s time for some theory. Because there is method to this madness ;-) The above has only scratched the surface. Introducing Flow Design Functional design starts with a given function, the Entry Point. Its goal is to describe the behavior of the program when the Entry Point is triggered using a process, not an algorithm. An algorithm consists of logic, a process on the other hand consists just of steps or stages. Each processing step transforms input into output or a side effect. Also it might access resources, e.g. a printer, a database, or just memory. Processing steps thus can rely on state of some sort. This is different from Functional Programming, where functions are supposed to not be stateful and not cause side effects.[1] In its simplest form a process can be written as a bullet point list of steps, e.g. Get data from user Output result to user Transform data Parse data Map result for output Such a compilation of steps - possibly on different levels of abstraction - often is the first artifact of functional design. It can be generated by a team in an initial design brainstorming. Next comes ordering the steps. What should happen first, what next etc.? Get data from user Parse data Transform data Map result for output Output result to user That´s great for a start into functional design. It´s better than starting to code right away on a given function using TDD. Please get me right: TDD is a valuable practice. But it can be unnecessarily hard if the scope of a functionn is too large. But how do you know beforehand without investing some thinking? And how to do this thinking in a systematic fashion? My recommendation: For any given function you´re supposed to implement first do a functional design. Then, once you´re confident you know the processing steps - which are pretty small - refine and code them using TDD. You´ll see that´s much, much easier - and leads to cleaner code right away. For more information on this approach I call “Informed TDD” read my book of the same title. Thinking before coding is smart. And writing down the solution as a bunch of functions possibly is the simplest thing you can do, I´d say. It´s more according to the KISS (Keep It Simple, Stupid) principle than returning constants or other trivial stuff TDD development often is started with. So far so good. A simple ordered list of processing steps will do to start with functional design. As shown in the above example such steps can easily be translated into functions. Moving from design to coding thus is simple. However, such a list does not scale. Processing is not always that simple to be captured in a list. And then the list is just text. Again. Like code. That means the design is lacking visuality. Textual representations need more parsing by your brain than visual representations. Plus they are limited in their “dimensionality”: text just has one dimension, it´s sequential. Alternatives and parallelism are hard to encode in text. In addition the functional design using numbered lists lacks data. It´s not visible what´s the input, output, and state of the processing steps. That´s why functional design should be done using a lightweight visual notation. No tool is necessary to draw such designs. Use pen and paper; a flipchart, a whiteboard, or even a napkin is sufficient. Visualizing processes The building block of the functional design notation is a functional unit. I mostly draw it like this: Something is done, it´s clear what goes in, it´s clear what comes out, and it´s clear what the processing step requires in terms of state or hardware. Whenever input flows into a functional unit it gets processed and output is produced and/or a side effect occurs. Flowing data is the driver of something happening. That´s why I call this approach to functional design Flow Design. It´s about data flow instead of control flow. Control flow like in algorithms is of no concern to functional design. Thinking about control flow simply is too low level. Once you start with control flow you easily get bogged down by tons of details. That´s what you want to avoid during design. Design is supposed to be quick, broad brush, abstract. It should give overview. But what about all the details? As Robert C. Martin rightly said: “Programming is abot detail”. Detail is a matter of code. Once you start coding the processing steps you designed you can worry about all the detail you want. Functional design does not eliminate all the nitty gritty. It just postpones tackling them. To me that´s also an example of the SRP. Function design has the responsibility to come up with a solution to a problem posed by a single function (Entry Point). And later coding has the responsibility to implement the solution down to the last detail (i.e. statement, API-call). TDD unfortunately mixes both responsibilities. It´s just coding - and thereby trying to find detailed implementations (green phase) plus getting the design right (refactoring). To me that´s one reason why TDD has failed to deliver on its promise for many developers. Using functional units as building blocks of functional design processes can be depicted very easily. Here´s the initial process for the example problem: For each processing step draw a functional unit and label it. Choose a verb or an “action phrase” as a label, not a noun. Functional design is about activities, not state or structure. Then make the output of an upstream step the input of a downstream step. Finally think about the data that should flow between the functional units. Write the data above the arrows connecting the functional units in the direction of the data flow. Enclose the data description in brackets. That way you can clearly see if all flows have already been specified. Empty brackets mean “no data is flowing”, but nevertheless a signal is sent. A name like “list” or “strings” in brackets describes the data content. Use lower case labels for that purpose. A name starting with an upper case letter like “String” or “Customer” on the other hand signifies a data type. If you like, you also can combine descriptions with data types by separating them with a colon, e.g. (list:string) or (strings:string[]). But these are just suggestions from my practice with Flow Design. You can do it differently, if you like. Just be sure to be consistent. Flows wired-up in this manner I call one-dimensional (1D). Each functional unit just has one input and/or one output. A functional unit without an output is possible. It´s like a black hole sucking up input without producing any output. Instead it produces side effects. A functional unit without an input, though, does make much sense. When should it start to work? What´s the trigger? That´s why in the above process even the first processing step has an input. If you like, view such 1D-flows as pipelines. Data is flowing through them from left to right. But as you can see, it´s not always the same data. It get´s transformed along its passage: (args) becomes a (list) which is turned into (strings). The Principle of Mutual Oblivion A very characteristic trait of flows put together from function units is: no functional units knows another one. They are all completely independent of each other. Functional units don´t know where their input is coming from (or even when it´s gonna arrive). They just specify a range of values they can process. And they promise a certain behavior upon input arriving. Also they don´t know where their output is going. They just produce it in their own time independent of other functional units. That means at least conceptually all functional units work in parallel. Functional units don´t know their “deployment context”. They now nothing about the overall flow they are place in. They are just consuming input from some upstream, and producing output for some downstream. That makes functional units very easy to test. At least as long as they don´t depend on state or resources. I call this the Principle of Mutual Oblivion (PoMO). Functional units are oblivious of others as well as an overall context/purpose. They are just parts of a whole focused on a single responsibility. How the whole is built, how a larger goal is achieved, is of no concern to the single functional units. By building software in such a manner, functional design interestingly follows nature. Nature´s building blocks for organisms also follow the PoMO. The cells forming your body do not know each other. Take a nerve cell “controlling” a muscle cell for example:[2] The nerve cell does not know anything about muscle cells, let alone the specific muscel cell it is “attached to”. Likewise the muscle cell does not know anything about nerve cells, let a lone a specific nerve cell “attached to” it. Saying “the nerve cell is controlling the muscle cell” thus only makes sense when viewing both from the outside. “Control” is a concept of the whole, not of its parts. Control is created by wiring-up parts in a certain way. Both cells are mutually oblivious. Both just follow a contract. One produces Acetylcholine (ACh) as output, the other consumes ACh as input. Where the ACh is going, where it´s coming from neither cell cares about. Million years of evolution have led to this kind of division of labor. And million years of evolution have produced organism designs (DNA) which lead to the production of these different cell types (and many others) and also to their co-location. The result: the overall behavior of an organism. How and why this happened in nature is a mystery. For our software, though, it´s clear: functional and quality requirements needs to be fulfilled. So we as developers have to become “intelligent designers” of “software cells” which we put together to form a “software organism” which responds in satisfying ways to triggers from it´s environment. My bet is: If nature gets complex organisms working by following the PoMO, who are we to not apply this recipe for success to our much simpler “machines”? So my rule is: Wherever there is functionality to be delivered, because there is a clear Entry Point into software, design the functionality like nature would do it. Build it from mutually oblivious functional units. That´s what Flow Design is about. In that way it´s even universal, I´d say. Its notation can also be applied to biology: Never mind labeling the functional units with nouns. That´s ok in Flow Design. You´ll do that occassionally for functional units on a higher level of abstraction or when their purpose is close to hardware. Getting a cockroach to roam your bedroom takes 1,000,000 nerve cells (neurons). Getting the de-duplication program to do its job just takes 5 “software cells” (functional units). Both, though, follow the same basic principle. Translating functional units into code Moving from functional design to code is no rocket science. In fact it´s straightforward. There are two simple rules: Translate an input port to a function. Translate an output port either to a return statement in that function or to a function pointer visible to that function. The simplest translation of a functional unit is a function. That´s what you saw in the above example. Functions are mutually oblivious. That why Functional Programming likes them so much. It makes them composable. Which is the reason, nature works according to the PoMO. Let´s be clear about one thing: There is no dependency injection in nature. For all of an organism´s complexity no DI container is used. Behavior is the result of smooth cooperation between mutually oblivious building blocks. Functions will often be the adequate translation for the functional units in your designs. But not always. Take for example the case, where a processing step should not always produce an output. Maybe the purpose is to filter input. Here the functional unit consumes words and produces words. But it does not pass along every word flowing in. Some words are swallowed. Think of a spell checker. It probably should not check acronyms for correctness. There are too many of them. Or words with no more than two letters. Such words are called “stop words”. In the above picture the optionality of the output is signified by the astrisk outside the brackets. It means: Any number of (word) data items can flow from the functional unit for each input data item. It might be none or one or even more. This I call a stream of data. Such behavior cannot be translated into a function where output is generated with return. Because a function always needs to return a value. So the output port is translated into a function pointer or continuation which gets passed to the subroutine when called:[3]void filter_stop_words( string word, Action<string> onNoStopWord) { if (...check if not a stop word...) onNoStopWord(word); } If you want to be nitpicky you might call such a function pointer parameter an injection. And technically you´re right. Conceptually, though, it´s not an injection. Because the subroutine is not functionally dependent on the continuation. Firstly continuations are procedures, i.e. subroutines without a return type. Remember: Flow Design is about unidirectional data flow. Secondly the name of the formal parameter is chosen in a way as to not assume anything about downstream processing steps. onNoStopWord describes a situation (or event) within the functional unit only. Translating output ports into function pointers helps keeping functional units mutually oblivious in cases where output is optional or produced asynchronically. Either pass the function pointer to the function upon call. Or make it global by putting it on the encompassing class. Then it´s called an event. In C# that´s even an explicit feature.class Filter { public void filter_stop_words( string word) { if (...check if not a stop word...) onNoStopWord(word); } public event Action<string> onNoStopWord; } When to use a continuation and when to use an event dependens on how a functional unit is used in flows and how it´s packed together with others into classes. You´ll see examples further down the Flow Design road. Another example of 1D functional design Let´s see Flow Design once more in action using the visual notation. How about the famous word wrap kata? Robert C. Martin has posted a much cited solution including an extensive reasoning behind his TDD approach. So maybe you want to compare it to Flow Design. The function signature given is:string WordWrap(string text, int maxLineLength) {...} That´s not an Entry Point since we don´t see an application with an environment and users. Nevertheless it´s a function which is supposed to provide a certain functionality. The text passed in has to be reformatted. The input is a single line of arbitrary length consisting of words separated by spaces. The output should consist of one or more lines of a maximum length specified. If a word is longer than a the maximum line length it can be split in multiple parts each fitting in a line. Flow Design Let´s start by brainstorming the process to accomplish the feat of reformatting the text. What´s needed? Words need to be assembled into lines Words need to be extracted from the input text The resulting lines need to be assembled into the output text Words too long to fit in a line need to be split Does sound about right? I guess so. And it shows a kind of priority. Long words are a special case. So maybe there is a hint for an incremental design here. First let´s tackle “average words” (words not longer than a line). Here´s the Flow Design for this increment: The the first three bullet points turned into functional units with explicit data added. As the signature requires a text is transformed into another text. See the input of the first functional unit and the output of the last functional unit. In between no text flows, but words and lines. That´s good to see because thereby the domain is clearly represented in the design. The requirements are talking about words and lines and here they are. But note the asterisk! It´s not outside the brackets but inside. That means it´s not a stream of words or lines, but lists or sequences. For each text a sequence of words is output. For each sequence of words a sequence of lines is produced. The asterisk is used to abstract from the concrete implementation. Like with streams. Whether the list of words gets implemented as an array or an IEnumerable is not important during design. It´s an implementation detail. Does any processing step require further refinement? I don´t think so. They all look pretty “atomic” to me. And if not… I can always backtrack and refine a process step using functional design later once I´ve gained more insight into a sub-problem. Implementation The implementation is straightforward as you can imagine. The processing steps can all be translated into functions. Each can be tested easily and separately. Each has a focused responsibility. And the process flow becomes just a sequence of function calls: Easy to understand. It clearly states how word wrapping works - on a high level of abstraction. And it´s easy to evolve as you´ll see. Flow Design - Increment 2 So far only texts consisting of “average words” are wrapped correctly. Words not fitting in a line will result in lines too long. Wrapping long words is a feature of the requested functionality. Whether it´s there or not makes a difference to the user. To quickly get feedback I decided to first implement a solution without this feature. But now it´s time to add it to deliver the full scope. Fortunately Flow Design automatically leads to code following the Open Closed Principle (OCP). It´s easy to extend it - instead of changing well tested code. How´s that possible? Flow Design allows for extension of functionality by inserting functional units into the flow. That way existing functional units need not be changed. The data flow arrow between functional units is a natural extension point. No need to resort to the Strategy Pattern. No need to think ahead where extions might need to be made in the future. I just “phase in” the remaining processing step: Since neither Extract words nor Reformat know of their environment neither needs to be touched due to the “detour”. The new processing step accepts the output of the existing upstream step and produces data compatible with the existing downstream step. Implementation - Increment 2 A trivial implementation checking the assumption if this works does not do anything to split long words. The input is just passed on: Note how clean WordWrap() stays. The solution is easy to understand. A developer looking at this code sometime in the future, when a new feature needs to be build in, quickly sees how long words are dealt with. Compare this to Robert C. Martin´s solution:[4] How does this solution handle long words? Long words are not even part of the domain language present in the code. At least I need considerable time to understand the approach. Admittedly the Flow Design solution with the full implementation of long word splitting is longer than Robert C. Martin´s. At least it seems. Because his solution does not cover all the “word wrap situations” the Flow Design solution handles. Some lines would need to be added to be on par, I guess. But even then… Is a difference in LOC that important as long as it´s in the same ball park? I value understandability and openness for extension higher than saving on the last line of code. Simplicity is not just less code, it´s also clarity in design. But don´t take my word for it. Try Flow Design on larger problems and compare for yourself. What´s the easier, more straightforward way to clean code? And keep in mind: You ain´t seen all yet ;-) There´s more to Flow Design than described in this chapter. In closing I hope I was able to give you a impression of functional design that makes you hungry for more. To me it´s an inevitable step in software development. Jumping from requirements to code does not scale. And it leads to dirty code all to quickly. Some thought should be invested first. Where there is a clear Entry Point visible, it´s functionality should be designed using data flows. Because with data flows abstraction is possible. For more background on why that´s necessary read my blog article here. For now let me point out to you - if you haven´t already noticed - that Flow Design is a general purpose declarative language. It´s “programming by intention” (Shalloway et al.). Just write down how you think the solution should work on a high level of abstraction. This breaks down a large problem in smaller problems. And by following the PoMO the solutions to those smaller problems are independent of each other. So they are easy to test. Or you could even think about getting them implemented in parallel by different team members. Flow Design not only increases evolvability, but also helps becoming more productive. All team members can participate in functional design. This goes beyon collective code ownership. We´re talking collective design/architecture ownership. Because with Flow Design there is a common visual language to talk about functional design - which is the foundation for all other design activities. PS: If you like what you read, consider getting my ebook “The Incremental Architekt´s Napkin”. It´s where I compile all the articles in this series for easier reading. I like the strictness of Function Programming - but I also find it quite hard to live by. And it certainly is not what millions of programmers are used to. Also to me it seems, the real world is full of state and side effects. So why give them such a bad image? That´s why functional design takes a more pragmatic approach. State and side effects are ok for processing steps - but be sure to follow the SRP. Don´t put too much of it into a single processing step. ? Image taken from www.physioweb.org ? My code samples are written in C#. C# sports typed function pointers called delegates. Action is such a function pointer type matching functions with signature void someName(T t). Other languages provide similar ways to work with functions as first class citizens - even Java now in version 8. I trust you find a way to map this detail of my translation to your favorite programming language. I know it works for Java, C++, Ruby, JavaScript, Python, Go. And if you´re using a Functional Programming language it´s of course a no brainer. ? Taken from his blog post “The Craftsman 62, The Dark Path”. ?

Read the article
How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article
Distributed and/or Parallel SSIS processing

- by Jeff

Background: Our company hosts SaaS DSS applications, where clients provide us data Daily and/or Weekly, which we process & merge into their existing database. During business hours, load in the servers are pretty minimal as it's mostly users running simple pre-defined queries via the website, or running drill-through reports that mostly hit the SSAS OLAP cube. I manage the IT Operations Team, and so far this has presented an interesting "scaling" issue for us. For our daily-refreshed clients, the server is only "busy" for about 4-6 hrs at night. For our weekly-refresh clients, the server is only "busy" for maybe 8-10 hrs per week! We've done our best to use some simple methods of distributing the load by spreading the daily clients evenly among the servers such that we're not trying to process daily clients back-to-back over night. But long-term this scaling strategy creates two notable issues. First, it's going to consume a pretty immense amount of hardware that sits idle for large periods of time. Second, it takes significant Production Support over-head to basically "schedule" the ETL such that they don't over-lap, and move clients/schedules around if they out-grow the resources on a particular server or allocated time-slot. As the title would imply, one option we've tried is running multiple SSIS packages in parallel, but in most cases this has yielded VERY inconsistent results. The most common failures are DTExec, SQL, and SSAS fighting for physical memory and throwing out-of-memory errors, and ETLs running 3,4,5x longer than expected. So from my practical experience thus far, it seems like running multiple ETL packages on the same hardware isn't a good idea, but I can't be the first person that doesn't want to scale multiple ETLs around manual scheduling, and sequential processing. One option we've considered is virtualizing the servers, which obviously doesn't give you any additional resources, but moves the resource contention onto the hypervisor, which (from my experience) seems to manage simultaneous CPU/RAM/Disk I/O a little more gracefully than letting DTExec, SQL, and SSAS battle it out within Windows. Question to the forum: So my question to the forum is, are we missing something obvious here? Are there tools out there that can help manage running multiple SSIS packages on the same hardware? Would it be more "efficient" in terms of parallel execution if instead of running DTExec, SQL, and SSAS same machine (with every machine running that configuration), we run in pairs of three machines with SSIS running on one machine, SQL on another, and SSAS on a third? Obviously that would only make sense if we could process more than the three ETL we were able to process on the machine independently. Another option we've considered is completely re-architecting our SSIS package to have one "master" package for all clients that attempts to intelligently chose a server based off how "busy" it already is in terms of CPU/Memory/Disk utilization, but that would be a herculean effort, and seems like we're trying to reinvent something that you would think someone would sell (although I haven't had any luck finding it). So in summary, are we missing an obvious solution for this, and does anyone know if any tools (for free or for purchase, doesn't matter) that facilitate running multiple SSIS ETL packages in parallel and on multiple servers? (What I would call a "queue & node based" system, but that's not an official term). Ultimately VMWare's Distributed Resource Scheduler addresses this as you simply run a consistent number of clients per VM that you know will never conflict scheduleing-wise, then leave it up to VMWare to move the VMs around to balance out hardware usage. I'm definitely not against using VMWare to do this, but since we're a 100% Microsoft app stack, it seems like -someone- out there would have solved this problem at the application layer instead of the hypervisor layer by checking on resource utilization at the OS, SQL, SSAS levels. I'm open to ANY discussion on this, and remember no suggestion is too crazy or radical! :-) Right now, VMWare is the only option we've found to get away from "manually" balancing our resources, so any suggestions that leave us on a pure Microsoft stack would be great. Thanks guys, Jeff

Read the article
Are there alternatives to Sysinternals ADInsight?

- by mmcglynn

I had been using ADInsight from Sysinternals to trace Active Directory calls from my workstation, but the application has failed. Where previously the Active Directory events were traced and logged, now the window remains blank, whether the application is in capture mode or not. I have run as Administrator, rebooted, downloaded a new version; none of those actions has returned the program to a functional state. The Sysinternals forums don't offer much hope, since this tool is known to fail often. Is there tool that has similar functionality? Questions Does the tool fail when run from another workstation with your account? Yes Does it fail from your (and/or) another workstation using someone else's account? Yes Is there anything in the event log of your workstation? No

Read the article
Reporting Services Returning HTTP 401 Unauthorized

- by Chris Arnold

I have just ported an existing ASP.NET application to a new web server (Windows Server 2008 R2 and SQL Server 2008). It is successfully running on 4 other servers of varying O/S (which I also setup). My ASP.NET app calls into the Reporting Services Web Service (ReportExecution2005.asmx) to generate a report and save it as a pdf to the file system. I consistently receive "System.Net.WebException - The request failed with HTTP status 401: Unauthorized." In UTTER desperation I have performed the following... Granted all Users complete access to SSRS via the Reports web page. Granted all Users 'Full control' to <%ProgramFiles%\Microsoft SQL Server\MSRS10.MSSQLSERVER I am not a network / server specialist but I'm the only one that can deal with this and it's driving me batty. Help!

Read the article
Cisco Anyconnect Issue on HTC HD2

- by Myles

Hello, We've just got a HTC HD2 handset through (UK - T-mobile); and we've installed the Cisco Anyconnect client. It connects ok but then after a few seconds disconnects, then reconnects. It then keeps cycling through in this way, and at no point can we even attempt to sync Exchange! Our ASA 5510 reports; Group User IP <149.254.217.2 SVC Message: 17/ERROR: Reconnecting to recover from error.. And from the phone log; 10:56:03Debug Function: CSocketTransport::getTransportMTU File: ..\IPC\SocketTransport.cpp Line: 1058 Invoked Function: CNetInterface::GetTcpIpMTU Return Code: -32571377 (0xFE0F000F) Description: NETINTERFACE_ERROR_INTERFACE_NOT_AVAILABLE Does anyone have any advice on why it's constantly disconnecting? The phone log does suggest a lack of service; but the phone can browse the net, make calls, etc and appears to have good signal throughout. We did try the Anyconnect software on a Windows 7 PC which worked fine, no drop outs. Any help would be greatly appreciated! Thank you, Myles

Read the article
Asterisk - Trying to use call files to create a conference call between two dynamic numbers

- by Hank

I'm trying to setup an Asterisk system that will allow me to create a conference call between two dynamic numbers. It seems I can use 'call files' to make Asterisk initiate the call without needing an incoming call - http://www.voip-info.org/tiki-index.php?page=Asterisk+auto-dial+out This example seems to be what I'd need: Channel: SIP/mytrunk/12345678 MaxRetries: 2 RetryTime: 60 WaitTime: 30 Context: callme Extension: 800 Priority: 2 I can generate this file with some scripting language and then place it into the Asterisk Call File folder. The problem I'm having is: How do I call out to two numbers and join them in a conference call? The MeetMe plugin/extension seems to be what I need in terms of conference calling, I'm just unsure as to how I'd use the two together and join them. Also, is it possible to have multiple 2-person conference calls at the same time? Is setting this up as simple as setting aside X amount of 'channels' in the meetme.conf?

Read the article
Ubuntu keyboard detection from bash script

- by Ryan Brubaker

Excuse my ignorance of linux OS/hardware issues...I'm just a programmer :) I have an application that calls out to some bash scripts to launch external applications, in this case Firefox. The application runs on a kiosk with touch screen capability. When launching Firefox, I also launch a virtual keyboard application that allows the user to have keyboard input. However, the kiosk also has both PS/2 and USB slots that would allow a user to plug-in a keyboard. If a keyboard were plugged in, it would be nice if I didn't have to launch the virtual keyboard and provide more screen space for the Firefox window. Is there a way for me to detect if a keyboard is plugged in from the bash script? Would it show up in /dev, and if so, would it show up at a consistent location? Would it make a difference if the user used a PS/2 or USB keyboard? Thanks!

Read the article
Windows Server 2008 CMD Task Schedule not running

- by Jonathan Platovsky

I have a BAT/CMD file that when run from the command prompt runs completely. When I run it through the Task Scheduler it partially runs. Here is a copy of the file cd\sqlbackup ren Apps_Backup*.* Apps.Bak ren Apps_Was_Backup*.* Apps_Was.Bak xcopy /Y c:\sqlbackup\*.bak c:\sqlbackup\11\*.bak xcopy /y c:\sqlbackup\*.bak \\igweb01\c$\sqlbackup\*.bak Move /y c:\sqlbackup\*.bak "\\igsrv01\d$\sql backup\" The last two lines do not run when the task scheduler calls it. But again, work when manually run from the command line. All the local sever commands run but when it comes to the last two lines where it goes to another server then it does not work.

Read the article
mysql UDF : fopen = permission denied

- by lindenb

Hi All, this is question I already asked on SO but I wonder if this could be a SysAdmin problem. I'm trying to create a mysql UDF function , this function calls "fopen/fclose" to read a flat file stored in /data. But using errno (yes, I know it is bad in a MT program...) I can see that the function cannot open my file: "Permission denied" I tried to do a chmod -R 755 /data (as well as 777, chown -R mysql:mysql /data etc...) but it didn't change anything. when I copied the flat file to /tmp : OK, my UDF was able to 'fopen' the file. I'm puzzled. currently , I've got: drwxrwxrwx 4 pierre root 4096 2010-05-26 16:51 /data drwxrwxrwx 3 pierre root 4096 2010-05-18 09:41 /data/dir1 drwxrwxrwx 3 pierre root 4096 2010-05-18 09:41 /data/dir1/dir2 drwxrwxrwx 4 pierre root 4096 2010-05-18 10:27 /data/dir1/dir2/dir3 -rw-r--r-- 1 pierre root 50685268 2005-12-10 00:01 /data/dir1/dir2/dir3/myfile.txt Any idea ?

Read the article
go back to original version of firefox in ubuntu from beta version

- by Jack Coroman

In ubuntu 9.04, I tried to upgrade from firefox 3.0 to 3.5, by installing some apt-get packages, and there is a problem! Now firefox calls itself "Namoroka" and the firefox logo is gone and replaced by a black square in the upper bar and it says it is a development beta version. I really don't like this version, how can I go back to the stable version of firefox? I tried apt-get remove firefox-3.5 and apt-get install firefox-3.0 and that did not work. How do I go back to the stable version of firefox?

Read the article
WAN interface responds as LAN when request comes from LAN, is that correct?

- by Eugenio Miró

Hi Everyone! I have a problem with my router/modem. I've published an HTTP service from one of my internal computers and when I access the service from the internal lan using the external IP address the modem responds instead of redirecting the call to the forwarded port. I can access the service from outside however, but from the internal network the modem responds to my calls. I'm using a ZTE ZXDSL 831 Series modem with ZXDSL 831IIV7.5.1e_E09_BR1 firmware. Thanks in advance!

Read the article
Crontab script on Mac OS X Lion does not work anymore

- by Nopster

I have a problem with cron tasks. Previously this script worked fine on Mac OS X 10.6 server, but when I initialize it on Lion (client), this script stopped working. Basically, this .bat file calls a jar file (that invokes a loop of mysqldump commands) to backup several databases on several servers, and runs perfectly if launched by the shell. cd /Users/nameoftheuser/Desktop/backupper /usr/bin/java -cp .:Backupper.jar:lib/mail.jar backupper.Main "/Users/nameoftheuser/Desktop/backupper/listasiti.txt" "/Users/nameofthe/Desktop/backupper/config.properties But if the cron launches the same .bat file, the generated database backups are 0 bytes. The cron entry is: 0 0 sh /Users/path/to/file.bat I believe that the problem is that cron doesn't run as root. Or what else?

Read the article
Mod_rewrite display subdomain.domain.com and call domain.com/subdomain/ for SSL

- by Jeff H.

I have a website secured by a standard SSL certificate, securing a few different shops under different subdirectories. Ex. domain.com/shop1/ The shops are also accessible via a subdomain e.g. shop1.domain.com. What I'm trying to accomplish: display shop1.domain.com to the user, while keeping all of the actual server calls as domain.com/shop1, so that the secure pages will continue to work properly. (Not sure if I'm using the proper language, exactly, I hope my point is clear.) To be clear: my SSL is working fine, and I don't need help with that, and I don't need or want to purchase a UCC cert. It can't be that difficult for anyone with experience with Apache. (I've spent 3 hours trying to learn about mod_rewrite. It's just not clicking.) I'm on a GoDaddy secure shared server, so please keep in mind that I'm not able to reset the server or anything.

Read the article
Asterisk - Trying to use call files to create a conference call between two dynamic numbers

- by Hank

I'm trying to setup an Asterisk system that will allow me to create a conference call between two dynamic numbers. It seems I can use 'call files' to make Asterisk initiate the call without needing an incoming call - http://www.voip-info.org/tiki-index.php?page=Asterisk+auto-dial+out This example seems to be what I'd need: Channel: SIP/mytrunk/12345678 MaxRetries: 2 RetryTime: 60 WaitTime: 30 Context: callme Extension: 800 Priority: 2 I can generate this file with some scripting language and then place it into the Asterisk Call File folder. The problem I'm having is: How do I call out to two numbers and join them in a conference call? The MeetMe plugin/extension seems to be what I need in terms of conference calling, I'm just unsure as to how I'd use the two together and join them. Also, is it possible to have multiple 2-person conference calls at the same time? Is setting this up as simple as setting aside X amount of 'channels' in the meetme.conf?

Read the article
IIS7 500 error if form is submitted more than 120 seconds after it was originally loaded

- by user41170

Summary: When I submit a form (via POST) to php more than 120 seconds after the form was first called I get a 500 server error. If I submit before 120 seconds, the form works fine. I'm obviously encountering some kind of timeout but I don't know where to look. I've enabled failed request tracing and received the following error: ModuleName="FastCgiModule", Notification="EXECUTE_REQUEST_HANDLER", HttpStatus="500", HttpReason="Internal Server Error", HttpSubStatus="0", ErrorCode="The I/O operation has been aborted because of either a thread exit or an application request. (0x800703e3)", ConfigExceptionInfo="" I don't understand why it matters when the form is submitted relative to the time the form is first rendered (note: the actual form processing takes a fraction of a second). Shouldn't these two requests be stateless? Suggestions? Configuration: Windows Vista SP1 IIS7 php 5.2.x under fastCGI This is a development server running on my laptop. All calls to IIS are coming from localhost.

Read the article
MS Office Communicator: Long delays in setting up audio connection when starting a call

- by geofftnz

I am using Microsoft Office Communicator with a USB headset as my work phone. OCS is connected to our PABX so we can take and make calls to regular, non-OCS phones. When making an external call to a cellphone, it can take up to 5-10 seconds for audio to start flowing. eg: Work Phone Cellphone - dial cellphone (ringing) (ringing) answer cellphone (hearing nothing) speak "1" . speak "2" . speak "3" . ... . speak "14" hear "15" speak "15" hear "16" speak "16" Has anyone experienced this kind of thing with an OCS setup? Any pointers?

Read the article

Search Results

Search found 7436 results on 298 pages for 'simultaneous calls'.

Page 116/298 | < Previous Page | 112 113 114 115 116 117 118 119 120 121 122 123 | Next Page >

- by Doug McClean

- by Rob

- by MusiGenesis

- by Peter Perhác

- by oliwr

- by Marcus Whybrow

- by Nailuj

- by Guillaume Gervais

- by Ralf Westphal

- by rchrd

- by Jeff

- by mmcglynn

- by Chris Arnold

- by Myles

- by Hank

- by Ryan Brubaker

- by Jonathan Platovsky

- by lindenb

- by Jack Coroman

- by Eugenio Miró

- by Nopster

- by Jeff H.

- by Hank

- by user41170

- by geofftnz

< Previous Page | 112 113 114 115 116 117 118 119 120 121 122 123 | Next Page >