Search Results

Search found 4841 results on 194 pages for 'poor programmer'.

Page 186/194 | < Previous Page | 182 183 184 185 186 187 188 189 190 191 192 193 | Next Page >

Are there any platforms where using structure copy on an fd_set (for select() or pselect()) causes p

- by Jonathan Leffler

The select() and pselect() system calls modify their arguments (the 'struct fd_set *' arguments), so the input value tells the system which file descriptors to check and the return values tell the programmer which file descriptors are currently usable. If you are going to call them repeatedly for the same set of file descriptors, you need to ensure that you have a fresh copy of the descriptors for each call. The obvious way to do that is to use a structure copy: struct fd_set ref_set_rd; struct fd_set ref_set_wr; struct fd_set ref_set_er; ... ...code to set the reference fd_set_xx values... ... while (!done) { struct fd_set act_set_rd = ref_set_rd; struct fd_set act_set_wr = ref_set_wr; struct fd_set act_set_er = ref_set_er; int bits_set = select(max_fd, &act_set_rd, &act_set_wr, &act_set_er, &timeout); if (bits_set > 0) { ...process the output values of act_set_xx... } } My question: Are there any platforms where it is not safe to do a structure copy of the struct fd_set values as shown? I'm concerned lest there be hidden memory allocation or anything unexpected like that. (There are macros/functions FD_SET(), FD_CLR(), FD_ZERO() and FD_ISSET() to mask the internals from the application.) I can see that MacOS X (Darwin) is safe; other BSD-based systems are likely to be safe, therefore. You can help by documenting other systems that you know are safe in your answers. (I do have minor concerns about how well the struct fd_set would work with more than 8192 open file descriptors - the default maximum number of open files is only 256, but the maximum number is 'unlimited'. Also, since the structures are 1 KB, the copying code is not dreadfully efficient, but then running through a list of file descriptors to recreate the input mask on each cycle is not necessarily efficient either. Maybe you can't do select() when you have that many file descriptors open, though that is when you are most likely to need the functionality.) There's a related SO question - asking about 'poll() vs select()' which addresses a different set of issues from this question.

Read the article
Multiset container appears to stop sorting

- by Sarah

I would appreciate help debugging some strange behavior by a multiset container. Occasionally, the container appears to stop sorting. This is an infrequent error, apparent in only some simulations after a long time, and I'm short on ideas. (I'm an amateur programmer--suggestions of all kinds are welcome.) My container is a std::multiset that holds Event structs: typedef std::multiset< Event, std::less< Event > > EventPQ; with the Event structs sorted by their double time members: struct Event { public: explicit Event(double t) : time(t), eventID(), hostID(), s() {} Event(double t, int eid, int hid, int stype) : time(t), eventID( eid ), hostID( hid ), s(stype) {} bool operator < ( const Event & rhs ) const { return ( time < rhs.time ); } double time; ... }; The program iterates through periods of adding events with unordered times to EventPQ currentEvents and then pulling off events in order. Rarely, after some events have been added (with perfectly 'legal' times), events start getting executed out of order. What could make the events ever not get ordered properly? (Or what could mess up the iterator?) I have checked that all the added event times are legitimate (i.e., all exceed the current simulation time), and I have also confirmed that the error does not occur because two events happen to get scheduled for the same time. I'd love suggestions on how to work through this. The code for executing and adding events is below for the curious: double t = 0.0; double nextTimeStep = t + EPID_DELTA_T; EventPQ::iterator eventIter = currentEvents.begin(); while ( t < EPID_SIM_LENGTH ) { // Add some events to currentEvents while ( ( *eventIter ).time < nextTimeStep ) { Event thisEvent = *eventIter; t = thisEvent.time; executeEvent( thisEvent ); eventCtr++; currentEvents.erase( eventIter ); eventIter = currentEvents.begin(); } t = nextTimeStep; nextTimeStep += EPID_DELTA_T; } void Simulation::addEvent( double et, int eid, int hid, int s ) { assert( currentEvents.find( Event(et) ) == currentEvents.end() ); Event thisEvent( et, eid, hid, s ); currentEvents.insert( thisEvent ); }

Read the article
Converting this code from ASP to PHP

- by jethomas

I'll admit I'm a novice programmer and really the only experience I have is in classic ASP. I'm looking for a way to convert this asp code to PHP. For a customer who only has access to a linux box but also as a learning tool for me. Thanks in advance for the help: Recordset and Function: Function pd(n, totalDigits) if totalDigits > len(n) then pd = String(totalDigits-len(n),"0") & n else pd = n end if End Function 'declare the variables Dim Connection Dim Recordset Dim SQL Dim SQLDate SQLDate = Year(Date)& "-" & pd(Month(Date()),2)& "-" & pd(Day(Date()),2) 'declare the SQL statement that will query the database SQL = "SELECT * FROM tblXYZ WHERE element_8 = 2 AND element_9 > '" & SQLDate &"'" 'create an instance of the ADO connection and recordset objects Set Connection = Server.CreateObject("ADODB.Connection") Set Recordset = Server.CreateObject("ADODB.Recordset") 'open the connection to the database Connection.Open "PROVIDER=MSDASQL;DRIVER={MySQL ODBC 5.1 Driver};SERVER=localhost;UID=xxxxx;PWD=xxxxx;database=xxxxx;Option=3;" 'Open the recordset object executing the SQL statement and return records Recordset.Open SQL,Connection Display page/loop: Dim counter counter = 0 While Not Recordset.EOF counter = counter + 1 response.write("<div><td width='200' valign='top' align='center'><a href='" & Recordset("element_6") & "' style='text-decoration: none;'><div id='ad_header'>" & Recordset("element_3") & "</div><div id='store_name' valign='bottom'>" & Recordset("element_5") & "</div><img id='photo-small-img' src='http://xyz.com/files/" & Recordset("element_7") & "' /><br /><div id='ad_details'>"& Recordset("element_4") & "</div></a></td></div>") Recordset.MoveNext If counter = 3 Then Response.Write("</tr><tr>") counter = 0 End If Wend

Read the article
Is valgrind crazy or is this is a genuine std map iterator memory leak?

- by Alberto Toglia

Well, I'm very new to Valgrind and memory leak profilers in general. And I must say it is a bit scary when you start using them cause you can't stop wondering how many leaks you might have left unsolved before! To the point, as I'm not an experienced in c++ programmer, I would like to check if this is certainly a memory leak or is it that Valgrind is doing a false positive? typedef std::vector<int> Vector; typedef std::vector<Vector> VectorVector; typedef std::map<std::string, Vector*> MapVector; typedef std::pair<std::string, Vector*> PairVector; typedef std::map<std::string, Vector*>::iterator IteratorVector; VectorVector vv; MapVector m1; MapVector m2; vv.push_back(Vector()); m1.insert(PairVector("one", &vv.back())); vv.push_back(Vector()); m2.insert(PairVector("two", &vv.back())); IteratorVector i = m1.find("one"); i->second->push_back(10); m2.insert(PairVector("one", i->second)); m2.clear(); m1.clear(); vv.clear(); Why is that? Shouldn't the clear command call the destructor of every object and every vector? Now after doing some tests I found different solutions to the leak: 1) Deleting the line i-second-push_back(10); 2) adding a delete i-second; after it's been used. 3) Deleting the second vv.push_back(Vector()); and m2.insert(PairVector("two", &vv.back())); statements. Using solution 2) makes Valgring print: 10 allocs, 11 frees Is that OK? As I'm not using new why should I delete? Thanks, for any help!

Read the article
How to Automatically Refresh Data on Page Using Ajax on an Interval?

- by Karnak

I would like to load an XML file every 30 seconds and display its contents inside an HTML page. So far I know how to load the file, but I don't know how to automatically refresh it and display its updated contents. It would also be great if it did some error checking and if it displayed error.png image when it's not able to load data.xml file. Here is my code: <head> <script> window.XMLHttpRequest { xmlhttp = new XMLHttpRequest(); } xmlhttp.open("GET", "data.xml", false); xmlhttp.send(); loadXMLDoc = xmlhttp.responseXML; f = loadXMLDoc.getElementsByTagName("foo") function buildBar(i) { qux = (f[i].getElementsByTagName("qux")[0].childNodes[0].nodeValue); document.getElementById("displayBar").innerHTML = qux; } </script> </head> <body> <script> document.write("<ul>"); for (var i = 0; i < f.length; i++) { document.write("<li onclick='buildBar(" + i + ")'>"); document.write(f[i].getElementsByTagName("bar")[0].childNodes[0].nodeValue); document.write("</li>"); } document.write("</ul>"); </script> <div id="displayBar"> </div> </body> After searching the internet for a few hours I found many examples on how to do this, but I didn't know how to implement it in my particular case. I am not a programmer, so please be kind. I would really appriciate any help. It would mean a lot.

Read the article
Numpy/Python performing terribly vs. Matlab

- by Nissl

Novice programmer here. I'm writing a program that analyzes the relative spatial locations of points (cells). The program gets boundaries and cell type off an array with the x coordinate in column 1, y coordinate in column 2, and cell type in column 3. It then checks each cell for cell type and appropriate distance from the bounds. If it passes, it then calculates its distance from each other cell in the array and if the distance is within a specified analysis range it adds it to an output array at that distance. My cell marking program is in wxpython so I was hoping to develop this program in python as well and eventually stick it into the GUI. Unfortunately right now python takes ~20 seconds to run the core loop on my machine while MATLAB can do ~15 loops/second. Since I'm planning on doing 1000 loops (with a randomized comparison condition) on ~30 cases times several exploratory analysis types this is not a trivial difference. I tried running a profiler and array calls are 1/4 of the time, almost all of the rest is unspecified loop time. Here is the python code for the main loop: for basecell in range (0, cellnumber-1): if firstcelltype == np.array((cellrecord[basecell,2])): xloc=np.array((cellrecord[basecell,0])) yloc=np.array((cellrecord[basecell,1])) xedgedist=(xbound-xloc) yedgedist=(ybound-yloc) if xloc>excludedist and xedgedist>excludedist and yloc>excludedist and yedgedist>excludedist: for comparecell in range (0, cellnumber-1): if secondcelltype==np.array((cellrecord[comparecell,2])): xcomploc=np.array((cellrecord[comparecell,0])) ycomploc=np.array((cellrecord[comparecell,1])) dist=math.sqrt((xcomploc-xloc)**2+(ycomploc-yloc)**2) dist=round(dist) if dist>=1 and dist<=analysisdist: arraytarget=round(dist*analysisdist/intervalnumber) addone=np.array((spatialraw[arraytarget-1])) addone=addone+1 targetcell=arraytarget-1 np.put(spatialraw,[targetcell,targetcell],addone) Here is the matlab code for the main loop: for basecell = 1:cellnumber; if firstcelltype==cellrecord(basecell,3); xloc=cellrecord(basecell,1); yloc=cellrecord(basecell,2); xedgedist=(xbound-xloc); yedgedist=(ybound-yloc); if (xloc>excludedist) && (yloc>excludedist) && (xedgedist>excludedist) && (yedgedist>excludedist); for comparecell = 1:cellnumber; if secondcelltype==cellrecord(comparecell,3); xcomploc=cellrecord(comparecell,1); ycomploc=cellrecord(comparecell,2); dist=sqrt((xcomploc-xloc)^2+(ycomploc-yloc)^2); if (dist>=1) && (dist<=100.4999); arraytarget=round(dist*analysisdist/intervalnumber); spatialsum(1,arraytarget)=spatialsum(1,arraytarget)+1; end end end end end end Thanks!

Read the article
Achieving C# "readonly" behavior in C++

- by Tommy Fisk

Hi guys, this is my first question on stack overflow, so be gentle. Let me first explain the exact behavior I would like to see. If you are familiar with C# then you know that declaring a variable as "readonly" allows a programmer to assign some value to that variable exactly once. Further attempts to modify the variable will result in an error. What I am after: I want to make sure that any and all single-ton classes I define can be predictably instantiated exactly once in my program (more details at the bottom). My approach to realizing my goal is to use extern to declare a global reference to the single-ton (which I will later instantiate at a time I choose. What I have sort of looks like this, namespace Global { extern Singleton& mainInstance; // not defined yet, but it will be later! } int main() { // now that the program has started, go ahead and create the singleton object Singleton& Global::mainInstance = Singleton::GetInstance(); // invalid use of qualified name Global::mainInstance = Singleton::GetInstance(); // doesn't work either :( } class Singleton { /* Some details ommited */ public: Singleton& GetInstance() { static Singleton instance; // exists once for the whole program return instance; } } However this does not really work, and I don't know where to go from here. Some details about what I'm up against: I'm concerned about threading as I am working on code that will deal with game logic while communicating with several third-party processes and other processes I will create. Eventually I would have to implement some kind of synchronization so multiple threads could access the information in the Singleton class without worry. Because I don't know what kinds of optimizations I might like to do, or exactly what threading entails (never done a real project using it), I was thinking that being able to predictably control when Singletons were instantiated would be a Good Thing. Imagine if Process A creates Process B, where B contains several Singletons distributed against multiple files and/or libraries. It could be a real nightmare if I can not reliably ensure the order these singleton objects are instantiated (because they could depend on each other, and calling methods on a NULL object is generally a Bad Thing). If I were in C# I would just use the readonly keyword, but is there any way I can implement this (compiler supported) behavior in C++? Is this even a good idea? Thanks for any feedback.

Read the article
When may I ask a question to fellow developers? (Rules before asking questions).

- by Zwei Steinen

I assigned a quite simple task to one junior developer today, and he kept pinging me EVERY 5 minutes for HOURS, asking STEP BY STEP, what to do. Whenever something went wrong, he simply copy&pasted the log and basically wrote, "An exception occurred. What should I do?" So I finally had to tell him, "If you want to be a developer, please start thinking a little bit. Read the error message. That's what they are for!". I also however, tell junior developers to ask questions before spending too much time trying to solve it themselves. This might sound contradictory, but I feel there is some kind of an implicit rule that distinguishes questions that should be asked fairly quickly and that should not (and I try to follow those rules when I ask questions..) So my question is, do you have any rules that you follow, or expect others to follow on asking questions? If so, what are they? Let me start with my own. If you have struggled for more than 90 min, you may ask that question (exceptions exists). If you haven't struggled for more than 15 min, you may not ask that question (if you are sure that the answer can not be found within 15 min, this rule does not have to apply). If it is completely out of your domain and you do not plan to learn that domain, you may ask that question after 15 min (e.g. if I am a java programmer and need to back up the DB, I may ask the DBA what procedure to follow after googling for 15 min). If it is a "local" question, whose answer is difficult to derive or for which resources is difficult to get (e.g. asking an colleague "what method xxx does" etc.), you may ask that question after 15 min. If the answer for it is difficult to derive, and you know that the other person knows the answer, you may ask the question after 15 min (e.g. asking a hibernate expert "What do I need to change else to make this work?". If the process to derive the answer is interesting and is a good learning opportunity, you may ask for hints but you may not ask for answers! What are your rules?

Read the article
c# video equivalent to image.fromstream? Or changing the scope of the following script to allow vide

- by Daniel

The following is a part of an upload class in a c# script. I'm a php programmer, I've never messed with c# much but I'm trying to learn. This upload script will not handle anything except images, I need to adapt this class to handle other types of media also, or rewrite it all together. If I'm correct, I realize that using (Image image = Image.FromStream(file.InputStream)) basically says that the scope of the following is Image, only an image can be used or the object is discarded? And also that the variable image is being created from an Image from the file stream, which I understand to be, like... the $_FILES array in php? I dunno, I don't really care about making thumbnails right now either way, so if this can be taken out and still process the upload I'm totally cool with that, I just haven't had any luck getting this thing to take anything but images, even when commenting out that whole part of the class... protected void Page_Load(object sender, EventArgs e) { string dir = Path.Combine(Request.PhysicalApplicationPath, "files"); if (Request.Files.Count == 0) { // No files were posted Response.StatusCode = 500; } else { try { // Only one file at a time is posted HttpPostedFile file = Request.Files[0]; // Size limit 100MB if (file.ContentLength > 102400000) { // File too large Response.StatusCode = 500; } else { string id = Request.QueryString["userId"]; string[] folders = userDir(id); foreach (string folder in folders) { dir = Path.Combine(dir, folder); if (!Directory.Exists(dir)) Directory.CreateDirectory(dir); } string path = Path.Combine(dir, String.Concat(Request.QueryString["batchId"], "_", file.FileName)); file.SaveAs(path); // Create thumbnail int dot = path.LastIndexOf('.'); string thumbpath = String.Concat(path.Substring(0, dot), "_thumb", path.Substring(dot)); using (Image image = Image.FromStream(file.InputStream)) { // Find the ratio that will create maximum height or width of 100px. double ratio = Math.Max(image.Width / 100.0, image.Height / 100.0); using (Image thumb = new Bitmap(image, new Size((int)Math.Round(image.Width / ratio), (int)Math.Round(image.Height / ratio)))) { using (Graphics graphic = Graphics.FromImage(thumb)) { // Make sure thumbnail is not crappy graphic.SmoothingMode = SmoothingMode.HighQuality; graphic.InterpolationMode = InterpolationMode.High; graphic.CompositingQuality = CompositingQuality.HighQuality; // JPEG ImageCodecInfo codec = ImageCodecInfo.GetImageEncoders()[1]; // 90% quality EncoderParameters encode = new EncoderParameters(1); encode.Param[0] = new EncoderParameter(Encoder.Quality, 90L); // Resize graphic.DrawImage(image, new Rectangle(0, 0, thumb.Width, thumb.Height)); // Save thumb.Save(thumbpath, codec, encode); } } } // Success Response.StatusCode = 200; } } catch { // Something went wrong Response.StatusCode = 500; } } }

Read the article
Inlining an array of non-default constructible objects in a C++ class

- by porgarmingduod

C++ doesn't allow a class containing an array of items that are not default constructible: class Gordian { public: int member; Gordian(int must_have_variable) : member(must_have_variable) {} }; class Knot { Gordian* pointer_array[8]; // Sure, this works. Gordian inlined_array[8]; // Won't compile. Can't be initialized. }; As even beginner C++ users know, the language guarantees that all members are initialized when constructing a class. And it doesn't trust the user to initialize everything in the constructor - one has to provide valid arguments to the constructors of all members before the body of the constructor even starts. Generally, that's a great idea as far as I'm concerned, but I've come across a situation where it would be a lot easier if I could actually have an array of non-default constructible objects. The obvious solution: Have an array of pointers to the objects. This is not optimal in my case, as I am using shared memory. It would force me to do extra allocation from an already contended resource (that is, the shared memory). The entire reason I want to have the array inlined in the object is to reduce the number of allocations. This is a situation where I would be willing to use a hack, even an ugly one, provided it works. One possible hack I am thinking about would be: class Knot { public: struct dummy { char padding[sizeof(Gordian)]; }; dummy inlined_array[8]; Gordian* get(int index) { return reinterpret_cast<Gordian*>(&inlined_array[index]); } Knot() { for (int x = 0; x != 8; x++) { new (get(x)) Gordian(x*x); } } }; Sure, it compiles, but I'm not exactly an experienced C++ programmer. That is, I couldn't possibly trust my hacks less. So, the questions: 1) Does the hack I came up with seem workable? What are the issues? (I'm mainly concerned with C++0x on newer versions of GCC). 2) Is there a better way to inline an array of non-default constructible objects in a class?

Read the article
Boost MultiIndex - objects or pointers (and how to use them?)?

- by Sarah

I'm programming an agent-based simulation and have decided that Boost's MultiIndex is probably the most efficient container for my agents. I'm not a professional programmer, and my background is very spotty. I've two questions: Is it better to have the container contain the agents (of class Host) themselves, or is it more efficient for the container to hold Host *? Hosts will sometimes be deleted from memory (that's my plan, anyway... need to read up on new and delete). Hosts' private variables will get updated occasionally, which I hope to do through the modify function in MultiIndex. There will be no other copies of Hosts in the simulation, i.e., they will not be used in any other containers. If I use pointers to Hosts, how do I set up the key extraction properly? My code below doesn't compile. // main.cpp - ATTEMPTED POINTER VERSION ... #include <boost/multi_index_container.hpp> #include <boost/multi_index/hashed_index.hpp> #include <boost/multi_index/member.hpp> #include <boost/multi_index/ordered_index.hpp> #include <boost/multi_index/mem_fun.hpp> #include <boost/tokenizer.hpp> typedef multi_index_container< Host *, indexed_by< // hash by Host::id hashed_unique< BOOST_MULTI_INDEX_MEM_FUN(Host,int,Host::getID) > // arg errors here > // end indexed_by > HostContainer; ... int main() { ... HostContainer testHosts; Host * newHostPtr; newHostPtr = new Host( t, DOB, idCtr, 0, currentEvents ); testHosts.insert( newHostPtr ); ... } I can't find a precisely analogous example in the Boost documentation, and my knowledge of C++ syntax is still very weak. The code does appear to work when I replace all the pointer references with the class objects themselves. As best I can read it, the Boost documentation (see summary table at bottom) implies I should be able to use member functions with pointer elements.

Read the article
Apache outputs all urls of a second domain as a subfolder of the primary domain name

- by s_rathbone

Hi all, would anyone be able to possibly give me some guidance.. Basically, i have a 'shared hosting' account with a large internet hosting provider, and my account lets me have multiple seperate domains within this folder structure.(note: not aliased domains and not sub domains). so, my goal is to have 2 domains set up. i have already purchased the two domain names i need: The first domain is the 'primary' domain name for the root folder(eg. www.example1.com) and the second domain name is set for one of its sub folders(eg. www.example2.com is set to the folder www.example1.com/sites/music). The problem is that when apache returns a page of the second domain back to the browser, apache writes the hyperlinks as if it's a sub folder of the first domain ( eg. www.example2.com/index.html. comes out as http://www.example1.com/sites/music/index.html). Now, I have done some reading on this, looking though "Apache: the definitive guide"(o'reilly), and although it was useful, couldn't really find the answer. i'm guessing this issue is most likely an apache setup issue in http.conf, rather than an issue with the hosting company itself (which is why im posting it here) and I have also been to the official documentation for apache site, and i am guessing i might need to use something like the rewritebase directive in htaccess files.. but im really not sure, im more of a java programmer guy, and have been struggling with this for a couple of days. Any guidance would be REALLY appreciated. If it helps, my hosting company is godaddy, and my sites are hosted on linux. My problem was originally with wordpress which i reinstalled a number of times in various ways to correct the problem, but ive just done a test with a very simple static html, and it still has the same issue with relative urls like this: <html> <head></head><body><a href="images/dog.html">Pictures of Dogs</a></body> </html> However, it is fine if i hardcode the urls like this: <html> <head></head><body><a href="http://www.example2.com/images/dog.html">Pictures of Dogs</a></body> </html> Thanks heaps, Steve R NOW FIXED Ok, the problem has now been fixed, and i didn't need to modify any .conf or .htaccess files. The problem was, that when I went to install the second application into a second domain from the godaddy site, one of the setup questions is that it asks you which site you want it installed to. after that it asks for the desired folder path. However, the problem was that the second domain name was already pointing to the correct subfolder of the primary domain. So when I started installing wordpress again and came to the menu to select which site it was for, and it listed only the primary domain as an option, i assumed that this was like a label of "which hosting account?", or "which primary domain will your application will be installed under?" because I already knew that in the next step i was specifiying the folder. In order to correct this, you must make sure that your second domain is added to your domain list so that it will be listed as an option during the installation process. For further details please read tystips.com/archives/52/how2-save-money-host-multiple-wordpress-blogs-on-a-single-godaddy-hosting-account/

Read the article
I cut-to-move DCIM folder to ext SD when an auto android OS update popped up b4 I could choose target - Cannot recover 200+ photos

- by ZeroG

I was downloading my Exhibit II's DCIM camera folder (with month's of photos inside) to its external SD card, in order to transfer them into my laptop. In my overconfidence, I hurriedly chose cut-to-move (rather than copy-to-move) when KABOOM! —an automatic Android OS update popped up before I could choose the target!!! I figured everything was in cache & calmly tried to go through with the update. But that was not a typically seamless event. It showed downloading icon but hmm… since I rooted the phone it brought the command line up & recovery sequence. But neither Android nor I had yet downloaded any alternate custom ROM Files to internal SD to update from! So were they trying to make me unroot my phone by giving me some bogus update on the fly or just give me a hard time in trying to hand me down an unrooted ROM that I'd have to figure out how to root again? Yes, I know there was that blurb about overwriting a file of the same name but I was trying to shake the darn stubborn update being forced on my phone during this precarious moment. I thought I had frozen or turned off all those auto-updates previously. Anyway, phones are small & fingers are big (sigh)... I tried to reboot into safe mode but the resultant photo file was partially overwritten (200 files had names but Zero bytes in them). I thought maybe it was still hung in cache or deposited somewhere else but I have searched everywhere with file managers. Since I did not have Titanium backing up camera, photo folder or gallery, I cannot recover 200+ photos. Dumb. You can understand my dilemma as I am involved in the arts & although just a camera phone, most of these photos were historic & aesthetic or at least as to subject matter. Photo-ops don't reoccur. I have tried a couple of recovery apps from the market like Search Duplicates & Recover to no avail. I was only able to salvage stuff I'd sent out in messages. I've got several decades in computers & this is such a miserable beginner's piece of bad luck I can't believe it happened to me. They were precious photos! Yes, I turned on Titanium since & yes I even tried USB to laptop recoveries. Being on a MacBookPro I'm trying androidfiletransfer.dmg, but I'd have to upgrade to Peach Sunrise to get above Android 3.0 for that App to recognize the phone via USB & the programmer says installation zeros your data, so that pretty much toasts any secret hidden places where these photos may have been deposited. Don't want to do that & am still trying to find them. They certainly didn't make it to my external SD Card. If any of you techies out there know anything, please help & thanks. Despite decades of being in computing, unfamiliar & ever-changing hard or software can humble even the most seasoned veterans.

Read the article
How to set up a centralized backup server with lots of offsite workstations, intermittent internet connectivity, and stubborn users?

- by Zac B

This might be an impossible question. Context: We have a bunch of computers across around 1000 users. We have a centralized office where 900 of the users work, most of the time. Most of the computers are laptops. They are very frequently coming on and off the network for hours at a time. Users often take their computers home and do lots of work from home. In addition, there are a handful of users who work elsewhere in the country, who are offline (no internet connection whatsoever) for more than half of the time they use their machines. All of the machines are Windows 7/XP. Problem: People are always losing data. One day someone accidentally deletes a bunch of files. The next day someone else installs a bad driver or tries to mess with something in system32 and needs a personal data backup/reinstall of Windows. Because of how many of our business operations are done without an internet connection, and how frequently computers come on- and offline, it's unfeasible to make users use network storage for all of their data. We tried giving them Dropboxes, and they stored their files elsewhere. We bought and deployed Altiris, and they uninstalled it and blamed us when they couldn't get files back that they accidentally deleted while they were offline and hadn't taken a backup in months. We tried teaching them backup best-practices, and using scheduled sync tools to upload things to the network drives, and they turned them off because they "looked like viruses". It doesn't help that many of these users are pretty high up in the business and are not amicable to any sort of "you need to do something regularly because we say so" solution. Question: Other than finding another job where IT is treated differently and users are willing to follow best practices, how would people recommend I implement a file backup solution that supports the following: Backs up to a centralized server over LAN or WAN whenever a network link becomes available, or on a schedule. Supports interrupted/resumed backups (and hopefully file-delta only backups), since connections to the network (WAN or LAN) are often slow and only open for half an hour or so. Supports relatively rapid, "I accidentally deleted the TPS reports! Oh no!" single-file recovery, ideally administered from the central backup server rather than the client PC. Supports local-to-local file delta backup on a schedule, so that users without a network connection for a few days can still retrieve accidental deletions or whatnot. Ideally, the local stored backups would be pushed up to the server whenever network link is available. Isn't configurable on the clients without certain credentials. Because the CFOs (who won't give up their admin rights on the domain) will disable it if they can. Backs up the entire hard drive. There are people who are self-righteous about storing things in C:\, or in the recycle bin, or in the C:\Windows dir (yes, I know). I'm fine integrating multiple products/solutions, or scripting different programs together myself (I'm a somewhat competent programmer), but I've been drawing a blank on where to start. Dropbox is folder-specific, Altiris doesn't cope with LAN outages or interrupted/resumed backups, Volume Shadow Copy is awesome for a local-to-local solution, but I don't know how to push days of stored shadow copies up to a server in a 2 hour window of network access. The company is fine with spending decent money on this, thousands (USD) on a server, and hundreds on clients, if necessary. I want to emphasize that this isn't a shopping list request. While I wish there was a program out there that did what I want, I've looked pretty hard, and not found anything that fits the bill. Instead, I'm hoping for ideas on where to start hacking things together from scratch/from different technologies to make something stable that works. Cheers!

Read the article
Portable scripting language for a multi-server admin?

- by Aaron

Please Note: Portable as in portableapps.com, not the traditional definition. Originally posted on stackoverflow.com, asking here at another user's suggestion. I'm a DBA and sysadmin, mostly for Windows machines running SQL Server. I'm looking for a programming/scripting language for Windows that doesn't require Admin access or an installer, needing no install process other than expanding it into a folder. My intent is to have a language for automation on which I can standardize. Up to this point, I've been using a combination of batch files and Unix shell, using sh.exe from UnxUtils but it's far from a perfect solution. I've evaluated a handful of options, all of them have at least one serious shortcoming or another. I have a strong preference for something open source or dual license, but I'm more interested in finding the right tool than anything else. Not interested that anything that relies on Cygwin or Java, but at this point I'd be fine with something that needs .NET. Requirements: Manageable footprint (1-100 files, under 30 MB installed) Run on Windows XP and Server (2003+) No installer (exe, msi) Works with external pipes, processes, and files Support for MS SQL Server or ODBC connections Bonus Points: Open Source FFI for calling functions in native DLLs GUI support (native or gtk, wx, fltk, etc) Linux, AIX, and/or OS X support Dynamic, object oriented and/or functional, interpreted or bytecode compiled; interactive development Able to package or compile scripts into executables So far I've tried: Ruby: 148 MB on disk, 23000 files Portable Python: 54 MB on disk, 2800 files Strawberry Perl: 123 MB on disk, 3600 files REBOL: Great, except closed source and no MSSQL or ODBC in free version Squeak Smalltalk: Great, except poor support for scripting ---- cut: points of clarification ---- Why all the limitations? I realize some of my criteria seem arbitrarily confining. It's primarily a product my environment. I work as a SQL Server DBA and backup Unix admin at a division of a large company. In addition to near a hundred boxes running some version or another of SQL Server on Windows, I also support the SQL Server Express Edition installs on over a thousand machines in the field. Because of our security policies, I don't login rights on every machine. Often enough, an issue comes up and I'm given local Admin for some period of time. Often enough, it's some box I've never touched and don't have my own environment setup yet. I may have temporary admin rights on the box, but I'm not the admin for the machine- I'm just the DBA. I've no interest in stepping on the toes of the Windows admins, nor do I want to take over any of their duties. If I bring up "installing" something, suddenly it becomes a matter of interest for Production Control and the Windows admins; if I'm copying up a script, no one minds. The distinction may not mean much to the readers, but if someone gets the wrong idea I've suddenly got a long wait and significant overhead before I can get the tool installed and get the problem solved. That's why I want something that can be copied and run in the manner of a portable app. What about the small footprint? My company has three divisions, each in a different geographical location, and one of them is a new acquisition. We have different production control/security policies in each division. I support our MSSQL databases in all three divisions. The field machines are spread around the US, sometimes connecting to the VPN over very slow links. Installing Ruby \using psexec has taken a long time over these connections. In these instances, the bigger time waster seems to be archives with thousands and thousands of files rather than their sheer size. You could say I'm spoiled by Unix, where the admins usually have at least some modern scripting language installed; I'd use PowerShell, but I don't know it well and more importantly it isn't everywhere I need to work. It's a regular occurrence that I need to write, deploy and execute some script on short notice on some machine I've never on which logged in. Since having Ruby or something similar installed on every machine I'll ever need to touch is effectively impossible because of the approvals, time and and Windows admin labor needed I makes more sense find a solution that allows me to work on my own terms.

Read the article
Capistrano + Nginx + Passenger = 403

- by slimchrisp

I asked this over at stackoverflow as well, but still haven't received any answers that have helped me to solve this problem. I have spent almost a week at this point trying to solve the issue, and I'm just not making any headway. It seems that this issue is pretty common, but none of the solutions I found online work for me. A buddy of mine is actually creating the same setup, and he is having the same issue. After a few days stuck with the 403 error I started over using this tutorial: http://blog.ninjahideout.com/posts/a-guide-to-a-nginx-passenger-and-rvm-server I had hoped starting from scratch using this tutorial would work, but no dice. Either way, if you view the tutorial you can see what steps I have taken. Here is essentially what I have going on. I have a VPS account on linode.com Server OS is Ubuntu 10.04 Local OS (shouldn't matter, but just so you know) used to deploy with Capistrano is Snow Leopard 10.6.6 I use RVM on the server. Version is 1.2.2 I was previously on ruby-1.9.2-p0 [ i386 ], but per the tutorial listed above I switched to ree-1.8.7-2010.02 [ i386 ]. Running 'which ruby' from the command line verifies that I am using 1.8.7 with the following output: /usr/local/rvm/rubies/ree-1.8.7-2010.02/bin/ruby passenger -v prints the following: Phusion Passenger version 3.0.2 Running 'nginx -v' gives me a message that the command nginx could not be found. The server is definitely there and running as I can use nginx to serve static files, but this could have something to do with my problem. I have two users dealing with the install. root which I used to install everything, and deployer which is a user I created specifically to for deploying my applications My web app directory is in the deployer user's home directory as follows: /home/deployer/webapps/mysite.com/public Per Capistrano default deploy, a symbolic link called current is created in the public folder, and points to /home/deployer/webapps/mysite.com/public/releases/most_current_release I have chmodded the deployer directory recursively to 777 /opt/nginx permissions: rwxr-xr-x /usr/local/rvm/gems/ree-1.8.7-2010.02/gems/passenger-3.0.2 permissions: rwxrwsrwx My nginx config file has gone through just short of eternity variations, but currently looks like this: ================================================================================== worker_processes 1; events { worker_connections 1024; } http { passenger_root /usr/local/rvm/gems/ree-1.8.7-2010.02/gems/passenger-3.0.2; passenger_ruby /usr/local/rvm/bin/passenger_ruby; include mime.types; default_type application/octet-stream; sendfile on; keepalive_timeout 65; server { # listen *:80; server_name mysite.com www.mysite.com; root /home/deployer/webapps/mysite.com/public/current; passenger_enabled on; passenger_friendly_error_pages on; access_log logs/mysite.com/server.log; error_log logs/mysite.com/error.log info; error_page 500 502 503 504 /50x.html; location = /50x.html { root html; } } } ================================================================================== I bounce nginx, hit the site, and boom. 403, and logs say directory index of /home/deployer... is forbidden As others with a similar problem have said, you can drop an index.html into the public/releases/current_release and it will render. But rails no worky. That's basically it. At this point I have just about completely exhausted every possible solution attempt I can think of. I am a programmer and definitely not a sysadmin, so I am 99% sure this has something to do with permissions that I have hosed, but for the life of me I just can't figure out where. If anyone can help I would really really appreciate it. If there's any specific permission things you want me to check (ie groups/permissions), can you please include the commands to do so as well. Hopefully this will help others in the future who read this post. Let me know if there is any other information I can provide, and thanks in advance!!!

Read the article
Issues Converting Plain Text Into Microsoft Word Bulleted Lists

- by user787832

I'm a programmer. I hate status reports. I found a way to live with it. While I am working in my IDE ( Visual Slickedit ) I keep a plain text file open in one of the file/buffer tabs. As I finish things I just jot down a quick note into that file. At the end of the week that becomes my weekly status report. Example entries: The Datatables.net plugin runs very slowly in IE 8 with more than 2,000 records. I changed the way I did the server side code to process the data to make less work for the plugin to get decent performance for the IE 8 users. I made a class to wrap data from the new data collection objects into the legacy data holder objects. This will let the new database code be backward compatible with the legacy code until we can replace it. I found the bug reported by Jane. The software is fine. The database we use for the test site has data that is corrupted in a way it wouldn't be for production site At the end of the month I go back to each weekly *.txt file and paste all of the entries into a MS Word file for a monthly report. I give the monthly report to a liason to the contracting company who has to compile everyone's monthly reports into a single MS Word 2007 document. His problem, soon to be my problem, comes when he highlights my paragraphs like the ones above to put bullets in front of my paragraphs. When he highlights my notes to put bullets in front of them with MS Word 2007, Word rearranges the text a bit and the new line chars/carriage returns stagger the text so the text is no longer in neat chunks. This: I found the bug reported by Jane. The software is fine. The database we use for the test site has data that is corrupted in a way it wouldn't be for production site Becomes This: I found the bug reported by Jane. The software is fine. The database we use for the test site has data that is corrupted in a way it wouldn't be for production site I tried turning word wrap on in my IDE for the text files I put my status notes in. It just puts some kind of newline character in anyway. Searching/Replacing those chars in the text files has the result of destroying the paragraphs. Once my notes are pasted into MS Word, Word automatically translates them into paragraph breaks. Searching/Replacing them there has similar results. Blank lines separating the notes disappears. One big mess. What I would like is to be able to keep adding my status notes to a text file as I am now, but do something different when I paste the notes into MS Word such that my liason can select the text, hit the bulleting command and NOT have the staggered text as shown above. Any ideas? Thanks much in advance Steve

Read the article
What *exactly* gets screwed when I kill -9 or pull the power?

- by Mike

Set-Up I've been a programmer for quite some time now but I'm still a bit fuzzy on deep, internal stuff. Now. I am well aware that it's not a good idea to either: kill -9 a process (bad) spontaneously pull the power plug on a running computer or server (worse) However, sometimes you just plain have to. Sometimes a process just won't respond no matter what you do, and sometimes a computer just won't respond, no matter what you do. Let's assume a system running Apache 2, MySQL 5, PHP 5, and Python 2.6.5 through mod_wsgi. Note: I'm most interested about Mac OS X here, but an answer that pertains to any UNIX system would help me out. My Concern Each time I have to do either one of these, especially the second, I'm very worried for a period of time that something has been broken. Some file somewhere could be corrupt -- who knows which file? There are over 1,000,000 files on the computer. I'm often using OS X, so I'll run a "Verify Disk" operation through the Disk Utility. It will report no problems, but I'm still concerned about this. What if some configuration file somewhere got screwed up. Or even worse, what if a binary file somewhere is corrupt. Or a script file somewhere is corrupt now. What if some hardware is damaged? What if I don't find out about it until next month, in a critical scenario, when the corruption or damage causes a catastrophe? Or, what if valuable data is already lost? My Hope My hope is that these concerns and worries are unfounded. After all, after doing this many times before, nothing truly bad has happened yet. The worst is I've had to repair some MySQL tables, but I don't seem to have lost any data. But, if my worries are not unfounded, and real damage could happen in either situation 1 or 2, then my hope is that there is a way to detect it and prevent against it. My Question(s) Could this be because modern operating systems are designed to ensure that nothing is lost in these scenarios? Could this be because modern software is designed to ensure that nothing lost? What about modern hardware design? What measures are in place when you pull the power plug? My question is, for both of these scenarios, what exactly can go wrong, and what steps should be taken to fix it? I'm under the impression that one thing that can go wrong is some programs might not have flushed their data to the disk, so any highly recent data that was supposed to be written to the disk (say, a few seconds before the power pull) might be lost. But what about beyond that? And can this very issue of 5-second data loss screw up a system? What about corruption of random files hiding somewhere in the huge forest of files on my hard drives? What about hardware damage? What Would Help Me Most Detailed descriptions about what goes on internally when you either kill -9 a process or pull the power on the whole system. (it seems instant, but can someone slow it down for me?) Explanations of all things that could go wrong in these scenarios, along with (rough of course) probabilities (i.e., this is very unlikely, but this is likely)... Descriptions of measures in place in modern hardware, operating systems, and software, to prevent damage or corruption when these scenarios occur. (to comfort me) Instructions for what to do after a kill -9 or a power pull, beyond "verifying the disk", in order to truly make sure nothing is corrupt or damaged somewhere on the drive. Measures that can be taken to fortify a computer setup so that if something has to be killed or the power has to be pulled, any potential damage is mitigated. Thanks so much!

Read the article
Week in Geek: US Govt E-card Scam Siphons Confidential Data Edition

- by Asian Angel

This week we learned how to “back up photos to Flickr, automate repetitive tasks, & normalize MP3 volume”, enable “stereo mix” in Windows 7 to record audio, create custom papercraft toys, read up on three alternatives to Apple’s flaky iOS alarm clock, decorated our desktops & app docks with Google icon packs, and more. Photo by alexschlegel. Random Geek Links It has been a busy week on the security & malware fronts and we have a roundup of the latest news to help keep you updated. Photo by TopTechWriter.US. US govt e-card scam hits confidential data A fake U.S. government Christmas e-card has managed to siphon off gigabytes of sensitive data from a number of law enforcement and military staff who work on cybersecurity matters, many of whom are involved in computer crime investigations. Security tool uncovers multiple bugs in every browser Michal Zalewski reports that he discovered the vulnerability in Internet Explorer a while ago using his cross_fuzz fuzzing tool and reported it to Microsoft in July 2010. Zalewski also used cross_fuzz to discover bugs in other browsers, which he also reported to the relevant organisations. Microsoft to fix Windows holes, but not ones in IE Microsoft said that it will release two security bulletins next week fixing three holes in Windows, but it is still investigating or working on fixing holes in Internet Explorer that have been reportedly exploited in attacks. Microsoft warns of Windows flaw affecting image rendering Microsoft has warned of a Windows vulnerability that could allow an attacker to take control of a computer if the user is logged on with administrative rights. Windows 7 Not Affected by Critical 0-Day in the Windows Graphics Rendering Engine While confirming that details on a Critical zero-day vulnerability have made their way into the wild, Microsoft noted that customers running the latest iteration of Windows client and server platforms are not exposed to any risks. Microsoft warns of Office-related malware Microsoft’s Malware Protection Center issued a warning this week that it has spotted malicious code on the Internet that can take advantage of a flaw in Word and infect computers after a user does nothing more than read an e-mail. *Refers to a flaw that was addressed in the November security patch releases. Make sure you have all of the latest security updates installed. Unpatched hole in ImgBurn disk burning application According to security specialist Secunia, a highly critical vulnerability in ImgBurn, a lightweight disk burning application, can be used to remotely compromise a user’s system. Hole in VLC Media Player Virtual Security Research (VSR) has identified a vulnerability in VLC Media Player. In versions up to and including 1.1.5 of the VLC Media Player. Flash Player sandbox can be bypassed Flash applications run locally can read local files and send them to an online server – something which the sandbox is supposed to prevent. Chinese auction site touts hacked iTunes accounts Tens of thousands of reportedly hacked iTunes accounts have been found on Chinese auction site Taobao, but the company claims it is unable to take action unless there are direct complaints. What happened in the recent Hotmail outage Mike Schackwitz explains the cause of the recent Hotmail outage. DOJ sends order to Twitter for Wikileaks-related account info The U.S. Justice Department has obtained a court order directing Twitter to turn over information about the accounts of activists with ties to Wikileaks, including an Icelandic politician, a legendary Dutch hacker, and a U.S. computer programmer. Google gets court to block Microsoft Interior Department e-mail win The U.S. Federal Claims Court has temporarily blocked Microsoft from proceeding with the $49.3 million, five-year DOI contract that it won this past November. Google Apps customers get email lockdown Companies and organisations using Google Apps are now able to restrict the email access of selected users. LibreOffice Is the Default Office Suite for Ubuntu 11.04 Matthias Klose has announced some details regarding the replacement of the old OpenOffice.org 3.2.1 packages with the new LibreOffice 3.3 ones, starting with the upcoming Ubuntu 11.04 (Natty Narwhal) Alpha 2 release. Sysadmin Geek Tips Photo by Filomena Scalise. How to Setup Software RAID for a Simple File Server on Ubuntu Do you need a file server that is cheap and easy to setup, “rock solid” reliable, and has Email Alerting? This tutorial shows you how to use Ubuntu, software RAID, and SaMBa to accomplish just that. How to Control the Order of Startup Programs in Windows While you can specify the applications you want to launch when Windows starts, the ability to control the order in which they start is not available. However, there are a couple of ways you can easily overcome this limitation and control the startup order of applications. Random TinyHacker Links Using Opera Unite to Send Large Files A tutorial on using Opera Unite to easily send huge files from your computer. WorkFlowy is a Useful To-do List Tool A cool to-do list tool that lets you integrate multiple tasks in one single list easily. Playing Flash Videos on iOS Devices Yes, you can play flash videos on jailbroken iPhones. Here’s a tutorial. Clear Safari History and Cookies On iPhone A tutorial on clearing your browser history on iPhone and other iOS devices. Monitor Your Internet Usage Here’s a cool, cross-platform tool to monitor your internet bandwidth. Super User Questions See what the community had to say on these popular questions from Super User this week. Why is my upload speed much less than my download speed? Where should I find drivers for my laptop if it didn’t come with a driver disk? OEM Office 2010 without media – how to reinstall? Is there a point to using theft tracking software like Prey on my laptop, if you have login security? Moving an “all-in-one” PC when turned on/off How-To Geek Weekly Article Recap Get caught up on your HTG reading with our hottest articles from this past week. How to Combine Rescue Disks to Create the Ultimate Windows Repair Disk How To Boot 10 Different Live CDs From 1 USB Flash Drive What is Camera Raw, and Why Would a Professional Prefer it to JPG? Did You Know Facebook Has Built-In Shortcut Keys? The How-To Geek Guide to Audio Editing: The Basics One Year Ago on How-To Geek Enjoy looking through our latest gathering of retro article goodness. Learning Windows 7: Create a Homegroup & Join a New Computer To It How To Disconnect a Machine from a Homegroup Use Remote Desktop To Access Other Computers On a Small Office or Home Network How To Share Files and Printers Between Windows 7 and Vista Allow Users To Run Only Specified Programs in Windows 7 The Geek Note That is all we have for you this week and we hope your first week back at work or school has gone very well now that the holidays are over. Know a great tip? Send it in to us at [email protected]. Photo by Pamela Machado. Latest Features How-To Geek ETC HTG Projects: How to Create Your Own Custom Papercraft Toy How to Combine Rescue Disks to Create the Ultimate Windows Repair Disk What is Camera Raw, and Why Would a Professional Prefer it to JPG? The How-To Geek Guide to Audio Editing: The Basics How To Boot 10 Different Live CDs From 1 USB Flash Drive The 20 Best How-To Geek Linux Articles of 2010 Arctic Theme for Windows 7 Gives Your Desktop an Icy Touch Install LibreOffice via PPA and Receive Auto-Updates in Ubuntu Creative Portraits Peek Inside the Guts of Modern Electronics Scenic Winter Lane Wallpaper to Create a Relaxing Mood Access Your Web Apps Directly Using the Context Menu in Chrome The Deep – Awesome Use of Metal Objects as Deep Sea Creatures [Video]

Read the article
Parallelism in .NET – Part 4, Imperative Data Parallelism: Aggregation

- by Reed

In the article on simple data parallelism, I described how to perform an operation on an entire collection of elements in parallel. Often, this is not adequate, as the parallel operation is going to be performing some form of aggregation. Simple examples of this might include taking the sum of the results of processing a function on each element in the collection, or finding the minimum of the collection given some criteria. This can be done using the techniques described in simple data parallelism, however, special care needs to be taken into account to synchronize the shared data appropriately. The Task Parallel Library has tools to assist in this synchronization. The main issue with aggregation when parallelizing a routine is that you need to handle synchronization of data. Since multiple threads will need to write to a shared portion of data. Suppose, for example, that we wanted to parallelize a simple loop that looked for the minimum value within a dataset: double min = double.MaxValue; foreach(var item in collection) { double value = item.PerformComputation(); min = System.Math.Min(min, value); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This seems like a good candidate for parallelization, but there is a problem here. If we just wrap this into a call to Parallel.ForEach, we’ll introduce a critical race condition, and get the wrong answer. Let’s look at what happens here: // Buggy code! Do not use! double min = double.MaxValue; Parallel.ForEach(collection, item => { double value = item.PerformComputation(); min = System.Math.Min(min, value); }); This code has a fatal flaw: min will be checked, then set, by multiple threads simultaneously. Two threads may perform the check at the same time, and set the wrong value for min. Say we get a value of 1 in thread 1, and a value of 2 in thread 2, and these two elements are the first two to run. If both hit the min check line at the same time, both will determine that min should change, to 1 and 2 respectively. If element 1 happens to set the variable first, then element 2 sets the min variable, we’ll detect a min value of 2 instead of 1. This can lead to wrong answers. Unfortunately, fixing this, with the Parallel.ForEach call we’re using, would require adding locking. We would need to rewrite this like: // Safe, but slow double min = double.MaxValue; // Make a "lock" object object syncObject = new object(); Parallel.ForEach(collection, item => { double value = item.PerformComputation(); lock(syncObject) min = System.Math.Min(min, value); }); This will potentially add a huge amount of overhead to our calculation. Since we can potentially block while waiting on the lock for every single iteration, we will most likely slow this down to where it is actually quite a bit slower than our serial implementation. The problem is the lock statement – any time you use lock(object), you’re almost assuring reduced performance in a parallel situation. This leads to two observations I’ll make: When parallelizing a routine, try to avoid locks. That being said: Always add any and all required synchronization to avoid race conditions. These two observations tend to be opposing forces – we often need to synchronize our algorithms, but we also want to avoid the synchronization when possible. Looking at our routine, there is no way to directly avoid this lock, since each element is potentially being run on a separate thread, and this lock is necessary in order for our routine to function correctly every time. However, this isn’t the only way to design this routine to implement this algorithm. Realize that, although our collection may have thousands or even millions of elements, we have a limited number of Processing Elements (PE). Processing Element is the standard term for a hardware element which can process and execute instructions. This typically is a core in your processor, but many modern systems have multiple hardware execution threads per core. The Task Parallel Library will not execute the work for each item in the collection as a separate work item. Instead, when Parallel.ForEach executes, it will partition the collection into larger “chunks” which get processed on different threads via the ThreadPool. This helps reduce the threading overhead, and help the overall speed. In general, the Parallel class will only use one thread per PE in the system. Given the fact that there are typically fewer threads than work items, we can rethink our algorithm design. We can parallelize our algorithm more effectively by approaching it differently. Because the basic aggregation we are doing here (Min) is communitive, we do not need to perform this in a given order. We knew this to be true already – otherwise, we wouldn’t have been able to parallelize this routine in the first place. With this in mind, we can treat each thread’s work independently, allowing each thread to serially process many elements with no locking, then, after all the threads are complete, “merge” together the results. This can be accomplished via a different set of overloads in the Parallel class: Parallel.ForEach<TSource,TLocal>. The idea behind these overloads is to allow each thread to begin by initializing some local state (TLocal). The thread will then process an entire set of items in the source collection, providing that state to the delegate which processes an individual item. Finally, at the end, a separate delegate is run which allows you to handle merging that local state into your final results. To rewriting our routine using Parallel.ForEach<TSource,TLocal>, we need to provide three delegates instead of one. The most basic version of this function is declared as: public static ParallelLoopResult ForEach<TSource, TLocal>( IEnumerable<TSource> source, Func<TLocal> localInit, Func<TSource, ParallelLoopState, TLocal, TLocal> body, Action<TLocal> localFinally ) The first delegate (the localInit argument) is defined as Func<TLocal>. This delegate initializes our local state. It should return some object we can use to track the results of a single thread’s operations. The second delegate (the body argument) is where our main processing occurs, although now, instead of being an Action<T>, we actually provide a Func<TSource, ParallelLoopState, TLocal, TLocal> delegate. This delegate will receive three arguments: our original element from the collection (TSource), a ParallelLoopState which we can use for early termination, and the instance of our local state we created (TLocal). It should do whatever processing you wish to occur per element, then return the value of the local state after processing is completed. The third delegate (the localFinally argument) is defined as Action<TLocal>. This delegate is passed our local state after it’s been processed by all of the elements this thread will handle. This is where you can merge your final results together. This may require synchronization, but now, instead of synchronizing once per element (potentially millions of times), you’ll only have to synchronize once per thread, which is an ideal situation. Now that I’ve explained how this works, lets look at the code: // Safe, and fast! double min = double.MaxValue; // Make a "lock" object object syncObject = new object(); Parallel.ForEach( collection, // First, we provide a local state initialization delegate. () => double.MaxValue, // Next, we supply the body, which takes the original item, loop state, // and local state, and returns a new local state (item, loopState, localState) => { double value = item.PerformComputation(); return System.Math.Min(localState, value); }, // Finally, we provide an Action<TLocal>, to "merge" results together localState => { // This requires locking, but it's only once per used thread lock(syncObj) min = System.Math.Min(min, localState); } ); Although this is a bit more complicated than the previous version, it is now both thread-safe, and has minimal locking. This same approach can be used by Parallel.For, although now, it’s Parallel.For<TLocal>. When working with Parallel.For<TLocal>, you use the same triplet of delegates, with the same purpose and results. Also, many times, you can completely avoid locking by using a method of the Interlocked class to perform the final aggregation in an atomic operation. The MSDN example demonstrating this same technique using Parallel.For uses the Interlocked class instead of a lock, since they are doing a sum operation on a long variable, which is possible via Interlocked.Add. By taking advantage of local state, we can use the Parallel class methods to parallelize algorithms such as aggregation, which, at first, may seem like poor candidates for parallelization. Doing so requires careful consideration, and often requires a slight redesign of the algorithm, but the performance gains can be significant if handled in a way to avoid excessive synchronization.

Read the article
Parallelism in .NET – Part 11, Divide and Conquer via Parallel.Invoke

- by Reed

Many algorithms are easily written to work via recursion. For example, most data-oriented tasks where a tree of data must be processed are much more easily handled by starting at the root, and recursively “walking” the tree. Some algorithms work this way on flat data structures, such as arrays, as well. This is a form of divide and conquer: an algorithm design which is based around breaking up a set of work recursively, “dividing” the total work in each recursive step, and “conquering” the work when the remaining work is small enough to be solved easily. Recursive algorithms, especially ones based on a form of divide and conquer, are often a very good candidate for parallelization. This is apparent from a common sense standpoint. Since we’re dividing up the total work in the algorithm, we have an obvious, built-in partitioning scheme. Once partitioned, the data can be worked upon independently, so there is good, clean isolation of data. Implementing this type of algorithm is fairly simple. The Parallel class in .NET 4 includes a method suited for this type of operation: Parallel.Invoke. This method works by taking any number of delegates defined as an Action, and operating them all in parallel. The method returns when every delegate has completed: Parallel.Invoke( () => { Console.WriteLine("Action 1 executing in thread {0}", Thread.CurrentThread.ManagedThreadId); }, () => { Console.WriteLine("Action 2 executing in thread {0}", Thread.CurrentThread.ManagedThreadId); }, () => { Console.WriteLine("Action 3 executing in thread {0}", Thread.CurrentThread.ManagedThreadId); } ); .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Running this simple example demonstrates the ease of using this method. For example, on my system, I get three separate thread IDs when running the above code. By allowing any number of delegates to be executed directly, concurrently, the Parallel.Invoke method provides us an easy way to parallelize any algorithm based on divide and conquer. We can divide our work in each step, and execute each task in parallel, recursively. For example, suppose we wanted to implement our own quicksort routine. The quicksort algorithm can be designed based on divide and conquer. In each iteration, we pick a pivot point, and use that to partition the total array. We swap the elements around the pivot, then recursively sort the lists on each side of the pivot. For example, let’s look at this simple, sequential implementation of quicksort: public static void QuickSort<T>(T[] array) where T : IComparable<T> { QuickSortInternal(array, 0, array.Length - 1); } private static void QuickSortInternal<T>(T[] array, int left, int right) where T : IComparable<T> { if (left >= right) { return; } SwapElements(array, left, (left + right) / 2); int last = left; for (int current = left + 1; current <= right; ++current) { if (array[current].CompareTo(array[left]) < 0) { ++last; SwapElements(array, last, current); } } SwapElements(array, left, last); QuickSortInternal(array, left, last - 1); QuickSortInternal(array, last + 1, right); } static void SwapElements<T>(T[] array, int i, int j) { T temp = array[i]; array[i] = array[j]; array[j] = temp; } Here, we implement the quicksort algorithm in a very common, divide and conquer approach. Running this against the built-in Array.Sort routine shows that we get the exact same answers (although the framework’s sort routine is slightly faster). On my system, for example, I can use framework’s sort to sort ten million random doubles in about 7.3s, and this implementation takes about 9.3s on average. Looking at this routine, though, there is a clear opportunity to parallelize. At the end of QuickSortInternal, we recursively call into QuickSortInternal with each partition of the array after the pivot is chosen. This can be rewritten to use Parallel.Invoke by simply changing it to: // Code above is unchanged... SwapElements(array, left, last); Parallel.Invoke( () => QuickSortInternal(array, left, last - 1), () => QuickSortInternal(array, last + 1, right) ); } This routine will now run in parallel. When executing, we now see the CPU usage across all cores spike while it executes. However, there is a significant problem here – by parallelizing this routine, we took it from an execution time of 9.3s to an execution time of approximately 14 seconds! We’re using more resources as seen in the CPU usage, but the overall result is a dramatic slowdown in overall processing time. This occurs because parallelization adds overhead. Each time we split this array, we spawn two new tasks to parallelize this algorithm! This is far, far too many tasks for our cores to operate upon at a single time. In effect, we’re “over-parallelizing” this routine. This is a common problem when working with divide and conquer algorithms, and leads to an important observation: When parallelizing a recursive routine, take special care not to add more tasks than necessary to fully utilize your system. This can be done with a few different approaches, in this case. Typically, the way to handle this is to stop parallelizing the routine at a certain point, and revert back to the serial approach. Since the first few recursions will all still be parallelized, our “deeper” recursive tasks will be running in parallel, and can take full advantage of the machine. This also dramatically reduces the overhead added by parallelizing, since we’re only adding overhead for the first few recursive calls. There are two basic approaches we can take here. The first approach would be to look at the total work size, and if it’s smaller than a specific threshold, revert to our serial implementation. In this case, we could just check right-left, and if it’s under a threshold, call the methods directly instead of using Parallel.Invoke. The second approach is to track how “deep” in the “tree” we are currently at, and if we are below some number of levels, stop parallelizing. This approach is a more general-purpose approach, since it works on routines which parse trees as well as routines working off of a single array, but may not work as well if a poor partitioning strategy is chosen or the tree is not balanced evenly. This can be written very easily. If we pass a maxDepth parameter into our internal routine, we can restrict the amount of times we parallelize by changing the recursive call to: // Code above is unchanged... SwapElements(array, left, last); if (maxDepth < 1) { QuickSortInternal(array, left, last - 1, maxDepth); QuickSortInternal(array, last + 1, right, maxDepth); } else { --maxDepth; Parallel.Invoke( () => QuickSortInternal(array, left, last - 1, maxDepth), () => QuickSortInternal(array, last + 1, right, maxDepth)); } We no longer allow this to parallelize indefinitely – only to a specific depth, at which time we revert to a serial implementation. By starting the routine with a maxDepth equal to Environment.ProcessorCount, we can restrict the total amount of parallel operations significantly, but still provide adequate work for each processing core. With this final change, my timings are much better. On average, I get the following timings: Framework via Array.Sort: 7.3 seconds Serial Quicksort Implementation: 9.3 seconds Naive Parallel Implementation: 14 seconds Parallel Implementation Restricting Depth: 4.7 seconds Finally, we are now faster than the framework’s Array.Sort implementation.

Read the article
Twitter traffic might not be what it seems

- by Piet

Are you using bit.ly stats to measure interest in the links you post on twitter? I’ve been hearing for a while about people claiming to get the majority of their traffic originating from twitter these days. Now, I’ve been playing with the twitter ruby gem recently, doing various experiments which I’ll not go into detail here because they could be regarded as spamming… if I’d conduct them on a large scale, that is. It’s scary to see people actually engaging with @replies crafted with some regular expressions and eliza-like trickery on status updates found using the twitter api. I’m wondering how Twitter is going to contain the coming spam-flood. When posting links I used bit.ly as url shortener, since this one seems to be the de-facto standard on twitter. A nice thing about bit.ly is that it shows some basic stats about the redirects it performs for your shortened links. To my surprise, most links posted almost immediately resulted in several visitors. Now, seeing that I was posting the links together with some information concerning what the link is about, I concluded that the people who were actually clicking the links should be very targeted visitors. This felt a bit like free adwords, and I suddenly started to understand why everyone was raving about getting traffic from twitter. How wrong I was! (and I think several 1000 online marketers with me) On the destination site I used a traffic logging solution that works by including a little javascript snippet in your pages. It seemed that somehow all visitors disappeared after the bit.ly redirect and before getting to the site, because I was hardly seeing any visitors there. So I started investigating what was happening: by looking at the logfiles of the destination site, and by making my own ’shortened’ urls by doing redirects using a very short domain name I own. This way, I could check the apache access_log before the redirects. Most user agents turned out to be bots without a doubt. Here’s an excerpt of user-agents awk’ed from apache’s access_log for a time period of about one hour, right after posting some links: AideRSS 2.0 (postrank.com) Java/1.6.0_13 Java/1.6.0_14 libwww-perl/5.816 MLBot (www.metadatalabs.com/mlbot) Mozilla/4.0 (compatible;MSIE 5.01; Windows -NT 5.0 - real-url.org) Mozilla/5.0 (compatible; Twitturls; +http://twitturls.com) Mozilla/5.0 (compatible; Viralheat Bot/1.0; +http://www.viralheat.com/) Mozilla/5.0 (Danger hiptop 4.6; U; rv:1.7.12) Gecko/20050920 Mozilla/5.0 (X11; U; Linux i686; en-us; rv:1.9.0.2) Gecko/2008092313 Ubuntu/9.04 (jaunty) Firefox/3.5 OpenCalaisSemanticProxy PycURL/7.18.2 PycURL/7.19.3 Python-urllib/1.17 Twingly Recon twitmatic Twitturly / v0.6 Wget/1.10.2 (Red Hat modified) Wget/1.11.1 (Red Hat modified) Of the few user-agents that seem ‘real’ at first, half are originating from an ip-address used by Amazon EC2. And I doubt people are setting op proxies on there. Oh yeah, Googlebot (the real deal, from a legit google owned address) is sucking up posted links like fresh oysters. I guess google is trying to make sure in advance to never be beaten by twitter in the ‘realtime search’ department. Actually, I think it’d be almost stupid NOT to post any new pages/posts/websites on Twitter, it must be one of the fastest ways to get a Googlebot visit. Same experiment with a real, established twitter account Now, because I was posting the url’s either as ’status’ messages or directed @people, on a test-account with hardly any (human) followers, I checked again using the twitter accounts from a commercial site I’m involved with. These accounts all have between 500 and 1000 targeted (I think) followers. I checked the destination access_logs and also added ‘my’ redirect after the bit.ly redirect: same results, although seemingly a bit higher real visitor/bot ratio. Btw: one of these account was ‘punished’ with a 1 week lock recently because the same (1 one!) status update was sent that was sent right before using another account. They got an email explaining the lock because the account didn’t act according to their TOS. I can’t find anything in their TOS about it, can you? I don’t think Twitter is on the right track punishing a legit account, knowing the trickery I had been doing with it’s api went totally unpunished. I might be wrong though, I often am. On the other hand: this commercial site reported targeted traffic and actual signups from visitors coming from Twitter. The ones that are really real visitors are also very targeted. I’m just not sure if the amount of work involved could hold up against an adwords campaign. Reposting the same link over and over again helps On thing I noticed: It helps to keep on reposting the same links with regular intervals. I guess most people only look at their first page when checking out recent posts of the ones they’re following, or don’t look too far back when performing a search. Now, this probably isn’t according to the twitter TOS. Actually, it might be spamming but no-one is obligated to follow anyone else of course. This way, I was getting more real visitors and less bots. To my surprise (when my programmer’s hat is on) there were still repeated visits from the same bots coming from the same ip-addresses. Did they expect to find something else when visiting for a 2nd or 3rd time? (actually,this gave me an idea: you can’t change a link once it’s posted, but you can change where it redirects to) Most bots were smart enough not to follow the same link again though. Are you successful in getting real visitors from Twitter? Are you only relying on bit.ly to provide traffic stats?

Read the article
MVP Summit 2011 summary and thoughts: The “I hope I don’t cross a line and lose my MVP status” post

- by George Clingerman

I've been wanting to write this post summarizing my thoughts about the MVP summit but have been dragging my feet since it's a very difficult one to write. However seeing Andy (http://forums.create.msdn.com/forums/t/77625.aspx) and Catalin (http://www.catalinzima.com/2011/03/mvp-summit-2011/) and Chris (http://geekswithblogs.net/cwilliams/archive/2011/03/07/144229.aspx) post about it has encouraged me to finally take the plunge. I'm going to have to write carefully though because I'm going to be dancing around a ton of NDA mine fields as well as having to walk the tight-rope of not sending the wrong message or having people read too much into what I'm saying. I want to note that most of what I'm about to say is just based on my observations, they're not thoughts that Microsoft has asked me to pass along and they're not things I heard Microsoft say. It's just me sharing what I think after going to the MVP summit. Let's start off with a short imaginary question and answer session. Has the App Hub forums and XBLIG management been rather poor by Microsoft? Yes. Do I think we're going to see changes to that overnight? No. Will it continue to look bad from the outside? Somewhat. Confusing right? Well that's kind of how things are right now. Lots of confusion. XNA is doing AWESOME. Like, really, really awesome. As a result of that awesomeness, XNA is on three major platforms: Xbox 360, WP7 and PC. This means that internally Microsoft is really excited and invested in the technology. That's fantastic for XNA and really should show you the future the framework has. It's here to stay. So why are Xbox LIVE Indie Game developers feeling so much pain? The ironic thing is that pain is being caused by the success of XNA. When XNA was just a small thing, there was more freedom and more focus. It was just us and them. We were an only child. Now our family has grown and everyone has and wants some time with XNA. This gets XNA pulled in all directions and as it moves onto new platforms, it plays catch up trying to get those platforms up to speed to where Xbox LIVE Indie Games has grown. Forums, documentation, educational content. They all need to be there because Xbox LIVE Indie Games has all of that and more. Along with the catch up in features/documentation/awesomeness there's the catch up that the people on the team have to play. New platforms and new areas of development mean new players and those new guys don't have the history of being around from the beginning. This leads to a lack of understanding at times just how important some things are because they seem so small and insignificant (Rich Text defaulting for new forum profiles would be one things that jumps to mind). If you're not aware that the forums have become more than just a basic Q&A, if you're not aware that they're a central hub to a very active community, then you don't understand why that small change should be prioritized over something else. New people have to get caught up and figure out how to make a framework and central forum site work for everyone it's now serving. So yeah, a lot of our pain this last year has been simply that XNA is doing well and XBLIG is doing well so the focus was shifted to catch other things up. It hurts when a parent seems to not have any time for you and they're spending some much time with your new baby brother. Growing pains. All families and in our case our product family experience it to some degree. I think as WP7 matures we'll see the team figuring out how to give everyone the right amount of attention. While we're talking about some of our growing pains, it is also important to note (although not really an excuse) that the Xbox LIVE Arcade developers complain about many of the same things that we do. If you paid attention to talks and information coming out of GDC 2011, most of the the XBLA guys were saying things that sounded eerily similar to what the XBLIG developers are saying (Scott Nichols from GayGamer.net noticed http://twitter.com/#!/NaviFairyGG/status/43540379206811650). Does this mean we should just accept the status quo since we're being treated exactly the same? No way. However it DOES show that the way we're being treated is no indication of the stability and future of the platform, it's just Microsoft dropping the communication ball on two playing fields. We're not alone and we're not even being treated worse. Not great, but also in a weird way a very good sign. Now on to a few tidbits I think I CAN share from the summit (I'm really crossing my fingers I'm not stepping over some NDA line I shouldn't be). First, I discovered that the XBLIG user base is bigger than I personally had originally estimated. I won't give the exact numbers (although we did beg Microsoft to release some of these numbers so maybe someday?) but it was much larger than my original guestimates and I was pleasantly surprised. Maybe some of you guys had the right number when you were guessing, but I know that mine was much too low. And even MORE importantly the number of users/shoppers is growing at a steady pace as well. Our market is growing! That was fantastic news and really something that I had to share. On to the community manager discussion. It was mentioned. I was mentioned. I blushed. Nothing more to report there than the blush in my cheeks was a light crimson color. If I ever see a job description posted for that position I have a resume waiting in the wings. I can't deny that I think that would be my dream job... ...so after I finished blushing, the MVPs did make it very, very clear that the communication has to improve. Community manager or not the single biggest pain point with the Xbox LIVE Indie Game community has been a lack of communication. I have seen dramatic improvement in the team responding to MVPs and I'm even seeing more communication from them on the forums so I'm hoping that's a long term change. I really think they understood the issue, the problem remains how to open that communication channel in a way that was sustainable. I think they'll get it figured out and hopefully that's sooner rather than later. During the summit, you may have seen me tweeting about how I was "that guy" (http://twitter.com/#!/clingermangw/status/42740432471470081). You also may have noticed that Andy and Catalin both mentioned me in their summit write ups. I may have come on a bit strong while I was there...went a little out of character for myself. I've been agitated for a while with the way things have been and I've been listening to you guys and hearing you guys be agitated. I'm also watching some really awesome indie game developers looking elsewhere and leaving the platform. Some of them we might not have been able to keep even with changes, but others are only leaving because of perceptions and lack of communication from Microsoft. And that pisses me off. And I let Microsoft know that I was pissed off. You made your list and I took that list and verbalized it. I verbalized the hell out of it. [It was actually mentioned that I'm a lot nicer on the forums and in email than I am in person...I felt bad about that, but I couldn't stay silent]. Hopefully it did something guys, I really did try hard to get the message across. Along with my agitation, I also brought some pride. I mentioned several things in person to the team that I was particularly proud of. From people in the community that are doing an awesome job, to the re-launch of XboxIndies that was going on that week and even gamers like Steven Hurdle (http://writingsofmassdeduction.com/) who have purchased one XBLIG every day for over 100 days now. The community is freaking rocking it and I made sure to highlight that. So in conclusion, I'd just like to say hang in there (you know, like that picture of the cat). If you've been worried about investing in Xbox LIVE Indie Games because you think it's on shaky ground. It's not. Dream Build Play being about the Xbox 360 should have helped a little to point that out. The team is really scrambling around trying to figure things out and make improvements all around. There’s quite a few new gals and guys and it's going to take them time to catch up and there are a lot of constantly shifting priorities. We all have one toy, one team and we're fighting for time with it. It's also time for the community to continue spreading our wings and going out on our own more often. The Indie Game Winter Uprising was a fantastic example of that. We took things into our own hands and it got noticed and Microsoft got behind it. They do every time we stand up and do something (look at how many Microsoft employees tweeted, wrote about the re-launch of XboxIndies.com or the support I've gotten from them for my weekly XNA Notes). XNA is here to stay, it's time for us to stop being scared of that and figure out how to make our own games the successes they should be. There's definitely a list of things that need to be fixed, things that should be improved and I think we should definitely keep vocal about that with Microsoft. Keep it short, focused and prioritized. There's also a lot of things we can do ourselves while we're waiting on them to fix and change things. Lots of ways we can compensate for particular weaknesses in the channel. The kind of stuff that we can step up and do ourselves. Do it on our own, you know, the way Indies always do. And I'm really looking forward to watching us do just that.

Read the article
Heaps of Trouble?

- by Paul White NZ

If you’re not already a regular reader of Brad Schulz’s blog, you’re missing out on some great material. In his latest entry, he is tasked with optimizing a query run against tables that have no indexes at all. The problem is, predictably, that performance is not very good. The catch is that we are not allowed to create any indexes (or even new statistics) as part of our optimization efforts. In this post, I’m going to look at the problem from a slightly different angle, and present an alternative solution to the one Brad found. Inevitably, there’s going to be some overlap between our entries, and while you don’t necessarily need to read Brad’s post before this one, I do strongly recommend that you read it at some stage; he covers some important points that I won’t cover again here. The Example We’ll use data from the AdventureWorks database, copied to temporary unindexed tables. A script to create these structures is shown below: CREATE TABLE #Custs ( CustomerID INTEGER NOT NULL, TerritoryID INTEGER NULL, CustomerType NCHAR(1) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, ); GO CREATE TABLE #Prods ( ProductMainID INTEGER NOT NULL, ProductSubID INTEGER NOT NULL, ProductSubSubID INTEGER NOT NULL, Name NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, ); GO CREATE TABLE #OrdHeader ( SalesOrderID INTEGER NOT NULL, OrderDate DATETIME NOT NULL, SalesOrderNumber NVARCHAR(25) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, CustomerID INTEGER NOT NULL, ); GO CREATE TABLE #OrdDetail ( SalesOrderID INTEGER NOT NULL, OrderQty SMALLINT NOT NULL, LineTotal NUMERIC(38,6) NOT NULL, ProductMainID INTEGER NOT NULL, ProductSubID INTEGER NOT NULL, ProductSubSubID INTEGER NOT NULL, ); GO INSERT #Custs ( CustomerID, TerritoryID, CustomerType ) SELECT C.CustomerID, C.TerritoryID, C.CustomerType FROM AdventureWorks.Sales.Customer C WITH (TABLOCK); GO INSERT #Prods ( ProductMainID, ProductSubID, ProductSubSubID, Name ) SELECT P.ProductID, P.ProductID, P.ProductID, P.Name FROM AdventureWorks.Production.Product P WITH (TABLOCK); GO INSERT #OrdHeader ( SalesOrderID, OrderDate, SalesOrderNumber, CustomerID ) SELECT H.SalesOrderID, H.OrderDate, H.SalesOrderNumber, H.CustomerID FROM AdventureWorks.Sales.SalesOrderHeader H WITH (TABLOCK); GO INSERT #OrdDetail ( SalesOrderID, OrderQty, LineTotal, ProductMainID, ProductSubID, ProductSubSubID ) SELECT D.SalesOrderID, D.OrderQty, D.LineTotal, D.ProductID, D.ProductID, D.ProductID FROM AdventureWorks.Sales.SalesOrderDetail D WITH (TABLOCK); The query itself is a simple join of the four tables: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #Prods P JOIN #OrdDetail D ON P.ProductMainID = D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID JOIN #OrdHeader H ON D.SalesOrderID = H.SalesOrderID JOIN #Custs C ON H.CustomerID = C.CustomerID ORDER BY P.ProductMainID ASC OPTION (RECOMPILE, MAXDOP 1); Remember that these tables have no indexes at all, and only the single-column sampled statistics SQL Server automatically creates (assuming default settings). The estimated query plan produced for the test query looks like this (click to enlarge): The Problem The problem here is one of cardinality estimation – the number of rows SQL Server expects to find at each step of the plan. The lack of indexes and useful statistical information means that SQL Server does not have the information it needs to make a good estimate. Every join in the plan shown above estimates that it will produce just a single row as output. Brad covers the factors that lead to the low estimates in his post. In reality, the join between the #Prods and #OrdDetail tables will produce 121,317 rows. It should not surprise you that this has rather dire consequences for the remainder of the query plan. In particular, it makes a nonsense of the optimizer’s decision to use Nested Loops to join to the two remaining tables. Instead of scanning the #OrdHeader and #Custs tables once (as it expected), it has to perform 121,317 full scans of each. The query takes somewhere in the region of twenty minutes to run to completion on my development machine. A Solution At this point, you may be thinking the same thing I was: if we really are stuck with no indexes, the best we can do is to use hash joins everywhere. We can force the exclusive use of hash joins in several ways, the two most common being join and query hints. A join hint means writing the query using the INNER HASH JOIN syntax; using a query hint involves adding OPTION (HASH JOIN) at the bottom of the query. The difference is that using join hints also forces the order of the join, whereas the query hint gives the optimizer freedom to reorder the joins at its discretion. Adding the OPTION (HASH JOIN) hint results in this estimated plan: That produces the correct output in around seven seconds, which is quite an improvement! As a purely practical matter, and given the rigid rules of the environment we find ourselves in, we might leave things there. (We can improve the hashing solution a bit – I’ll come back to that later on). Faster Nested Loops It might surprise you to hear that we can beat the performance of the hash join solution shown above using nested loops joins exclusively, and without breaking the rules we have been set. The key to this part is to realize that a condition like (A = B) can be expressed as (A <= B) AND (A >= B). Armed with this tremendous new insight, we can rewrite the join predicates like so: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #OrdDetail D JOIN #OrdHeader H ON D.SalesOrderID >= H.SalesOrderID AND D.SalesOrderID <= H.SalesOrderID JOIN #Custs C ON H.CustomerID >= C.CustomerID AND H.CustomerID <= C.CustomerID JOIN #Prods P ON P.ProductMainID >= D.ProductMainID AND P.ProductMainID <= D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID ORDER BY D.ProductMainID OPTION (RECOMPILE, LOOP JOIN, MAXDOP 1, FORCE ORDER); I’ve also added LOOP JOIN and FORCE ORDER query hints to ensure that only nested loops joins are used, and that the tables are joined in the order they appear. The new estimated execution plan is: This new query runs in under 2 seconds. Why Is It Faster? The main reason for the improvement is the appearance of the eager Index Spools, which are also known as index-on-the-fly spools. If you read my Inside The Optimiser series you might be interested to know that the rule responsible is called JoinToIndexOnTheFly. An eager index spool consumes all rows from the table it sits above, and builds a index suitable for the join to seek on. Taking the index spool above the #Custs table as an example, it reads all the CustomerID and TerritoryID values with a single scan of the table, and builds an index keyed on CustomerID. The term ‘eager’ means that the spool consumes all of its input rows when it starts up. The index is built in a work table in tempdb, has no associated statistics, and only exists until the query finishes executing. The result is that each unindexed table is only scanned once, and just for the columns necessary to build the temporary index. From that point on, every execution of the inner side of the join is answered by a seek on the temporary index – not the base table. A second optimization is that the sort on ProductMainID (required by the ORDER BY clause) is performed early, on just the rows coming from the #OrdDetail table. The optimizer has a good estimate for the number of rows it needs to sort at that stage – it is just the cardinality of the table itself. The accuracy of the estimate there is important because it helps determine the memory grant given to the sort operation. Nested loops join preserves the order of rows on its outer input, so sorting early is safe. (Hash joins do not preserve order in this way, of course). The extra lazy spool on the #Prods branch is a further optimization that avoids executing the seek on the temporary index if the value being joined (the ‘outer reference’) hasn’t changed from the last row received on the outer input. It takes advantage of the fact that rows are still sorted on ProductMainID, so if duplicates exist, they will arrive at the join operator one after the other. The optimizer is quite conservative about introducing index spools into a plan, because creating and dropping a temporary index is a relatively expensive operation. It’s presence in a plan is often an indication that a useful index is missing. I want to stress that I rewrote the query in this way primarily as an educational exercise – I can’t imagine having to do something so horrible to a production system. Improving the Hash Join I promised I would return to the solution that uses hash joins. You might be puzzled that SQL Server can create three new indexes (and perform all those nested loops iterations) faster than it can perform three hash joins. The answer, again, is down to the poor information available to the optimizer. Let’s look at the hash join plan again: Two of the hash joins have single-row estimates on their build inputs. SQL Server fixes the amount of memory available for the hash table based on this cardinality estimate, so at run time the hash join very quickly runs out of memory. This results in the join spilling hash buckets to disk, and any rows from the probe input that hash to the spilled buckets also get written to disk. The join process then continues, and may again run out of memory. This is a recursive process, which may eventually result in SQL Server resorting to a bailout join algorithm, which is guaranteed to complete eventually, but may be very slow. The data sizes in the example tables are not large enough to force a hash bailout, but it does result in multiple levels of hash recursion. You can see this for yourself by tracing the Hash Warning event using the Profiler tool. The final sort in the plan also suffers from a similar problem: it receives very little memory and has to perform multiple sort passes, saving intermediate runs to disk (the Sort Warnings Profiler event can be used to confirm this). Notice also that because hash joins don’t preserve sort order, the sort cannot be pushed down the plan toward the #OrdDetail table, as in the nested loops plan. Ok, so now we understand the problems, what can we do to fix it? We can address the hash spilling by forcing a different order for the joins: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #Prods P JOIN #Custs C JOIN #OrdHeader H ON H.CustomerID = C.CustomerID JOIN #OrdDetail D ON D.SalesOrderID = H.SalesOrderID ON P.ProductMainID = D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID ORDER BY D.ProductMainID OPTION (MAXDOP 1, HASH JOIN, FORCE ORDER); With this plan, each of the inputs to the hash joins has a good estimate, and no hash recursion occurs. The final sort still suffers from the one-row estimate problem, and we get a single-pass sort warning as it writes rows to disk. Even so, the query runs to completion in three or four seconds. That’s around half the time of the previous hashing solution, but still not as fast as the nested loops trickery. Final Thoughts SQL Server’s optimizer makes cost-based decisions, so it is vital to provide it with accurate information. We can’t really blame the performance problems highlighted here on anything other than the decision to use completely unindexed tables, and not to allow the creation of additional statistics. I should probably stress that the nested loops solution shown above is not one I would normally contemplate in the real world. It’s there primarily for its educational and entertainment value. I might perhaps use it to demonstrate to the sceptical that SQL Server itself is crying out for an index. Be sure to read Brad’s original post for more details. My grateful thanks to him for granting permission to reuse some of his material. Paul White Email: [email protected] Twitter: @PaulWhiteNZ

Read the article
Who IS Brian Solis?

- by Michael Snow

Q: Brian, Welcome to the WebCenter Blog. Can you tell our readers your current role and what career path brought you here? A: I’m proudly serving as a principal analyst at Altimeter Group, a research based advisory firm in Silicon Valley. My career path, well, let’s just say it’s a long and winding road. As a kid, I was fascinated with technology. I learned programming at an early age and found myself naturally drawn to all things tech. I started my career as a database programmer at a technology marketing agency in Southern California. When I saw the chance to work with tech companies and help them better market their capabilities to businesses and consumers, I switched focus from programming to marketing and advertising. As technologist, my approach to marketing was different. I didn’t believe in hype, fluff or buzz words. I believed in translating features into benefits and specifications and capabilities into solutions for real world problems and opportunities. In the mid 90’s I experimented with direct to consumer/customer engagement in dedicated technology forums and boards. I quickly realized that the entire approach to do so would need to change. Therefore, I learned and developed new methods for a more social and informed way of engaging people in ways that helped them, marketed the company, and also tied to tangible benefits for the company. This work would lead me to start an agency in 1999 dedicated to interactive marketing. As I continued to experiment with interactive platforms, I developed interesting methods for converting one-to-many forms of media into one-to-one-to-many programs. I ran that company until joining Altimeter Group. Along the way, in the early 2000s, I realized that everything was changing and that there were others like me finding success in what would become a more social form of media. I dedicated a significant amount of my time to sharing everything that I learned in the form of articles, blogs, and eventually books. My mission became to share my experience with anyone who’d listen. It would later become much bigger than marketing, this would lead to a decade of work, that still continues, in business transformation. Then and now, I find myself always assuming the role of a student. Q: As an industry analyst & technology change evangelist, what are you primarily focused on these days? A: As a digital analyst, I study how disruptive technology impacts business. As an aspiring social scientist, I study how technology affects human behavior. I explore both horizons professionally and personally to better understand the future of popular culture and also the opportunities that exist for organizations to improve relationships and experiences with customers and the people that are important to them. Q: People cite that the line between work and life is getting more and more blurred. Do you see your personal life influencing your professional work? A: The line between work and life isn’t blurred it’s been overtly crossed and erased. We live in an always on society. The digital lifestyle keeps us connected to one another it keeps us connected all the time. Whether your sending or checking email, trying to catch up, or simply trying to get ahead, people are spending the equivalent of an extra day at work in the time they spend out of work…working. That’s absurd. It’s a matter of survival. It’s also a matter of unintended, subconscious self-causation. We brought this on ourselves and continue to do so. Think about your day. You’re in meetings for the better part of each day. You probably spend evenings and weekends catching up on email and actually doing the work you couldn’t get to during the day. And, your co-workers and executives are doing the same thing. So if you try to slow down, you find yourself at a disadvantage as you’re willfully pulling yourself out of an unfortunate culture of whenever wherever business dynamics. If you’re unresponsive or unreachable, someone within your organization or on your team is accessible. Over time, this could contribute to unfavorable impressions. I choose to steer my life balance in ways that complement one another. But, I don’t pretend to have this figured out by any means. In fact, I find myself swimming upstream like those around me. It’s essentially a competition for relevance and at some point I’ll learn how to earn attention and relevance while redrawing the line between work and life. Q: How can people keep up with what you’re working on? A: The easy answer is that people can keep up with me at briansolis.com. But, I also try to reach people where their attention is focused. Whether it’s Facebook (facebook.com/briansolis), Twitter (@briansolis), Google+ (+briansolis), Youtube (briansolis.tv) or through books and conferences, people can usually find me in a place of their choosing. Q: Recently, you’ve been working with us here at Oracle on something exciting coming up later this week. What’s on the horizon? A: I spent some time with the Oracle team reviewing the idea of Digital Darwinism and how technology and society are evolving faster than many organizations can adapt. Digital Darwinism: How Brands Can Survive the Rapid Evolution of Society and Technology Thursday, December 13, 2012, 10 a.m. PT / 1 p.m. ET Q: You’ve been very actively pursued for media interviews and conference and company speaking engagements – anything you’d like to share to give us a sneak peak of what to expect on Thursday’s webcast? A: We’re inviting guests to join us online as we dive into the future of business and how the convergence of technology and connected consumerism would ultimately impact how business is done. It’ll be an exciting and revealing conversation that explores just how much everything is changing. We’ll also review the importance of adapting to emergent trends and how to compete for the future. It’s important to recognize that change is not happening to us, it’s happening because of us. We are part of the revolution and therefore we need to help organizations adapt from the inside out. Watch the Entire Oracle Social Business Thought Leaders Webcast Series On-Demand and Stay Tuned for More to Come in 2013!

Read the article

Search Results

Search found 4841 results on 194 pages for 'poor programmer'.

Page 186/194 | < Previous Page | 182 183 184 185 186 187 188 189 190 191 192 193 | Next Page >

- by Jonathan Leffler

- by Sarah

- by jethomas

- by Alberto Toglia

- by Karnak

- by Nissl

- by Tommy Fisk

- by Zwei Steinen

- by Daniel

- by porgarmingduod

- by Sarah

- by s_rathbone

- by ZeroG

- by Zac B

- by Aaron

- by slimchrisp

- by user787832

- by Mike

- by Asian Angel

- by Reed

- by Reed

- by Piet

- by George Clingerman

- by Paul White NZ

- by Michael Snow

< Previous Page | 182 183 184 185 186 187 188 189 190 191 192 193 | Next Page >