Search Results

Search found 5104 results on 205 pages for 'evolutionary algorithm'.

Page 173/205 | < Previous Page | 169 170 171 172 173 174 175 176 177 178 179 180  | Next Page >

  • can't compile min_element in c++

    - by Vincenzo
    This is my code: #include <algorithm> #include <vector> #include <string> using namespace std; class A { struct CompareMe { bool operator() (const string*& s1, const string*& s2) const { return true; } }; void f() { CompareMe comp; vector<string*> v; min_element(v.begin(), v.end(), comp); } }; And this is the error: error: no match for call to ‘(A::CompareMe) (std::string*&, std::string*&)’ test.cpp:7: note: candidates are: bool A::CompareMe::operator()(const std::string*&, const std::string*&) const I feel that there is some syntax defect, but can't find out which one. Please, help!

    Read the article

  • Generics not so generic !!

    - by Aymen
    Hi I tried to implement a generic binary search algorithm in scala. Here it is : type Ord ={ def <(x:Any):Boolean def >(x:Any):Boolean } def binSearch[T <: Ord ](x:T,start:Int,end:Int,t:Array[T]):Boolean = { if (start > end) return false val pos = (start + end ) / 2 if(t(pos)==x) true else if (t(pos) < x) binSearch(x,pos+1,end,t) else binSearch(x,start,pos-1,t) } everything is OK until I tried to actually use it (xD) : binSearch(3,0,4,Array(1,2,5,6)) the compiler is pretending that Int not a member of Ord, well what shall I do to solve this ? Thanks

    Read the article

  • How to take data from textarea and decrypt using javascript?

    - by user1657555
    I need to take data from a textarea on a website and decrypt it using a simple algorithm. The data is in the form of numbers separated by a comma. It also needs to read a space as a space. It looks like 42,54,57, ,57,40,57,44. Heres what I have so far: var my_textarea = $('textarea[name = "words"]').first(); var my_value = $(my_textarea).val(); var my_array = my_value.split(","); for (i=0; i < my_array.length; i++) { var nv = my_array - 124; var acv = nv + 34; var my_result = String.fromCharCode(acv); } prompt("", my_result);

    Read the article

  • Does anyone really understand how HFSC scheduling in Linux/BSD works?

    - by Mecki
    I read the original SIGCOMM '97 PostScript paper about HFSC, it is very technically, but I understand the basic concept. Instead of giving a linear service curve (as with pretty much every other scheduling algorithm), you can specify a convex or concave service curve and thus it is possible to decouple bandwidth and delay. However, even though this paper mentions to kind of scheduling algorithms being used (real-time and link-share), it always only mentions ONE curve per scheduling class (the decoupling is done by specifying this curve, only one curve is needed for that). Now HFSC has been implemented for BSD (OpenBSD, FreeBSD, etc.) using the ALTQ scheduling framework and it has been implemented Linux using the TC scheduling framework (part of iproute2). Both implementations added two additional service curves, that were NOT in the original paper! A real-time service curve and an upper-limit service curve. Again, please note that the original paper mentions two scheduling algorithms (real-time and link-share), but in that paper both work with one single service curve. There never have been two independent service curves for either one as you currently find in BSD and Linux. Even worse, some version of ALTQ seems to add an additional queue priority to HSFC (there is no such thing as priority in the original paper either). I found several BSD HowTo's mentioning this priority setting (even though the man page of the latest ALTQ release knows no such parameter for HSFC, so officially it does not even exist). This all makes the HFSC scheduling even more complex than the algorithm described in the original paper and there are tons of tutorials on the Internet that often contradict each other, one claiming the opposite of the other one. This is probably the main reason why nobody really seems to understand how HFSC scheduling really works. Before I can ask my questions, we need a sample setup of some kind. I'll use a very simple one as seen in the image below: Here are some questions I cannot answer because the tutorials contradict each other: What for do I need a real-time curve at all? Assuming A1, A2, B1, B2 are all 128 kbit/s link-share (no real-time curve for either one), then each of those will get 128 kbit/s if the root has 512 kbit/s to distribute (and A and B are both 256 kbit/s of course), right? Why would I additionally give A1 and B1 a real-time curve with 128 kbit/s? What would this be good for? To give those two a higher priority? According to original paper I can give them a higher priority by using a curve, that's what HFSC is all about after all. By giving both classes a curve of [256kbit/s 20ms 128kbit/s] both have twice the priority than A2 and B2 automatically (still only getting 128 kbit/s on average) Does the real-time bandwidth count towards the link-share bandwidth? E.g. if A1 and B1 both only have 64kbit/s real-time and 64kbit/s link-share bandwidth, does that mean once they are served 64kbit/s via real-time, their link-share requirement is satisfied as well (they might get excess bandwidth, but lets ignore that for a second) or does that mean they get another 64 kbit/s via link-share? So does each class has a bandwidth "requirement" of real-time plus link-share? Or does a class only have a higher requirement than the real-time curve if the link-share curve is higher than the real-time curve (current link-share requirement equals specified link-share requirement minus real-time bandwidth already provided to this class)? Is upper limit curve applied to real-time as well, only to link-share, or maybe to both? Some tutorials say one way, some say the other way. Some even claim upper-limit is the maximum for real-time bandwidth + link-share bandwidth? What is the truth? Assuming A2 and B2 are both 128 kbit/s, does it make any difference if A1 and B1 are 128 kbit/s link-share only, or 64 kbit/s real-time and 128 kbit/s link-share, and if so, what difference? If I use the seperate real-time curve to increase priorities of classes, why would I need "curves" at all? Why is not real-time a flat value and link-share also a flat value? Why are both curves? The need for curves is clear in the original paper, because there is only one attribute of that kind per class. But now, having three attributes (real-time, link-share, and upper-limit) what for do I still need curves on each one? Why would I want the curves shape (not average bandwidth, but their slopes) to be different for real-time and link-share traffic? According to the little documentation available, real-time curve values are totally ignored for inner classes (class A and B), they are only applied to leaf classes (A1, A2, B1, B2). If that is true, why does the ALTQ HFSC sample configuration (search for 3.3 Sample configuration) set real-time curves on inner classes and claims that those set the guaranteed rate of those inner classes? Isn't that completely pointless? (note: pshare sets the link-share curve in ALTQ and grate the real-time curve; you can see this in the paragraph above the sample configuration). Some tutorials say the sum of all real-time curves may not be higher than 80% of the line speed, others say it must not be higher than 70% of the line speed. Which one is right or are they maybe both wrong? One tutorial said you shall forget all the theory. No matter how things really work (schedulers and bandwidth distribution), imagine the three curves according to the following "simplified mind model": real-time is the guaranteed bandwidth that this class will always get. link-share is the bandwidth that this class wants to become fully satisfied, but satisfaction cannot be guaranteed. In case there is excess bandwidth, the class might even get offered more bandwidth than necessary to become satisfied, but it may never use more than upper-limit says. For all this to work, the sum of all real-time bandwidths may not be above xx% of the line speed (see question above, the percentage varies). Question: Is this more or less accurate or a total misunderstanding of HSFC? And if assumption above is really accurate, where is prioritization in that model? E.g. every class might have a real-time bandwidth (guaranteed), a link-share bandwidth (not guaranteed) and an maybe an upper-limit, but still some classes have higher priority needs than other classes. In that case I must still prioritize somehow, even among real-time traffic of those classes. Would I prioritize by the slope of the curves? And if so, which curve? The real-time curve? The link-share curve? The upper-limit curve? All of them? Would I give all of them the same slope or each a different one and how to find out the right slope? I still haven't lost hope that there exists at least a hand full of people in this world that really understood HFSC and are able to answer all these questions accurately. And doing so without contradicting each other in the answers would be really nice ;-)

    Read the article

  • Does anyone really understand how HFSC scheduling in Linux/BSD works?

    - by Mecki
    I read the original SIGCOMM '97 PostScript paper about HFSC, it is very technically, but I understand the basic concept. Instead of giving a linear service curve (as with pretty much every other scheduling algorithm), you can specify a convex or concave service curve and thus it is possible to decouple bandwidth and delay. However, even though this paper mentions to kind of scheduling algorithms being used (real-time and link-share), it always only mentions ONE curve per scheduling class (the decoupling is done by specifying this curve, only one curve is needed for that). Now HFSC has been implemented for BSD (OpenBSD, FreeBSD, etc.) using the ALTQ scheduling framework and it has been implemented Linux using the TC scheduling framework (part of iproute2). Both implementations added two additional service curves, that were NOT in the original paper! A real-time service curve and an upper-limit service curve. Again, please note that the original paper mentions two scheduling algorithms (real-time and link-share), but in that paper both work with one single service curve. There never have been two independent service curves for either one as you currently find in BSD and Linux. Even worse, some version of ALTQ seems to add an additional queue priority to HSFC (there is no such thing as priority in the original paper either). I found several BSD HowTo's mentioning this priority setting (even though the man page of the latest ALTQ release knows no such parameter for HSFC, so officially it does not even exist). This all makes the HFSC scheduling even more complex than the algorithm described in the original paper and there are tons of tutorials on the Internet that often contradict each other, one claiming the opposite of the other one. This is probably the main reason why nobody really seems to understand how HFSC scheduling really works. Before I can ask my questions, we need a sample setup of some kind. I'll use a very simple one as seen in the image below: Here are some questions I cannot answer because the tutorials contradict each other: What for do I need a real-time curve at all? Assuming A1, A2, B1, B2 are all 128 kbit/s link-share (no real-time curve for either one), then each of those will get 128 kbit/s if the root has 512 kbit/s to distribute (and A and B are both 256 kbit/s of course), right? Why would I additionally give A1 and B1 a real-time curve with 128 kbit/s? What would this be good for? To give those two a higher priority? According to original paper I can give them a higher priority by using a curve, that's what HFSC is all about after all. By giving both classes a curve of [256kbit/s 20ms 128kbit/s] both have twice the priority than A2 and B2 automatically (still only getting 128 kbit/s on average) Does the real-time bandwidth count towards the link-share bandwidth? E.g. if A1 and B1 both only have 64kbit/s real-time and 64kbit/s link-share bandwidth, does that mean once they are served 64kbit/s via real-time, their link-share requirement is satisfied as well (they might get excess bandwidth, but lets ignore that for a second) or does that mean they get another 64 kbit/s via link-share? So does each class has a bandwidth "requirement" of real-time plus link-share? Or does a class only have a higher requirement than the real-time curve if the link-share curve is higher than the real-time curve (current link-share requirement equals specified link-share requirement minus real-time bandwidth already provided to this class)? Is upper limit curve applied to real-time as well, only to link-share, or maybe to both? Some tutorials say one way, some say the other way. Some even claim upper-limit is the maximum for real-time bandwidth + link-share bandwidth? What is the truth? Assuming A2 and B2 are both 128 kbit/s, does it make any difference if A1 and B1 are 128 kbit/s link-share only, or 64 kbit/s real-time and 128 kbit/s link-share, and if so, what difference? If I use the seperate real-time curve to increase priorities of classes, why would I need "curves" at all? Why is not real-time a flat value and link-share also a flat value? Why are both curves? The need for curves is clear in the original paper, because there is only one attribute of that kind per class. But now, having three attributes (real-time, link-share, and upper-limit) what for do I still need curves on each one? Why would I want the curves shape (not average bandwidth, but their slopes) to be different for real-time and link-share traffic? According to the little documentation available, real-time curve values are totally ignored for inner classes (class A and B), they are only applied to leaf classes (A1, A2, B1, B2). If that is true, why does the ALTQ HFSC sample configuration (search for 3.3 Sample configuration) set real-time curves on inner classes and claims that those set the guaranteed rate of those inner classes? Isn't that completely pointless? (note: pshare sets the link-share curve in ALTQ and grate the real-time curve; you can see this in the paragraph above the sample configuration). Some tutorials say the sum of all real-time curves may not be higher than 80% of the line speed, others say it must not be higher than 70% of the line speed. Which one is right or are they maybe both wrong? One tutorial said you shall forget all the theory. No matter how things really work (schedulers and bandwidth distribution), imagine the three curves according to the following "simplified mind model": real-time is the guaranteed bandwidth that this class will always get. link-share is the bandwidth that this class wants to become fully satisfied, but satisfaction cannot be guaranteed. In case there is excess bandwidth, the class might even get offered more bandwidth than necessary to become satisfied, but it may never use more than upper-limit says. For all this to work, the sum of all real-time bandwidths may not be above xx% of the line speed (see question above, the percentage varies). Question: Is this more or less accurate or a total misunderstanding of HSFC? And if assumption above is really accurate, where is prioritization in that model? E.g. every class might have a real-time bandwidth (guaranteed), a link-share bandwidth (not guaranteed) and an maybe an upper-limit, but still some classes have higher priority needs than other classes. In that case I must still prioritize somehow, even among real-time traffic of those classes. Would I prioritize by the slope of the curves? And if so, which curve? The real-time curve? The link-share curve? The upper-limit curve? All of them? Would I give all of them the same slope or each a different one and how to find out the right slope? I still haven't lost hope that there exists at least a hand full of people in this world that really understood HFSC and are able to answer all these questions accurately. And doing so without contradicting each other in the answers would be really nice ;-)

    Read the article

  • Strange Recurrent Excessive I/O Wait

    - by Chris
    I know quite well that I/O wait has been discussed multiple times on this site, but all the other topics seem to cover constant I/O latency, while the I/O problem we need to solve on our server occurs at irregular (short) intervals, but is ever-present with massive spikes of up to 20k ms a-wait and service times of 2 seconds. The disk affected is /dev/sdb (Seagate Barracuda, for details see below). A typical iostat -x output would at times look like this, which is an extreme sample but by no means rare: iostat (Oct 6, 2013) tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 16.00 0.00 156.00 9.75 21.89 288.12 36.00 57.60 5.50 0.00 44.00 8.00 48.79 2194.18 181.82 100.00 2.00 0.00 16.00 8.00 46.49 3397.00 500.00 100.00 4.50 0.00 40.00 8.89 43.73 5581.78 222.22 100.00 14.50 0.00 148.00 10.21 13.76 5909.24 68.97 100.00 1.50 0.00 12.00 8.00 8.57 7150.67 666.67 100.00 0.50 0.00 4.00 8.00 6.31 10168.00 2000.00 100.00 2.00 0.00 16.00 8.00 5.27 11001.00 500.00 100.00 0.50 0.00 4.00 8.00 2.96 17080.00 2000.00 100.00 34.00 0.00 1324.00 9.88 1.32 137.84 4.45 59.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 22.00 44.00 204.00 11.27 0.01 0.27 0.27 0.60 Let me provide you with some more information regarding the hardware. It's a Dell 1950 III box with Debian as OS where uname -a reports the following: Linux xx 2.6.32-5-amd64 #1 SMP Fri Feb 15 15:39:52 UTC 2013 x86_64 GNU/Linux The machine is a dedicated server that hosts an online game without any databases or I/O heavy applications running. The core application consumes about 0.8 of the 8 GBytes RAM, and the average CPU load is relatively low. The game itself, however, reacts rather sensitive towards I/O latency and thus our players experience massive ingame lag, which we would like to address as soon as possible. iostat: avg-cpu: %user %nice %system %iowait %steal %idle 1.77 0.01 1.05 1.59 0.00 95.58 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdb 13.16 25.42 135.12 504701011 2682640656 sda 1.52 0.74 20.63 14644533 409684488 Uptime is: 19:26:26 up 229 days, 17:26, 4 users, load average: 0.36, 0.37, 0.32 Harddisk controller: 01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04) Harddisks: Array 1, RAID-1, 2x Seagate Cheetah 15K.5 73 GB SAS Array 2, RAID-1, 2x Seagate ST3500620SS Barracuda ES.2 500GB 16MB 7200RPM SAS Partition information from df: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb1 480191156 30715200 425083668 7% /home /dev/sda2 7692908 437436 6864692 6% / /dev/sda5 15377820 1398916 13197748 10% /usr /dev/sda6 39159724 19158340 18012140 52% /var Some more data samples generated with iostat -dx sdb 1 (Oct 11, 2013) Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 15.00 0.00 70.00 0.00 656.00 9.37 4.50 1.83 4.80 33.60 sdb 0.00 0.00 0.00 2.00 0.00 16.00 8.00 12.00 836.00 500.00 100.00 sdb 0.00 0.00 0.00 3.00 0.00 32.00 10.67 9.96 1990.67 333.33 100.00 sdb 0.00 0.00 0.00 4.00 0.00 40.00 10.00 6.96 3075.00 250.00 100.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 0.00 0.00 100.00 sdb 0.00 0.00 0.00 2.00 0.00 16.00 8.00 2.62 4648.00 500.00 100.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 0.00 100.00 sdb 0.00 0.00 0.00 1.00 0.00 16.00 16.00 1.69 7024.00 1000.00 100.00 sdb 0.00 74.00 0.00 124.00 0.00 1584.00 12.77 1.09 67.94 6.94 86.00 Characteristic charts generated with rrdtool can be found here: iostat plot 1, 24 min interval: http://imageshack.us/photo/my-images/600/yqm3.png/ iostat plot 2, 120 min interval: http://imageshack.us/photo/my-images/407/griw.png/ As we have a rather large cache of 5.5 GBytes, we thought it might be a good idea to test if the I/O wait spikes would perhaps be caused by cache miss events. Therefore, we did a sync and then this to flush the cache and buffers: echo 3 > /proc/sys/vm/drop_caches and directly afterwards the I/O wait and service times virtually went through the roof, and everything on the machine felt like slow motion. During the next few hours the latency recovered and everything was as before - small to medium lags in short, unpredictable intervals. Now my question is: does anybody have any idea what might cause this annoying behaviour? Is it the first indication of the disk array or the raid controller dying, or something that can be easily mended by rebooting? (At the moment we're very reluctant to do this, however, because we're afraid that the disks might not come back up again.) Any help is greatly appreciated. Thanks in advance, Chris. Edited to add: we do see one or two processes go to 'D' state in top, one of which seems to be kjournald rather frequently. If I'm not mistaken, however, this does not indicate the processes causing the latency, but rather those affected by it - correct me if I'm wrong. Does the information about uninterruptibly sleeping processes help us in any way to address the problem? @Andy Shinn requested smartctl data, here it is: smartctl -a -d megaraid,2 /dev/sdb yields: smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Device: SEAGATE ST3500620SS Version: MS05 Serial number: Device type: disk Transport protocol: SAS Local Time is: Mon Oct 14 20:37:13 2013 CEST Device supports SMART and is Enabled Temperature Warning Disabled or Not Supported SMART Health Status: OK Current Drive Temperature: 20 C Drive Trip Temperature: 68 C Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 1236631092 Blocks received from initiator = 1097862364 Blocks read from cache and sent to initiator = 1383620256 Number of read and write commands whose size <= segment size = 531295338 Number of read and write commands whose size > segment size = 51986460 Vendor (Seagate/Hitachi) factory information number of hours powered up = 36556.93 number of minutes until next internal SMART test = 32 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 509271032 47 0 509271079 509271079 20981.423 0 write: 0 0 0 0 0 5022.039 0 verify: 1870931090 196 0 1870931286 1870931286 100558.708 0 Non-medium error count: 0 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Completed 16 36538 - [- - -] # 2 Background short Completed 16 36514 - [- - -] # 3 Background short Completed 16 36490 - [- - -] # 4 Background short Completed 16 36466 - [- - -] # 5 Background short Completed 16 36442 - [- - -] # 6 Background long Completed 16 36420 - [- - -] # 7 Background short Completed 16 36394 - [- - -] # 8 Background short Completed 16 36370 - [- - -] # 9 Background long Completed 16 36364 - [- - -] #10 Background short Completed 16 36361 - [- - -] #11 Background long Completed 16 2 - [- - -] #12 Background short Completed 16 0 - [- - -] Long (extended) Self Test duration: 6798 seconds [113.3 minutes] smartctl -a -d megaraid,3 /dev/sdb yields: smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Device: SEAGATE ST3500620SS Version: MS05 Serial number: Device type: disk Transport protocol: SAS Local Time is: Mon Oct 14 20:37:26 2013 CEST Device supports SMART and is Enabled Temperature Warning Disabled or Not Supported SMART Health Status: OK Current Drive Temperature: 19 C Drive Trip Temperature: 68 C Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 288745640 Blocks received from initiator = 1097848399 Blocks read from cache and sent to initiator = 1304149705 Number of read and write commands whose size <= segment size = 527414694 Number of read and write commands whose size > segment size = 51986460 Vendor (Seagate/Hitachi) factory information number of hours powered up = 36596.83 number of minutes until next internal SMART test = 28 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 610862490 44 0 610862534 610862534 20470.133 0 write: 0 0 0 0 0 5022.480 0 verify: 2861227413 203 0 2861227616 2861227616 100872.443 0 Non-medium error count: 1 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Completed 16 36580 - [- - -] # 2 Background short Completed 16 36556 - [- - -] # 3 Background short Completed 16 36532 - [- - -] # 4 Background short Completed 16 36508 - [- - -] # 5 Background short Completed 16 36484 - [- - -] # 6 Background long Completed 16 36462 - [- - -] # 7 Background short Completed 16 36436 - [- - -] # 8 Background short Completed 16 36412 - [- - -] # 9 Background long Completed 16 36404 - [- - -] #10 Background short Completed 16 36401 - [- - -] #11 Background long Completed 16 2 - [- - -] #12 Background short Completed 16 0 - [- - -] Long (extended) Self Test duration: 6798 seconds [113.3 minutes]

    Read the article

  • Advanced TSQL Tuning: Why Internals Knowledge Matters

    - by Paul White
    There is much more to query tuning than reducing logical reads and adding covering nonclustered indexes.  Query tuning is not complete as soon as the query returns results quickly in the development or test environments.  In production, your query will compete for memory, CPU, locks, I/O and other resources on the server.  Today’s entry looks at some tuning considerations that are often overlooked, and shows how deep internals knowledge can help you write better TSQL. As always, we’ll need some example data.  In fact, we are going to use three tables today, each of which is structured like this: Each table has 50,000 rows made up of an INTEGER id column and a padding column containing 3,999 characters in every row.  The only difference between the three tables is in the type of the padding column: the first table uses CHAR(3999), the second uses VARCHAR(MAX), and the third uses the deprecated TEXT type.  A script to create a database with the three tables and load the sample data follows: USE master; GO IF DB_ID('SortTest') IS NOT NULL DROP DATABASE SortTest; GO CREATE DATABASE SortTest COLLATE LATIN1_GENERAL_BIN; GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest', SIZE = 3GB, MAXSIZE = 3GB ); GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest_log', SIZE = 256MB, MAXSIZE = 1GB, FILEGROWTH = 128MB ); GO ALTER DATABASE SortTest SET ALLOW_SNAPSHOT_ISOLATION OFF ; ALTER DATABASE SortTest SET AUTO_CLOSE OFF ; ALTER DATABASE SortTest SET AUTO_CREATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_SHRINK OFF ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS_ASYNC ON ; ALTER DATABASE SortTest SET PARAMETERIZATION SIMPLE ; ALTER DATABASE SortTest SET READ_COMMITTED_SNAPSHOT OFF ; ALTER DATABASE SortTest SET MULTI_USER ; ALTER DATABASE SortTest SET RECOVERY SIMPLE ; USE SortTest; GO CREATE TABLE dbo.TestCHAR ( id INTEGER IDENTITY (1,1) NOT NULL, padding CHAR(3999) NOT NULL,   CONSTRAINT [PK dbo.TestCHAR (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestMAX ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL,   CONSTRAINT [PK dbo.TestMAX (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestTEXT ( id INTEGER IDENTITY (1,1) NOT NULL, padding TEXT NOT NULL,   CONSTRAINT [PK dbo.TestTEXT (id)] PRIMARY KEY CLUSTERED (id), ) ; -- ============= -- Load TestCHAR (about 3s) -- ============= INSERT INTO dbo.TestCHAR WITH (TABLOCKX) ( padding ) SELECT padding = REPLICATE(CHAR(65 + (Data.n % 26)), 3999) FROM ( SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1 FROM master.sys.columns C1, master.sys.columns C2, master.sys.columns C3 ORDER BY n ASC ) AS Data ORDER BY Data.n ASC ; -- ============ -- Load TestMAX (about 3s) -- ============ INSERT INTO dbo.TestMAX WITH (TABLOCKX) ( padding ) SELECT CONVERT(VARCHAR(MAX), padding) FROM dbo.TestCHAR ORDER BY id ; -- ============= -- Load TestTEXT (about 5s) -- ============= INSERT INTO dbo.TestTEXT WITH (TABLOCKX) ( padding ) SELECT CONVERT(TEXT, padding) FROM dbo.TestCHAR ORDER BY id ; -- ========== -- Space used -- ========== -- EXECUTE sys.sp_spaceused @objname = 'dbo.TestCHAR'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAX'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestTEXT'; ; CHECKPOINT ; That takes around 15 seconds to run, and shows the space allocated to each table in its output: To illustrate the points I want to make today, the example task we are going to set ourselves is to return a random set of 150 rows from each table.  The basic shape of the test query is the same for each of the three test tables: SELECT TOP (150) T.id, T.padding FROM dbo.Test AS T ORDER BY NEWID() OPTION (MAXDOP 1) ; Test 1 – CHAR(3999) Running the template query shown above using the TestCHAR table as the target, we find that the query takes around 5 seconds to return its results.  This seems slow, considering that the table only has 50,000 rows.  Working on the assumption that generating a GUID for each row is a CPU-intensive operation, we might try enabling parallelism to see if that speeds up the response time.  Running the query again (but without the MAXDOP 1 hint) on a machine with eight logical processors, the query now takes 10 seconds to execute – twice as long as when run serially. Rather than attempting further guesses at the cause of the slowness, let’s go back to serial execution and add some monitoring.  The script below monitors STATISTICS IO output and the amount of tempdb used by the test query.  We will also run a Profiler trace to capture any warnings generated during query execution. DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TC.id, TC.padding FROM dbo.TestCHAR AS TC ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; Let’s take a closer look at the statistics and query plan generated from this: Following the flow of the data from right to left, we see the expected 50,000 rows emerging from the Clustered Index Scan, with a total estimated size of around 191MB.  The Compute Scalar adds a column containing a random GUID (generated from the NEWID() function call) for each row.  With this extra column in place, the size of the data arriving at the Sort operator is estimated to be 192MB. Sort is a blocking operator – it has to examine all of the rows on its input before it can produce its first row of output (the last row received might sort first).  This characteristic means that Sort requires a memory grant – memory allocated for the query’s use by SQL Server just before execution starts.  In this case, the Sort is the only memory-consuming operator in the plan, so it has access to the full 243MB (248,696KB) of memory reserved by SQL Server for this query execution. Notice that the memory grant is significantly larger than the expected size of the data to be sorted.  SQL Server uses a number of techniques to speed up sorting, some of which sacrifice size for comparison speed.  Sorts typically require a very large number of comparisons, so this is usually a very effective optimization.  One of the drawbacks is that it is not possible to exactly predict the sort space needed, as it depends on the data itself.  SQL Server takes an educated guess based on data types, sizes, and the number of rows expected, but the algorithm is not perfect. In spite of the large memory grant, the Profiler trace shows a Sort Warning event (indicating that the sort ran out of memory), and the tempdb usage monitor shows that 195MB of tempdb space was used – all of that for system use.  The 195MB represents physical write activity on tempdb, because SQL Server strictly enforces memory grants – a query cannot ‘cheat’ and effectively gain extra memory by spilling to tempdb pages that reside in memory.  Anyway, the key point here is that it takes a while to write 195MB to disk, and this is the main reason that the query takes 5 seconds overall. If you are wondering why using parallelism made the problem worse, consider that eight threads of execution result in eight concurrent partial sorts, each receiving one eighth of the memory grant.  The eight sorts all spilled to tempdb, resulting in inefficiencies as the spilled sorts competed for disk resources.  More importantly, there are specific problems at the point where the eight partial results are combined, but I’ll cover that in a future post. CHAR(3999) Performance Summary: 5 seconds elapsed time 243MB memory grant 195MB tempdb usage 192MB estimated sort set 25,043 logical reads Sort Warning Test 2 – VARCHAR(MAX) We’ll now run exactly the same test (with the additional monitoring) on the table using a VARCHAR(MAX) padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TM.id, TM.padding FROM dbo.TestMAX AS TM ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query takes around 8 seconds to complete (3 seconds longer than Test 1).  Notice that the estimated row and data sizes are very slightly larger, and the overall memory grant has also increased very slightly to 245MB.  The most marked difference is in the amount of tempdb space used – this query wrote almost 391MB of sort run data to the physical tempdb file.  Don’t draw any general conclusions about VARCHAR(MAX) versus CHAR from this – I chose the length of the data specifically to expose this edge case.  In most cases, VARCHAR(MAX) performs very similarly to CHAR – I just wanted to make test 2 a bit more exciting. MAX Performance Summary: 8 seconds elapsed time 245MB memory grant 391MB tempdb usage 193MB estimated sort set 25,043 logical reads Sort warning Test 3 – TEXT The same test again, but using the deprecated TEXT data type for the padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TT.id, TT.padding FROM dbo.TestTEXT AS TT ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query runs in 500ms.  If you look at the metrics we have been checking so far, it’s not hard to understand why: TEXT Performance Summary: 0.5 seconds elapsed time 9MB memory grant 5MB tempdb usage 5MB estimated sort set 207 logical reads 596 LOB logical reads Sort warning SQL Server’s memory grant algorithm still underestimates the memory needed to perform the sorting operation, but the size of the data to sort is so much smaller (5MB versus 193MB previously) that the spilled sort doesn’t matter very much.  Why is the data size so much smaller?  The query still produces the correct results – including the large amount of data held in the padding column – so what magic is being performed here? TEXT versus MAX Storage The answer lies in how columns of the TEXT data type are stored.  By default, TEXT data is stored off-row in separate LOB pages – which explains why this is the first query we have seen that records LOB logical reads in its STATISTICS IO output.  You may recall from my last post that LOB data leaves an in-row pointer to the separate storage structure holding the LOB data. SQL Server can see that the full LOB value is not required by the query plan until results are returned, so instead of passing the full LOB value down the plan from the Clustered Index Scan, it passes the small in-row structure instead.  SQL Server estimates that each row coming from the scan will be 79 bytes long – 11 bytes for row overhead, 4 bytes for the integer id column, and 64 bytes for the LOB pointer (in fact the pointer is rather smaller – usually 16 bytes – but the details of that don’t really matter right now). OK, so this query is much more efficient because it is sorting a very much smaller data set – SQL Server delays retrieving the LOB data itself until after the Sort starts producing its 150 rows.  The question that normally arises at this point is: Why doesn’t SQL Server use the same trick when the padding column is defined as VARCHAR(MAX)? The answer is connected with the fact that if the actual size of the VARCHAR(MAX) data is 8000 bytes or less, it is usually stored in-row in exactly the same way as for a VARCHAR(8000) column – MAX data only moves off-row into LOB storage when it exceeds 8000 bytes.  The default behaviour of the TEXT type is to be stored off-row by default, unless the ‘text in row’ table option is set suitably and there is room on the page.  There is an analogous (but opposite) setting to control the storage of MAX data – the ‘large value types out of row’ table option.  By enabling this option for a table, MAX data will be stored off-row (in a LOB structure) instead of in-row.  SQL Server Books Online has good coverage of both options in the topic In Row Data. The MAXOOR Table The essential difference, then, is that MAX defaults to in-row storage, and TEXT defaults to off-row (LOB) storage.  You might be thinking that we could get the same benefits seen for the TEXT data type by storing the VARCHAR(MAX) values off row – so let’s look at that option now.  This script creates a fourth table, with the VARCHAR(MAX) data stored off-row in LOB pages: CREATE TABLE dbo.TestMAXOOR ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL,   CONSTRAINT [PK dbo.TestMAXOOR (id)] PRIMARY KEY CLUSTERED (id), ) ; EXECUTE sys.sp_tableoption @TableNamePattern = N'dbo.TestMAXOOR', @OptionName = 'large value types out of row', @OptionValue = 'true' ; SELECT large_value_types_out_of_row FROM sys.tables WHERE [schema_id] = SCHEMA_ID(N'dbo') AND name = N'TestMAXOOR' ; INSERT INTO dbo.TestMAXOOR WITH (TABLOCKX) ( padding ) SELECT SPACE(0) FROM dbo.TestCHAR ORDER BY id ; UPDATE TM WITH (TABLOCK) SET padding.WRITE (TC.padding, NULL, NULL) FROM dbo.TestMAXOOR AS TM JOIN dbo.TestCHAR AS TC ON TC.id = TM.id ; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAXOOR' ; CHECKPOINT ; Test 4 – MAXOOR We can now re-run our test on the MAXOOR (MAX out of row) table: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) MO.id, MO.padding FROM dbo.TestMAXOOR AS MO ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; TEXT Performance Summary: 0.3 seconds elapsed time 245MB memory grant 0MB tempdb usage 193MB estimated sort set 207 logical reads 446 LOB logical reads No sort warning The query runs very quickly – slightly faster than Test 3, and without spilling the sort to tempdb (there is no sort warning in the trace, and the monitoring query shows zero tempdb usage by this query).  SQL Server is passing the in-row pointer structure down the plan and only looking up the LOB value on the output side of the sort. The Hidden Problem There is still a huge problem with this query though – it requires a 245MB memory grant.  No wonder the sort doesn’t spill to tempdb now – 245MB is about 20 times more memory than this query actually requires to sort 50,000 records containing LOB data pointers.  Notice that the estimated row and data sizes in the plan are the same as in test 2 (where the MAX data was stored in-row). The optimizer assumes that MAX data is stored in-row, regardless of the sp_tableoption setting ‘large value types out of row’.  Why?  Because this option is dynamic – changing it does not immediately force all MAX data in the table in-row or off-row, only when data is added or actually changed.  SQL Server does not keep statistics to show how much MAX or TEXT data is currently in-row, and how much is stored in LOB pages.  This is an annoying limitation, and one which I hope will be addressed in a future version of the product. So why should we worry about this?  Excessive memory grants reduce concurrency and may result in queries waiting on the RESOURCE_SEMAPHORE wait type while they wait for memory they do not need.  245MB is an awful lot of memory, especially on 32-bit versions where memory grants cannot use AWE-mapped memory.  Even on a 64-bit server with plenty of memory, do you really want a single query to consume 0.25GB of memory unnecessarily?  That’s 32,000 8KB pages that might be put to much better use. The Solution The answer is not to use the TEXT data type for the padding column.  That solution happens to have better performance characteristics for this specific query, but it still results in a spilled sort, and it is hard to recommend the use of a data type which is scheduled for removal.  I hope it is clear to you that the fundamental problem here is that SQL Server sorts the whole set arriving at a Sort operator.  Clearly, it is not efficient to sort the whole table in memory just to return 150 rows in a random order. The TEXT example was more efficient because it dramatically reduced the size of the set that needed to be sorted.  We can do the same thing by selecting 150 unique keys from the table at random (sorting by NEWID() for example) and only then retrieving the large padding column values for just the 150 rows we need.  The following script implements that idea for all four tables: SET STATISTICS IO ON ; WITH TestTable AS ( SELECT * FROM dbo.TestCHAR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id = ANY (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAX ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestTEXT ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAXOOR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; All four queries now return results in much less than a second, with memory grants between 6 and 12MB, and without spilling to tempdb.  The small remaining inefficiency is in reading the id column values from the clustered primary key index.  As a clustered index, it contains all the in-row data at its leaf.  The CHAR and VARCHAR(MAX) tables store the padding column in-row, so id values are separated by a 3999-character column, plus row overhead.  The TEXT and MAXOOR tables store the padding values off-row, so id values in the clustered index leaf are separated by the much-smaller off-row pointer structure.  This difference is reflected in the number of logical page reads performed by the four queries: Table 'TestCHAR' logical reads 25511 lob logical reads 000 Table 'TestMAX'. logical reads 25511 lob logical reads 000 Table 'TestTEXT' logical reads 00412 lob logical reads 597 Table 'TestMAXOOR' logical reads 00413 lob logical reads 446 We can increase the density of the id values by creating a separate nonclustered index on the id column only.  This is the same key as the clustered index, of course, but the nonclustered index will not include the rest of the in-row column data. CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestCHAR (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAX (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestTEXT (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAXOOR (id); The four queries can now use the very dense nonclustered index to quickly scan the id values, sort them by NEWID(), select the 150 ids we want, and then look up the padding data.  The logical reads with the new indexes in place are: Table 'TestCHAR' logical reads 835 lob logical reads 0 Table 'TestMAX' logical reads 835 lob logical reads 0 Table 'TestTEXT' logical reads 686 lob logical reads 597 Table 'TestMAXOOR' logical reads 686 lob logical reads 448 With the new index, all four queries use the same query plan (click to enlarge): Performance Summary: 0.3 seconds elapsed time 6MB memory grant 0MB tempdb usage 1MB sort set 835 logical reads (CHAR, MAX) 686 logical reads (TEXT, MAXOOR) 597 LOB logical reads (TEXT) 448 LOB logical reads (MAXOOR) No sort warning I’ll leave it as an exercise for the reader to work out why trying to eliminate the Key Lookup by adding the padding column to the new nonclustered indexes would be a daft idea Conclusion This post is not about tuning queries that access columns containing big strings.  It isn’t about the internal differences between TEXT and MAX data types either.  It isn’t even about the cool use of UPDATE .WRITE used in the MAXOOR table load.  No, this post is about something else: Many developers might not have tuned our starting example query at all – 5 seconds isn’t that bad, and the original query plan looks reasonable at first glance.  Perhaps the NEWID() function would have been blamed for ‘just being slow’ – who knows.  5 seconds isn’t awful – unless your users expect sub-second responses – but using 250MB of memory and writing 200MB to tempdb certainly is!  If ten sessions ran that query at the same time in production that’s 2.5GB of memory usage and 2GB hitting tempdb.  Of course, not all queries can be rewritten to avoid large memory grants and sort spills using the key-lookup technique in this post, but that’s not the point either. The point of this post is that a basic understanding of execution plans is not enough.  Tuning for logical reads and adding covering indexes is not enough.  If you want to produce high-quality, scalable TSQL that won’t get you paged as soon as it hits production, you need a deep understanding of execution plans, and as much accurate, deep knowledge about SQL Server as you can lay your hands on.  The advanced database developer has a wide range of tools to use in writing queries that perform well in a range of circumstances. By the way, the examples in this post were written for SQL Server 2008.  They will run on 2005 and demonstrate the same principles, but you won’t get the same figures I did because 2005 had a rather nasty bug in the Top N Sort operator.  Fair warning: if you do decide to run the scripts on a 2005 instance (particularly the parallel query) do it before you head out for lunch… This post is dedicated to the people of Christchurch, New Zealand. © 2011 Paul White email: @[email protected] twitter: @SQL_Kiwi

    Read the article

  • SSIS Catalog, Windows updates and deployment failures due to System.Core mismatch

    - by jamiet
    This is a heads-up for anyone doing development on SSIS. On my current project where we are implementing a SQL Server Integration Services (SSIS) 2012 solution we recently encountered a situation where we were unable to deploy any of our projects even though we had successfully deployed in the past. Any attempt to use the deployment wizard resulted in this error dialog: The text of the error (for all you search engine crawlers out there) was: A .NET Framework error occurred during execution of user-defined routine or aggregate "create_key_information": System.IO.FileLoadException: Could not load file or assembly 'System.Core, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040) ---> System.IO.FileLoadException: The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040) System.IO.FileLoadException: System.IO.FileLoadException:     at Microsoft.SqlServer.IntegrationServices.Server.Security.CryptoGraphy.CreateSymmetricKey(String algorithm)    at Microsoft.SqlServer.IntegrationServices.Server.Security.CryptoGraphy.CreateKeyInformation(SqlString algorithmName, SqlBytes& key, SqlBytes& IV) . (Microsoft SQL Server, Error: 6522) After some investigation and a bit of back and forth with some very helpful members of the SSIS product team (hey Matt, Wee Hyong) it transpired that this was due to a .Net Framework fix that had been delivered via Windows Update. I took a look at the server update history and indeed there have been some recently applied .Net Framework updates: This fix had (in the words of Matt Masson) “somehow caused a mismatch on System.Core for SQLCLR” and, as you may know, SQLCLR is used heavily within the SSIS Catalog. The fix was pretty simple – restart SQL Server. This causes the assemblies to be upgraded automatically. If you are using Data Quality Services (DQS) you may have experienced similar problems which are documented at Upgrade SQLCLR Assemblies After .NET Framework Update. I am hoping the SSIS team will follow-up with a more thorough explanation on their blog soon. You DBAs out there may be questioning why Windows Update is set to automatically apply updates on our production servers. We’re checking that out with our hosting provider right now You have been warned! @Jamiet

    Read the article

  • Using MAC Authentication for simple Web API’s consumption

    - by cibrax
    For simple scenarios of Web API consumption where identity delegation is not required, traditional http authentication schemas such as basic, certificates or digest are the most used nowadays. All these schemas rely on sending the caller credentials or some representation of it in every request message as part of the Authorization header, so they are prone to suffer phishing attacks if they are not correctly secured at transport level with https. In addition, most client applications typically authenticate two different things, the caller application and the user consuming the API on behalf of that application. For most cases, the schema is simplified by using a single set of username and password for authenticating both, making necessary to store those credentials temporally somewhere in memory. The true is that you can use two different identities, one for the user running the application, which you might authenticate just once during the first call when the application is initialized, and another identity for the application itself that you use on every call. Some cloud vendors like Windows Azure or Amazon Web Services have adopted an schema to authenticate the caller application based on a Message Authentication Code (MAC) generated with a symmetric algorithm using a key known by the two parties, the caller and the Web API. The caller must include a MAC as part of the Authorization header created from different pieces of information in the request message such as the address, the host, and some other headers. The Web API can authenticate the caller by using the key associated to it and validating the attached MAC in the request message. In that way, no credentials are sent as part of the request message, so there is no way an attacker to intercept the message and get access to those credentials. Anyways, this schema also suffers from some deficiencies that can generate attacks. For example, brute force can be still used to infer the key used for generating the MAC, and impersonate the original caller. This can be mitigated by renewing keys in a relative short period of time. This schema as any other can be complemented with transport security. Eran Rammer, one of the brains behind OAuth, has recently published an specification of a protocol based on MAC for Http authentication called Hawk. The initial version of the spec is available here. A curious fact is that the specification per se does not exist, and the specification itself is the code that Eran initially wrote using node.js. In that implementation, you can associate a key to an user, so once the MAC has been verified on the Web API, the user can be inferred from that key. Also a timestamp is used to avoid replay attacks. As a pet project, I decided to port that code to .NET using ASP.NET Web API, which is available also in github under https://github.com/pcibraro/hawknet Enjoy!.

    Read the article

  • 2D Barcode Addendum

    - by Tim Dexter
    Having finally got my external drive back(long story) today from Oklahoma (thank you so much Sammy) Im back with a full compliment of Oracle and blogging tools at my disposal. I have missed JDeveloper this past week, which I have found, I immensely prefer over Eclipse (let the flaming commence :0) I use Zoundry Raven for writing articles and its not installed locally but on my external drove, so I have been soldiering on with the blog server's pain in the backside UI for writing. Now I have my favority editor back and things are calming down workwise, I will start to get the Excel template posts out. Today thou, a note about 2D barcode support or more specifically any barcode that needs some data manipulation before the barcode font is applied. I wrote about these fonts a long time back and laid out the java class you would need to write if you had an algorithm from the font manufacturer to use. I missed out a valuable point and James at Luminex fell into the trap. He was wanting to use the datamatrix font from IDAutomation but and had built the java class to be called from the RTF template but it was not encoding or at least did not appear to be. New debugging feature to the rescue. Kan over at the bipconsultng blog documented the feature a while back. Just adding <?xdo-debug-level:'STATEMENT'?> to my test template generated all the debug files in my c:\temp directory. No messing with files, just a simple command ... at last! Kan has documented the feature here. With the log in hand I spotted a java error stack referencing a missing code128a method, huh? Looking at James' class he had the following snippet: ENCODERS.put("code128a",mUtility.getClass().getMethod("code128a",clazz)); ENCODERS.put("code128b",mUtility.getClass().getMethod("code128b", clazz)); ENCODERS.put("code128c",mUtility.getClass().getMethod("code128c", clazz)); ENCODERS.put("pdf417",mUtility.getClass().getMethod("pdf417", clazz)); ENCODERS.put("datamatrix",mUtility.getClass().getMethod("datamatrix", clazz)); His class did not include the other code128 and pdf147 methods and BIP was expecting them. An easy fix, just comment them out, rebuild and deploy and the encoding started working. If you are hitting similar problems, check that class and ensure all of the referenced methods are available, if not, delete or get commenting. James now has purdy labels popping out that his hard ware can read, sweet!

    Read the article

  • Fair Comments

    - by Tony Davis
    To what extent is good code self-documenting? In one of the most entertaining sessions I saw at the recent PASS summit, Jeremiah Peschka (blog | twitter) got a laugh out of a sleepy post-lunch audience with the following remark: "Some developers say good code is self-documenting; I say, get off my team" I silently applauded the sentiment. It's not that all comments are useful, but that I mistrust the basic premise that "my code is so clearly written, it doesn't need any comments". I've read many pieces describing the road to self-documenting code, and my problem with most of them is that they feed the myth that comments in code are a sign of weakness. They aren't; in fact, used correctly I'd say they are essential. Regardless of how far intelligent naming can get you in describing what the code does, or how well any accompanying unit tests can explain to your fellow developers why it works that way, it's no excuse not to document fully the public interfaces to your code. Maybe I just mixed with the wrong crowd while learning my favorite language, but when I open a stored procedure I lose the will even to read it unless I see a big Phil Factor- or Jeff Moden-style header summarizing in plain English what the code does, how it fits in to the broader application, and a usage example. This public interface describes the high-level process and should explain the role of the code, clearly, for fellow developers, language non-experts, and even any non-technical stake holders in the project. When you step into the body of the code, the low-level details, then I agree that the rules are somewhat different; especially when code is subject to frequent refactoring that can quickly render comments redundant or misleading. At their worst, here, inline comments are sticking plaster to cover up the scars caused by poor naming conventions, failure in clarity when mapping a complex domain into code, or just by not entirely understanding the problem (/ this is the clever part). If you design and refactor your code carefully so that it is as simple as possible, your functions do one thing only, you avoid having two completely different algorithms in the same piece of code, and your functions, classes and variables are intelligently named, then, yes, the need for inline comments should be minimal. And yet, even given this, I'd still argue that many languages (T-SQL certainly being one) just don't lend themselves to readability when performing even moderately-complex tasks. If the algorithm is complex, I still like to see the occasional helpful comment. Please, therefore, be as liberal as you see fit in the detail of the comments you apply to this editorial, for like code it is bound to increase its' clarity and usefulness. Cheers, Tony.

    Read the article

  • CodePlex Daily Summary for Wednesday, November 23, 2011

    CodePlex Daily Summary for Wednesday, November 23, 2011Popular ReleasesVisual Leak Detector for Visual C++ 2008/2010: v2.2.1: Enhancements: * strdup and _wcsdup functions support added. * Preliminary support for VS 11 added. Bugs Fixed: * Low performance after upgrading from VLD v2.1. * Memory leaks with static linking fixed (disabled calloc support). * Runtime error R6002 fixed because of wrong memory dump format. * version.h fixed in installer. * Some PVS studio warning fixed.NetSqlAzMan - .NET SQL Authorization Manager: 3.6.0.10: 3.6.0.10 22-Nov-2011 Update: Removed PreEmptive Platform integration (PreEmptive analytics) Removed all PreEmptive attributes Removed PreEmptive.dll assembly references from all projects Added first support to ADAM/AD LDS Thanks to PatBea. Work Item 9775: http://netsqlazman.codeplex.com/workitem/9775Developer Team Article System Management: DTASM v1.3: ?? ??? ???? 3 ????? ???? ???? ????? ??? : - ????? ?????? ????? ???? ?? ??? ???? ????? ?? ??? ? ?? ???? ?????? ???? ?? ???? ????? ?? . - ??? ?? ???? ????? ???? ????? ???? ???? ?? ????? , ?????? ????? ????? ?? ??? . - ??? ??????? ??? ??? ???? ?? ????? ????? ????? .SharePoint 2010 FBA Pack: SharePoint 2010 FBA Pack 1.2.0: Web parts are now fully customizable via html templates (Issue #323) FBA Pack is now completely localizable using resource files. Thank you David Chen for submitting the code as well as Chinese translations of the FBA Pack! The membership request web part now gives the option of having the user enter the password and removing the captcha (Issue # 447) The FBA Pack will now work in a zone that does not have FBA enabled (Another zone must have FBA enabled, and the zone must contain the me...SharePoint 2010 Education Demo Project: Release SharePoint SP1 for Education Solutions: This release includes updates to the Content Packs for SharePoint SP1. All Content Packs have been updated to install successfully under SharePoint SP1SQL Monitor - tracking sql server activities: SQLMon 4.1 alpha 6: 1. improved support for schema 2. added find reference when right click on object list 3. added object rename supportBugNET Issue Tracker: BugNET 0.9.126: First stable release of version 0.9. Upgrades from 0.8 are fully supported and upgrades to future releases will also be supported. This release is now compiled against the .NET 4.0 framework and is a requirement. Because of this the web.config has significantly changed. After upgrading, you will need to configure the authentication settings for user registration and anonymous access again. Please see our installation / upgrade instructions for more details: http://wiki.bugnetproject.c...Anno 2070 Assistant: v0.1.0 (STABLE): Version 0.1.0 Features Production Chains Eco Production Chains (Complete) Tycoon Production Chains (Disabled - Incomplete) Tech Production Chains (Disabled - Incomplete) Supply (Disabled - Incomplete) Calculator (Disabled - Incomplete) Building Layouts Eco Building Layouts (Complete) Tycoon Building Layouts (Disabled - Incomplete) Tech Building Layouts (Disabled - Incomplete) Credits (Complete)Free SharePoint 2010 Sites Templates: SharePoint Server 2010 Sites Templates: here is the list of sites templates to be downloadedVsTortoise - a TortoiseSVN add-in for Microsoft Visual Studio: VsTortoise Build 30 Beta: Note: This release does not work with custom VsTortoise toolbars. These get removed every time when you shutdown Visual Studio. (#7940) Build 30 (beta)New: Support for TortoiseSVN 1.7 added. (the download contains both setups, for TortoiseSVN 1.6 and 1.7) New: OpenModifiedDocumentDialog displays conflicted files now. New: OpenModifiedDocument allows to group items by changelist now. Fix: OpenModifiedDocumentDialog caused Visual Studio 2010 to freeze sometimes. Fix: The installer didn...nopCommerce. Open source shopping cart (ASP.NET MVC): nopcommerce 2.30: Highlight features & improvements: • Performance optimization. • Back in stock notifications. • Product special price support. • Catalog mode (based on customer role) To see the full list of fixes and changes please visit the release notes page (http://www.nopCommerce.com/releasenotes.aspx).WPF Converters: WPF Converters V1.2.0.0: support for enumerations, value types, and reference types in the expression converter's equality operators the expression converter now handles DependencyProperty.UnsetValue as argument values correctly (#4062) StyleCop conformance (more or less)Json.NET: Json.NET 4.0 Release 4: Change - JsonTextReader.Culture is now CultureInfo.InvariantCulture by default Change - KeyValurPairConverter no longer cares about the order of the key and value properties Change - Time zone conversions now use new TimeZoneInfo instead of TimeZone Fix - Fixed boolean values sometimes being capitalized when converting to XML Fix - Fixed error when deserializing ConcurrentDictionary Fix - Fixed serializing some Uris returning the incorrect value Fix - Fixed occasional error when...Media Companion: MC 3.423b Weekly: Ensure .NET 4.0 Full Framework is installed. (Available from http://www.microsoft.com/download/en/details.aspx?id=17718) Ensure the NFO ID fix is applied when transitioning from versions prior to 3.416b. (Details here) Replaced 'Rebuild' with 'Refresh' throughout entire code. Rebuild will now be known as Refresh. mc_com.exe has been fully updated TV Show Resolutions... Resolved issue #206 - having to hit save twice when updating runtime manually Shrunk cache size and lowered loading times f...Delta Engine: Delta Engine Beta Preview v0.9.1: v0.9.1 beta release with lots of refactoring, fixes, new samples and support for iOS, Android and WP7 (you need a Marketplace account however). If you want a binary release for the games (like v0.9.0), just say so in the Forum or here and we will quickly prepare one. It is just not much different from v0.9.0, so I left it out this time. See http://DeltaEngine.net/Wiki.Roadmap for details.ASP.net Awesome Samples (Web-Forms): 1.0 samples: Demos and Tutorials for ASP.net Awesome VS2008 are in .NET 3.5 VS2010 are in .NET 4.0 (demos for the ASP.net Awesome jQuery Ajax Controls)SharpMap - Geospatial Application Framework for the CLR: SharpMap-0.9-AnyCPU-Trunk-2011.11.17: This is a build of SharpMap from the 0.9 development trunk as per 2011-11-17 For most applications the AnyCPU release is the recommended, but in case you need an x86 build that is included to. For some dataproviders (GDAL/OGR, SqLite, PostGis) you need to also referense the SharpMap.Extensions assembly For SqlServer Spatial you need to reference the SharpMap.SqlServerSpatial assemblyAJAX Control Toolkit: November 2011 Release: AJAX Control Toolkit Release Notes - November 2011 Release Version 51116November 2011 release of the AJAX Control Toolkit. AJAX Control Toolkit .NET 4 - Binary – AJAX Control Toolkit for .NET 4 and sample site (Recommended). AJAX Control Toolkit .NET 3.5 - Binary – AJAX Control Toolkit for .NET 3.5 and sample site (Recommended). Notes: - The current version of the AJAX Control Toolkit is not compatible with ASP.NET 2.0. The latest version that is compatible with ASP.NET 2.0 can be found h...Microsoft Ajax Minifier: Microsoft Ajax Minifier 4.36: Fix for issue #16908: string literals containing ASP.NET replacement syntax fail if the ASP.NET code contains the same character as the string literal delimiter. Also, we shouldn't be changing the delimiter for those literals or combining them with other literals; the developer may have specifically chosen the delimiter used because of possible content inserted by ASP.NET code. This logic is normally off; turn it on via the -aspnet command-line flag (or the Code.Settings.AllowEmbeddedAspNetBl...MVC Controls Toolkit: Mvc Controls Toolkit 1.5.5: Added: Now the DateRanteAttribute accepts complex expressions containing "Now" and "Today" as static minimum and maximum. Menu, MenuFor helpers capable of handling a "currently selected element". The developer can choose between using a standard nested menu based on a standard SimpleMenuItem class or specifying an item template based on a custom class. Added also helpers to build the tree structure containing all data items the menu takes infos from. Improved the pager. Now the developer ...New ProjectsActiveWorlds World Server Admin PowerShell SnapIn: The purpose of this PowerShell SnapIn is to provide a set of tools to administer the world server from PowerShell. It leverages the ActiveWorlds SDK .NET Wrapper to provide this functionality.Aigu: Enter special characters like you would on your mobile phone. For instance, if you want to type 'é', you just hold down 'e' and a menu will appear. Selected the desired character using the arrow keys and press 'enter'. Simple but powerful.Are you workaholic?: Are you a workaholic? Did your Doctor advice you not to stare at the computer monitor for a long time? Then this app is perfectly made for you. It runs in the background, and alerts you to take periodic rests for your eyes and body. What's more, It's open source (MS-PL).ATDIS PoC: privateAuto Version Web Assets: The AVWA project is an HTTP Module written in C# that is designed to allow for versioning of various web assets such as .CSS and .JS files. This allows you to publish new versions of these files without having to force the server or the client browsers to expire cache.Bachelor Thesis Algorithm Test Bed: Algorithm Test Bed for my Bachelor ThesisBase64: Simple application helps converting strings and files from or to Base64 string. You can use any encoding to convert while a sidebar previews decoded string for all other encodings.BoracayExpress: BoracayExpressC++ Framework for Test Driven Development: A testing framework for C++ written in C++.Class2Table: Class2Table aka Entity2Table. Easy tool that allows creation of SQL tables from .Net types.Code for Demos & Experiments: This is where I will post code from demos and presentationsCodeMaker: CodeMaker?????????: 1、?????????? 2、???? 3、????? 4、??Python????????? ConsoleCommand: ConsoleCommand provides certain .Net commands for access from javascript console engines. Included are commands to set the text and background colors, as well as list and extract resources compiled in a .Net dll. Converter: Character code conversion tools ???????? CryptoInator - self contained, self-encrypting, self-decrypting image viewer: Original developed to encrypt and store NemID images in Denmark. DAiBears: Something, something, botDelicious Notify Plugin: Lets you push a blog post straight to Delicious from Live WriterDeveloperFile: Compresses Javascripts using the YUI .NET project. Loops through the root folder and subfolders for files matching the debug extension and creates new files using the release extension. (File extensions must match exactly).DotNetNuke SharePoint File Explorer: A DotNetNuke SharePoint File ExplorerDouban FM: WP7 Douban FM appGame Lib: Game Library is a open-source game library to allow focusing on the fun part of a game. It is developed in C#, but will be ported to C++ and VB.net.Google reader notes to Delicious Export tool (WPF): Google reader discontinued note in reader features. Current google reader allows to export users old notes in JSON format, This App will parse the JSON file & upload it to it delicious , delicious is a good alternative for note in readerHtml Source Transmitter Control: This web control allows getting a source of a web page, that will displayed before submit. So, developer can store a view of the html page, that was before server exception. It helps to reproduce bugs and can be used with other logging systems.Ideopuzzle: A puzzle gameImageShack-Uploader: This project demonstrates how to upload files automated to imageshack.us and other image hosters with C#.Insert Acronym Tags: Lets you insert <acronym> and <abbr> tags into your blog entry more easily.Insert Quick Link: Allows you to paste a link into the Writer window and have the a window similar to the one in Writer where you can change what text is to appear, open in new window, etc.Insert Video Plugin: Allows you to insert a video into a blog entry from a multitude of different sitesIoCWrap: Provides a wrapper to the various IoC container implementations so that it is possible to switch to a different provider without changing any application code.kaveepoj: sharepoint projectKinect Quiz Engine: Fun quiz game for the Kinect.Klaverjas: Test application for testing different new technologies in .NET (WCF, DataServices, C# stuff, Entity...etc.)Man In The Middle: A cyberpunk themed action with puzzle and strategy elements. Made with XNA as part of a game development course at the IT University of Copenhagen by Bo Bendtsen, Jonas Flensbak, Daniel Kromand, Jess Rahbek & Darryl Woodford.MediaSelektor: Simple tool to select mediasMicajah Mindtouch Deki Wiki Copier: Small C# application to move data between 2 Deki Wiki installs or, more importantly, from a wik.is account to a locally installed systemMineFlagger: MineFlagger is a mine clearing game modeled after Microsoft’s Minesweeper. In addition to standard play, MineFlagger incorporates an AI for fun and training.myXbyqwrhjadsfasfhgf: myXbyqwrhjadsfasfhgfnatoop: natoopNauplius.KeyStore: Provides secure application key storage backed by SQL 2008 and Active Directory.ObjectDB: An object database written using C# 4 and Mono.Cecil.PaceR: PaceR is an attempt to encapsulate a lot of the common code functionality I use on different projects. Instead of recreating functionality from memory or worse, copying from older projects, I'd like to have a central location to maintain this common code. Parseq: Parseq is a Parser Combinator library written in C# (version 2.0).PowerShell Network Adapter Configuration module: PowerShell Network Adapter Configuration module is a PowerShell module which provides functions for managing network adapters using WMI.public traffic tracker: This is a university project for a .net course. We develop a public traffic tracker applications for Windows Phone 7 devices, that can give information about the actual positions of the nearest vehicle on a given line. The speciality is that we use only the GPS information of the users' WP7 devices, so this is a completely software solution without any hardware investment. The disatvantage is that for the real operation we would need a lot of active WP7 user.puyo: puyoRadioTroll: Projeto web Radio TrollRead Feed Community: Read Feed CommunityReviewer: Reviewer.dk - Dansk spil og anmeldelsessite.Rollout Sharepoint Solutions - ROSS: ROSS performs the following actions: - Delete sitecollection and restart services - 'Get Latest Version' from SourceSafe - Rebuild Solution - Install all wsp solutions - Create SiteCollections - Check for build en provisioning errors - Send email to developers if errors occurredSchool Management: school managementSQL File Executer: This project is a class library written in c# which is used for executing *.sql files in remote server. Simply one dll file. You include it in your web project, add using statement at the top of your page, pass the parameters inside. Rest, it will do.Startup Manager: Startup Manager launches all startup programs at a managed rate therefore meaning that your computer doesn't crash everytime it starts up and you can use it immediately.stetic: ...Test Infrastructure Guidance: The purpose of this project is to provide guidance to testers in using TFS effectively as an ALM solution. TFS is much more than a simple code repository. Used with Visual Studio it can form a powerful testing solution and remove a lot of pain in dealing with test infrastructure overhead.Tête-à-tête: Tete-a-tete is an address book with a built-in function to send electronic mail over the Internet.Tipeysh! - Add-in that helps you creating C/C++ header files on a single click: Are you also feel miserable when you need to create a new header file in your Visual Studio C/C++ project? Repeatedly choosing "new header file", then writing the annoying (but needed) "#ifndef" section, then writing the class name with it's "private", "protected" and "public" access modifiers... too much clicks and typewriting! Well, there is a solution: Tipeysh! is a simple, easy to use, very handy and configurable Visual Studio Add-In, compatible for both the 2005 and 2008 versions. Once ...UMN Dashboard Project: academic projUsersMOSS: UsersMOSS est une petite application permettant de consulter sur un serveur MOSS les sites web (SPWeb) les users (SPUser), et les groupes (SPGroup). Cette application utilise le modèle objet de MOSS pour inspecter le contenu des objets d'un serveur MOSS. Cette application est loin d'être professionnelle, ou même terminée, mais elle me rend très souvent service. Surtout ne l'utilisez pas sur un serveur de production car le gestion du GC n'est pas faite, ce qui peut provoquer des plantages de v...UtilityLibrary.Win32: UtilityLibrary.Win32UW iLearn: The iLearn activity inference platform is a suite of desktop and mobile tools for logging, modeling, and classifying sensor data for mobile devices. It was created at the University of Washington.VsDocGen: Dynamic javascript documentation generation directly from xml comment documented source code.Windows Live Spaces Photo Album plugin: This is going to be a plugin for Windows Live Writer that will allow you to browse a Windows Live Space Photo Album.Windows Live Writer Plugin for Amazon Books using CueCat: This Windows Live Writer Plugin is for users who use WLW and wish to use their CueCat to scan books. ItemLookups are run against Amazon via its AWS and book image, title, author, and publisher is returned. This project was first created by Scott Hanselman on MSDN's Coding4Fun! X7: X7 makes it easier for win7user to clean the system. You'll no longer have to delete useless stuff in your win7. It's developed in bat.xDT - Commander: Using this application, the user can assign shortcuts (short texts) for various links/URLs. These short texts will be typed into a Textbox to then launch/go to the target (similar to the "Run" program in Windows).

    Read the article

  • Can you Trust Search?

    - by David Dorf
    An awful lot of referrals to e-commerce sites come from web searches. Retailers rely on search engine optimization (SEO) to correctly position their website so they can be found. Search on "blue jeans" and the results are determined by a semi-secret algorithm -- in my case Levi.com, Banana Republic, and ShopStyle show up. The NY Times recently uncovered a situation where JCPenney, via third-parties hired to help with SEO, was caught manipulating search results so they were erroneously higher in page rankings. No doubt this helped drive additional sales during this part Christmas. The article, The Dirty Little Secrets of Search, is well worth reading. My friend Ron Kleinman started an interesting discussion at the ARTS Linkedin forum. He posed the question: The ability of a single company to "punish" any retailer (by significantly impacting their on-line sales volume) who does not play by their rules ... is this a good thing or a bad thing? Clearly JCP was in the wrong and needed to be punished, but should that decision lie with Google alone? Don't get me wrong -- I'm certainly not advocating we create a Department of Search where bureaucrats think of ways to spend money, but Google wields an awful lot of power in this situation, and it makes me feel uncomfortable. Now Google is incorporating more social aspects into their search results. For example, when Google knows its me (i.e. I'm logged in when using Google) search results will be influenced by my Twitter network. In an effort to increase relevance, the blogs and re-tweeted articles from my network will be higher in the search results than they otherwise would be. So in the case of product searches, things discussed in my network will rise to the top. Continuing my blue jean example, if someone in my network had been discussing Macy's perhaps they would now be higher in the result set. soapbox: I already have lots of spammers posting bogus comments to this blog in an effort to create additional links to their sites and thus increase their search ranking. Should I expect a similar situation in Twitter and eventually Facebook? Now retailers need to expand their SEO efforts to incorporate social media as well, but do us all a favor and please don't cheat.

    Read the article

  • Architecture strategies for a complex competition scoring system

    - by mikewassmer
    Competition description: There are about 10 teams competing against each other over a 6-week period. Each team's total score (out of a 1000 total available points) is based on the total of its scores in about 25,000 different scoring elements. Most scoring elements are worth a small fraction of a point and there will about 10 X 25,000 = 250,000 total raw input data points. The points for some scoring elements are awarded at frequent regular time intervals during the competition. The points for other scoring elements are awarded at either irregular time intervals or at just one moment in time. There are about 20 different types of scoring elements. Each of the 20 types of scoring elements has a different set of inputs, a different algorithm for calculating the earned score from the raw inputs, and a different number of total available points. The simplest algorithms require one input and one simple calculation. The most complex algorithms consist of hundreds or thousands of raw inputs and a more complicated calculation. Some types of raw inputs are automatically generated. Other types of raw inputs are manually entered. All raw inputs are subject to possible manual retroactive adjustments by competition officials. Primary requirements: The scoring system UI for competitors and other competition followers will show current and historical total team scores, team standings, team scores by scoring element, raw input data (at several levels of aggregation, e.g. daily, weekly, etc.), and other metrics. There will be charts, tables, and other widgets for displaying historical raw data inputs and scores. There will be a quasi-real-time dashboard that will show current scores and raw data inputs. Aggregate scores should be updated/refreshed whenever new raw data inputs arrive or existing raw data inputs are adjusted. There will be a "scorekeeper UI" for manually entering new inputs, manually adjusting existing inputs, and manually adjusting calculated scores. Decisions: Should the scoring calculations be performed on the database layer (T-SQL/SQL Server, in my case) or on the application layer (C#/ASP.NET MVC, in my case)? What are some recommended approaches for calculating updated total team scores whenever new raw inputs arrives? Calculating each of the teams' total scores from scratch every time a new input arrives will probably slow the system to a crawl. I've considered some kind of "diff" approach, but that approach may pose problems for ad-hoc queries and some aggegates. I'm trying draw some sports analogies, but it's tough because most games consist of no more than 20 or 30 scoring elements per game (I'm thinking of a high-scoring baseball game; football and soccer have fewer scoring events per game). Perhaps a financial balance sheet analogy makes more sense because financial "bottom line" calcs may be calculated from 250,000 or more transactions. Should I be making heavy use of caching for this application? Are there any obvious approaches or similar case studies that I may be overlooking?

    Read the article

  • SQL Server 2008 Compression

    - by Peter Larsson
    Hi! Today I am going to talk about compression in SQL Server 2008. The data warehouse I currently design and develop holds historical data back to 1973. The data warehouse will have an other blog post laster due to it's complexity. However, the server has 60GB of memory (of which 48 is dedicated to SQL Server service), so all data didn't fit in memory and the SAN is not the fastest one around. So I decided to give compression a go, since we use Enterprise Edition anyway. This is the code I use to compress all tables with PAGE compression. DECLARE @SQL VARCHAR(MAX)   DECLARE curTables CURSOR FOR             SELECT 'ALTER TABLE ' + QUOTENAME(OBJECT_SCHEMA_NAME(object_id))                     + '.' + QUOTENAME(OBJECT_NAME(object_id))                     + ' REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE)'             FROM    sys.tables   OPEN    curTables   FETCH   NEXT FROM    curTables INTO    @SQL   WHILE @@FETCH_STATUS = 0     BEGIN         IF @SQL IS NOT NULL             RAISERROR(@SQL, 10, 1) WITH NOWAIT           FETCH   NEXT         FROM    curTables         INTO    @SQL     END   CLOSE       curTables DEALLOCATE  curTables Copy and paste the result to a new code window and execute the statements. One thing I noticed when doing this, is that the database grows with the same size as the table. If the database cannot grow this size, the operation fails. For me, I first ended up with orphaned connection. Not good. And this is the code I use to create the index compression statements DECLARE @SQL VARCHAR(MAX)   DECLARE curIndexes CURSOR FOR             SELECT      'ALTER INDEX ' + QUOTENAME(name)                         + ' ON '                         + QUOTENAME(OBJECT_SCHEMA_NAME(object_id))                         + '.'                         + QUOTENAME(OBJECT_NAME(object_id))                         + ' REBUILD PARTITION = ALL WITH (FILLFACTOR = 100, DATA_COMPRESSION = PAGE)'             FROM        sys.indexes             WHERE       OBJECTPROPERTY(object_id, 'IsMSShipped') = 0                         AND OBJECTPROPERTY(object_id, 'IsTable') = 1             ORDER BY    CASE type_desc                             WHEN 'CLUSTERED' THEN 1                             ELSE 2                         END   OPEN    curIndexes   FETCH   NEXT FROM    curIndexes INTO    @SQL   WHILE @@FETCH_STATUS = 0     BEGIN         IF @SQL IS NOT NULL             RAISERROR(@SQL, 10, 1) WITH NOWAIT           FETCH   NEXT         FROM    curIndexes         INTO    @SQL     END   CLOSE       curIndexes DEALLOCATE  curIndexes When this was done, I noticed that the 90GB database now only was 17GB. And most important, complete database now could reside in memory! After this I took care of the administrative tasks, backups. Here I copied the code from Management Studio because I didn't want to give too much time for this. The code looks like (notice the compression option). BACKUP DATABASE [Yoda] TO              DISK = N'D:\Fileshare\Backup\Yoda.bak' WITH            NOFORMAT,                 INIT,                 NAME = N'Yoda - Full Database Backup',                 SKIP,                 NOREWIND,                 NOUNLOAD,                 COMPRESSION,                 STATS = 10,                 CHECKSUM GO   DECLARE @BackupSetID INT   SELECT  @BackupSetID = Position FROM    msdb..backupset WHERE   database_name = N'Yoda'         AND backup_set_id =(SELECT MAX(backup_set_id) FROM msdb..backupset WHERE database_name = N'Yoda')   IF @BackupSetID IS NULL     RAISERROR(N'Verify failed. Backup information for database ''Yoda'' not found.', 16, 1)   RESTORE VERIFYONLY FROM    DISK = N'D:\Fileshare\Backup\Yoda.bak' WITH    FILE = @BackupSetID,         NOUNLOAD,         NOREWIND GO After running the backup, the file size was even more reduced due to the zip-like compression algorithm used in SQL Server 2008. The file size? Only 9 GB. //Peso

    Read the article

  • Infinite detail inside Perlin noise procedural mapping

    - by Dave Jellison
    I am very new to game development but I was able to scour the internet to figure out Perlin noise enough to implement a very simple 2D tile infinite procedural world. Here's the question and it's more conceptual than code-based in answer, I think. I understand the concept of "I plug in (x, y) and get back from Perlin noise p" (I'll call it p). P will always be the same value for the same (x, y) (as long as the Perlin algorithm parameters haven't changed, like altering number of octaves, et cetera). What I want to do is be able to zoom into a square and be able to generate smaller squares inside of the already generated overhead tile of terrain. Let's say I have a jungle tile for overhead terrain but I want to zoom in and maybe see a small river tile that would only be a creek and not large enough to be a full "big tile" of water in the overhead. Of course, I want the same net effect as a Perlin equation inside a Perlin equation if that makes sense? (aka. I want two people playing the game with the same settings to get the same terrain and details every time). I can conceptually wrap my head around the large tile being based on an "zoomed out" coordinate leaving enough room to drill into but this approach doesn't make sense in my head (maybe I'm wrong). I'm guessing with this approach my overhead terrain would lose all of the cohesiveness delivered by the Perlin. Imagine I calculate (0, 0) as overhead tile 1 and then to the east of that I plug in (50, 0). OK, great, I now have 49 pixels of detail I could then "drill down" into. The issue I have in my head with this approach (without attempting it) is that there's no guarantee from my Perlin noise that (0,0) would be a good neighbor to (50,0) as they could have wildly different "elevations" or p/resultant values returning from the Perlin equation when I generate the overhead map. I think I can conceive of using the Perlin noise for the overhead tile to then reuse the p value as a seed for the "detail" level of noise once I zoom in. That would ensure my detail Perlin is always the same configuration for (0,0), (1,0), etc. ad nauseam but I'm not sure if there are better approaches out there or if this is a sound approach at all.

    Read the article

  • Towards Database Continuous Delivery – What Next after Continuous Integration? A Checklist

    - by Ben Rees
    .dbd-banner p{ font-size:0.75em; padding:0 0 10px; margin:0 } .dbd-banner p span{ color:#675C6D; } .dbd-banner p:last-child{ padding:0; } @media ALL and (max-width:640px){ .dbd-banner{ background:#f0f0f0; padding:5px; color:#333; margin-top: 5px; } } -- Database delivery patterns & practices STAGE 4 AUTOMATED DEPLOYMENT If you’ve been fortunate enough to get to the stage where you’ve implemented some sort of continuous integration process for your database updates, then hopefully you’re seeing the benefits of that investment – constant feedback on changes your devs are making, advanced warning of data loss (prior to the production release on Saturday night!), a nice suite of automated tests to check business logic, so you know it’s going to work when it goes live, and so on. But what next? What can you do to improve your delivery process further, moving towards a full continuous delivery process for your database? In this article I describe some of the issues you might need to tackle on the next stage of this journey, and how to plan to overcome those obstacles before they appear. Our Database Delivery Learning Program consists of four stages, really three – source controlling a database, running continuous integration processes, then how to set up automated deployment (the middle stage is split in two – basic and advanced continuous integration, making four stages in total). If you’ve managed to work through the first three of these stages – source control, basic, then advanced CI, then you should have a solid change management process set up where, every time one of your team checks in a change to your database (whether schema or static reference data), this change gets fully tested automatically by your CI server. But this is only part of the story. Great, we know that our updates work, that the upgrade process works, that the upgrade isn’t going to wipe our 4Tb of production data with a single DROP TABLE. But – how do you get this (fully tested) release live? Continuous delivery means being always ready to release your software at any point in time. There’s a significant gap between your latest version being tested, and it being easily releasable. Just a quick note on terminology – there’s a nice piece here from Atlassian on the difference between continuous integration, continuous delivery and continuous deployment. This piece also gives a nice description of the benefits of continuous delivery. These benefits have been summed up by Jez Humble at Thoughtworks as: “Continuous delivery is a set of principles and practices to reduce the cost, time, and risk of delivering incremental changes to users” There’s another really useful piece here on Simple-Talk about the need for continuous delivery and how it applies to the database written by Phil Factor – specifically the extra needs and complexities of implementing a full CD solution for the database (compared to just implementing CD for, say, a web app). So, hopefully you’re convinced of moving on the the next stage! The next step after CI is to get some sort of automated deployment (or “release management”) process set up. But what should I do next? What do I need to plan and think about for getting my automated database deployment process set up? Can’t I just install one of the many release management tools available and hey presto, I’m ready! If only it were that simple. Below I list some of the areas that it’s worth spending a little time on, where a little planning and prep could go a long way. It’s also worth pointing out, that this should really be an evolving process. Depending on your starting point of course, it can be a long journey from your current setup to a full continuous delivery pipeline. If you’ve got a CI mechanism in place, you’re certainly a long way down that path. Nevertheless, we’d recommend evolving your process incrementally. Pages 157 and 129-141 of the book on Continuous Delivery (by Jez Humble and Dave Farley) have some great guidance on building up a pipeline incrementally: http://www.amazon.com/Continuous-Delivery-Deployment-Automation-Addison-Wesley/dp/0321601912 For now, in this post, we’ll look at the following areas for your checklist: You and Your Team Environments The Deployment Process Rollback and Recovery Development Practices You and Your Team It’s a cliché in the DevOps community that “It’s not all about processes and tools, really it’s all about a culture”. As stated in this DevOps report from Puppet Labs: “DevOps processes and tooling contribute to high performance, but these practices alone aren’t enough to achieve organizational success. The most common barriers to DevOps adoption are cultural: lack of manager or team buy-in, or the value of DevOps isn’t understood outside of a specific group”. Like most clichés, there’s truth in there – if you want to set up a database continuous delivery process, you need to get your boss, your department, your company (if relevant) onside. Why? Because it’s an investment with the benefits coming way down the line. But the benefits are huge – for HP, in the book A Practical Approach to Large-Scale Agile Development: How HP Transformed LaserJet FutureSmart Firmware, these are summarized as: -2008 to present: overall development costs reduced by 40% -Number of programs under development increased by 140% -Development costs per program down 78% -Firmware resources now driving innovation increased by a factor of 8 (from 5% working on new features to 40% But what does this mean? It means that, when moving to the next stage, to make that extra investment in automating your deployment process, it helps a lot if everyone is convinced that this is a good thing. That they understand the benefits of automated deployment and are willing to make the effort to transform to a new way of working. Incidentally, if you’re ever struggling to convince someone of the value I’d strongly recommend just buying them a copy of this book – a great read, and a very practical guide to how it can really work at a large org. I’ve spoken to many customers who have implemented database CI who describe their deployment process as “The point where automation breaks down. Up to that point, the CI process runs, untouched by human hand, but as soon as that’s finished we revert to manual.” This deployment process can involve, for example, a DBA manually comparing an environment (say, QA) to production, creating the upgrade scripts, reading through them, checking them against an Excel document emailed to him/her the night before, turning to page 29 in his/her notebook to double-check how replication is switched off and on for deployments, and so on and so on. Painful, error-prone and lengthy. But the point is, if this is something like your deployment process, telling your DBA “We’re changing everything you do and your toolset next week, to automate most of your role – that’s okay isn’t it?” isn’t likely to go down well. There’s some work here to bring him/her onside – to explain what you’re doing, why there will still be control of the deployment process and so on. Or of course, if you’re the DBA looking after this process, you have to do a similar job in reverse. You may have researched and worked out how you’d like to change your methodology to start automating your painful release process, but do the dev team know this? What if they have to start producing different artifacts for you? Will they be happy with this? Worth talking to them, to find out. As well as talking to your DBA/dev team, the other group to get involved before implementation is your manager. And possibly your manager’s manager too. As mentioned, unless there’s buy-in “from the top”, you’re going to hit problems when the implementation starts to get rocky (and what tool/process implementations don’t get rocky?!). You need to have support from someone senior in your organisation – someone you can turn to when you need help with a delayed implementation, lack of resources or lack of progress. Actions: Get your DBA involved (or whoever looks after live deployments) and discuss what you’re planning to do or, if you’re the DBA yourself, get the dev team up-to-speed with your plans, Get your boss involved too and make sure he/she is bought in to the investment. Environments Where are you going to deploy to? And really this question is – what environments do you want set up for your deployment pipeline? Assume everyone has “Production”, but do you have a QA environment? Dedicated development environments for each dev? Proper pre-production? I’ve seen every setup under the sun, and there is often a big difference between “What we want, to do continuous delivery properly” and “What we’re currently stuck with”. Some of these differences are: What we want What we’ve got Each developer with their own dedicated database environment A single shared “development” environment, used by everyone at once An Integration box used to test the integration of all check-ins via the CI process, along with a full suite of unit-tests running on that machine In fact if you have a CI process running, you’re likely to have some sort of integration server running (even if you don’t call it that!). Whether you have a full suite of unit tests running is a different question… Separate QA environment used explicitly for manual testing prior to release “We just test on the dev environments, or maybe pre-production” A proper pre-production (or “staging”) box that matches production as closely as possible Hopefully a pre-production box of some sort. But does it match production closely!? A production environment reproducible from source control A production box which has drifted significantly from anything in source control The big question is – how much time and effort are you going to invest in fixing these issues? In reality this just involves figuring out which new databases you’re going to create and where they’ll be hosted – VMs? Cloud-based? What about size/data issues – what data are you going to include on dev environments? Does it need to be masked to protect access to production data? And often the amount of work here really depends on whether you’re working on a new, greenfield project, or trying to update an existing, brownfield application. There’s a world if difference between starting from scratch with 4 or 5 clean environments (reproducible from source control of course!), and trying to re-purpose and tweak a set of existing databases, with all of their surrounding processes and quirks. But for a proper release management process, ideally you have: Dedicated development databases, An Integration server used for testing continuous integration and running unit tests. [NB: This is the point at which deployments are automatic, without human intervention. Each deployment after this point is a one-click (but human) action], QA – QA engineers use a one-click deployment process to automatically* deploy chosen releases to QA for testing, Pre-production. The environment you use to test the production release process, Production. * A note on the use of the word “automatic” – when carrying out automated deployments this does not mean that the deployment is happening without human intervention (i.e. that something is just deploying over and over again). It means that the process of carrying out the deployment is automatic in that it’s not a person manually running through a checklist or set of actions. The deployment still requires a single-click from a user. Actions: Get your environments set up and ready, Set access permissions appropriately, Make sure everyone understands what the environments will be used for (it’s not a “free-for-all” with all environments to be accessed, played with and changed by development). The Deployment Process As described earlier, most existing database deployment processes are pretty manual. The following is a description of a process we hear very often when we ask customers “How do your database changes get live? How does your manual process work?” Check pre-production matches production (use a schema compare tool, like SQL Compare). Sometimes done by taking a backup from production and restoring in to pre-prod, Again, use a schema compare tool to find the differences between the latest version of the database ready to go live (i.e. what the team have been developing). This generates a script, User (generally, the DBA), reviews the script. This often involves manually checking updates against a spreadsheet or similar, Run the script on pre-production, and check there are no errors (i.e. it upgrades pre-production to what you hoped), If all working, run the script on production.* * this assumes there’s no problem with production drifting away from pre-production in the interim time period (i.e. someone has hacked something in to the production box without going through the proper change management process). This difference could undermine the validity of your pre-production deployment test. Red Gate is currently working on a free tool to detect this problem – sign up here at www.sqllighthouse.com, if you’re interested in testing early versions. There are several variations on this process – some better, some much worse! How do you automate this? In particular, step 3 – surely you can’t automate a DBA checking through a script, that everything is in order!? The key point here is to plan what you want in your new deployment process. There are so many options. At one extreme, pure continuous deployment – whenever a dev checks something in to source control, the CI process runs (including extensive and thorough testing!), before the deployment process keys in and automatically deploys that change to the live box. Not for the faint hearted – and really not something we recommend. At the other extreme, you might be more comfortable with a semi-automated process – the pre-production/production matching process is automated (with an error thrown if these environments don’t match), followed by a manual intervention, allowing for script approval by the DBA. One he/she clicks “Okay, I’m happy for that to go live”, the latter stages automatically take the script through to live. And anything in between of course – and other variations. But we’d strongly recommended sitting down with a whiteboard and your team, and spending a couple of hours mapping out “What do we do now?”, “What do we actually want?”, “What will satisfy our needs for continuous delivery, but still maintaining some sort of continuous control over the process?” NB: Most of what we’re discussing here is about production deployments. It’s important to note that you will also need to map out a deployment process for earlier environments (for example QA). However, these are likely to be less onerous, and many customers opt for a much more automated process for these boxes. Actions: Sit down with your team and a whiteboard, and draw out the answers to the questions above for your production deployments – “What do we do now?”, “What do we actually want?”, “What will satisfy our needs for continuous delivery, but still maintaining some sort of continuous control over the process?” Repeat for earlier environments (QA and so on). Rollback and Recovery If only every deployment went according to plan! Unfortunately they don’t – and when things go wrong, you need a rollback or recovery plan for what you’re going to do in that situation. Once you move in to a more automated database deployment process, you’re far more likely to be deploying more frequently than before. No longer once every 6 months, maybe now once per week, or even daily. Hence the need for a quick rollback or recovery process becomes paramount, and should be planned for. NB: These are mainly scenarios for handling rollbacks after the transaction has been committed. If a failure is detected during the transaction, the whole transaction can just be rolled back, no problem. There are various options, which we’ll explore in subsequent articles, things like: Immediately restore from backup, Have a pre-tested rollback script (remembering that really this is a “roll-forward” script – there’s not really such a thing as a rollback script for a database!) Have fallback environments – for example, using a blue-green deployment pattern. Different options have pros and cons – some are easier to set up, some require more investment in infrastructure; and of course some work better than others (the key issue with using backups, is loss of the interim transaction data that has been added between the failed deployment and the restore). The best mechanism will be primarily dependent on how your application works and how much you need a cast-iron failsafe mechanism. Actions: Work out an appropriate rollback strategy based on how your application and business works, your appetite for investment and requirements for a completely failsafe process. Development Practices This is perhaps the more difficult area for people to tackle. The process by which you can deploy database updates is actually intrinsically linked with the patterns and practices used to develop that database and linked application. So you need to decide whether you want to implement some changes to the way your developers actually develop the database (particularly schema changes) to make the deployment process easier. A good example is the pattern “Branch by abstraction”. Explained nicely here, by Martin Fowler, this is a process that can be used to make significant database changes (e.g. splitting a table) in a step-wise manner so that you can always roll back, without data loss – by making incremental updates to the database backward compatible. Slides 103-108 of the following slidedeck, from Niek Bartholomeus explain the process: https://speakerdeck.com/niekbartho/orchestration-in-meatspace As these slides show, by making a significant schema change in multiple steps – where each step can be rolled back without any loss of new data – this affords the release team the opportunity to have zero-downtime deployments with considerably less stress (because if an increment goes wrong, they can roll back easily). There are plenty more great patterns that can be implemented – the book Refactoring Databases, by Scott Ambler and Pramod Sadalage is a great read, if this is a direction you want to go in: http://www.amazon.com/Refactoring-Databases-Evolutionary-paperback-Addison-Wesley/dp/0321774515 But the question is – how much of this investment are you willing to make? How often are you making significant schema changes that would require these best practices? Again, there’s a difference here between migrating old projects and starting afresh – with the latter it’s much easier to instigate best practice from the start. Actions: For your business, work out how far down the path you want to go, amending your database development patterns to “best practice”. It’s a trade-off between implementing quality processes, and the necessity to do so (depending on how often you make complex changes). Socialise these changes with your development group. No-one likes having “best practice” changes imposed on them, so good to introduce these ideas and the rationale behind them early.   Summary The next stages of implementing a continuous delivery pipeline for your database changes (once you have CI up and running) require a little pre-planning, if you want to get the most out of the work, and for the implementation to go smoothly. We’ve covered some of the checklist of areas to consider – mainly in the areas of “Getting the team ready for the changes that are coming” and “Planning our your pipeline, environments, patterns and practices for development”, though there will be more detail, depending on where you’re coming from – and where you want to get to. This article is part of our database delivery patterns & practices series on Simple Talk. Find more articles for version control, automated testing, continuous integration & deployment.

    Read the article

  • Is commented out code really always bad?

    - by nikie
    Practically every text on code quality I've read agrees that commented out code is a bad thing. The usual example is that someone changed a line of code and left the old line there as a comment, apparently to confuse people who read the code later on. Of course, that's a bad thing. But I often find myself leaving commented out code in another situation: I write a computational-geometry or image processing algorithm. To understand this kind of code, and to find potential bugs in it, it's often very helpful to display intermediate results (e.g. draw a set of points to the screen or save a bitmap file). Looking at these values in the debugger usually means looking at a wall of numbers (coordinates, raw pixel values). Not very helpful. Writing a debugger visualizer every time would be overkill. I don't want to leave the visualization code in the final product (it hurts performance, and usually just confuses the end user), but I don't want to loose it, either. In C++, I can use #ifdef to conditionally compile that code, but I don't see much differnce between this: /* // Debug Visualization: draw set of found interest points for (int i=0; i<count; i++) DrawBox(pts[i].X, pts[i].Y, 5,5); */ and this: #ifdef DEBUG_VISUALIZATION_DRAW_INTEREST_POINTS for (int i=0; i<count; i++) DrawBox(pts[i].X, pts[i].Y, 5,5); #endif So, most of the time, I just leave the visualization code commented out, with a comment saying what is being visualized. When I read the code a year later, I'm usually happy I can just uncomment the visualization code and literally "see what's going on". Should I feel bad about that? Why? Is there a superior solution? Update: S. Lott asks in a comment Are you somehow "over-generalizing" all commented code to include debugging as well as senseless, obsolete code? Why are you making that overly-generalized conclusion? I recently read Robert Glass' "Clean Code", which says: Few practices are as odious as commenting-out code. Don't do this!. I've looked at the paragraph in the book again (p. 68), there's no qualification, no distinction made between different reasons for commenting out code. So I wondered if this rule is over-generalizing (or if I misunderstood the book) or if what I do is bad practice, for some reason I didn't know.

    Read the article

  • Why lock statements don't scale

    - by Alex.Davies
    We are going to have to stop using lock statements one day. Just like we had to stop using goto statements. The problem is similar, they're pretty easy to follow in small programs, but code with locks isn't composable. That means that small pieces of program that work in isolation can't necessarily be put together and work together. Of course actors scale fine :) Why lock statements don't scale as software gets bigger Deadlocks. You have a program with lots of threads picking up lots of locks. You already know that if two of your threads both try to pick up a lock that the other already has, they will deadlock. Your program will come to a grinding halt, and there will be fire and brimstone. "Easy!" you say, "Just make sure all the threads pick up the locks in the same order." Yes, that works. But you've broken composability. Now, to add a new lock to your code, you have to consider all the other locks already in your code and check that they are taken in the right order. Algorithm buffs will have noticed this approach means it takes quadratic time to write a program. That's bad. Why lock statements don't scale as hardware gets bigger Memory bus contention There's another headache, one that most programmers don't usually need to think about, but is going to bite us in a big way in a few years. Locking needs exclusive use of the entire system's memory bus while taking out the lock. That's not too bad for a single or dual-core system, but already for quad-core systems it's a pretty large overhead. Have a look at this blog about the .NET 4 ThreadPool for some numbers and a weird analogy (see the author's comment). Not too bad yet, but I'm scared my 1000 core machine of the future is going to go slower than my machine today! I don't know the answer to this problem yet. Maybe some kind of per-core work queue system with hierarchical work stealing. Definitely hardware support. But what I do know is that using locks specifically prevents any solution to this. We should be abstracting our code away from the details of locks as soon as possible, so we can swap in whatever solution arrives when it does. NAct uses locks at the moment. But my advice is that you code using actors (which do scale well as software gets bigger). And when there's a better way of implementing actors that'll scale well as hardware gets bigger, only NAct needs to work out how to use it, and your program will go fast on it's own.

    Read the article

  • How to implement an email unsubscribe system for a site with many kinds of emails?

    - by Mike Liu
    I'm working on a website that features many different types of emails. Users have accounts, and when logged in they have access to a setting page that they can use to customize what types of emails they receive. However, I'd like to also give users an easy way to unsubscribe directly in the emails they receive. I've looked into list unsubscribe headers as well as creating some type of one click link that would unsubscribe a user from that type of email without requiring login or further action. The later would probably require me to break convention and make changes to the database in response to a GET on the link. However, am I incorrect in thinking that either of these would require me to generate and permanently store a unique identifier in my database for every email I ever send, really complicating email delivery? Without that, I'm not sure how I would be able to uniquely identify a user and a type of email in order to change their email preferences, and this identifier would need to be stored forever as a user could have an email sitting in their inbox for a long time before they decide to act on it. Alternatively, I was considering having a no-login page for managing email preferences. In contrast to above where I would need one of these identifiers for each email, this would only need one identifier per user, with no generation or other action required on sending an email. All of these raise security issues, and they could potentially be used by people to tamper with others' email preferences. This could be mitigated somewhat by ensuring that the identifier is really difficult to guess. For the once per user identifier approach, I was considering generating the identifier by passing a user's ID through some type of encryption algorithm, is this a sound approach? For the per-email identifiers, perhaps I could use a user's ID appended to the time. However, even this would not eliminate the problem entirely, as this would really just be security through obscurity, and anyone with the URL could tamper, and in the end the main defense would have to be that most people aren't so bored as to tamper with other people's email preferences. Are there any other alternatives I've missed, or issues or solutions with these that anyone can provide insight on? What are best practices in this area?

    Read the article

  • It's called College.

    - by jeffreyabecker
    Today I saw yet another 'GUID vs int as your primary key' article. Like most of the ones I've read this was filled with technical misrepresentations and out-right fallices. Chef's famous line that "There's a time and a place for everything children" applies here. GUIDs have distinct advantages and disadvantages which should be considered when choosing a data type for the primary key. Fallacy 1: "Its easier" An integer data type(tinyint, smallint, int, bigint) is a better artifical key than a GUID because its easier to remember. I'm a firm believer that your artifical primary keys should be opaque gibberish. PK's are an implementation detail which should never be exposed to the user or relied on for business logic. If you want things to come back in an order, add and ORDER BY clause and SortOrder fields. If you want a human-usable look-up add a business key with a unique constraint. If you want to know what order things were inserted into a table add a timestamp. Fallacy 2: "Size Matters" For many applications, the size of the artifical primary key is going to be irrelevant. The particular article which kicked this post off stated repeatedly that joining against an int has better performance than joining against a GUID. In computer science the performance of your algorithm is always a function of the number of data points. This still holds true for databases. Unless your table is very large, the performance difference between an int and a guid probably isnt going to be mesurable let alone noticeable. My personal experience is that the performance becomes an issue when you start having billions of rows in the table. At this point, you should probably start looking to move from int to bigint so the effective space/performance gain isnt as much as you'd think. GUID Advantages: Insert-ability / Mergeability: You can reliably insert guids into tables without key collisions. Database Independence: Saving entities to the database often requires knowing ids. With identity based ids the id must be selected back after every insert. GUIDs can be generated application-side allowing much faster inserts. GUID Disadvantages: Generatability: You can calculate the next id for an integer pk pretty easily in your head but will need a program to generate GUIDs. Solution: "Select top 100 newid() from sysobjects" Fragmentation: most GUID generation algorithms generate pseudo random GUIDs. This can cause inserts into the middle of your clustered index. Solutions: add a default of newsequentialid() or use GuidComb in NHibernate.

    Read the article

  • Did I lost my RAID again?

    - by BarsMonster
    Hi! A little history: 2 years ago I was really excited to find out that mdadm is so powerful so it even can reshape arrays so you can start with a smaller array and the grow it as you need. I've bought 3x1Tb drives and made RAID-5. It was fine for a year. Then I bought 2x more, and tried to reshape to RAID-6 out of 5 drives, and due to some mess with superblock versions, lost all content. Had to rebuild it from scratch, but 2Tb of data were gone. Yesterday I bought 2 more drives, and this time I had everything: properly built array, UPS. I've disabled write intent map, added 2 new drives as a spare and run a command to grow array to 7-disk. It started working, but speed was ridiculously slow, ~100kb/sec. AFter processing first 37Mb at such an amasing speed, one of old HDDs fails. I properly shutdown PC and disconnected failed drive. After bootup it appeared it recreated intent map as it was still in mdadm config, so I removed it from config and rebooted again. Now all I see is that all mdadm processes deadlocks, and don't do anything. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1937 root 20 0 12992 608 444 D 0 0.1 0:00.00 mdadm 2283 root 20 0 12992 852 704 D 0 0.1 0:00.01 mdadm 2287 root 20 0 0 0 0 D 0 0.0 0:00.01 md0_reshape 2288 root 18 -2 12992 820 676 D 0 0.1 0:00.01 mdadm And all I see in mdstat is: $ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid6 sdb1[1] sdg1[4] sdf1[7] sde1[6] sdd1[0] sdc1[5] 2929683456 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [7/6] [UU_UUUU] [>....................] reshape = 0.0% (37888/976561152) finish=567604147.2min speed=0K/sec I've already tried mdadm 2.6.7, 3.1.4 and 3.2 - nothing helps. Did I lost my data again? Any suggestions how can I make it work? OS is Ubuntu Server 10.04.2... PS. Needless to say that data is unaccessible - I cannot mount /dev/md0 as save the most valuable data. You can see my disappointment - the very specific thing I was excited about failed twice taking 5Tb of my data with it.

    Read the article

  • How to load stacking chunks on the fly?

    - by Brettetete
    I'm currently working on an infinite world, mostly inspired by minecraft. A Chunk consists of 16x16x16 blocks. A block(cube) is 1x1x1. This runs very smoothly with a ViewRange of 12 Chunks (12x16) on my computer. Fine. When I change the Chunk height to 256 this becomes - obviously - incredible laggy. So what I basically want to do is stacking chunks. That means my world could be [8,16,8] Chunks large. The question is now how to generate chunks on the fly? At the moment I generate not existing chunks circular around my position (near to far). Since I don't stack chunks yet, this is not very complex. As important side note here: I also want to have biomes, with different min/max height. So in Biome Flatlands the highest layer with blocks would be 8 (8x16) - in Biome Mountains the highest layer with blocks would be 14 (14x16). Just as example. What I could do would be loading 1 Chunk above and below me for example. But here the problem would be, that transitions between different bioms could be larger than one chunk on y. My current chunk loading in action For the completeness here my current chunk loading "algorithm" private IEnumerator UpdateChunks(){ for (int i = 1; i < VIEW_RANGE; i += ChunkWidth) { float vr = i; for (float x = transform.position.x - vr; x < transform.position.x + vr; x += ChunkWidth) { for (float z = transform.position.z - vr; z < transform.position.z + vr; z += ChunkWidth) { _pos.Set(x, 0, z); // no y, yet _pos.x = Mathf.Floor(_pos.x/ChunkWidth)*ChunkWidth; _pos.z = Mathf.Floor(_pos.z/ChunkWidth)*ChunkWidth; Chunk chunk = Chunk.FindChunk(_pos); // If Chunk is already created, continue if (chunk != null) continue; // Create a new Chunk.. chunk = (Chunk) Instantiate(ChunkFab, _pos, Quaternion.identity); } } // Skip to next frame yield return 0; } }

    Read the article

  • Do I have to worry about "error: superfluous RAID member"?

    - by 0xC0000022L
    When running update-grub on the newly installed Ubuntu 12.04 with an older software RAID (md), I get: error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). Generating grub.cfg ... error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). Found linux image: /boot/vmlinuz-3.2.0-24-generic Found initrd image: /boot/initrd.img-3.2.0-24-generic error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). Found linux image: /boot/vmlinuz-3.2.0-23-generic Found initrd image: /boot/initrd.img-3.2.0-23-generic error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). Found memtest86+ image: /boot/memtest86+.bin error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). error: superfluous RAID member (5 found). Found Debian GNU/Linux (5.0.9) on /dev/sdb1 Found Debian GNU/Linux (5.0.9) on /dev/sdc1 done I would be less worried if the message would say warning: ..., but since it says error: ... I'm wondering what the problem is. # cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md2 : active raid1 sdc1[1] sdb1[0] 48829440 blocks [2/2] [UU] md3 : active raid1 sdc2[1] sdb2[0] 263739008 blocks [2/2] [UU] md1 : active raid5 sdg1[3] sdf1[2] sde1[1] sdh1[0] sdi1[4] sdd1[5](S) 1250274304 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU] unused devices: <none> Do I have to worry or is this harmless? btw: disregard the mentioning of Debian 5.0.9, that was the previously installed system and is going to be overwritten. It's on /dev/md2 actually.

    Read the article

  • What does the `dmesg` error: "composite sync not supported" mean?

    - by M. Tibbits
    Question: I see [ 20.473125] composite sync not supported and several such entries when I run dmesg. What do they mean? Background: I'm trying to debug a problem where my laptop won't suspend. Since acpi seems happy and I can suspend easily from the command line, I've turned to tracking down all boot-up errors/warnings. So I run dmesg | grep not and, amongst other shtuff, I get: 728:[ 17.267120] composite sync not supported 733:[ 18.009061] composite sync not supported 740:[ 18.159289] registered panic notifier 749:[ 18.162500] vga16fb: not registering due to another framebuffer present 757:[ 18.598251] composite sync not supported 776:[ 20.473125] composite sync not supported 777:[ 20.932266] composite sync not supported 778:[ 28.350231] composite sync not supported 779:[ 28.924913] composite sync not supported 780:[ 35.480658] composite sync not supported And the full log for the few lines right around that first appearance (line 728) is listed at the bottom of my post (I'd happily include anything else). Any ideas what could be causing this? I've read several sites: Ubuntuforums #1 IRC Chat #1 One post talks about ??Adobe flash?? causing this error? Some others also suggest that it might be an nvidia related problem, but I've got a Dell Latitude D630 with an integrated Intel graphics -- so nvidia isn't the problem. [ 17.207142] phy0: Selected rate control algorithm 'minstrel' [ 17.207833] Registered led device: b43-phy0::tx [ 17.207849] Registered led device: b43-phy0::rx [ 17.207865] Registered led device: b43-phy0::radio [ 17.207927] Broadcom 43xx driver loaded [ Features: PL, Firmware-ID: FW13 ] [ 17.267120] composite sync not supported [ 17.415795] EXT4-fs (sda2): mounted filesystem with ordered data mode [ 17.602131] [drm] initialized overlay support [ 17.620201] input: DualPoint Stick as /devices/platform/i8042/serio1/input/input7 [ 17.641192] input: AlpsPS/2 ALPS DualPoint TouchPad as /devices/platform/i8042/serio1/input/input8 [ 18.009061] composite sync not supported [ 18.106042] pcmcia_socket pcmcia_socket0: cs: IO port probe 0x100-0x3af: clean. [ 18.108115] pcmcia_socket pcmcia_socket0: cs: IO port probe 0x3e0-0x4ff: clean. [ 18.108941] pcmcia_socket pcmcia_socket0: cs: IO port probe 0x820-0x8ff: clean. [ 18.109676] pcmcia_socket pcmcia_socket0: cs: IO port probe 0xc00-0xcf7: clean. [ 18.110356] pcmcia_socket pcmcia_socket0: cs: IO port probe 0xa00-0xaff: clean. [ 18.159286] fb0: inteldrmfb frame buffer device [ 18.159289] registered panic notifier [ 18.160218] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/LNXVIDEO:01/input/input9 [ 18.160286] ACPI: Video Device [VID1] (multi-head: yes rom: no post: no) [ 18.160334] ACPI Warning for \_SB_.PCI0.VID2._DOD: Return Package has no elements (empty) (20090903/nspredef-433) [ 18.160432] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/LNXVIDEO:02/input/input10 [ 18.160491] ACPI: Video Device [VID2] (multi-head: yes rom: no post: no) [ 18.160539] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0 [ 18.162494] vga16fb: initializing [ 18.162497] vga16fb: mapped to 0xc00a0000 [ 18.162500] vga16fb: not registering due to another framebuffer present [ 18.176091] HDA Intel 0000:00:1b.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21 [ 18.176123] HDA Intel 0000:00:1b.0: setting latency timer to 64 [ 18.285752] input: HDA Digital PCBeep as /devices/pci0000:00/0000:00:1b.0/input/input11 [ 18.312497] input: HDA Intel Mic at Ext Left Jack as /devices/pci0000:00/0000:00:1b.0/sound/card0/input12 [ 18.312586] input: HDA Intel HP Out at Ext Left Jack as /devices/pci0000:00/0000:00:1b.0/sound/card0/input13 [ 18.328043] usbcore: registered new interface driver ndiswrapper [ 18.460909] Console: switching to colour frame buffer device 180x56 [ 18.598251] composite sync not supported

    Read the article

< Previous Page | 169 170 171 172 173 174 175 176 177 178 179 180  | Next Page >