Search Results

Search found 9014 results on 361 pages for 'space scenery'.

Page 65/361 | < Previous Page | 61 62 63 64 65 66 67 68 69 70 71 72 | Next Page >

iPad and User Assistance

- by ultan o'broin

What possibilities does the iPad over for user assistance in the enterprise space? We will research the possibilities but I can see a number of possibilities already for remote workers who need access to trouble-shooting information on-site, implementers who need reference information and diagrams, business analysts or technical users accessing reports and dashboards for metrics or issues, functional users who need org charts and other data visualizations, and so on. It could also open up more possibilities for collaborative problem solving. User assistance content can take advantage of the device's superb display, graphics capability, connectivity, and long battery life. The possibility of opening up more innovative user assistance solutions (such as comics) is an exciting one for everyone in the UX space. Aligned to this possibility we need to research how users would use the device as they work.

Read the article
Understanding G1 GC Logs

- by poonam

The purpose of this post is to explain the meaning of GC logs generated with some tracing and diagnostic options for G1 GC. We will take a look at the output generated with PrintGCDetails which is a product flag and provides the most detailed level of information. Along with that, we will also look at the output of two diagnostic flags that get enabled with -XX:+UnlockDiagnosticVMOptions option - G1PrintRegionLivenessInfo that prints the occupancy and the amount of space used by live objects in each region at the end of the marking cycle and G1PrintHeapRegions that provides detailed information on the heap regions being allocated and reclaimed. We will be looking at the logs generated with JDK 1.7.0_04 using these options. Option -XX:+PrintGCDetails Here's a sample log of G1 collection generated with PrintGCDetails. 0.522: [GC pause (young), 0.15877971 secs] [Parallel Time: 157.1 ms] [GC Worker Start (ms): 522.1 522.2 522.2 522.2 Avg: 522.2, Min: 522.1, Max: 522.2, Diff: 0.1] [Ext Root Scanning (ms): 1.6 1.5 1.6 1.9 Avg: 1.7, Min: 1.5, Max: 1.9, Diff: 0.4] [Update RS (ms): 38.7 38.8 50.6 37.3 Avg: 41.3, Min: 37.3, Max: 50.6, Diff: 13.3] [Processed Buffers : 2 2 3 2 Sum: 9, Avg: 2, Min: 2, Max: 3, Diff: 1] [Scan RS (ms): 9.9 9.7 0.0 9.7 Avg: 7.3, Min: 0.0, Max: 9.9, Diff: 9.9] [Object Copy (ms): 106.7 106.8 104.6 107.9 Avg: 106.5, Min: 104.6, Max: 107.9, Diff: 3.3] [Termination (ms): 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] [Termination Attempts : 1 4 4 6 Sum: 15, Avg: 3, Min: 1, Max: 6, Diff: 5] [GC Worker End (ms): 679.1 679.1 679.1 679.1 Avg: 679.1, Min: 679.1, Max: 679.1, Diff: 0.1] [GC Worker (ms): 156.9 157.0 156.9 156.9 Avg: 156.9, Min: 156.9, Max: 157.0, Diff: 0.1] [GC Worker Other (ms): 0.3 0.3 0.3 0.3 Avg: 0.3, Min: 0.3, Max: 0.3, Diff: 0.0] [Clear CT: 0.1 ms] [Other: 1.5 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.3 ms] [Ref Enq: 0.0 ms] [Free CSet: 0.3 ms] [Eden: 12M(12M)->0B(10M) Survivors: 0B->2048K Heap: 13M(64M)->9739K(64M)] [Times: user=0.59 sys=0.02, real=0.16 secs] This is the typical log of an Evacuation Pause (G1 collection) in which live objects are copied from one set of regions (young OR young+old) to another set. It is a stop-the-world activity and all the application threads are stopped at a safepoint during this time. This pause is made up of several sub-tasks indicated by the indentation in the log entries. Here's is the top most line that gets printed for the Evacuation Pause. 0.522: [GC pause (young), 0.15877971 secs] This is the highest level information telling us that it is an Evacuation Pause that started at 0.522 secs from the start of the process, in which all the regions being evacuated are Young i.e. Eden and Survivor regions. This collection took 0.15877971 secs to finish. Evacuation Pauses can be mixed as well. In which case the set of regions selected include all of the young regions as well as some old regions. 1.730: [GC pause (mixed), 0.32714353 secs] Let's take a look at all the sub-tasks performed in this Evacuation Pause. [Parallel Time: 157.1 ms] Parallel Time is the total elapsed time spent by all the parallel GC worker threads. The following lines correspond to the parallel tasks performed by these worker threads in this total parallel time, which in this case is 157.1 ms. [GC Worker Start (ms): 522.1 522.2 522.2 522.2Avg: 522.2, Min: 522.1, Max: 522.2, Diff: 0.1] The first line tells us the start time of each of the worker thread in milliseconds. The start times are ordered with respect to the worker thread ids – thread 0 started at 522.1ms and thread 1 started at 522.2ms from the start of the process. The second line tells the Avg, Min, Max and Diff of the start times of all of the worker threads. [Ext Root Scanning (ms): 1.6 1.5 1.6 1.9 Avg: 1.7, Min: 1.5, Max: 1.9, Diff: 0.4] This gives us the time spent by each worker thread scanning the roots (globals, registers, thread stacks and VM data structures). Here, thread 0 took 1.6ms to perform the root scanning task and thread 1 took 1.5 ms. The second line clearly shows the Avg, Min, Max and Diff of the times spent by all the worker threads. [Update RS (ms): 38.7 38.8 50.6 37.3 Avg: 41.3, Min: 37.3, Max: 50.6, Diff: 13.3] Update RS gives us the time each thread spent in updating the Remembered Sets. Remembered Sets are the data structures that keep track of the references that point into a heap region. Mutator threads keep changing the object graph and thus the references that point into a particular region. We keep track of these changes in buffers called Update Buffers. The Update RS sub-task processes the update buffers that were not able to be processed concurrently, and updates the corresponding remembered sets of all regions. [Processed Buffers : 2 2 3 2Sum: 9, Avg: 2, Min: 2, Max: 3, Diff: 1] This tells us the number of Update Buffers (mentioned above) processed by each worker thread. [Scan RS (ms): 9.9 9.7 0.0 9.7 Avg: 7.3, Min: 0.0, Max: 9.9, Diff: 9.9] These are the times each worker thread had spent in scanning the Remembered Sets. Remembered Set of a region contains cards that correspond to the references pointing into that region. This phase scans those cards looking for the references pointing into all the regions of the collection set. [Object Copy (ms): 106.7 106.8 104.6 107.9 Avg: 106.5, Min: 104.6, Max: 107.9, Diff: 3.3] These are the times spent by each worker thread copying live objects from the regions in the Collection Set to the other regions. [Termination (ms): 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] Termination time is the time spent by the worker thread offering to terminate. But before terminating, it checks the work queues of other threads and if there are still object references in other work queues, it tries to steal object references, and if it succeeds in stealing a reference, it processes that and offers to terminate again. [Termination Attempts : 1 4 4 6 Sum: 15, Avg: 3, Min: 1, Max: 6, Diff: 5] This gives the number of times each thread has offered to terminate. [GC Worker End (ms): 679.1 679.1 679.1 679.1 Avg: 679.1, Min: 679.1, Max: 679.1, Diff: 0.1] These are the times in milliseconds at which each worker thread stopped. [GC Worker (ms): 156.9 157.0 156.9 156.9 Avg: 156.9, Min: 156.9, Max: 157.0, Diff: 0.1] These are the total lifetimes of each worker thread. [GC Worker Other (ms): 0.3 0.3 0.3 0.3Avg: 0.3, Min: 0.3, Max: 0.3, Diff: 0.0] These are the times that each worker thread spent in performing some other tasks that we have not accounted above for the total Parallel Time. [Clear CT: 0.1 ms] This is the time spent in clearing the Card Table. This task is performed in serial mode. [Other: 1.5 ms] Time spent in the some other tasks listed below. The following sub-tasks (which individually may be parallelized) are performed serially. [Choose CSet: 0.0 ms] Time spent in selecting the regions for the Collection Set. [Ref Proc: 0.3 ms] Total time spent in processing Reference objects. [Ref Enq: 0.0 ms] Time spent in enqueuing references to the ReferenceQueues. [Free CSet: 0.3 ms] Time spent in freeing the collection set data structure. [Eden: 12M(12M)->0B(13M) Survivors: 0B->2048K Heap: 14M(64M)->9739K(64M)] This line gives the details on the heap size changes with the Evacuation Pause. This shows that Eden had the occupancy of 12M and its capacity was also 12M before the collection. After the collection, its occupancy got reduced to 0 since everything is evacuated/promoted from Eden during a collection, and its target size grew to 13M. The new Eden capacity of 13M is not reserved at this point. This value is the target size of the Eden. Regions are added to Eden as the demand is made and when the added regions reach to the target size, we start the next collection. Similarly, Survivors had the occupancy of 0 bytes and it grew to 2048K after the collection. The total heap occupancy and capacity was 14M and 64M receptively before the collection and it became 9739K and 64M after the collection. Apart from the evacuation pauses, G1 also performs concurrent-marking to build the live data information of regions. 1.416: [GC pause (young) (initial-mark), 0.62417980 secs] ….... 2.042: [GC concurrent-root-region-scan-start] 2.067: [GC concurrent-root-region-scan-end, 0.0251507] 2.068: [GC concurrent-mark-start] 3.198: [GC concurrent-mark-reset-for-overflow] 4.053: [GC concurrent-mark-end, 1.9849672 sec] 4.055: [GC remark 4.055: [GC ref-proc, 0.0000254 secs], 0.0030184 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 4.088: [GC cleanup 117M->106M(138M), 0.0015198 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 4.090: [GC concurrent-cleanup-start] 4.091: [GC concurrent-cleanup-end, 0.0002721] The first phase of a marking cycle is Initial Marking where all the objects directly reachable from the roots are marked and this phase is piggy-backed on a fully young Evacuation Pause. 2.042: [GC concurrent-root-region-scan-start] This marks the start of a concurrent phase that scans the set of root-regions which are directly reachable from the survivors of the initial marking phase. 2.067: [GC concurrent-root-region-scan-end, 0.0251507] End of the concurrent root region scan phase and it lasted for 0.0251507 seconds. 2.068: [GC concurrent-mark-start] Start of the concurrent marking at 2.068 secs from the start of the process. 3.198: [GC concurrent-mark-reset-for-overflow] This indicates that the global marking stack had became full and there was an overflow of the stack. Concurrent marking detected this overflow and had to reset the data structures to start the marking again. 4.053: [GC concurrent-mark-end, 1.9849672 sec] End of the concurrent marking phase and it lasted for 1.9849672 seconds. 4.055: [GC remark 4.055: [GC ref-proc, 0.0000254 secs], 0.0030184 secs] This corresponds to the remark phase which is a stop-the-world phase. It completes the left over marking work (SATB buffers processing) from the previous phase. In this case, this phase took 0.0030184 secs and out of which 0.0000254 secs were spent on Reference processing. 4.088: [GC cleanup 117M->106M(138M), 0.0015198 secs] Cleanup phase which is again a stop-the-world phase. It goes through the marking information of all the regions, computes the live data information of each region, resets the marking data structures and sorts the regions according to their gc-efficiency. In this example, the total heap size is 138M and after the live data counting it was found that the total live data size dropped down from 117M to 106M. 4.090: [GC concurrent-cleanup-start] This concurrent cleanup phase frees up the regions that were found to be empty (didn't contain any live data) during the previous stop-the-world phase. 4.091: [GC concurrent-cleanup-end, 0.0002721] Concurrent cleanup phase took 0.0002721 secs to free up the empty regions. Option -XX:G1PrintRegionLivenessInfo Now, let's look at the output generated with the flag G1PrintRegionLivenessInfo. This is a diagnostic option and gets enabled with -XX:+UnlockDiagnosticVMOptions. G1PrintRegionLivenessInfo prints the live data information of each region during the Cleanup phase of the concurrent-marking cycle. 26.896: [GC cleanup ### PHASE Post-Marking @ 26.896### HEAP committed: 0x02e00000-0x0fe00000 reserved: 0x02e00000-0x12e00000 region-size: 1048576 Cleanup phase of the concurrent-marking cycle started at 26.896 secs from the start of the process and this live data information is being printed after the marking phase. Committed G1 heap ranges from 0x02e00000 to 0x0fe00000 and the total G1 heap reserved by JVM is from 0x02e00000 to 0x12e00000. Each region in the G1 heap is of size 1048576 bytes. ### type address-range used prev-live next-live gc-eff### (bytes) (bytes) (bytes) (bytes/ms) This is the header of the output that tells us about the type of the region, address-range of the region, used space in the region, live bytes in the region with respect to the previous marking cycle, live bytes in the region with respect to the current marking cycle and the GC efficiency of that region. ### FREE 0x02e00000-0x02f00000 0 0 0 0.0 This is a Free region. ### OLD 0x02f00000-0x03000000 1048576 1038592 1038592 0.0 Old region with address-range from 0x02f00000 to 0x03000000. Total used space in the region is 1048576 bytes, live bytes as per the previous marking cycle are 1038592 and live bytes with respect to the current marking cycle are also 1038592. The GC efficiency has been computed as 0. ### EDEN 0x03400000-0x03500000 20992 20992 20992 0.0 This is an Eden region. ### HUMS 0x0ae00000-0x0af00000 1048576 1048576 1048576 0.0### HUMC 0x0af00000-0x0b000000 1048576 1048576 1048576 0.0### HUMC 0x0b000000-0x0b100000 1048576 1048576 1048576 0.0### HUMC 0x0b100000-0x0b200000 1048576 1048576 1048576 0.0### HUMC 0x0b200000-0x0b300000 1048576 1048576 1048576 0.0### HUMC 0x0b300000-0x0b400000 1048576 1048576 1048576 0.0### HUMC 0x0b400000-0x0b500000 1001480 1001480 1001480 0.0 These are the continuous set of regions called Humongous regions for storing a large object. HUMS (Humongous starts) marks the start of the set of humongous regions and HUMC (Humongous continues) tags the subsequent regions of the humongous regions set. ### SURV 0x09300000-0x09400000 16384 16384 16384 0.0 This is a Survivor region. ### SUMMARY capacity: 208.00 MB used: 150.16 MB / 72.19 % prev-live: 149.78 MB / 72.01 % next-live: 142.82 MB / 68.66 % At the end, a summary is printed listing the capacity, the used space and the change in the liveness after the completion of concurrent marking. In this case, G1 heap capacity is 208MB, total used space is 150.16MB which is 72.19% of the total heap size, live data in the previous marking was 149.78MB which was 72.01% of the total heap size and the live data as per the current marking is 142.82MB which is 68.66% of the total heap size. Option -XX:+G1PrintHeapRegions G1PrintHeapRegions option logs the regions related events when regions are committed, allocated into or are reclaimed. COMMIT/UNCOMMIT events G1HR COMMIT [0x6e900000,0x6ea00000]G1HR COMMIT [0x6ea00000,0x6eb00000] Here, the heap is being initialized or expanded and the region (with bottom: 0x6eb00000 and end: 0x6ec00000) is being freshly committed. COMMIT events are always generated in order i.e. the next COMMIT event will always be for the uncommitted region with the lowest address. G1HR UNCOMMIT [0x72700000,0x72800000]G1HR UNCOMMIT [0x72600000,0x72700000] Opposite to COMMIT. The heap got shrunk at the end of a Full GC and the regions are being uncommitted. Like COMMIT, UNCOMMIT events are also generated in order i.e. the next UNCOMMIT event will always be for the committed region with the highest address. GC Cycle events G1HR #StartGC 7G1HR CSET 0x6e900000G1HR REUSE 0x70500000G1HR ALLOC(Old) 0x6f800000G1HR RETIRE 0x6f800000 0x6f821b20G1HR #EndGC 7 This shows start and end of an Evacuation pause. This event is followed by a GC counter tracking both evacuation pauses and Full GCs. Here, this is the 7th GC since the start of the process. G1HR #StartFullGC 17G1HR UNCOMMIT [0x6ed00000,0x6ee00000]G1HR POST-COMPACTION(Old) 0x6e800000 0x6e854f58G1HR #EndFullGC 17 Shows start and end of a Full GC. This event is also followed by the same GC counter as above. This is the 17th GC since the start of the process. ALLOC events G1HR ALLOC(Eden) 0x6e800000 The region with bottom 0x6e800000 just started being used for allocation. In this case it is an Eden region and allocated into by a mutator thread. G1HR ALLOC(StartsH) 0x6ec00000 0x6ed00000G1HR ALLOC(ContinuesH) 0x6ed00000 0x6e000000 Regions being used for the allocation of Humongous object. The object spans over two regions. G1HR ALLOC(SingleH) 0x6f900000 0x6f9eb010 Single region being used for the allocation of Humongous object. G1HR COMMIT [0x6ee00000,0x6ef00000]G1HR COMMIT [0x6ef00000,0x6f000000]G1HR COMMIT [0x6f000000,0x6f100000]G1HR COMMIT [0x6f100000,0x6f200000]G1HR ALLOC(StartsH) 0x6ee00000 0x6ef00000G1HR ALLOC(ContinuesH) 0x6ef00000 0x6f000000G1HR ALLOC(ContinuesH) 0x6f000000 0x6f100000G1HR ALLOC(ContinuesH) 0x6f100000 0x6f102010 Here, Humongous object allocation request could not be satisfied by the free committed regions that existed in the heap, so the heap needed to be expanded. Thus new regions are committed and then allocated into for the Humongous object. G1HR ALLOC(Old) 0x6f800000 Old region started being used for allocation during GC. G1HR ALLOC(Survivor) 0x6fa00000 Region being used for copying old objects into during a GC. Note that Eden and Humongous ALLOC events are generated outside the GC boundaries and Old and Survivor ALLOC events are generated inside the GC boundaries. Other Events G1HR RETIRE 0x6e800000 0x6e87bd98 Retire and stop using the region having bottom 0x6e800000 and top 0x6e87bd98 for allocation. Note that most regions are full when they are retired and we omit those events to reduce the output volume. A region is retired when another region of the same type is allocated or we reach the start or end of a GC(depending on the region). So for Eden regions: For example: 1. ALLOC(Eden) Foo2. ALLOC(Eden) Bar3. StartGC At point 2, Foo has just been retired and it was full. At point 3, Bar was retired and it was full. If they were not full when they were retired, we will have a RETIRE event: 1. ALLOC(Eden) Foo2. RETIRE Foo top3. ALLOC(Eden) Bar4. StartGC G1HR CSET 0x6e900000 Region (bottom: 0x6e900000) is selected for the Collection Set. The region might have been selected for the collection set earlier (i.e. when it was allocated). However, we generate the CSET events for all regions in the CSet at the start of a GC to make sure there's no confusion about which regions are part of the CSet. G1HR POST-COMPACTION(Old) 0x6e800000 0x6e839858 POST-COMPACTION event is generated for each non-empty region in the heap after a full compaction. A full compaction moves objects around, so we don't know what the resulting shape of the heap is (which regions were written to, which were emptied, etc.). To deal with this, we generate a POST-COMPACTION event for each non-empty region with its type (old/humongous) and the heap boundaries. At this point we should only have Old and Humongous regions, as we have collapsed the young generation, so we should not have eden and survivors. POST-COMPACTION events are generated within the Full GC boundary. G1HR CLEANUP 0x6f400000G1HR CLEANUP 0x6f300000G1HR CLEANUP 0x6f200000 These regions were found empty after remark phase of Concurrent Marking and are reclaimed shortly afterwards. G1HR #StartGC 5G1HR CSET 0x6f400000G1HR CSET 0x6e900000G1HR REUSE 0x6f800000 At the end of a GC we retire the old region we are allocating into. Given that its not full, we will carry on allocating into it during the next GC. This is what REUSE means. In the above case 0x6f800000 should have been the last region with an ALLOC(Old) event during the previous GC and should have been retired before the end of the previous GC. G1HR ALLOC-FORCE(Eden) 0x6f800000 A specialization of ALLOC which indicates that we have reached the max desired number of the particular region type (in this case: Eden), but we decided to allocate one more. Currently it's only used for Eden regions when we extend the young generation because we cannot do a GC as the GC-Locker is active. G1HR EVAC-FAILURE 0x6f800000 During a GC, we have failed to evacuate an object from the given region as the heap is full and there is no space left to copy the object. This event is generated within GC boundaries and exactly once for each region from which we failed to evacuate objects. When Heap Regions are reclaimed ? It is also worth mentioning when the heap regions in the G1 heap are reclaimed. All regions that are in the CSet (the ones that appear in CSET events) are reclaimed at the end of a GC. The exception to that are regions with EVAC-FAILURE events. All regions with CLEANUP events are reclaimed. After a Full GC some regions get reclaimed (the ones from which we moved the objects out). But that is not shown explicitly, instead the non-empty regions that are left in the heap are printed out with the POST-COMPACTION events.

Read the article
How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article
Collaboration Nation Call to Action! Calling All SQL Server Bloggers and Twitterers

- by KKline

Help build a brand new shiny centralized reference with a full compendium to everyone blog & twitter hyperlinks who are active in the SQL Server space...(read more)

Read the article
OCS 2007 R2 User Properties Error Message

- by BWCA

When I attempted to configure one of our user’s Meeting settings using the Microsoft Office Communications Server 2007 R2 Administration Tool I received an Validation failed – Validation failed with HRESULT = 0XC3EC7E02 dialog box error message. I received the same error message when I tried to configure the user’s Telephony and Other settings. Using ADSI Edit, I compared the settings of an user that I had no problems configuring and the user that I had problems configuring. For the user I had problems configuring, I noticed a trailing space after the last phone number digit for the user’s msRTCSIP-Line attribute. After I removed the trailing space for the attribute and waited for Active Directory replication to complete, I was able to configure the user’s Meeting settings (and Telephony/Other settings) without any problems. If you get the error message, check your user’s msRTCSIP-xxxxx attributes in Active Directory using ADSI Edit for any trailing spaces, typos, or any other mistakes.

Read the article
Temporarily share/deploy a python (flask) application

- by Jeff

Goal Temporarily (1 month?) deploy/share a python (flask) web app without expensive/complex hosting. More info I've developed a basic mobile web app for the non-profit I work for. It's written in python and uses flask as its framework. I'd like to share this with other employees and beta testers (<25 people). Ideally, I could get some sort of simple hosting space/service and push regular updates to it while we test and iterate on this app. Think something along the lines of dropbox, which of course would not work for this purpose. We do have a website, and hosting services for it, but I'm concerned about using this resource as our website is mission critical and this app is very much pre-alpha at this point. Options I've researched / considered Self host from local machine/network (slow, unreliable) Purchase hosting space (with limited non-profit resources, I'm concerned this is overkill) Using our current web server / hosting (not appropriate for testing) Thanks very much for your time!

Read the article
Why Google skips page title

- by Bob

I have no idea why this is happening. An example http://www.londonofficespace.com/ofdj17062004934429t.htm Title tag is: Unfurnished Office Space Wimbledon – Serviced Office on Lombard Road SW19 But is indexed as: Lombard Road – SW19 - London Office Space If you look in the source code and search for this portion ‘Lombard Road – SW19’ You then find that it's next to an office image alt=’Lombard Road – SW19’. The only thing I could think of is that the spider somehow ‘skips’ our title tag and grabs this bit, and then inserts the name of the site (but WHY?) Is there anything I can do with this? or is this a Google behaviour?

Read the article
Problem with Numlock light and remapping keyboard

- by ansidev

I upgraded to Ubuntu 13.10. And there are two problem: Numlock is on. But after I press Ctrl+Space (hotkey for IBus, (default in 13.10 is Super+Space)), Numlock is off and I must press Numlock button twice to enable Numlock. About remapping keyboard, because my keyboard have some broken key, so I am using xmodmap to remap my keyboard (config file for xmodmap is $HOME/.xmodmap). But when I switch keyboard layout (I mentioned above), everything changed to default, and I must run xmodmap .xmodmap to remap keyboard again. When using Ubuntu 13.04, everything is good. How to solve my problems? UPDATE: 1. I am using two keyboard layout: English (US) and Unikey (from ibus-unikey package). 2. If I change key board layout using menu on Unity panel, no first problem.

Read the article
New Computer

- by Matt Christian

Last night I received my computer that was ordered with my tax return money. Here are the specs of my old computer: - Pentium 4 Processor - 3-4 GB RAM - ~256 GB HDD space (2 drives) - nVidia card (AGP 8x) Sorry I can't be more specific, my memory is gone :p Here are the new computer specs (mostly): - 2.8ghz Pentium i7 quadcore - 6 GB RAM - 1 TB HDD space (1 drive) - 1 GB Radeon card (PCI-X) I also got a new monitor (22" Asus with HDMI) so will be using my 19" widescreen as a secondary monitor. If I remember I'll hop on here and post the specifics later on...

Read the article
How can I change my cursor behavior?

- by Doug Clement

When I am typing, my mouse cursor, if left on the text, will eventually auto-click in whatever space in the the text box I happen to be, causing me to type in the middle of a sentence. Also, the cursor in the text box will frequently stop mid-word and the screen will scroll down all of a sudden when pressing the space bar while typing. My question is, how do I change this behavior because it is driving me absolutely bat crap crazy. I have an Acer Aspire One D257 Netbook. I am not sure if it's a Xubuntu problem because it does this while I am using Windows 7 too. Any help would be nice, thanks!

Read the article
Solaris: What comes next?

- by alanc

As you probably know by now, a few months ago, we released Solaris 11 after years of development. That of course means we now need to figure out what comes next - if Solaris 11 is “The First Cloud OS”, then what do we need to make future releases of Solaris be, to be modern and competitive when they're released? So we've been having planning and brainstorming meetings, and I've captured some notes here from just one of those we held a couple weeks ago with a number of the Silicon Valley based engineers. Now before someone sees an idea here and calls their product rep wanting to know what's up, please be warned what follows are rough ideas, and as I'll discuss later, none of them have any committment, schedule, working code, or even plan for integration in any possible future product at this time. (Please don't make me force you to read the full Oracle future product disclaimer here, you should know it by heart already from the front of every Oracle product slide deck.) To start with, we did some background research, looking at ideas from other Oracle groups, and competitive OS'es. We examined what was hot in the technology arena and where the interesting startups were heading. We then looked at Solaris to see where we could apply those ideas. Making Network Admins into Socially Networking Admins We all know an admin who has grumbled about being the only one stuck late at work to fix a problem on the server, or having to work the weekend alone to do scheduled maintenance. But admins are humans (at least most are), and crave companionship and community with their fellow humans. And even when they're alone in the server room, they're never far from a network connection, allowing access to the wide world of wonders on the Internet. Our solution here is not building a new social network - there's enough of those already, and Oracle even has its own Oracle Mix social network already. What we proposed is integrating Solaris features to help engage our system admins with these social networks, building community and bringing them recognition in the workplace, using achievement recognition systems as found in many popular gaming platforms. For instance, if you had a Facebook account, and a group of admin friends there, you could register it with our Social Network Utility For Facebook, and then your friends might see: Alan earned the achievement Critically Patched (April 2012) for patching all his servers. Matt is only at 50% - encourage him to complete this achievement today! To avoid any undue risk of advertising who has unpatched servers that are easier targets for hackers to break into, this information would be tightly protected via Facebook's world-renowned privacy settings to avoid it falling into the wrong hands. A related form of gamification we considered was replacing simple certfications with role-playing-game-style Experience Levels. Instead of just knowing an admin passed a test establishing a given level of competency, these would provide recruiters with a more detailed level of how much real-world experience an admin has. Achievements such as the one above would feed into it, but larger numbers of experience points would be gained by tougher or more critical tasks - such as recovering a down system, or migrating a service to a new platform. (As long as it was an Oracle platform of course - migrating to an HP or IBM platform would cause the admin to lose points with us.) Unfortunately, we couldn't figure out a good way to prevent (if you will) “gaming” the system. For instance, a disgruntled admin might decide to start ignoring warnings from FMA that a part is beginning to fail or skip preventative maintenance, in the hopes that they'd cause a catastrophic failure to earn more points for bolstering their resume as they look for a job elsewhere, and not worrying about the effect on your business of a mission critical server going down. More Z's for ZFS Our suggested new feature for ZFS was inspired by the worlds most successful Z-startup of all time: Zynga. Using the Social Network Utility For Facebook described above, we'd tie it in with ZFS monitoring to help you out when you find yourself in a jam needing more disk space than you have, and can't wait a month to get a purchase order through channels to buy more. Instead with the click of a button you could post to your group: Alan can't find any space in his server farm! Can you help? Friends could loan you some space on their connected servers for a few weeks, knowing that you'd return the favor when needed. ZFS would create a new filesystem for your use on their system, and securely share it with your system using Kerberized NFS. If none of your friends have space, then you could buy temporary use space in small increments at affordable rates right there in Facebook, using your Facebook credits, and then file an expense report later, after the urgent need has passed. Universal Single Sign On One thing all the engineers agreed on was that we still had far too many "Single" sign ons to deal with in our daily work. On the web, every web site used to have its own password database, forcing us to hope we could remember what login name was still available on each site when we signed up, and which unique password we came up with to avoid having to disclose our other passwords to a new site. In recent years, the web services world has finally been reducing the number of logins we have to manage, with many services allowing you to login using your identity from Google, Twitter or Facebook. So we proposed following their lead, introducing PAM modules for web services - no more would you have to type in whatever login name IT assigned and try to remember the password you chose the last time password aging forced you to change it - you'd simply choose which web service you wanted to authenticate against, and would login to your Solaris account upon reciept of a cookie from their identity service. Pinning notes to the cloud We also all noted that we all have our own pile of notes we keep in our daily work - in text files in our home directory, in notebooks we carry around, on white boards in offices and common areas, on sticky notes on our monitors, or on scraps of paper pinned to our bulletin boards. The contents of the notes vary, some are things just for us, some are useful for our groups, some we would share with the world. For instance, when our group moved to a new building a couple years ago, we had a white board in the hallway listing all the NIS & DNS servers, subnets, and other network configuration information we needed to set up our Solaris machines after the move. Similarly, as Solaris 11 was finishing and we were all learning the new network configuration commands, we shared notes in wikis and e-mails with our fellow engineers. Users may also remember one of the popular features of Sun's old BigAdmin site was a section for sharing scripts and tips such as these. Meanwhile, the online "pin board" at Pinterest is taking the web by storm. So we thought, why not mash those up to solve this problem? We proposed a new BigAddPin site where users could “pin” notes, command snippets, configuration information, and so on. For instance, once they had worked out the ideal Automated Installation manifest for their app server, they could pin it up to share with the rest of their group, or choose to make it public as an example for the world. Localized data, such as our group's notes on the servers for our subnet, could be shared only to users connecting from that subnet. And notes that they didn't want others to see at all could be marked private, such as the list of phone numbers to call for late night pizza delivery to the machine room, the birthdays and anniversaries they can never remember but would be sleeping on the couch if they forgot, or the list of automatically generated completely random, impossible to remember root passwords to all their servers. For greater integration with Solaris, we'd put support right into the command shells — redirect output to a pinned note, set your path to include pinned notes as scripts you can run, or bring up your recent shell history and pin a set of commands to save for the next time you need to remember how to do that operation. Location service for Solaris servers A longer term plan would involve convincing the hardware design groups to put GPS locators with wireless transmitters in future server designs. This would help both admins and service personnel trying to find servers in todays massive data centers, and could feed into location presence apps to help show potential customers that while they may not see many Solaris machines on the desktop any more, they are all around. For instance, while walking down Wall Street it might show “There are over 2000 Solaris computers in this block.” [Note: this proposal was made before the recent media coverage of a location service aggregrator app with less noble intentions, and in hindsight, we failed to consider what happens when such data similarly falls into the wrong hands. We certainly wouldn't want our app to be misinterpreted as “There are over $20 million dollars of SPARC servers in this building, waiting for you to steal them.” so it's probably best it was rejected.] Harnessing the power of the GPU for Security Most modern OS'es make use of the widespread availability of high powered GPU hardware in today's computers, with desktop environments requiring 3-D graphics acceleration, whether in Ubuntu Unity, GNOME Shell on Fedora, or Aero Glass on Windows, but we haven't yet made Solaris fully take advantage of this, beyond our basic offering of Compiz on the desktop. Meanwhile, more businesses are interested in increasing security by using biometric authentication, but must also comply with laws in many countries preventing discrimination against employees with physical limations such as missing eyes or fingers, not to mention the lost productivity when employees can't login due to tinted contacts throwing off a retina scan or a paper cut changing their fingerprint appearance until it heals. Fortunately, the two groups considering these problems put their heads together and found a common solution, using 3D technology to enable authentication using the one body part all users are guaranteed to have - pam_phrenology.so, a new PAM module that uses an array USB attached web cams (or just one if the user is willing to spin their chair during login) to take pictures of the users head from all angles, create a 3D model and compare it to the one in the authentication database. While Mythbusters has shown how easy it can be to fool common fingerprint scanners, we have not yet seen any evidence that people can impersonate the shape of another user's cranium, no matter how long they spend beating their head against the wall to reshape it. This could possibly be extended to group users, using modern versions of some of the older phrenological studies, such as giving all users with long grey beards access to the System Architect role, or automatically placing users with pointy spikes in their hair into an easy use mode. Unfortunately, there are still some unsolved technical challenges we haven't figured out how to overcome. Currently, a visit to the hair salon causes your existing authentication to expire, and some users have found that shaving their heads is the only way to avoid bad hair days becoming bad login days. Reaction to these ideas After gathering all our notes on these ideas from the engineering brainstorming meeting, we took them in to present to our management. Unfortunately, most of their reaction cannot be printed here, and they chose not to accept any of these ideas as they were, but they did have some feedback for us to consider as they sent us back to the drawing board. They strongly suggested our ideas would be better presented if we weren't trying to decipher ink blotches that had been smeared by the condensation when we put our pint glasses on the napkins we were taking notes on, and to that end let us know they would not be approving any more engineering offsites in Irish themed pubs on the Friday of a Saint Patrick's Day weekend. (Hopefully they mean that situation specifically and aren't going to deny the funding for travel to this year's X.Org Developer's Conference just because it happens to be in Bavaria and ending on the Friday of the weekend Oktoberfest starts.) They recommended our research techniques could be improved over just sitting around reading blogs and checking our Facebook, Twitter, and Pinterest accounts, such as considering input from alternate viewpoints on topics such as gamification. They also mentioned that Oracle hadn't fully adopted some of Sun's common practices and we might have to try harder to get those to be accepted now that we are one unified company. So as I said at the beginning, don't pester your sales rep just yet for any of these, since they didn't get approved, but if you have better ideas, pass them on and maybe they'll get into our next batch of planning.

Read the article
scritp to create automatically ext4 and swap in unallocated diskspace

- by user285589

i've to install a number of machines. Some machines have windows 7 installed. Some machines not. The machines have 0 or 2 or 3 partitions. Every machine has enough free diskspace (20 to 250 GB) I installed an "golden client" and build an tar archiv of this client. Now, every client boots up a small linux via pxe, and run a script. This script should create a ext4 and a swap partition using the whole free space. After this, mount the ext4-partition, copy tar, chroot, and so on. The problem still is: I can create partitions using fdisk. But how can i figure out the partion number of the new partition. Do i have to mount /dev/sda3 or /dev/sda1? Someone an idea? Further question: How can i figure out, if the is unallocated space, and how much it is? Thanks

Read the article
How do I list installed software with the installed size?

- by Lewis Goddard

I would like to have a list the installed software on my machine, with the disk space consumed by them alongside. I would prefer to be able to order by largest/smallest, but that is not a necessity. I am the sort of person who will install software to try it, and never clean up after myself. As a result, my 7GB (Windows and my Data are on separate partitions, as well as a swap area) Ubuntu 11.04 partition is suffering, and has started regularly showing warning messages. I have cleaned my browser cache, as well as everything under Package Cleaner in Ubuntu Tweak, and am left with 149.81 MB off free space.

Read the article
Is there any difference between storing textures and baked lighting for environment meshes?

- by Ben Hymers

I assume that when texturing environments, one or several textures will be used, and the UVs of the environment geometry will likely overlap on these textures, so that e.g. a tiling brick texture can be used by many parts of the environment, rather than UV unwrapping the entire thing, and having several areas of the texture be identical. If my assumption is wrong, please let me know! Now, when thinking about baking lighting, clearly this can't be done the same way - lighting in general will be unique to every face so the environment must be UV unwrapped without overlap, and lighting must be baked onto unique areas of one or several textures, to give each surface its own texture space to store its lighting. My questions are: Have I got this wrong? If so, how? Isn't baking lighting going to use a lot of texture space? Will the geometry need two UV sets, one used for the colour/normal texture and one for the lighting texture? Anything else you'd like to add? :)

Read the article
This Week in Geek History: Zelda Turns 25, Birth of the Printing Press, and the Unveiling of ENIAC

- by Jason Fitzpatrick

Every week we bring you interesting highlights from the history of geekdom. This week we take a look at The Legend of Zelda’s 25th anniversary, the Gutenberg press, and the unveiling of primitive super computer ENIAC. Latest Features How-To Geek ETC Should You Delete Windows 7 Service Pack Backup Files to Save Space? What Can Super Mario Teach Us About Graphics Technology? Windows 7 Service Pack 1 is Released: But Should You Install It? How To Make Hundreds of Complex Photo Edits in Seconds With Photoshop Actions How to Enable User-Specific Wireless Networks in Windows 7 How to Use Google Chrome as Your Default PDF Reader (the Easy Way) Reclaim Vertical UI Space by Moving Your Tabs to the Side in Firefox Wind and Water: Puzzle Battles – An Awesome Game for Linux and Windows How Star Wars Changed the World [Infographic] Tabs Visual Manager Adds Thumbnailed Tab Switching to Chrome Daisies and Rye Swaying in the Summer Wind Wallpaper Read On Phone Pushes Data from Your Desktop to the Appropriate Android App

Read the article
Error java.lang.OutOfMemoryError: getNewTla using Oracle EPM products

- by Marc Schumacher

Running into a Java out of memory error, it is very common behaviour in the field that the Java heap size will be increased. While this might help to solve a heap space out of memory error, it might not help to fix an out of memory error for the Thread Local Area (TLA). Increasing the available heap space from 1 GB to 16 GB might not even help in this situation. The Thread Local Area (TLA) is part of the Java heap, but as the name already indicates, this memory area is local to a specific thread so there is no need to synchronize with other threads using this memory area. For optimization purposes the TLA size is configurable using the Java command line option “-XXtlasize”. Depending on the JRockit version and the available Java heap, the default values vary. Using Oracle EPM System (mainly 11.1.2.x) the following setting was tested successfully: -XXtlasize:min=8k,preferred=128k More information about the “-XXtlasize” parameter can be found in the JRockit documentation: http://docs.oracle.com/cd/E13150_01/jrockit_jvm/jrockit/jrdocs/refman/optionXX.html

Read the article
Time/duration handling in strategic game

- by borg

I'm considering developing a space opera game, having already done some game design. Technically, though, I'm coming from a business applications background. Hence I don't really know how I should handle time and duration. Let's state the matter clearly: what if something is bound to happen in 5 hours and on which other events depend. For example the arrival of some space ship in some system where some defense systems are present, hence a fight would start. Should I use some kind of scheduler (like Quartz in my java land) to trigger the corresponding event when due (I plan to use events for communication)? Something else?

Read the article
Live Meeting error: malformed email address... or IS IT???

- by PeterBrunone

During a remote SharePoint training session this morning, Live Meeting presented one of our instructors with the following gem: "An attendee email address is malformed". This was particularly troubling since a wizard took care of adding all the entries, and they looked correct (even after being sifted through my character analysis tool).As it turns out, the addresses were indeed correct. As sometimes happens, though, at the line breaks, it looked like there was no space between the semicolon and the following email address. Since I'm a member in good standing of the "I wonder what this button does" school of thought, I added an extra space after each of these cramped little semicolons -- and the invitation executed flawlessly.Coincidence? Maybe... but you can bet I'm going to keep trying dumb stuff like that when the error message doesn't make sense. Think of it as the tech support equivalent of "Ask a silly question..."

Read the article
Compute directional light frustum from view furstum points and light direction

- by Fabian

I'm working on a friends engine project and my task is to construct a new frustum from the light direction that overlaps the view frustum and possible shadow casters. The project already has a function that creates a frustum for this but its way to big and includes way to many casters (shadows) which can't be seen in the view frustum. Now the only parameter of this function are the normalized light direction vector and a view class which lets me extract the 8 view frustum points in world space. I don't have any additional infos about the scene. I have read some of the related Questions here but non seem to fit very well to my problem as they often just point to cascaded shadow maps. Sadly i can't use DX or openGl functions directly because this engine has a dedicated math library. From what i've read so far the steps are: Transform view frustum points into light space and find min/max x and y values (or sometimes minima and maxima of all three axis) and create a AABB using the min/max vectors. But what comes after this step? How do i transform this new AABB back to world space? What i've done so far: CVector3 Points[8], MinLight = CVector3(FLT_MAX), MaxLight = CVector3(FLT_MAX); for(int i = 0; i<8;++i){ Points[i] = Points[i] * WorldToShadowMapMatrix; MinLight = Math::Min(Points[i],MinLight); MaxLight = Math::Max(Points[i],MaxLight); } AABox box(MinLight,MaxLight); I don't think this is the right way to do it. The near plain probably has to extend into the direction of the light source to include potentional shadow casters. I've read the Microsoft article about cascaded shadow maps http://msdn.microsoft.com/en-us/library/windows/desktop/ee416307%28v=vs.85%29.aspx which also includes some sample code. But they seem to use the scenes AABB to determine the near and far plane which I can't since i cant access this information from the funtion I'm working in. Could you guys please link some example code which shows the calculation of such frustum? Thanks in advance! Additional questio: is there a way to construct a WorldToFrustum matrix that represents the above transformation?

Read the article
SQL SERVER – Weekly Series – Memory Lane – #037

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Convert Text to Numbers (Integer) – CAST and CONVERT If table column is VARCHAR and has all the numeric values in it, it can be retrieved as Integer using CAST or CONVERT function. List All Stored Procedure Modified in Last N Days If SQL Server suddenly start behaving in un-expectable behavior and if stored procedure were changed recently, following script can be used to check recently modified stored procedure. If a stored procedure was created but never modified afterwards modified date and create a date for that stored procedure are same. Count Duplicate Records – Rows Validate Field For DATE datatype using function ISDATE() We always checked DATETIME field for incorrect data type. One of the user input date as 30/2/2007. The date was sucessfully inserted in the temp table but while inserting from temp table to final table it crashed with error. We had now task to validate incorrect date value before we insert in final table. Jr. Developer asked me how can he do that? We check for incorrect data type (varchar, int, NULL) but this is incorrect date value. Regular expression works fine with them because of mm/dd/yyyy format. 2008 Find Space Used For Any Particular Table It is very simple to find out the space used by any table in the database. Two Convenient Features Inline Assignment – Inline Operations Here is the script which does both – Inline Assignment and Inline Operation DECLARE @idx INT = 0 SET @idx+=1 SELECT @idx Introduction to SPARSE Columns SPARSE column are better at managing NULL and ZERO values in SQL Server. It does not take any space in database at all. If column is created with SPARSE clause with it and it contains ZERO or NULL it will be take lesser space then regular column (without SPARSE clause). SP_CONFIGURE – Displays or Changes Global Configuration Settings If advanced settings are not enabled at configuration level SQL Server will not let user change the advanced features on server. Authorized user can turn on or turn off advance settings. 2009 Standby Servers and Types of Standby Servers Standby Server is a type of server that can be brought online in a situation when Primary Server goes offline and application needs continuous (high) availability of the server. There is always a need to set up a mechanism where data and objects from primary server are moved to secondary (standby) server. BLOB – Pointer to Image, Image in Database, FILESTREAM Storage When it comes to storing images in database there are two common methods. I had previously blogged about the same subject on my visit to Toronto. With SQL Server 2008, we have a new method of FILESTREAM storage. However, the answer on when to use FILESTREAM and when to use other methods is still vague in community. 2010 Upper Case Shortcut SQL Server Management Studio I select the word and hit CTRL+SHIFT+U and it SSMS immediately changes the case of the selected word. Similar way if one want to convert cases to lower case, another short cut CTRL+SHIFT+L is also available. The Self Join – Inner Join and Outer Join Self Join has always been a noteworthy case. It is interesting to ask questions about self join in a room full of developers. I often ask – if there are three kinds of joins, i.e.- Inner Join, Outer Join and Cross Join; what type of join is Self Join? The usual answer is that it is an Inner Join. However, the reality is very different. Parallelism – Row per Processor – Row per Thread – Thread 0 If you look carefully in the Properties window or XML Plan, there is “Thread 0?. What does this “Thread 0” indicate? Well find out from the blog post. How do I Learn and How do I Teach The blog post has raised three very interesting questions. How do you learn? How do you teach? What are you learning or teaching? Let me try to answer the same. 2011 SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 7 of 31 What are Different Types of Locks? What are Pessimistic Lock and Optimistic Lock? When is the use of UPDATE_STATISTICS command? What is the Difference between a HAVING clause and a WHERE clause? What is Connection Pooling and why it is Used? What are the Properties and Different Types of Sub-Queries? What are the Authentication Modes in SQL Server? How can it be Changed? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 8 of 31 Which Command using Query Analyzer will give you the Version of SQL Server and Operating System? What is an SQL Server Agent? Can a Stored Procedure call itself or a Recursive Stored Procedure? How many levels of SP nesting is possible? What is Log Shipping? Name 3 ways to get an Accurate Count of the Number of Records in a Table? What does it mean to have QUOTED_IDENTIFIER ON? What are the Implications of having it OFF? What is the Difference between a Local and a Global Temporary Table? What is the STUFF Function and How Does it Differ from the REPLACE Function? What is PRIMARY KEY? What is UNIQUE KEY Constraint? What is FOREIGN KEY? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 9 of 31 What is CHECK Constraint? What is NOT NULL Constraint? What is the difference between UNION and UNION ALL? What is B-Tree? How to get @@ERROR and @@ROWCOUNT at the Same Time? What is a Scheduled Job or What is a Scheduled Task? What are the Advantages of Using Stored Procedures? What is a Table Called, if it has neither Cluster nor Non-cluster Index? What is it Used for? Can SQL Servers Linked to other Servers like Oracle? What is BCP? When is it Used? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 10 of 31 What Command do we Use to Rename a db, a Table and a Column? What are sp_configure Commands and SET Commands? How to Implement One-to-One, One-to-Many and Many-to-Many Relationships while Designing Tables? What is Difference between Commit and Rollback when Used in Transactions? What is an Execution Plan? When would you Use it? How would you View the Execution Plan? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 11 of 31 What is Difference between Table Aliases and Column Aliases? Do they Affect Performance? What is the difference between CHAR and VARCHAR Datatypes? What is the Difference between VARCHAR and VARCHAR(MAX) Datatypes? What is the Difference between VARCHAR and NVARCHAR datatypes? Which are the Important Points to Note when Multilanguage Data is Stored in a Table? How to Optimize Stored Procedure Optimization? What is SQL Injection? How to Protect Against SQL Injection Attack? How to Find Out the List Schema Name and Table Name for the Database? What is CHECKPOINT Process in the SQL Server? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Day 12 of 31 How does Using a Separate Hard Drive for Several Database Objects Improves Performance Right Away? How to Find the List of Fixed Hard Drive and Free Space on Server? Why can there be only one Clustered Index and not more than one? What is Difference between Line Feed (\n) and Carriage Return (\r)? Is It Possible to have Clustered Index on Separate Drive From Original Table Location? What is a Hint? How to Delete Duplicate Rows? Why the Trigger Fires Multiple Times in Single Login? 2012 CTRL+SHIFT+] Shortcut to Select Code Between Two Parenthesis Shortcut key is CTRL+SHIFT+]. This key can be very useful when dealing with multiple subqueries, CTE or query with multiple parentheses. When exercised this shortcut key it selects T-SQL code between two parentheses. Monday Morning Puzzle – Query Returns Results Sometimes but Not Always I am beginner with SQL Server. I have one query, it sometime returns a result and sometime it does not return me the result. Where should I start looking for a solution and what kind of information I should send to you so you can help me with solving. I have no clue, please guide me. Remove Debug Button in SSMS – SQL in Sixty Seconds #020 – Video Effect of Case Sensitive Collation on Resultset Collation is a very interesting concept but I quite often see it is heavily neglected. I have seen developer and DBA looking for a workaround to fix collation error rather than understanding if the side effect of the workaround. Switch Between Two Parenthesis using Shortcut CTRL+] Earlier this week I wrote a blog post about CTRL+SHIFT+] Shortcut to Select Code Between Two Parenthesis, I received quite a lot of positive feedback from readers. If you are a regular reader of the blog post, you must be aware that I appreciate the learning shared by readers. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
How can I force X to start in a computer without a monitor?

- by Javier Rivera

I have I computer that have no monitor attached. When I boot it X fails to start because there is no monitor detected. If I boot it with a monitor attached and after X is started I remove the monitor everything works fine. Details and Background: This computer is a kind of hardware consolidation server. It's only purpose is to run two Virtual Box VM's that run Windows XP and some important but seldom used (once or twice a month) programs. For a couple of time it has been lying in a corner with an old monitor attached to it and working great. But space in the office was getting scarce and I moved the computer to the server room. There is no monitor attached to it there (no space), and sometimes the computer is rebooted. When it boots without monitor X is not started, the vms don't start and I get called to solve the problem.

Read the article
Week in Geek: Rogue Antivirus Caught Using AVG Logo Edition

- by Asian Angel

This week we learned how to quickly cut a clip from a video file with Avidemux, “tile windows, remote control a desktop using an iOS device, taking advantage of Windows 7 libraries”, turn a home Ubuntu PC into a LAMP web server, enable desktop notifications for Gmail in Chrome, “what image channels are and what they mean”, and more Latest Features How-To Geek ETC How to Integrate Dropbox with Pages, Keynote, and Numbers on iPad RGB? CMYK? Alpha? What Are Image Channels and What Do They Mean? How to Recover that Photo, Picture or File You Deleted Accidentally How To Colorize Black and White Vintage Photographs in Photoshop How To Get SSH Command-Line Access to Windows 7 Using Cygwin The How-To Geek Video Guide to Using Windows 7 Speech Recognition Android Notifier Pushes Android Notices to Your Desktop Dead Space 2 Theme for Chrome and Iron Carl Sagan and Halo Reach Mashup – We Humans are Capable of Greatness [Video] Battle the Necromorphs Once Again on Your Desktop with the Dead Space 2 Theme for Windows 7 HTC Home Brings HTC’s Weather Widget to Your Windows Desktop Apps Uninstall Batch Removes Android Applications

Read the article
HTG Explains: How Do Noise Reducing Headphones Work?

- by YatriTrivedi

Passive noise reduction, active noise cancellation, sound isolation… The world of headphones has become quite advanced in giving you your own private sound bubble. Here’s how these different technologies work. Latest Features How-To Geek ETC Should You Delete Windows 7 Service Pack Backup Files to Save Space? What Can Super Mario Teach Us About Graphics Technology? Windows 7 Service Pack 1 is Released: But Should You Install It? How To Make Hundreds of Complex Photo Edits in Seconds With Photoshop Actions How to Enable User-Specific Wireless Networks in Windows 7 How to Use Google Chrome as Your Default PDF Reader (the Easy Way) WizMouse Enables Mouse Over Scrolling on Any Window Enhance GIMP’s Image Editing Power with Gimp Paint Studio Reclaim Vertical UI Space by Moving Your Tabs to the Side in Firefox Wind and Water: Puzzle Battles – An Awesome Game for Linux and Windows How Star Wars Changed the World [Infographic] Tabs Visual Manager Adds Thumbnailed Tab Switching to Chrome

Read the article
Determine percentage of screen covered by an object without using frustum culling

- by Meltac

On the CPU-side of an 3D first-person / ego perspective game I need to check whether what the players currently sees on screen is the inside of a box object defined by world space coordinates (the player might be outside of that box but on screen sees only/mostly the inside of the box, or vice-versa, looks from within the box to the outside). The "casual" way of performing such a check would incorporate frustum culling but such an approach would be hard to achieve with my given set of engine parameters which I'd like to avoid if there is a simpler way. What I actually have at the point where I would like to do the check (high-level script on CPU, not GPU side): Camera world position Camera direction Camera FOV Two Box corner world coordinates (left-bottom-front, right-top-back) What I do not have right away: View frustrum definition (near/far plane or say 6 planes defining frustum) Any specific pixel information (uv, view space position, depth or the like) What I would like to calculate: Percentage of screen "covered" by box. Any hints on how to perform such calculation?

Read the article
Folder missing in external hard drive

- by Hans

I have been backing up my folders, I am using Seagate Expansion portable Drive. I had created a folder called "|o|o" in the root folder of the portable drive... I copied my latest folders into "|o|o" folder to re-install ubuntu. When I open the portable drive the folder |o|o is not visible, when I ctrl+a and check properties the space used is 122GB, however when I click on the drive to view properties of the drive used space is 260GB. It looks as if the folder is there in the portable drive but I cannot access it... I have tried to view all the hidden files and "|o|o" is still not there. I am using 12.04.Can you please help me to retrieve this folder.

Read the article

< Previous Page | 61 62 63 64 65 66 67 68 69 70 71 72 | Next Page >