optimizing - Developer IT

How to Sabotage Your Search Engine Optimizing Efforts

Search engine optimizing is a key and ongoing strategy anybody marketing on the internet needs to adopt as part of their daily routine. Properly optimizing any sites or content will serve to increase the amount of search engine traffic you receive. Read on to discover 3 search engine optimization tips to help you get the absolute most traffic out of your optimizing efforts.

Read the article

How to Sabotage Your Search Engine Optimizing Efforts

Search engine optimizing is a key and ongoing strategy anybody marketing on the internet needs to adopt as part of their daily routine. Properly optimizing any sites or content will serve to increase the amount of search engine traffic you receive. Read on to discover 3 search engine optimization tips to help you get the absolute most traffic out of your optimizing efforts.

Read the article

C++0x optimizing compiler quality

- by aaa

hello. I do some heavy numbercrunching and for me floating-point performance is very important. I like performance of Intel compiler very much and quite content with quality of assembly it produces. I am thinking at some point to try C++0x mainly for sugar parts, like auto, initializer list, etc, but also lambdas. at this point I use those features in regular C++ by the means of boost. How good of assembly code do compilers C++0x generate? specifically Intel and gcc compilers. Do they produce SSE code? is performance comparable to C++? are there any benchmarks? My Google search did not reveal much. Thank you.

Read the article

Webcast: ODI and Successful Strategies for Optimizing Your Data Warehouse

- by antonio romero

A new public webcast for ODI: “Successful Strategies for Optimizing Your Data Warehouse” is scheduled for March 3th at 10am PT/1pm ET. In this webcast, Mala Narasimharajan, from the product marketing team and Denis Gray from the product management team, will be presenting ODI’s strong value proposition for data warehousing solutions. You can find the registration link below. Live webcast: Successful Strategies for Optimizing Your Data Warehouse March 3, 2011 1pm ET/10pm PT Registration link: http://www.oracle.com/us/dm/66153-wwmk10035379mpp011-se-309154.html

Read the article

Google I/O 2010 - Optimizing apps with the GWT Compiler

Google I/O 2010 - Optimizing apps with the GWT Compiler Google I/O 2010 - Faster apps faster - Optimizing apps with the GWT Compiler GWT 201 Ray Cromwell The GWT compiler isn't just a Java to JavaScript transliterator. It performs many optimizations along the way. In this session, we'll show you not only the optimizations performed, but how you can get more out of the compiler itself. Learn how to speed up compiles, use -draftCompile, compile for only one locale/browser permutation, and more. For all I/O 2010 sessions, please go to code.google.com From: GoogleDevelopers Views: 7 0 ratings Time: 56:17 More in Science & Technology

Read the article

Google I/O 2011: Optimizing Android Apps with Google Analytics

Google I/O 2011: Optimizing Android Apps with Google Analytics Nick Mihailovski, Philip Mui, Jim Cotugno Thousands of apps have taken advantage of Google Analytics' native Android tracking capabilities to improve the adoption and usability of Andriod Apps. This session covers best practices for tracking apps on mobile, TV and other devices. We'll also show you how to gain actionable insights from new tracking and reporting capabilities. From: GoogleDevelopers Views: 6819 34 ratings Time: 47:40 More in Science & Technology

Read the article

Book Review: Optimizing Windows 7 Pocket Consultant

It is essential to optimize Windows 7 in order to make use of the features to its full potential. However, it is difficult to find and locate the various elements which require optimization. In this review, Anand examines the contents of Optimizing Windows 7 Pocket Consultant book authored by William Stanek. After reading the review, you will be in a position to judge whether the book will be suitable for you or not.

Read the article

Google I/O 2012 - Optimizing Your Code Using Features of Google APIs

Google I/O 2012 - Optimizing Your Code Using Features of Google APIs Sven Mawson Google APIs support a variety of features designed to enable state of the art development. In this session, you will learn how to create applications that use performance enhancing features to make your code run faster and use fewer resources. Some features we'll describe include batching, requests for partial response, and efficient ways to handle media. For all I/O 2012 sessions, go to developers.google.com From: GoogleDevelopers Views: 0 0 ratings Time: 44:50 More in Science & Technology

Read the article

Optimizing collision engine bottleneck

- by Vittorio Romeo

Foreword: I'm aware that optimizing this bottleneck is not a necessity - the engine is already very fast. I, however, for fun and educational purposes, would love to find a way to make the engine even faster. I'm creating a general-purpose C++ 2D collision detection/response engine, with an emphasis on flexibility and speed. Here's a very basic diagram of its architecture: Basically, the main class is World, which owns (manages memory) of a ResolverBase*, a SpatialBase* and a vector<Body*>. SpatialBase is a pure virtual class which deals with broad-phase collision detection. ResolverBase is a pure virtual class which deals with collision resolution. The bodies communicate to the World::SpatialBase* with SpatialInfo objects, owned by the bodies themselves. There currenly is one spatial class: Grid : SpatialBase, which is a basic fixed 2D grid. It has it's own info class, GridInfo : SpatialInfo. Here's how its architecture looks: The Grid class owns a 2D array of Cell*. The Cell class contains two collection of (not owned) Body*: a vector<Body*> which contains all the bodies that are in the cell, and a map<int, vector<Body*>> which contains all the bodies that are in the cell, divided in groups. Bodies, in fact, have a groupId int that is used for collision groups. GridInfo objects also contain non-owning pointers to the cells the body is in. As I previously said, the engine is based on groups. Body::getGroups() returns a vector<int> of all the groups the body is part of. Body::getGroupsToCheck() returns a vector<int> of all the groups the body has to check collision against. Bodies can occupy more than a single cell. GridInfo always stores non-owning pointers to the occupied cells. After the bodies move, collision detection happens. We assume that all bodies are axis-aligned bounding boxes. How broad-phase collision detection works: Part 1: spatial info update For each Body body: Top-leftmost occupied cell and bottom-rightmost occupied cells are calculated. If they differ from the previous cells, body.gridInfo.cells is cleared, and filled with all the cells the body occupies (2D for loop from the top-leftmost cell to the bottom-rightmost cell). body is now guaranteed to know what cells it occupies. For a performance boost, it stores a pointer to every map<int, vector<Body*>> of every cell it occupies where the int is a group of body->getGroupsToCheck(). These pointers get stored in gridInfo->queries, which is simply a vector<map<int, vector<Body*>>*>. body is now guaranteed to have a pointer to every vector<Body*> of bodies of groups it needs to check collision against. These pointers are stored in gridInfo->queries. Part 2: actual collision checks For each Body body: body clears and fills a vector<Body*> bodiesToCheck, which contains all the bodies it needs to check against. Duplicates are avoided (bodies can belong to more than one group) by checking if bodiesToCheck already contains the body we're trying to add. const vector<Body*>& GridInfo::getBodiesToCheck() { bodiesToCheck.clear(); for(const auto& q : queries) for(const auto& b : *q) if(!contains(bodiesToCheck, b)) bodiesToCheck.push_back(b); return bodiesToCheck; } The GridInfo::getBodiesToCheck() method IS THE BOTTLENECK. The bodiesToCheck vector must be filled for every body update because bodies could have moved meanwhile. It also needs to prevent duplicate collision checks. The contains function simply checks if the vector already contains a body with std::find. Collision is checked and resolved for every body in bodiesToCheck. That's it. So, I've been trying to optimize this broad-phase collision detection for quite a while now. Every time I try something else than the current architecture/setup, something doesn't go as planned or I make assumption about the simulation that later are proven to be false. My question is: how can I optimize the broad-phase of my collision engine maintaining the grouped bodies approach? Is there some kind of magic C++ optimization that can be applied here? Can the architecture be redesigned in order to allow for more performance? Actual implementation: SSVSCollsion Body.h, Body.cpp World.h, World.cpp Grid.h, Grid.cpp Cell.h, Cell.cpp GridInfo.h, GridInfo.cpp

Read the article

Optimizing AES modes on Solaris for Intel Westmere

- by danx

Optimizing AES modes on Solaris for Intel Westmere Review AES is a strong method of symmetric (secret-key) encryption. It is a U.S. FIPS-approved cryptographic algorithm (FIPS 197) that operates on 16-byte blocks. AES has been available since 2001 and is widely used. However, AES by itself has a weakness. AES encryption isn't usually used by itself because identical blocks of plaintext are always encrypted into identical blocks of ciphertext. This encryption can be easily attacked with "dictionaries" of common blocks of text and allows one to more-easily discern the content of the unknown cryptotext. This mode of encryption is called "Electronic Code Book" (ECB), because one in theory can keep a "code book" of all known cryptotext and plaintext results to cipher and decipher AES. In practice, a complete "code book" is not practical, even in electronic form, but large dictionaries of common plaintext blocks is still possible. Here's a diagram of encrypting input data using AES ECB mode: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ AESKey-->(AES Encryption) AESKey-->(AES Encryption) | | | | \/ \/ CipherTextOutput CipherTextOutput Block 1 Block 2 What's the solution to the same cleartext input producing the same ciphertext output? The solution is to further process the encrypted or decrypted text in such a way that the same text produces different output. This usually involves an Initialization Vector (IV) and XORing the decrypted or encrypted text. As an example, I'll illustrate CBC mode encryption: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ IV >----->(XOR) +------------->(XOR) +---> . . . . | | | | | | | | \/ | \/ | AESKey-->(AES Encryption) | AESKey-->(AES Encryption) | | | | | | | | | \/ | \/ | CipherTextOutput ------+ CipherTextOutput -------+ Block 1 Block 2 The steps for CBC encryption are: Start with a 16-byte Initialization Vector (IV), choosen randomly. XOR the IV with the first block of input plaintext Encrypt the result with AES using a user-provided key. The result is the first 16-bytes of output cryptotext. Use the cryptotext (instead of the IV) of the previous block to XOR with the next input block of plaintext Another mode besides CBC is Counter Mode (CTR). As with CBC mode, it also starts with a 16-byte IV. However, for subsequent blocks, the IV is just incremented by one. Also, the IV ix XORed with the AES encryption result (not the plain text input). Here's an illustration: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ AESKey-->(AES Encryption) AESKey-->(AES Encryption) | | | | \/ \/ IV >----->(XOR) IV + 1 >---->(XOR) IV + 2 ---> . . . . | | | | \/ \/ CipherTextOutput CipherTextOutput Block 1 Block 2 Optimization Which of these modes can be parallelized? ECB encryption/decryption can be parallelized because it does more than plain AES encryption and decryption, as mentioned above. CBC encryption can't be parallelized because it depends on the output of the previous block. However, CBC decryption can be parallelized because all the encrypted blocks are known at the beginning. CTR encryption and decryption can be parallelized because the input to each block is known--it's just the IV incremented by one for each subsequent block. So, in summary, for ECB, CBC, and CTR modes, encryption and decryption can be parallelized with the exception of CBC encryption. How do we parallelize encryption? By interleaving. Usually when reading and writing data there are pipeline "stalls" (idle processor cycles) that result from waiting for memory to be loaded or stored to or from CPU registers. Since the software is written to encrypt/decrypt the next data block where pipeline stalls usually occurs, we can avoid stalls and crypt with fewer cycles. This software processes 4 blocks at a time, which ensures virtually no waiting ("stalling") for reading or writing data in memory. Other Optimizations Besides interleaving, other optimizations performed are Loading the entire key schedule into the 128-bit %xmm registers. This is done once for per 4-block of data (since 4 blocks of data is processed, when present). The following is loaded: the entire "key schedule" (user input key preprocessed for encryption and decryption). This takes 11, 13, or 15 registers, for AES-128, AES-192, and AES-256, respectively The input data is loaded into another %xmm register The same register contains the output result after encrypting/decrypting Using SSSE 4 instructions (AESNI). Besides the aesenc, aesenclast, aesdec, aesdeclast, aeskeygenassist, and aesimc AESNI instructions, Intel has several other instructions that operate on the 128-bit %xmm registers. Some common instructions for encryption are: pxor exclusive or (very useful), movdqu load/store a %xmm register from/to memory, pshufb shuffle bytes for byte swapping, pclmulqdq carry-less multiply for GCM mode Combining AES encryption/decryption with CBC or CTR modes processing. Instead of loading input data twice (once for AES encryption/decryption, and again for modes (CTR or CBC, for example) processing, the input data is loaded once as both AES and modes operations occur at in the same function Performance Everyone likes pretty color charts, so here they are. I ran these on Solaris 11 running on a Piketon Platform system with a 4-core Intel Clarkdale processor @3.20GHz. Clarkdale which is part of the Westmere processor architecture family. The "before" case is Solaris 11, unmodified. Keep in mind that the "before" case already has been optimized with hand-coded Intel AESNI assembly. The "after" case has combined AES-NI and mode instructions, interleaved 4 blocks at-a-time. « For the first table, lower is better (milliseconds). The first table shows the performance improvement using the Solaris encrypt(1) and decrypt(1) CLI commands. I encrypted and decrypted a 1/2 GByte file on /tmp (swap tmpfs). Encryption improved by about 40% and decryption improved by about 80%. AES-128 is slighty faster than AES-256, as expected. The second table shows more detail timings for CBC, CTR, and ECB modes for the 3 AES key sizes and different data lengths. » The results shown are the percentage improvement as shown by an internal PKCS#11 microbenchmark. And keep in mind the previous baseline code already had optimized AESNI assembly! The keysize (AES-128, 192, or 256) makes little difference in relative percentage improvement (although, of course, AES-128 is faster than AES-256). Larger data sizes show better improvement than 128-byte data. Availability This software is in Solaris 11 FCS. It is available in the 64-bit libcrypto library and the "aes" Solaris kernel module. You must be running hardware that supports AESNI (for example, Intel Westmere and Sandy Bridge, microprocessor architectures). The easiest way to determine if AES-NI is available is with the isainfo(1) command. For example, $ isainfo -v 64-bit amd64 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu No special configuration or setup is needed to take advantage of this software. Solaris libraries and kernel automatically determine if it's running on AESNI-capable machines and execute the correctly-tuned software for the current microprocessor. Summary Maximum throughput of AES cipher modes can be achieved by combining AES encryption with modes processing, interleaving encryption of 4 blocks at a time, and using Intel's wide 128-bit %xmm registers and instructions. References "Block cipher modes of operation", Wikipedia Good overview of AES modes (ECB, CBC, CTR, etc.) "Advanced Encryption Standard", Wikipedia "Current Modes" describes NIST-approved block cipher modes (ECB,CBC, CFB, OFB, CCM, GCM)

Read the article

Google PageSpeed, optimizing Google's own elements

- by mowgli

I'm trying Google's PageSpeed online service Ironically, it's primarily highlighting Google's own services as something that needs improvement on my site 1) jQuery from Google: blocking. So I moved all javascript from <head> to the end of the document before </body>. That helped 2) Linking to external Google Font CSS (in <head>): blocking. But the font is critical to the design of the page and should load before much else 3) Google Analytics: Caching is not good. (Google has set it internally to 2 hours expiration). Don't know how to change this (this is also placed at the bottom of page) The Google Font is highlighted as a big priority to change. How can I fix this? Where/how should I call the the font?

Read the article

Optimizing the MySQL Query Cache

MySQL's query cache is an impressive piece of engineering if sometimes misunderstood. Keeping it optimized and used efficiently can make a big difference in the overall throughput of your application, so it's worth taking a look under the hood, understanding it, and then keeping it tuned optimally.

Read the article

Optimizing Solaris 11 SHA-1 on Intel Processors

- by danx

SHA-1 is a "hash" or "digest" operation that produces a 160 bit (20 byte) checksum value on arbitrary data, such as a file. It is intended to uniquely identify text and to verify it hasn't been modified. Max Locktyukhin and others at Intel have improved the performance of the SHA-1 digest algorithm using multiple techniques. This code has been incorporated into Solaris 11 and is available in the Solaris Crypto Framework via the libmd(3LIB), the industry-standard libpkcs11(3LIB) library, and Solaris kernel module sha1. The optimized code is used automatically on systems with a x86 CPU supporting SSSE3 (Intel Supplemental SSSE3). Intel microprocessor architectures that support SSSE3 include Nehalem, Westmere, Sandy Bridge microprocessor families. Further optimizations are available for microprocessors that support AVX (such as Sandy Bridge). Although SHA-1 is considered obsolete because of weaknesses found in the SHA-1 algorithm—NIST recommends using at least SHA-256, SHA-1 is still widely used and will be with us for awhile more. Collisions (the same SHA-1 result for two different inputs) can be found with moderate effort. SHA-1 is used heavily though in SSL/TLS, for example. And SHA-1 is stronger than the older MD5 digest algorithm, another digest option defined in SSL/TLS. Optimizations Review SHA-1 operates by reading an arbitrary amount of data. The data is read in 512 bit (64 byte) blocks (the last block is padded in a specific way to ensure it's a full 64 bytes). Each 64 byte block has 80 "rounds" of calculations (consisting of a mixture of "ROTATE-LEFT", "AND", and "XOR") applied to the block. Each round produces a 32-bit intermediate result, called W[i]. Here's what each round operates: The first 16 rounds, rounds 0 to 15, read the 512 bit block 32 bits at-a-time. These 32 bits is used as input to the round. The remaining rounds, rounds 16 to 79, use the results from the previous rounds as input. Specifically for round i it XORs the results of rounds i-3, i-8, i-14, and i-16 and rotates the result left 1 bit. The remaining calculations for the round is a series of AND, XOR, and ROTATE-LEFT operators on the 32-bit input and some constants. The 32-bit result is saved as W[i] for round i. The 32-bit result of the final round, W[79], is the SHA-1 checksum. Optimization: Vectorization The first 16 rounds can be vectorized (computed in parallel) because they don't depend on the output of a previous round. As for the remaining rounds, because of step 2 above, computing round i depends on the results of round i-3, W[i-3], one can vectorize 3 rounds at-a-time. Max Locktyukhin found through simple factoring, explained in detail in his article referenced below, that the dependencies of round i on the results of rounds i-3, i-8, i-14, and i-16 can be replaced instead with dependencies on the results of rounds i-6, i-16, i-28, and i-32. That is, instead of initializing intermediate result W[i] with: W[i] = (W[i-3] XOR W[i-8] XOR W[i-14] XOR W[i-16]) ROTATE-LEFT 1 Initialize W[i] as follows: W[i] = (W[i-6] XOR W[i-16] XOR W[i-28] XOR W[i-32]) ROTATE-LEFT 2 That means that 6 rounds could be vectorized at once, with no additional calculations, instead of just 3! This optimization is independent of Intel or any other microprocessor architecture, although the microprocessor has to support vectorization to use it, and exploits one of the weaknesses of SHA-1. Optimization: SSSE3 Intel SSSE3 makes use of 16 %xmm registers, each 128 bits wide. The 4 32-bit inputs to a round, W[i-6], W[i-16], W[i-28], W[i-32], all fit in one %xmm register. The following code snippet, from Max Locktyukhin's article, converted to ATT assembly syntax, computes 4 rounds in parallel with just a dozen or so SSSE3 instructions: movdqa W_minus_04, W_TMP pxor W_minus_28, W // W equals W[i-32:i-29] before XOR // W = W[i-32:i-29] ^ W[i-28:i-25] palignr $8, W_minus_08, W_TMP // W_TMP = W[i-6:i-3], combined from // W[i-4:i-1] and W[i-8:i-5] vectors pxor W_minus_16, W // W = (W[i-32:i-29] ^ W[i-28:i-25]) ^ W[i-16:i-13] pxor W_TMP, W // W = (W[i-32:i-29] ^ W[i-28:i-25] ^ W[i-16:i-13]) ^ W[i-6:i-3]) movdqa W, W_TMP // 4 dwords in W are rotated left by 2 psrld $30, W // rotate left by 2 W = (W >> 30) | (W << 2) pslld $2, W_TMP por W, W_TMP movdqa W_TMP, W // four new W values W[i:i+3] are now calculated paddd (K_XMM), W_TMP // adding 4 current round's values of K movdqa W_TMP, (WK(i)) // storing for downstream GPR instructions to read A window of the 32 previous results, W[i-1] to W[i-32] is saved in memory on the stack. This is best illustrated with a chart. Without vectorization, computing the rounds is like this (each "R" represents 1 round of SHA-1 computation): RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR With vectorization, 4 rounds can be computed in parallel: RRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRR Optimization: AVX The new "Sandy Bridge" microprocessor architecture, which supports AVX, allows another interesting optimization. SSSE3 instructions have two operands, a input and an output. AVX allows three operands, two inputs and an output. In many cases two SSSE3 instructions can be combined into one AVX instruction. The difference is best illustrated with an example. Consider these two instructions from the snippet above: pxor W_minus_16, W // W = (W[i-32:i-29] ^ W[i-28:i-25]) ^ W[i-16:i-13] pxor W_TMP, W // W = (W[i-32:i-29] ^ W[i-28:i-25] ^ W[i-16:i-13]) ^ W[i-6:i-3]) With AVX they can be combined in one instruction: vpxor W_minus_16, W, W_TMP // W = (W[i-32:i-29] ^ W[i-28:i-25] ^ W[i-16:i-13]) ^ W[i-6:i-3]) This optimization is also in Solaris, although Sandy Bridge-based systems aren't widely available yet. As an exercise for the reader, AVX also has 256-bit media registers, %ymm0 - %ymm15 (a superset of 128-bit %xmm0 - %xmm15). Can %ymm registers be used to parallelize the code even more? Optimization: Solaris-specific In addition to using the Intel code described above, I performed other minor optimizations to the Solaris SHA-1 code: Increased the digest(1) and mac(1) command's buffer size from 4K to 64K, as previously done for decrypt(1) and encrypt(1). This size is well suited for ZFS file systems, but helps for other file systems as well. Optimized encode functions, which byte swap the input and output data, to copy/byte-swap 4 or 8 bytes at-a-time instead of 1 byte-at-a-time. Enhanced the Solaris mdb(1) and kmdb(1) debuggers to display all 16 %xmm and %ymm registers (mdb "$x" command). Previously they only displayed the first 8 that are available in 32-bit mode. Can't optimize if you can't debug :-). Changed the SHA-1 code to allow processing in "chunks" greater than 2 Gigabytes (64-bits) Performance I measured performance on a Sun Ultra 27 (which has a Nehalem-class Xeon 5500 Intel W3570 microprocessor @3.2GHz). Turbo mode is disabled for consistent performance measurement. Graphs are better than words and numbers, so here they are: The first graph shows the Solaris digest(1) command before and after the optimizations discussed here, contained in libmd(3LIB). I ran the digest command on a half GByte file in swapfs (/tmp) and execution time decreased from 1.35 seconds to 0.98 seconds. The second graph shows the the results of an internal microbenchmark that uses the Solaris libpkcs11(3LIB) library. The operations are on a 128 byte buffer with 10,000 iterations. The results show operations increased from 320,000 to 416,000 operations per second. Finally the third graph shows the results of an internal kernel microbenchmark that uses the Solaris /kernel/crypto/amd64/sha1 module. The operations are on a 64Kbyte buffer with 100 iterations. third graph shows the results of an internal kernel microbenchmark that uses the Solaris /kernel/crypto/amd64/sha1 module. The operations are on a 64Kbyte buffer with 100 iterations. The results show for 1 kernel thread, operations increased from 410 to 600 MBytes/second. For 8 kernel threads, operations increase from 1540 to 1940 MBytes/second. Availability This code is in Solaris 11 FCS. It is available in the 64-bit libmd(3LIB) library for 64-bit programs and is in the Solaris kernel. You must be running hardware that supports Intel's SSSE3 instructions (for example, Intel Nehalem, Westmere, or Sandy Bridge microprocessor architectures). The easiest way to determine if SSSE3 is available is with the isainfo(1) command. For example, nehalem $ isainfo -v $ isainfo -v 64-bit amd64 applications sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu If the output also shows "avx", the Solaris executes the even-more optimized 3-operand AVX instructions for SHA-1 mentioned above: sandybridge $ isainfo -v 64-bit amd64 applications avx xsave pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications avx xsave pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu No special configuration or setup is needed to take advantage of this code. Solaris libraries and kernel automatically determine if it's running on SSSE3 or AVX-capable machines and execute the correctly-tuned code for that microprocessor. Summary The Solaris 11 Crypto Framework, via the sha1 kernel module and libmd(3LIB) and libpkcs11(3LIB) libraries, incorporated a useful SHA-1 optimization from Intel for SSSE3-capable microprocessors. As with other Solaris optimizations, they come automatically "under the hood" with the current Solaris release. References "Improving the Performance of the Secure Hash Algorithm (SHA-1)" by Max Locktyukhin (Intel, March 2010). The source for these SHA-1 optimizations used in Solaris "SHA-1", Wikipedia Good overview of SHA-1 FIPS 180-1 SHA-1 standard (FIPS, 1995) NIST Comments on Cryptanalytic Attacks on SHA-1 (2005, revised 2006)

Read the article

Optimizing MySQL Database Operations for Better Performance

- by Antoinette O'Sullivan

If you are responsible for a MySQL Database, you make choices based on your priorities; cost, security and performance. To learn more about improving performance, take the MySQL Performance Tuning course. In this 4-day instructor-led course you will learn practical, safe and highly efficient ways to optimize performance for the MySQL Server. It will help you develop the skills needed to use tools for monitoring, evaluating and tuning MySQL. You can take this course via the following delivery methods:Training-on-Demand: Take this course at your own pace, starting training within 24 hours of registration. Live-Virtual Event: Follow a live-event from your own desk; no travel required. You can choose from a selection of events to suit your timezone. In-Class Event: Travel to an education center to take this course. Below is a selection of events already on the schedule. Location Date Delivery Language London, England 26 November 2013 English Toulouse, France 18 November 2013 French Rome, Italy 2 December 2013 Italian Riga, Latvia 3 March 2014 Latvian Jakarta Barat, Indonesia 10 December 2013 English Tokyo, Japan 17 April 2014 Japanese Pasig City, Philippines 9 December 2013 English Bangkok, Thailand 4 November 2013 English To register for this course or to learn more about the authentic MySQL curriculum, go to http://education.oracle.com/mysql. To see what an expert has to say about MySQL Performance, read Dimitri's blog.

Read the article

optimizing graphics for iOS flash game

- by 1GR3

Friend of mine and me are working on a flash developed iOS (and later Android) puzzle board game. He's a programmer and I'm a designer/developer so (no surprise) we have a different points of view. anyway, he's method: make small tiles (100x100px) in photoshop join them into the board and then in flash apply effects to the board to avoid repetition (80's not in the good way) my method: precompose the whole board (960x640px+bleed) in photoshop and than mask active and inactive areas in flash what do you think? thank you in advance!

Read the article

Internet Business Coaching and Marketing For Travel Professionals - Optimizing the Website

Are you really looking out for an option that can work for you as a mean booking machine? Well, then an optimized website is the right solution.

Read the article

Optimizing Memory Usage in a .NET Application with ANTS Memory Profiler

Most people have encountered an OutOfMemory problem at some point or other, and these people know that tracking down the source of the problem is often a time-consuming and frustrating task. Florian Standhartinger gives us a walkthrough of how he used the ANTS Memory Profiler to help make an otherwise painful task that little bit less troublesome.

Read the article

Optimizing graphics for an iOS flash game

- by 1GR3

A friend of mine and myself are working on a flash developed iOS (and later Android) puzzle board game. He's a developer and I'm a designer/developer so (no surprise) we have different points of view. His method: make small tiles (100x100px) in Photoshop join them into the board and then in flash apply effects to the board to avoid repetition (80's not in the good way). My method: precompose the whole board (960x640px+bleed) in Photoshop and than mask active and inactive areas in flash. What do you think?

Read the article

Optimizing hash lookup & memory performance in Go

- by Moishe

As an exercise, I'm implementing HashLife in Go. In brief, HashLife works by memoizing nodes in a quadtree so that once a given node's value in the future has been calculated, it can just be looked up instead of being re-calculated. So eg. if you have a node at the 8x8 level, you remember it by its four children (each at the 2x2 level). So next time you see an 8x8 node, when you calculate the next generation, you first check if you've already seen a node with those same four children. This is extended up through all levels of the quadtree, which gives you some pretty amazing optimizations if eg. you're 10 levels above the leaves. Unsurprisingly, it looks like the perfmance crux of this is the lookup of nodes by child-node values. Currently I have a hashmap of {&upper_left_node,&upper_right_node,&lower_left_node,&lower_right_node} -> node So my lookup function is this: func FindNode(ul, ur, ll, lr *Node) *Node { var node *Node var ok bool nc := NodeChildren{ul, ur, ll, lr} node, ok = NodeMap[nc] if ok { return node } node = &Node{ul, ur, ll, lr, 0, ul.Level + 1, nil} NodeMap[nc] = node return node } What I'm trying to figure out is if the "nc := NodeChildren..." line causes a memory allocation each time the function is called. If it does, can I/should I move the declaration to the global scope and just modify the values each time this function is called? Or is there a more efficient way to do this? Any advice/feedback would be welcome. (even coding style nits; this is literally the first thing I've written in Go so I'd love any feedback)

Read the article

How to Make a Website Load Faster - Optimizing Website Loading Time

How to make a website load faster is not difficult to do. You have to consider this when doing a website because you don't want to have a very slow loading website which will lead you to a zero visitor. People have no time waiting for your website to load.

Read the article

Optimizing MySQL -

- by Josh

I've been researching how to optimize MySQL a bit, but I still have a few questions. MySQL Primer Results http://pastie.org/private/lzjukl8wacxfjbjhge6vw Based on this, the first problem seems to be that the max_connections limit is too low. I had a similar problem with Apache initially, the max connection limit was set to 100, and the web server would frequently lock up and take an excruciatingly long time to deliver pages. Raising the connection limit to 512 fixed this issue, and I read that raising the connection limit on MySQL to match this was considered good practice. Being that MySQL has actually been "locking up" recently as well (connections have been refused entirely for a few minutes at a time at random intervals) I'm assuming this is the main cause of the issue. However, as far as table cache goes, I'm not sure what I should set this as. I've read that setting this too high can hinder performance further, so should I raise this to right around 551, 560, 600, or do something else? Lastly, as far as raising the join_buffer_size value goes, this doesn't even seem to be included in Debian's my.cnf file by default. Assuming there's not much I can do about adding indexes, should I look into raising this? Any suggested values? Any suggestions in general here would be appreciated as well. Edit: Here's the number of open tables the MySQL server is reporting. I believe this value is related to my question (Opened_tables: 22574)

Read the article

Optimizing lifestyle and training

- by Gabe

I am a college freshman who has recently discovered a passion for computer science. Having had my first lick of formal python training last semester, I have cast aside my previously hedonist way of life and tunneled my sights on becoming the most rounded and proficient programmer I can be. I know that I'm taking strides in the right direction (I've stopped smoking, I've been exercising every day, I've taught myself C++ and OpenGL, and I've begun training in kung-fu and meditation), yet I am still finding myself struggling to achieve satisfactory results. I would like to be able to spend a good 3-4 hours every day burning through textbooks. I have the time cleared and the resources allocated. The problem lies in the logistics-- I have never taken anything seriously before. Recently I've realized that I am clueless when it comes to taking care of myself and gaining control of my mind, and it drastically hinders my productivity. My question is this: How can I learn to manage my time and take care of myself such that I can spend the maximum amount of time every day studying with steady concentration? Personal tricks would be key here: techniques you use to get yourself to sleep, a diet that yields focus, even computer break stretching routines or active reading techniques. Anything you could think of here would be great. I was a low-life in high school and I have the drive to turn my life around, I'm just quite a bit behind in the way of good habits :)

Read the article

Optimizing MySQL, Improving Performance of Database Servers

- by Antoinette O'Sullivan

Optimization involves improving the performance of a database server and queries that run against it. Optimization reduces query execution time and optimized queries benefit everyone that uses the server. When the server runs more smoothly and processes more queries with less, it performs better as a whole. To learn more about how a MySQL developer can make a difference with optimization, take the MySQL Developers training course. This 5-day instructor-led course is available as: Live-Virtual Event: Attend a live class from your own desk - no travel required. Choose from a selection of events on the schedule to suit different timezones. In-Class Event: Travel to an education center to attend an event. Below is a selection of the events on the schedule. Location Date Delivery Language Vienna, Austria 17 November 2014 German Brussels, Belgium 8 December 2014 English Sao Paulo, Brazil 14 July 2014 Brazilian Portuguese London, English 29 September 2014 English Belfast, Ireland 6 October 2014 English Dublin, Ireland 27 October 2014 English Milan, Italy 10 November 2014 Italian Rome, Italy 21 July 2014 Italian Nairobi, Kenya 14 July 2014 English Petaling Jaya, Malaysia 25 August 2014 English Utrecht, Netherlands 21 July 2014 English Makati City, Philippines 29 September 2014 English Warsaw, Poland 25 August 2014 Polish Lisbon, Portugal 13 October 2014 European Portuguese Porto, Portugal 13 October 2014 European Portuguese Barcelona, Spain 7 July 2014 Spanish Madrid, Spain 3 November 2014 Spanish Valencia, Spain 24 November 2014 Spanish Basel, Switzerland 4 August 2014 German Bern, Switzerland 4 August 2014 German Zurich, Switzerland 4 August 2014 German The MySQL for Developers course helps prepare you for the MySQL 5.6 Developers OCP certification exam. To register for an event, request an additional event or learn more about the authentic MySQL curriculum, go to http://education.oracle.com/mysql.

Read the article

A Real-Time HPC Approach for Optimizing Multicore Architectures

Complex math is at the heart of many of the biggest technical challenges. With multicore processors, the type of calculations that would have required a supercomputer can now be performed in real-time, embedded environments. High-performance computing - Supercomputer - Real-time computing - Operating system - Companies

Read the article

Need Help in optimizing a loop in C [migrated]

- by WedaPashi

I am trying to draw a Checkerboard pattern on a lcd using a GUI library called emWin. I have actually managed to draw it using the following code. But having these many loops in the program body for a single task, that too in the internal flash of the Microcontroller is not a good idea. Those who have not worked with emWin, I will try and explain a few things before we go for actual logic. GUI_REST is a structure which id define source files of emWin and I am blind to it. Rect, REct2,Rec3.. and so on till Rect10 are objects. Elements of the Rect array are {x0,y0,x1,y1}, where x0,y0 are starting locations of rectangle in X-Y plane and x1, y1 are end locations of Rectangle in x-Y plane. So, Rect={0,0,79,79} is a rectangle starts at top left of the LCD and is upto (79,79), so its a square basically. The function GUI_setBkColor(int color); sets the color of the background. The function GUI_setColor(int color); sets the color of the foreground. GUI_WHITE and DM_CHECKERBOARD_COLOR are two color values, #defineed GUI_FillRectEx(&Rect); will draw the Rectangle. The code below works fine but I want to make it smarter. GUI_RECT Rect = {0, 0, 79, 79}; GUI_RECT Rect2 = {80, 0, 159, 79}; GUI_RECT Rect3 = {160, 0, 239, 79}; GUI_RECT Rect4 = {240, 0, 319, 79}; GUI_RECT Rect5 = {320, 0, 399, 79}; GUI_RECT Rect6 = {400, 0, 479, 79}; GUI_RECT Rect7 = {480, 0, 559, 79}; GUI_RECT Rect8 = {560, 0, 639, 79}; GUI_RECT Rect9 = {640, 0, 719, 79}; GUI_RECT Rect10 = {720, 0, 799, 79}; WM_SelectWindow(Win_DM_Main); GUI_SetBkColor(GUI_BLACK); GUI_Clear(); for(i = 0; i < 6; i++) { if(i%2 == 0) GUI_SetColor(GUI_WHITE); else GUI_SetColor(DM_CHECKERBOARD_COLOR); GUI_FillRectEx(&Rect); Rect.y0 += 80; Rect.y1 += 80; } /* for(j=0,j<11;j++) { for(i = 0; i < 6; i++) { if(i%2 == 0) GUI_SetColor(GUI_WHITE); else GUI_SetColor(DM_CHECKERBOARD_COLOR); GUI_FillRectEx(&Rect); Rect.y0 += 80; Rect.y1 += 80; } Rect.x0 += 80; Rect.x1 += 80; } */ for(i = 0; i < 6; i++) { if(i%2 == 0) GUI_SetColor(DM_CHECKERBOARD_COLOR); else GUI_SetColor(GUI_WHITE); GUI_FillRectEx(&Rect2); Rect2.y0 += 80; Rect2.y1 += 80; } for(i = 0; i < 6; i++) { if(i%2 == 0) GUI_SetColor(GUI_WHITE); else GUI_SetColor(DM_CHECKERBOARD_COLOR); GUI_FillRectEx(&Rect3); Rect3.y0 += 80; Rect3.y1 += 80; } for(i = 0; i < 6; i++) { if(i%2 == 0) GUI_SetColor(DM_CHECKERBOARD_COLOR); else GUI_SetColor(GUI_WHITE); GUI_FillRectEx(&Rect4); Rect4.y0 += 80; Rect4.y1 += 80; } for(i = 0; i < 6; i++) { if(i%2 == 0) GUI_SetColor(GUI_WHITE); else GUI_SetColor(DM_CHECKERBOARD_COLOR); GUI_FillRectEx(&Rect5); Rect5.y0 += 80; Rect5.y1 += 80; } for(i = 0; i < 6; i++) { if(i%2 == 0) GUI_SetColor(DM_CHECKERBOARD_COLOR); else GUI_SetColor(GUI_WHITE); GUI_FillRectEx(&Rect6); Rect6.y0 += 80; Rect6.y1 += 80; } for(i = 0; i < 6; i++) { if(i%2 == 0) GUI_SetColor(GUI_WHITE); else GUI_SetColor(DM_CHECKERBOARD_COLOR); GUI_FillRectEx(&Rect7); Rect7.y0 += 80; Rect7.y1 += 80; } for(i = 0; i < 6; i++) { if(i%2 == 0) GUI_SetColor(DM_CHECKERBOARD_COLOR); else GUI_SetColor(GUI_WHITE); GUI_FillRectEx(&Rect8); Rect8.y0 += 80; Rect8.y1 += 80; } for(i = 0; i < 6; i++) { if(i%2 == 0) GUI_SetColor(GUI_WHITE); else GUI_SetColor(DM_CHECKERBOARD_COLOR); GUI_FillRectEx(&Rect9); Rect9.y0 += 80; Rect9.y1 += 80; } for(i = 0; i < 6; i++) { if(i%2 == 0) GUI_SetColor(DM_CHECKERBOARD_COLOR); else GUI_SetColor(GUI_WHITE); GUI_FillRectEx(&Rect10); Rect10.y0 += 80; Rect10.y1 += 80; }

Search Results

Search found 671 results on 27 pages for 'optimizing'.

Page 1/27 | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by aaa

- by antonio romero

- by Vittorio Romeo

- by danx

- by mowgli

- by danx

- by Antoinette O'Sullivan

- by 1GR3

- by 1GR3

- by Moishe

- by Josh

- by Gabe

- by Antoinette O'Sullivan

- by WedaPashi

1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >