optimization level - Page 91

SQL Server indexed view matching of views with joins not working

- by usr

Does anyone have experience of when SQL Servr 2008 R2 is able to automatically match indexed view (also known as materialized views) that contain joins to a query? for example the view select dbo.Orders.Date, dbo.OrderDetails.ProductID from dbo.OrderDetails join dbo.Orders on dbo.OrderDetails.OrderID = dbo.Orders.ID cannot be automatically matched to the same exact query. When I select directly from this view ith (noexpand) I actually get a much faster query plan that does a scan on the clustered index of the indexed view. Can I get SQL Server to do this matching automatically? I have quite a few queries and views... I am on enterprise edition of SQL Server 2008 R2.

Read the article

Coalesce function for PHP?

- by mikl

Many programming languages has a coalesce function (example). PHP, sadly, does not. What would be the most efficient way to implement one in PHP?

Read the article

Fast read of certain bytes of multiple files in C/C++

- by Alejandro Cámara

I've been searching in the web about this question and although there are many similar questions about read/write in C/C++, I haven't found about this specific task. I want to be able to read from multiple files (256x256 files) only sizeof(double) bytes located in a certain position of each file. Right now my solution is, for each file: Open the file (read, binary mode): fstream fTest("current_file", ios_base::out | ios_base::binary); Seek the position I want to read: fTest.seekg(position*sizeof(test_value), ios_base::beg); Read the bytes: fTest.read((char *) &(output[i][j]), sizeof(test_value)); And close the file: fTest.close(); This takes about 350 ms to run inside a for{ for {} } structure with 256x256 iterations (one for each file). Q: Do you think there is a better way to implement this operation? How would you do it?

Read the article

Optimize MySQL query (ngrams, COUNT(), GROUP BY, ORDER BY)

- by Gerardo

I have a database with thousands of companies and their locations. I have implemented n-grams to optimize search. I am making one query to retrieve all the companies that match with the search query and another one to get a list with their locations and the number of companies in each location. The query I am trying to optimize is the latter. Maybe the problem is this: Every company ('anunciante') has a field ('estado') to make logical deletes. So, if 'estado' equals 1, the company should be retrieved. When I run the EXPLAIN command, it shows that it goes through almost 40k rows, when the actual result (the reality matching companies) are 80. How can I optimize this? This is my query (XXX represent the n-grams for the search query): SELECT provincias.provincia AS provincia, provincias.id, COUNT(*) AS cantidad FROM anunciantes JOIN anunciante_invertido AS a_i0 ON anunciantes.id = a_i0.id_anunciante JOIN indice_invertido AS indice0 ON a_i0.id_invertido = indice0.id LEFT OUTER JOIN domicilios ON anunciantes.id = domicilios.id_anunciante LEFT OUTER JOIN localidades ON domicilios.id_localidad = localidades.id LEFT OUTER JOIN provincias ON provincias.id = localidades.id_provincia WHERE anunciantes.estado = 1 AND indice0.id IN (SELECT invertido_ngrama.id_palabra FROM invertido_ngrama JOIN ngrama ON ngrama.id = invertido_ngrama.id_ngrama WHERE ngrama.ngrama = 'XXX') AND indice0.id IN (SELECT invertido_ngrama.id_palabra FROM invertido_ngrama JOIN ngrama ON ngrama.id = invertido_ngrama.id_ngrama WHERE ngrama.ngrama = 'XXX') AND indice0.id IN (SELECT invertido_ngrama.id_palabra FROM invertido_ngrama JOIN ngrama ON ngrama.id = invertido_ngrama.id_ngrama WHERE ngrama.ngrama = 'XXX') AND indice0.id IN (SELECT invertido_ngrama.id_palabra FROM invertido_ngrama JOIN ngrama ON ngrama.id = invertido_ngrama.id_ngrama WHERE ngrama.ngrama = 'XXX') AND indice0.id IN (SELECT invertido_ngrama.id_palabra FROM invertido_ngrama JOIN ngrama ON ngrama.id = invertido_ngrama.id_ngrama WHERE ngrama.ngrama = 'XXX') GROUP BY provincias.id ORDER BY cantidad DESC And this is the query explained (hope it can be read in this format): id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY anunciantes ref PRIMARY,estado estado 1 const 36669 Using index; Using temporary; Using filesort 1 PRIMARY domicilios ref id_anunciante id_anunciante 4 db84771_viaempresas.anunciantes.id 1 1 PRIMARY localidades eq_ref PRIMARY PRIMARY 4 db84771_viaempresas.domicilios.id_localidad 1 1 PRIMARY provincias eq_ref PRIMARY PRIMARY 4 db84771_viaempresas.localidades.id_provincia 1 1 PRIMARY a_i0 ref PRIMARY,id_anunciante,id_invertido PRIMARY 4 db84771_viaempresas.anunciantes.id 1 Using where; Using index 1 PRIMARY indice0 eq_ref PRIMARY PRIMARY 4 db84771_viaempresas.a_i0.id_invertido 1 Using index 6 DEPENDENT SUBQUERY ngrama const PRIMARY,ngrama ngrama 5 const 1 Using index 6 DEPENDENT SUBQUERY invertido_ngrama eq_ref PRIMARY,id_palabra,id_ngrama PRIMARY 8 func,const 1 Using index 5 DEPENDENT SUBQUERY ngrama const PRIMARY,ngrama ngrama 5 const 1 Using index 5 DEPENDENT SUBQUERY invertido_ngrama eq_ref PRIMARY,id_palabra,id_ngrama PRIMARY 8 func,const 1 Using index 4 DEPENDENT SUBQUERY ngrama const PRIMARY,ngrama ngrama 5 const 1 Using index 4 DEPENDENT SUBQUERY invertido_ngrama eq_ref PRIMARY,id_palabra,id_ngrama PRIMARY 8 func,const 1 Using index 3 DEPENDENT SUBQUERY ngrama const PRIMARY,ngrama ngrama 5 const 1 Using index 3 DEPENDENT SUBQUERY invertido_ngrama eq_ref PRIMARY,id_palabra,id_ngrama PRIMARY 8 func,const 1 Using index 2 DEPENDENT SUBQUERY ngrama const PRIMARY,ngrama ngrama 5 const 1 Using index 2 DEPENDENT SUBQUERY invertido_ngrama eq_ref PRIMARY,id_palabra,id_ngrama PRIMARY 8 func,const 1 Using index

Read the article

Wrappers of primitive types in arraylist vs arrays

- by ismail marmoush

Hi, In "Core java 1" I've read CAUTION: An ArrayList is far less efficient than an int[] array because each value is separately wrapped inside an object. You would only want to use this construct for small collections when programmer convenience is more important than efficiency. But in my software I've already used Arraylist instead of normal arrays due to some requirements, though "The software is supposed to have high performance and after I've read the quoted text I started to panic!" one thing I can change is changing double variables to Double so as to prevent auto boxing and I don't know if that is worth it or not, in next sample algorithm public void multiply(final double val) { final int rows = getSize1(); final int cols = getSize2(); for (int i = 0; i < rows; i++) { for (int j = 0; j < cols; j++) { this.get(i).set(j, this.get(i).get(j) * val); } } } My question is does changing double to Double makes a difference ? or that's a micro optimizing that won't affect anything ? keep in mind I might be using large matrices.2nd Should I consider redesigning the whole program again ?

Read the article

Unicorn: Which number of worker processes to use?

- by blackbird07

I am running a Ruby on Rails app on a virtual Linux server that is capped at 1GB RAM. Currently, I am constantly hitting the limit and would like to optimize memory utilization. One option I am looking at is reducing the number of unicorn workers. So what is the best way to determine the number of unicorn workers to use? The current setting is 10 workers, but the maximum number of requests per second I have seen on Google Analytics Real-Time is 3 (only scored once at a peak time; in 99% of the time not going above 1 request per second). So is it a save assumption that I can - for now - go with 4 workers, leaving room for unexpected amounts of requests? What are the metrics I should have a look at for determining the number of workers and what are the tools I can use for that on my Ubuntu machine?

Read the article

Quickest way to write to file in java

- by user1097772

I'm writing an application which compares directory structure. First I wrote an application which writes gets info about files - one line about each file or directory. My soulution is: calling method toFile Static PrintWriter pw = new PrintWriter(new BufferedWriter( new FileWriter("DirStructure.dlis")), true); String line; // info about file or directory public void toFile(String line) { pw.println(line); } and of course pw.close(), at the end. My question is, can I do it quicker? What is the quickest way? Edit: quickest way = quickest writing in the file

Read the article

Postgre database ignoring created index ?!

- by drasto

I have an Postgre database and a table called my_table. There are 4 columns in that table (id, column1, column2, column3). The id column is primary key, there are no other constrains or indexes on columns. The table has about 200000 rows. I want to print out all rows which has value of column column2 equal(case insensitive) to 'value12'. I use this: SELECT * FROM my_table WHERE column2 = lower('value12') here is the execution plan for this statement(result of set enable_seqscan=on; EXPLAIN SELECT * FROM my_table WHERE column2 = lower('value12')): Seq Scan on my_table (cost=0.00..4676.00 rows=10000 width=55) Filter: ((column2)::text = 'value12'::text) I consider this to be to slow so I create an index on column column2 for better prerformance of searches: CREATE INDEX my_index ON my_table (lower(column2)) Now I ran the same select: SELECT * FROM my_table WHERE column2 = lower('value12') and I expect it to be much faster because it can use index. However it is not faster, it is as slow as before. So I check the execution plan and it is the same as before(see above). So it still uses sequential scen and it ignores the index! Where is the problem ?

Read the article

Is opening too many datacontexts bad?

- by ryudice

I've been checking my application with linq 2 sql profiler, and I noticed that it opens a lot of datacontexts, most of them are opened by the linq datasource I used, since my repositories use only the instance stored in Request.Items, is it bad to open too many datacontext? and how can I make my linqdatasource to use the datacontext that I store in Request.Items for the duration of the request? thanks for any help!

Read the article

some pointer to understanding GCC source code

- by user299570

hi, I'm student working on optimizing GCC for multi-core processor. I tried going through the source code, it is difficult to follow through it since I need to add some code to the back end. Can anyone suggest some good resource which explains the code flow through the different phases. Also suggest some development environment for debugging GCC mainly to step through the code. Is it possible on windows?

Read the article

How to increase my Web Application's Performance??

- by ramadan2050

I have a ASP.NET web application (.NET 2008) using MS SQL server 2005, I want to increase the performance of the web site, If anyone have an article contains steps to do that, step by step , In SQL(Indexes, ..... etc.) and in the code. Thanks in advance Best Regards

Read the article

Optimize a MySQL count each duplicate Query

- by Onema

I have the following query That gets the city name, city id, the region name, and a count of duplicate names for that record: SELECT Country_CA.City AS currentCity, Country_CA.CityID, globe_region.region_name, ( SELECT count(Country_CA.City) FROM Country_CA WHERE City LIKE currentCity ) as counter FROM Country_CA LEFT JOIN globe_region ON globe_region.region_id = Country_CA.RegionID AND globe_region.country_code = Country_CA.CountryCode ORDER BY City This example is for Canada, and the cities will be displayed on a dropdown list. There are a few towns in Canada, and in other countries, that have the same names. Therefore I want to know if there is more than one town with the same name region name will be appended to the town name. Region names are found in the globe_region table. Country_CA and globe_region look similar to this (I have changed a few things for visualization purposes) CREATE TABLE IF NOT EXISTS `Country_CA` ( `City` varchar(75) NOT NULL DEFAULT '', `RegionID` varchar(10) NOT NULL DEFAULT '', `CountryCode` varchar(10) NOT NULL DEFAULT '', `CityID` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`City`,`RegionID`), KEY `CityID` (`CityID`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; AND CREATE TABLE IF NOT EXISTS `globe_region` ( `country_code` char(2) COLLATE utf8_unicode_ci NOT NULL, `region_code` char(2) COLLATE utf8_unicode_ci NOT NULL, `region_name` varchar(50) COLLATE utf8_unicode_ci NOT NULL, PRIMARY KEY (`country_code`,`region_code`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; The query on the top does exactly what I want it to do, but It takes way too long to generate a list for 5000 records. I would like to know if there is a way to optimize the sub-query in order to obtain the same results faster. the results should look like this City CityID region_name counter sheraton 2349269 British Columbia 1 sherbrooke 2349270 Quebec 2 sherbrooke 2349271 Nova Scotia 2 shere 2349273 British Columbia 1 sherridon 2349274 Manitoba 1

Read the article

Is there table with timing(cost) of C functions?

- by DSblizzard

Preferable for x86-32 gcc implementation

Read the article

In PHP is faster to get a value from an if statement or from an array?

- by Vittorio Vittori

Maybe this is a stupid question but what is faster? <?php function getCss1 ($id = 0) { if ($id == 1) { return 'red'; } else if ($id == 2) { return 'yellow'; } else if ($id == 3) { return 'green'; } else if ($id == 4) { return 'blue'; } else if ($id == 5) { return 'orange'; } else { return 'grey'; } } function getCss2 ($id = 0) { $css[] = 'grey'; $css[] = 'red'; $css[] = 'yellow'; $css[] = 'green'; $css[] = 'blue'; $css[] = 'orange'; return $css[$id]; } echo getCss1(3); echo getCss2(3); ?> I suspect is faster the if statement but I prefere to ask!

Read the article

Find point which sum of distances to set of other points is minimal

- by Pawel Markowski

I have one set (X) of points (not very big let's say 1-20 points) and the second (Y), much larger set of points. I need to choose some point from Y which sum of distances to all points from X is minimal. I came up with an idea that I would treat X as a vertices of a polygon and find centroid of this polygon, and then I will choose a point from Y nearest to the centroid. But I'm not sure whether centroid minimizes sum of its distances to the vertices of polygon, so I'm not sure whether this is a good way? Is there any algorithm for solving this problem? Points are defined by geographical coordinates.

Read the article

Does replacing statements by expressions using the C++ comma operator could allow more compiler opti

- by Gabriel Cuvillier

The C++ comma operator is used to chain individual expressions, yielding the value of the last executed expression as the result. For example the skeleton code (6 statements, 6 expressions): step1; step2; if (condition) step3; return step4; else return step5; May be rewritten to: (1 statement, 6 expressions) return step1, step2, condition? step3, step4 : step5; I noticed that it is not possible to perform step-by-step debugging of such code, as the expression chain seems to be executed as a whole. Does it means that the compiler is able to perform special optimizations which are not possible with the traditional statement approach (specially if the steps are const or inline)? Note: I'm not talking about the coding style merit of that way of expressing sequence of expressions! Just about the possible optimisations allowed by replacing statements by expressions.

Read the article

Difference between Logarithmic and Uniform cost criteria

- by Marthin

I'v got some problem to understand the difference between Logarithmic(Lcc) and Uniform(Ucc) cost criteria and also how to use it in calculations. Could someone please explain the difference between the two and perhaps show how to calculate the complexity for a problem like A+B*C (Yes this is part of an assignment =) ) Thx for any help! /Marthin

Read the article

Javascriptlibrary more efficient than Rickshaw for realtime visualizations

- by dan kutz

I want to visualize data as time-series graphs on mobile devices(tablets) and therefore stumbled upon rickshaw, which is based on D3. First I must say I was a little bit confused when I realized that realtime in web design is defined totally different to realtime in engineering which has fixed(and often very short) timeframes. Anyway my aim is to visualize the data as fast as possible, and on older tablets visualization with rickshaw is quite slow. Can anybody recommend another library, which may be more efficient in rendering? Or is there no way out and I have to go native? regards Dan.

Read the article

Meassure website

- by s0mmer

Hi, I was wondering if it is possible to install or use any online service to measure your website's performance? I've seen many just checking the download speed of images, external files etc. But is it possible to meassure how long asp/php code takes to execute? I have a site running a bit slowly, and it would be very nice with some app/service guiding where to optimize.

Read the article

fast way to check if an array of chars is zero

- by Claudiu

I have an array of bytes, in memory. What's the fastest way to see if all the bytes in the array are zero?

Read the article

help me improve my sse yuv to rgb ssse3 code

- by David McPaul

Hello, I am looking to optimise some sse code I wrote for converting yuv to rgb (both planar and packed yuv functions). i am using SSSE3 at the moment but if there are useful functions from later sse versions thats ok. I am mainly interested in how I would work out processor stalls and the like. Anyone know of any tools that do static analysis of sse code? ; ; Copyright (C) 2009-2010 David McPaul ; ; All rights reserved. Distributed under the terms of the MIT License. ; ; A rather unoptimised set of ssse3 yuv to rgb converters ; does 8 pixels per loop ; inputer: ; reads 128 bits of yuv 8 bit data and puts ; the y values converted to 16 bit in xmm0 ; the u values converted to 16 bit and duplicated into xmm1 ; the v values converted to 16 bit and duplicated into xmm2 ; conversion: ; does the yuv to rgb conversion using 16 bit integer and the ; results are placed into the following registers as 8 bit clamped values ; r values in xmm3 ; g values in xmm4 ; b values in xmm5 ; outputer: ; writes out the rgba pixels as 8 bit values with 0 for alpha ; xmm6 used for scratch ; xmm7 used for scratch %macro cglobal 1 global _%1 %define %1 _%1 align 16 %1: %endmacro ; conversion code %macro yuv2rgbsse2 0 ; u = u - 128 ; v = v - 128 ; r = y + v + v >> 2 + v >> 3 + v >> 5 ; g = y - (u >> 2 + u >> 4 + u >> 5) - (v >> 1 + v >> 3 + v >> 4 + v >> 5) ; b = y + u + u >> 1 + u >> 2 + u >> 6 ; subtract 16 from y movdqa xmm7, [Const16] ; loads a constant using data cache (slower on first fetch but then cached) psubsw xmm0,xmm7 ; y = y - 16 ; subtract 128 from u and v movdqa xmm7, [Const128] ; loads a constant using data cache (slower on first fetch but then cached) psubsw xmm1,xmm7 ; u = u - 128 psubsw xmm2,xmm7 ; v = v - 128 ; load r,b with y movdqa xmm3,xmm0 ; r = y pshufd xmm5,xmm0, 0xE4 ; b = y ; r = y + v + v >> 2 + v >> 3 + v >> 5 paddsw xmm3, xmm2 ; add v to r movdqa xmm7, xmm1 ; move u to scratch pshufd xmm6, xmm2, 0xE4 ; move v to scratch psraw xmm6,2 ; divide v by 4 paddsw xmm3, xmm6 ; and add to r psraw xmm6,1 ; divide v by 2 paddsw xmm3, xmm6 ; and add to r psraw xmm6,2 ; divide v by 4 paddsw xmm3, xmm6 ; and add to r ; b = y + u + u >> 1 + u >> 2 + u >> 6 paddsw xmm5, xmm1 ; add u to b psraw xmm7,1 ; divide u by 2 paddsw xmm5, xmm7 ; and add to b psraw xmm7,1 ; divide u by 2 paddsw xmm5, xmm7 ; and add to b psraw xmm7,4 ; divide u by 32 paddsw xmm5, xmm7 ; and add to b ; g = y - u >> 2 - u >> 4 - u >> 5 - v >> 1 - v >> 3 - v >> 4 - v >> 5 movdqa xmm7,xmm2 ; move v to scratch pshufd xmm6,xmm1, 0xE4 ; move u to scratch movdqa xmm4,xmm0 ; g = y psraw xmm6,2 ; divide u by 4 psubsw xmm4,xmm6 ; subtract from g psraw xmm6,2 ; divide u by 4 psubsw xmm4,xmm6 ; subtract from g psraw xmm6,1 ; divide u by 2 psubsw xmm4,xmm6 ; subtract from g psraw xmm7,1 ; divide v by 2 psubsw xmm4,xmm7 ; subtract from g psraw xmm7,2 ; divide v by 4 psubsw xmm4,xmm7 ; subtract from g psraw xmm7,1 ; divide v by 2 psubsw xmm4,xmm7 ; subtract from g psraw xmm7,1 ; divide v by 2 psubsw xmm4,xmm7 ; subtract from g %endmacro ; outputer %macro rgba32sse2output 0 ; clamp values pxor xmm7,xmm7 packuswb xmm3,xmm7 ; clamp to 0,255 and pack R to 8 bit per pixel packuswb xmm4,xmm7 ; clamp to 0,255 and pack G to 8 bit per pixel packuswb xmm5,xmm7 ; clamp to 0,255 and pack B to 8 bit per pixel ; convert to bgra32 packed punpcklbw xmm5,xmm4 ; bgbgbgbgbgbgbgbg movdqa xmm0, xmm5 ; save bg values punpcklbw xmm3,xmm7 ; r0r0r0r0r0r0r0r0 punpcklwd xmm5,xmm3 ; lower half bgr0bgr0bgr0bgr0 punpckhwd xmm0,xmm3 ; upper half bgr0bgr0bgr0bgr0 ; write to output ptr movntdq [edi], xmm5 ; output first 4 pixels bypassing cache movntdq [edi+16], xmm0 ; output second 4 pixels bypassing cache %endmacro SECTION .data align=16 Const16 dw 16 dw 16 dw 16 dw 16 dw 16 dw 16 dw 16 dw 16 Const128 dw 128 dw 128 dw 128 dw 128 dw 128 dw 128 dw 128 dw 128 UMask db 0x01 db 0x80 db 0x01 db 0x80 db 0x05 db 0x80 db 0x05 db 0x80 db 0x09 db 0x80 db 0x09 db 0x80 db 0x0d db 0x80 db 0x0d db 0x80 VMask db 0x03 db 0x80 db 0x03 db 0x80 db 0x07 db 0x80 db 0x07 db 0x80 db 0x0b db 0x80 db 0x0b db 0x80 db 0x0f db 0x80 db 0x0f db 0x80 YMask db 0x00 db 0x80 db 0x02 db 0x80 db 0x04 db 0x80 db 0x06 db 0x80 db 0x08 db 0x80 db 0x0a db 0x80 db 0x0c db 0x80 db 0x0e db 0x80 ; void Convert_YUV422_RGBA32_SSSE3(void *fromPtr, void *toPtr, int width) width equ ebp+16 toPtr equ ebp+12 fromPtr equ ebp+8 ; void Convert_YUV420P_RGBA32_SSSE3(void *fromYPtr, void *fromUPtr, void *fromVPtr, void *toPtr, int width) width1 equ ebp+24 toPtr1 equ ebp+20 fromVPtr equ ebp+16 fromUPtr equ ebp+12 fromYPtr equ ebp+8 SECTION .text align=16 cglobal Convert_YUV422_RGBA32_SSSE3 ; reserve variables push ebp mov ebp, esp push edi push esi push ecx mov esi, [fromPtr] mov edi, [toPtr] mov ecx, [width] ; loop width / 8 times shr ecx,3 test ecx,ecx jng ENDLOOP REPEATLOOP: ; loop over width / 8 ; YUV422 packed inputer movdqa xmm0, [esi] ; should have yuyv yuyv yuyv yuyv pshufd xmm1, xmm0, 0xE4 ; copy to xmm1 movdqa xmm2, xmm0 ; copy to xmm2 ; extract both y giving y0y0 pshufb xmm0, [YMask] ; extract u and duplicate so each u in yuyv becomes u0u0 pshufb xmm1, [UMask] ; extract v and duplicate so each v in yuyv becomes v0v0 pshufb xmm2, [VMask] yuv2rgbsse2 rgba32sse2output ; endloop add edi,32 add esi,16 sub ecx, 1 ; apparently sub is better than dec jnz REPEATLOOP ENDLOOP: ; Cleanup pop ecx pop esi pop edi mov esp, ebp pop ebp ret cglobal Convert_YUV420P_RGBA32_SSSE3 ; reserve variables push ebp mov ebp, esp push edi push esi push ecx push eax push ebx mov esi, [fromYPtr] mov eax, [fromUPtr] mov ebx, [fromVPtr] mov edi, [toPtr1] mov ecx, [width1] ; loop width / 8 times shr ecx,3 test ecx,ecx jng ENDLOOP1 REPEATLOOP1: ; loop over width / 8 ; YUV420 Planar inputer movq xmm0, [esi] ; fetch 8 y values (8 bit) yyyyyyyy00000000 movd xmm1, [eax] ; fetch 4 u values (8 bit) uuuu000000000000 movd xmm2, [ebx] ; fetch 4 v values (8 bit) vvvv000000000000 ; extract y pxor xmm7,xmm7 ; 00000000000000000000000000000000 punpcklbw xmm0,xmm7 ; interleave xmm7 into xmm0 y0y0y0y0y0y0y0y0 ; extract u and duplicate so each becomes 0u0u punpcklbw xmm1,xmm7 ; interleave xmm7 into xmm1 u0u0u0u000000000 punpcklwd xmm1,xmm7 ; interleave again u000u000u000u000 pshuflw xmm1,xmm1, 0xA0 ; copy u values pshufhw xmm1,xmm1, 0xA0 ; to get u0u0 ; extract v punpcklbw xmm2,xmm7 ; interleave xmm7 into xmm1 v0v0v0v000000000 punpcklwd xmm2,xmm7 ; interleave again v000v000v000v000 pshuflw xmm2,xmm2, 0xA0 ; copy v values pshufhw xmm2,xmm2, 0xA0 ; to get v0v0 yuv2rgbsse2 rgba32sse2output ; endloop add edi,32 add esi,8 add eax,4 add ebx,4 sub ecx, 1 ; apparently sub is better than dec jnz REPEATLOOP1 ENDLOOP1: ; Cleanup pop ebx pop eax pop ecx pop esi pop edi mov esp, ebp pop ebp ret SECTION .note.GNU-stack noalloc noexec nowrite progbits

Read the article

How can I optimize this subqueried and Joined MySQL Query?

- by kevzettler

Read the article

Trying to reduce the speed overhead of an almost-but-not-quite-int number class

- by Fumiyo Eda

I have implemented a C++ class which behaves very similarly to the standard int type. The difference is that it has an additional concept of "epsilon" which represents some tiny value that is much less than 1, but greater than 0. One way to think of it is as a very wide fixed point number with 32 MSBs (the integer parts), 32 LSBs (the epsilon parts) and a huge sea of zeros in between. The following class works, but introduces a ~2x speed penalty in the overall program. (The program includes code that has nothing to do with this class, so the actual speed penalty of this class is probably much greater than 2x.) I can't paste the code that is using this class, but I can say the following: +, -, +=, <, > and >= are the only heavily used operators. Use of setEpsilon() and getInt() is extremely rare. * is also rare, and does not even need to consider the epsilon values at all. Here is the class: #include <limits> struct int32Uepsilon { typedef int32Uepsilon Self; int32Uepsilon () { _value = 0; _eps = 0; } int32Uepsilon (const int &i) { _value = i; _eps = 0; } void setEpsilon() { _eps = 1; } Self operator+(const Self &rhs) const { Self result = *this; result._value += rhs._value; result._eps += rhs._eps; return result; } Self operator-(const Self &rhs) const { Self result = *this; result._value -= rhs._value; result._eps -= rhs._eps; return result; } Self operator-( ) const { Self result = *this; result._value = -result._value; result._eps = -result._eps; return result; } Self operator*(const Self &rhs) const { return this->getInt() * rhs.getInt(); } // XXX: discards epsilon bool operator<(const Self &rhs) const { return (_value < rhs._value) || (_value == rhs._value && _eps < rhs._eps); } bool operator>(const Self &rhs) const { return (_value > rhs._value) || (_value == rhs._value && _eps > rhs._eps); } bool operator>=(const Self &rhs) const { return (_value >= rhs._value) || (_value == rhs._value && _eps >= rhs._eps); } Self &operator+=(const Self &rhs) { this->_value += rhs._value; this->_eps += rhs._eps; return *this; } Self &operator-=(const Self &rhs) { this->_value -= rhs._value; this->_eps -= rhs._eps; return *this; } int getInt() const { return(_value); } private: int _value; int _eps; }; namespace std { template<> struct numeric_limits<int32Uepsilon> { static const bool is_signed = true; static int max() { return 2147483647; } } }; The code above works, but it is quite slow. Does anyone have any ideas on how to improve performance? There are a few hints/details I can give that might be helpful: 32 bits are definitely insufficient to hold both _value and _eps. In practice, up to 24 ~ 28 bits of _value are used and up to 20 bits of _eps are used. I could not measure a significant performance difference between using int32_t and int64_t, so memory overhead itself is probably not the problem here. Saturating addition/subtraction on _eps would be cool, but isn't really necessary. Note that the signs of _value and _eps are not necessarily the same! This broke my first attempt at speeding this class up. Inline assembly is no problem, so long as it works with GCC on a Core i7 system running Linux!

Read the article

How to optimize an database suggestion engine

- by Dimitar Vouldjeff

Hi, I`m making an online engine for item-to-item recommending movies. I have made some researches and I think that the best way to implement that is using pearson correlation and make a table with item1, item2 and correlation fields, but the problem is that after each rate of item I have to regenerate the correlation for in the worst case N records (where N is the number of items). Another think that I read is the following article, but I haven`t thought a way to implement it. So what is your suggestion to optimize this process? Or any other suggestions? Thanks.

Read the article

Rewriting a for loop in pure NumPy to decrease execution time

- by Statto

I recently asked about trying to optimise a Python loop for a scientific application, and received an excellent, smart way of recoding it within NumPy which reduced execution time by a factor of around 100 for me! However, calculation of the B value is actually nested within a few other loops, because it is evaluated at a regular grid of positions. Is there a similarly smart NumPy rewrite to shave time off this procedure? I suspect the performance gain for this part would be less marked, and the disadvantages would presumably be that it would not be possible to report back to the user on the progress of the calculation, that the results could not be written to the output file until the end of the calculation, and possibly that doing this in one enormous step would have memory implications? Is it possible to circumvent any of these? import numpy as np import time def reshape_vector(v): b = np.empty((3,1)) for i in range(3): b[i][0] = v[i] return b def unit_vectors(r): return r / np.sqrt((r*r).sum(0)) def calculate_dipole(mu, r_i, mom_i): relative = mu - r_i r_unit = unit_vectors(relative) A = 1e-7 num = A*(3*np.sum(mom_i*r_unit, 0)*r_unit - mom_i) den = np.sqrt(np.sum(relative*relative, 0))**3 B = np.sum(num/den, 1) return B N = 20000 # number of dipoles r_i = np.random.random((3,N)) # positions of dipoles mom_i = np.random.random((3,N)) # moments of dipoles a = np.random.random((3,3)) # three basis vectors for this crystal n = [10,10,10] # points at which to evaluate sum gamma_mu = 135.5 # a constant t_start = time.clock() for i in range(n[0]): r_frac_x = np.float(i)/np.float(n[0]) r_test_x = r_frac_x * a[0] for j in range(n[1]): r_frac_y = np.float(j)/np.float(n[1]) r_test_y = r_frac_y * a[1] for k in range(n[2]): r_frac_z = np.float(k)/np.float(n[2]) r_test = r_test_x +r_test_y + r_frac_z * a[2] r_test_fast = reshape_vector(r_test) B = calculate_dipole(r_test_fast, r_i, mom_i) omega = gamma_mu*np.sqrt(np.dot(B,B)) # write r_test, B and omega to a file frac_done = np.float(i+1)/(n[0]+1) t_elapsed = (time.clock()-t_start) t_remain = (1-frac_done)*t_elapsed/frac_done print frac_done*100,'% done in',t_elapsed/60.,'minutes...approximately',t_remain/60.,'minutes remaining'

Search Results

Search found 14719 results on 589 pages for 'optimization level'.

Page 91/589 | < Previous Page | 87 88 89 90 91 92 93 94 95 96 97 98 | Next Page >

- by usr

- by mikl

- by Alejandro Cámara

- by Gerardo

- by ismail marmoush

- by blackbird07

- by user1097772

- by drasto

- by ryudice

- by user299570

- by ramadan2050

- by Onema

- by DSblizzard

- by Vittorio Vittori

- by Pawel Markowski

- by Gabriel Cuvillier

- by Marthin

- by dan kutz

- by s0mmer

- by Claudiu

- by David McPaul

- by kevzettler

- by Fumiyo Eda

- by Dimitar Vouldjeff

- by Statto

< Previous Page | 87 88 89 90 91 92 93 94 95 96 97 98 | Next Page >