Search Results

Search found 13461 results on 539 pages for 'optimizing performance'.

Page 62/539 | < Previous Page | 58 59 60 61 62 63 64 65 66 67 68 69  | Next Page >

  • Optimizing BeautifulSoup (Python) code

    - by user283405
    I have code that uses the BeautifulSoup library for parsing, but it is very slow. The code is written in such a way that threads cannot be used. Can anyone help me with this? I am using BeautifulSoup for parsing and than save into a DB. If I comment out the save statement, it still takes a long time, so there is no problem with the database. def parse(self,text): soup = BeautifulSoup(text) arr = soup.findAll('tbody') for i in range(0,len(arr)-1): data=Data() soup2 = BeautifulSoup(str(arr[i])) arr2 = soup2.findAll('td') c=0 for j in arr2: if str(j).find("<a href=") > 0: data.sourceURL = self.getAttributeValue(str(j),'<a href="') else: if c == 2: data.Hits=j.renderContents() #and few others... c = c+1 data.save() Any suggestions? Note: I already ask this question here but that was closed due to incomplete information.

    Read the article

  • Optimizing a fluid grid layout

    - by user1815176
    I recently just added a grid layout, but I can't figure out how to make my links work. The grid that I used is the 1140 one at http://cssgrid.net/. I studied the source code of that website, and tried to make my page like theirs, but when I put everything in it made mine worse, and the grid didn't even work. This is how my website is supposed to look http://spencedesign.netau.net/singaporehome.html and this is how it does http://spencedesign.netau.net/home.html And when you reduce the size, it doesn't look like it's supposed to. When you minimize it I want the pictures(links) to be two per row, then one per row depending on how small the page is. I also want the quote to turn into different rows when it is too small for it. But I can't figure out how to make the page look normal regularly let alone make it look good with a smaller browser. Thanks!

    Read the article

  • Optimizing this "Boundarize" method for Numerics in Ruby

    - by mstksg
    I'm extending Numerics with a method I call "Boundarize" for lack of better name; I'm sure there are actually real names for this. But its basic purpose is to reset a given point to be within a boundary. That is, "wrapping" a point around the boundary; if the area is betweeon 0 and 100, if the point goes to -1, -1.boundarize(0,100) = 99 (going one too far to the negative "wraps" the point around to one from the max). 102.boundarize(0,100) = 2 It's a very simple function to implement; when the number is below the minimum, simply add (max-min) until it's in the boundary. If the number is above the maximum, simply subtract (max-min) until it's in the boundary. One thing I also need to account for is that, there are cases where I don't want to include the minimum in the range, and cases where I don't want to include the maximum in the range. This is specified as an argument. However, I fear that my current implementation is horribly, terribly, grossly inefficient. And because every time something moves on the screen, it has to re-run this, this is one of the bottlenecks of my application. Anyone have any ideas? module Boundarizer def boundarize min=0,max=1,allow_min=true,allow_max=false raise "Improper boundaries #{min}/#{max}" if min >= max new_num = self if allow_min while new_num < min new_num += (max-min) end else while new_num <= min new_num += (max-min) end end if allow_max while new_num > max new_num -= (max-min) end else while new_num >= max new_num -= (max-min) end end return new_num end end class Numeric include Boundarizer end

    Read the article

  • optimizing the jquery in django

    - by sankar
    I have done the following code which actually dynamically generate the values in the third list box using ajax and jquery concepts. Thoug it works, i need to optimize it. Below is the code i am working on now. <html> <head> <title>Comparison based on ID Number</title> <script src="http://code.jquery.com/jquery-latest.js"></script> </head> <body> {% if form.errors %} <p style="color: red;"> Please correct the error{{ form.errors|pluralize }} below. </p> {% endif %} <form action="/idnumber/" method="post" align= "center">{% csrf_token %} <p>Select different id numbers and the name of the <b> Measurement Group </b>to start the comparison</p> <table align = "center"> <tr> <th><label for="id_criteria_1">Id Number 1:</label></th> <th> {{ form.idnumber_1 }} </th> </tr> <tr> <th><label for="id_criteria_2">Id Number 2:</label></th> <th> {{ form.idnumber_2 }} </th> </tr> <tr> <th><label for="group_name_list">Group Name:</label></th> <th><select name="group_name_list" id="id_group_name_list"> <option>Select</option> </select> </th> </tr> <script> $('#id_idnumber_2').change( function get_group_names() { var value1 = $('#id_idnumber_1').attr('value'); var value2 = $(this).attr('value'); alert(value1); alert(value2); var request = $.ajax({ url: "/getGroupnamesforIDnumber/", type: "GET", data: {idnumber_1 : value1,idnumber_2 : value2}, dataType: "json", success: function(data) { alert(data); var myselect = document.getElementById("group_name_list"); document.getElementById("group_name_list").options.length = 1; var length_of_data = data.length; alert(length_of_data); try{ for(var i = 0;i < length_of_data; i++) { myselect.add(new Option(data[i].group_name, "i"), myselect.options[i]) //add new option to beginning of "sample" } } catch(e){ //in IE, try the below version instead of add() for(var i = 0;i < length_of_data; i++) { myselect.add(new Option(data[i].group_name, data[i].group_name)) //add new option to end of "sample" } } } }); }); </script> <tr align = "center"><th><input type="submit" value="Submit"></th></tr> </table> </form> </body> </html> everything works fine but there is a little problem in my code. (ie) my ajax function calls only when there is a change in the select list 2 (ie) 'id_number_2'. I want to make it call in such a way that which ever select box, the third list box should be updated automatically. Can anyone please help me on this with a possible logical solution

    Read the article

  • Optimizing simple search script in PowerShell

    - by cc0
    I need to create a script to search through just below a million files of text, code, etc. to find matches and then output all hits on a particular string pattern to a CSV file. So far I made this; $location = 'C:\Work*' $arr = "foo", "bar" #Where "foo" and "bar" are string patterns I want to search for (separately) for($i=0;$i -lt $arr.length; $i++) { Get-ChildItem $location -recurse | select-string -pattern $($arr[$i]) | select-object Path | Export-Csv "C:\Work\Results\$($arr[$i]).txt" } This returns to me a CSV file named "foo.txt" with a list of all files with the word "foo" in it, and a file named "bar.txt" with a list of all files containing the word "bar". Is there any way anyone can think of to optimize this script to make it work faster? Or ideas on how to make an entirely different, but equivalent script that just works faster? All input appreciated!

    Read the article

  • Need help optimizing MYSQL query with join

    - by makeee
    I'm doing a join between the "favorites" table (3 million rows) the "items" table (600k rows). The query is taking anywhere from .3 seconds to 2 seconds, and I'm hoping I can optimize it some. Favorites.faver_profile_id and Items.id are indexed. Instead of using the faver_profile_id index I created a new index on (faver_profile_id,id), which eliminated the filesort needed when sorting by id. Unfortunately this index doesn't help at all and I'll probably remove it (yay, 3 more hours of downtime to drop the index..) Any ideas on how I can optimize this query? In case it helps: Favorite.removed and Item.removed are "0" 98% of the time. Favorite.collection_id is NULL about 80% of the time. SELECT `Item`.`id`, `Item`.`source_image`, `Item`.`cached_image`, `Item`.`source_title`, `Item`.`source_url`, `Item`.`width`, `Item`.`height`, `Item`.`fave_count`, `Item`.`created` FROM `favorites` AS `Favorite` LEFT JOIN `items` AS `Item` ON (`Item`.`removed` = 0 AND `Favorite`.`notice_id` = `Item`.`id`) WHERE ((`faver_profile_id` = 1) AND (`collection_id` IS NULL) AND (`Favorite`.`removed` = 0) AND (`Item`.`removed` = '0')) ORDER BY `Favorite`.`id` desc LIMIT 50;

    Read the article

  • turn off disable the performance cache

    - by jessie
    OK I run a streaming website and my CMS is giving me an error when uploading videos "Failed To Find Flength File" ok so I did some research. The answer I got from the coder was below. I did do all that, but the only thing I could not do is turn off what he refers to as performance cache, talked about in the last sentence... I am on a Cent OS Assuming the script is set up properly, you are probably dealing with some kind of write-caching. Some servers perform write-caching which prevents writing out the flength file or the entire CGITemp file during the upload. The flength file or the CGITemp file do not actually hit the disk until the upload is complete, making it worthless for reporting on progress during the upload. This may be fixed using a .htaccess file assuming your host supports them. Here is a link to an excellent tutorial on using .htaccess files. I strongly recommend giving it a quick read before attempting to install your own .htaccess file. 1. A mod_security module for Apache. To fix it just create a file called .htaccess (that's a period followed by "htaccess") and put the following lines in that file. Upload the file into the directory where the Uber-Uploader CGI ".pl" scripts resides, or in some directory above it (like your server's DOCUMENT_ROOT, i.e. the top-level of your webspace). htaccess files must be uploaded as ASCII mode, not BINARY. You may need to CHMOD the htaccess file to 644 or (RW-R--R--). # Turn off mod_security filtering. SecFilterEngine Off # The below probably isn't needed, # but better safe than sorry. SecFilterScanPOST Off If the above method does not work, try putting the following lines into the file SetEnvIfNoCase Content-Type \ "^multipart/form-data;" "MODSEC_NOPOSTBUFFERING=Do not buffer file uploads" mod_gzip_on No 2. "Performance Cache" enabled on OS X SERVER. If you're running OS X Server and the progress bar isn't working, it could be because of "performance caching." Apparently if ANY of your hosted sites are using performance caching, then by default, all sites (domains) will attempt to. The fix then is to disable the performance cache on all hosted sites.

    Read the article

  • Help with optimizing C# function via C and/or Assembly

    - by MusiGenesis
    I have this C# method which I'm trying to optimize: // assume arrays are same dimensions private void DoSomething(int[] bigArray1, int[] bigArray2) { int data1; byte A1; byte B1; byte C1; byte D1; int data2; byte A2; byte B2; byte C2; byte D2; for (int i = 0; i < bigArray1.Length; i++) { data1 = bigArray1[i]; data2 = bigArray2[i]; A1 = (byte)(data1 >> 0); B1 = (byte)(data1 >> 8); C1 = (byte)(data1 >> 16); D1 = (byte)(data1 >> 24); A2 = (byte)(data2 >> 0); B2 = (byte)(data2 >> 8); C2 = (byte)(data2 >> 16); D2 = (byte)(data2 >> 24); A1 = A1 > A2 ? A1 : A2; B1 = B1 > B2 ? B1 : B2; C1 = C1 > C2 ? C1 : C2; D1 = D1 > D2 ? D1 : D2; bigArray1[i] = (A1 << 0) | (B1 << 8) | (C1 << 16) | (D1 << 24); } } The function basically compares two int arrays. For each pair of matching elements, the method compares each individual byte value and takes the larger of the two. The element in the first array is then assigned a new int value constructed from the 4 largest byte values (irrespective of source). I think I have optimized this method as much as possible in C# (probably I haven't, of course - suggestions on that score are welcome as well). My question is, is it worth it for me to move this method to an unmanaged C DLL? Would the resulting method execute faster (and how much faster), taking into account the overhead of marshalling my managed int arrays so they can be passed to the method? If doing this would get me, say, a 10% speed improvement, then it would not be worth my time for sure. If it was 2 or 3 times faster, then I would probably have to do it. Note: please, no "premature optimization" comments, thanks in advance. This is simply "optimization".

    Read the article

  • Optimizing list comprehension to find pairs of co-prime numbers

    - by user3685422
    Given A,B print the number of pairs (a,b) such that GCD(a,b)=1 and 1<=a<=A and 1<=b<=B. Here is my answer: return len([(x,y) for x in range(1,A+1) for y in range(1,B+1) if gcd(x,y) == 1]) My answer works fine for small ranges but takes enough time if the range is increased. such as 1 <= A <= 10^5 1 <= B <= 10^5 is there a better way to write this or can this be optimized?

    Read the article

  • Optimizing sparse dot-product in C#

    - by Haggai
    Hello. I'm trying to calculate the dot-product of two very sparse associative arrays. The arrays contain an ID and a value, so the calculation should be done only on those IDs that are common to both arrays, e.g. <(1, 0.5), (3, 0.7), (12, 1.3) * <(2, 0.4), (3, 2.3), (12, 4.7) = 0.7*2.3 + 1.3*4.7 . My implementation (call it dict) currently uses Dictionaries, but it is too slow to my taste. double dot_product(IDictionary<int, double> arr1, IDictionary<int, double> arr2) { double res = 0; double val2; foreach (KeyValuePair<int, double> p in arr1) if (arr2.TryGetValue(p.Key, out val2)) res += p.Value * val2; return res; } The full arrays have about 500,000 entries each, while the sparse ones are only tens to hundreds entries each. I did some experiments with toy versions of dot products. First I tried to multiply just two double arrays to see the ultimate speed I can get (let's call this "flat"). Then I tried to change the implementation of the associative array multiplication using an int[] ID array and a double[] values array, walking together on both ID arrays and multiplying when they are equal (let's call this "double"). I then tried to run all three versions with debug or release, with F5 or Ctrl-F5. The results are as follows: debug F5: dict: 5.29s double: 4.18s (79% of dict) flat: 0.99s (19% of dict, 24% of double) debug ^F5: dict: 5.23s double: 4.19s (80% of dict) flat: 0.98s (19% of dict, 23% of double) release F5: dict: 5.29s double: 3.08s (58% of dict) flat: 0.81s (15% of dict, 26% of double) release ^F5: dict: 4.62s double: 1.22s (26% of dict) flat: 0.29s ( 6% of dict, 24% of double) I don't understand these results. Why isn't the dictionary version optimized in release F5 as do the double and flat versions? Why is it only slightly optimized in the release ^F5 version while the other two are heavily optimized? Also, since converting my code into the "double" scheme would mean lots of work - do you have any suggestions how to optimize the dictionary one? Thanks! Haggai

    Read the article

  • C++ performance, optimizing compiler, empty function in .cpp

    - by Dodo
    I've a very basic class, name it Basic, used in nearly all other files in a bigger project. In some cases, there needs to be debug output, but in release mode, this should not be enabled and be a NOOP. Currently there is a define in the header, which switches a makro on or off, depending on the setting. So this is definetely a NOOP, when switched off. I'm wondering, if I have the following code, if a compiler (MSVS / gcc) is able to optimize out the function call, so that it is again a NOOP. (By doing that, the switch could be in the .cpp and switching will be much faster, compile/link time wise). --Header-- void printDebug(const Basic* p); class Basic { Basic() { simpleSetupCode; // this should be a NOOP in release, // but constructor could be inlined printDebug(this); } }; --Source-- // PRINT_DEBUG defined somewhere else or here #if PRINT_DEBUG void printDebug(const Basic* p) { // Lengthy debug print } #else void printDebug(const Basic* p) {} #endif

    Read the article

  • Optimizing a "set in a string list" to a "set as a matrix" operation

    - by Eric Fournier
    I have a set of strings which contain space-separated elements. I want to build a matrix which will tell me which elements were part of which strings. For example: "" "A B C" "D" "B D" Should give something like: A B C D 1 2 1 1 1 3 1 4 1 1 Now I've got a solution, but it runs slow as molasse, and I've run out of ideas on how to make it faster: reverseIn <- function(vector, value) { return(value %in% vector) } buildCategoryMatrix <- function(valueVector) { allClasses <- c() for(classVec in unique(valueVector)) { allClasses <- unique(c(allClasses, strsplit(classVec, " ", fixed=TRUE)[[1]])) } resMatrix <- matrix(ncol=0, nrow=length(valueVector)) splitValues <- strsplit(valueVector, " ", fixed=TRUE) for(cat in allClasses) { if(cat=="") { catIsPart <- (valueVector == "") } else { catIsPart <- sapply(splitValues, reverseIn, cat) } resMatrix <- cbind(resMatrix, catIsPart) } colnames(resMatrix) <- allClasses return(resMatrix) } Profiling the function gives me this: $by.self self.time self.pct total.time total.pct "match" 31.20 34.74 31.24 34.79 "FUN" 30.26 33.70 74.30 82.74 "lapply" 13.56 15.10 87.86 97.84 "%in%" 12.92 14.39 44.10 49.11 So my actual questions would be: - Where are the 33% spent in "FUN" coming from? - Would there be any way to speed up the %in% call? I tried turning the strings into factors prior to going into the loop so that I'd be matching numbers instead of strings, but that actually makes R crash. I've also tried going for partial matrix assignment (IE, resMatrix[i,x] <- 1) where i is the number of the string and x is the vector of factors. No dice there either, as it seems to keep on running infinitely.

    Read the article

  • Optimizing C++ Tree Generation

    - by cam
    Hi, I'm generating a Tic-Tac-Toe game tree (9 seconds after the first move), and I'm told it should take only a few milliseconds. So I'm trying to optimize it, I ran it through CodeAnalyst and these are the top 5 calls being made (I used bitsets to represent the Tic-Tac-Toe board): std::_Iterator_base::_Orphan_me std::bitset<9::test std::_Iterator_base::_Adopt std::bitset<9::reference::operator bool std::_Iterator_base::~_Iterator_base void BuildTreeToDepth(Node &nNode, const int& nextPlayer, int depth) { if (depth > 0) { //Calculate gameboard states int evalBoard = nNode.m_board.CalculateBoardState(); bool isFinished = nNode.m_board.isFinished(); if (isFinished || (nNode.m_board.isWinner() > 0)) { nNode.m_winCount = evalBoard; } else { Ticboard tBoard = nNode.m_board; do { int validMove = tBoard.FirstValidMove(); if (validMove != -1) { Node f; Ticboard tempBoard = nNode.m_board; tempBoard.Move(validMove, nextPlayer); tBoard.Move(validMove, nextPlayer); f.m_board = tempBoard; f.m_winCount = 0; f.m_Move = validMove; int currPlay = (nextPlayer == 1 ? 2 : 1); BuildTreeToDepth(f,currPlay, depth - 1); nNode.m_winCount += f.m_board.CalculateBoardState(); nNode.m_branches.push_back(f); } else { break; } }while(true); } } } Where should I be looking to optimize it? How should I optimize these 5 calls (I don't recognize them=.

    Read the article

  • Optimizing GDI+ drawing?

    - by user146780
    I'm using C++ and GDI+ I'm going to be making a vector drawing application and want to use GDI+ for the drawing. I'v created a simple test to get familiar with it: case WM_PAINT: GetCursorPos(&mouse); GetClientRect(hWnd,&rct); hdc = BeginPaint(hWnd, &ps); MemDC = CreateCompatibleDC(hdc); bmp = CreateCompatibleBitmap(hdc, 600, 600); SelectObject(MemDC,bmp); g = new Graphics(MemDC); for(int i = 0; i < 1; ++i) { SolidBrush sb(Color(255,255,255)); g->FillRectangle(&sb,rct.top,rct.left,rct.right,rct.bottom); } for(int i = 0; i < 250; ++i) { pts[0].X = 0; pts[0].Y = 0; pts[1].X = 10 + mouse.x * i; pts[1].Y = 0 + mouse.y * i; pts[2].X = 10 * i + mouse.x; pts[2].Y = 10 + mouse.y * i; pts[3].X = 0 + mouse.x; pts[3].Y = (rand() % 600) + mouse.y; Point p1, p2; p1.X = 0; p1.Y = 0; p2.X = 300; p2.Y = 300; g->FillPolygon(&b,pts,4); } BitBlt(hdc,0,0,900,900,MemDC,0,0,SRCCOPY); EndPaint(hWnd, &ps); DeleteObject(bmp); g->ReleaseHDC(MemDC); DeleteDC(MemDC); delete g; break; I'm wondering if I'm doing it right, or if I have areas killing the cpu. Because right now it takes ~ 1sec to render this and I want to be able to have it redraw itself very quickly. Thanks In a real situation would it be better just to figure out the portion of the screen to redraw and only redraw the elements withing bounds of this?

    Read the article

  • Need help optimizing this Django aggregate query

    - by Chris Lawlor
    I have the following model class Plugin(models.Model): name = models.CharField(max_length=50) # more fields which represents a plugin that can be downloaded from my site. To track downloads, I have class Download(models.Model): plugin = models.ForiegnKey(Plugin) timestamp = models.DateTimeField(auto_now=True) So to build a view showing plugins sorted by downloads, I have the following query: # pbd is plugins by download - commented here to prevent scrolling pbd = Plugin.objects.annotate(dl_total=Count('download')).order_by('-dl_total') Which works, but is very slow. With only 1,000 plugins, the avg. response is 3.6 - 3.9 seconds (devserver with local PostgreSQL db), where a similar view with a much simpler query (sorting by plugin release date) takes 160 ms or so. I'm looking for suggestions on how to optimize this query. I'd really prefer that the query return Plugin objects (as opposed to using values) since I'm sharing the same template for the other views (Plugins by rating, Plugins by release date, etc.), so the template is expecting Plugin objects - plus I'm not sure how I would get things like the absolute_url without a reference to the plugin object. Or, is my whole approach doomed to failure? Is there a better way to track downloads? I ultimately want to provide users some nice download statistics for the plugins they've uploaded - like downloads per day/week/month. Will I have to calculate and cache Downloads at some point? EDIT: In my test dataset, there are somewhere between 10-20 Download instances per Plugin - in production I expect this number would be much higher for many of the plugins.

    Read the article

  • Optimizing / simplifying a path

    - by user146780
    Say I have a path with 150 nodes / verticies. How could I simplify if so that for example a straight line with 3 verticies, would remove the middle one since it does nothing to add to the path. Also how could I avoid destroying sharp corners? And how could I remove tiny variations and have smooth curves remaining. Thanks

    Read the article

  • Optimizing JS Array Search

    - by The.Anti.9
    I am working on a Browser-based media player which is written almost entirely in HTML 5 and JavaScript. The backend is written in PHP but it has one function which is to fill the playlist on the initial load. And the rest is all JS. There is a search bar that refines the playlist. I want it to refine as the person is typing, like most media players do. The only problem with this is that it is very slow and laggy as there are about 1000 songs in the whole program and there is likely to be more as time goes on. The original playlist load is an ajax call to a PHP page that returns the results as JSON. Each item has 4 attirbutes: artist album file url I then loop through each object and add it to an array called playlist. At the end of the looping a copy of playlist is created, backup. This is so that I can refine the playlist variable when people refine their search, but still repopulated it from backup without making another server request. The method refine() is called when the user types a key into the searchbox. It flushes playlist and searches through each property (not including url) of each object in the backup array for a match in the string. If there is a match in any of the properties, it appends the information to a table that displays the playlist, and adds it to the object to playlist for access by the actual player. Code for the refine() method: function refine() { $('#loadinggif').show(); $('#library').html("<table id='libtable'><tr><th>Artist</th><th>Album</th><th>File</th><th>&nbsp;</th></tr></table>"); playlist = []; for (var j = 0; j < backup.length; j++) { var sfile = new String(backup[j].file); var salbum = new String(backup[j].album); var sartist = new String(backup[j].artist); if (sfile.toLowerCase().search($('#search').val().toLowerCase()) !== -1 || salbum.toLowerCase().search($('#search').val().toLowerCase()) !== -1 || sartist.toLowerCase().search($('#search').val().toLowerCase()) !== -1) { playlist.push(backup[j]); num = playlist.length-1; $("<tr></tr>").html("<td>" + num + "</td><td>" + sartist + "</td><td>" + salbum + "</td><td>" + sfile + "</td><td><a href='#' onclick='setplay(" + num +");'>Play</a></td>").appendTo('#libtable'); } } $('#loadinggif').hide(); } As I said before, for the first couple of letters typed, this is very slow and laggy. I am looking for ways to refine this to make it much faster and more smooth.

    Read the article

  • Optimizing PHP code (trying to determine min/max/between case)

    - by Swizzh
    I know this code-bit does not conform very much to best coding practices, and was looking to improve it, any ideas? if ($query['date_min'] != _get_date_today()) $mode_min = true; if ($query['date_max'] != _get_date_today()) $mode_max = true; if ($mode_max && $mode_min) $mode = "between"; elseif ($mode_max && !$mode_min) $mode = "max"; elseif (!$mode_max && $mode_min) $mode = "min"; else return; if ($mode == "min" || $mode == "between") { $command_min = "A"; } if ($mode == "max" || $mode == "between") { $command_max = "B"; } if ($mode == "between") { $command = $command_min . " AND " . $command_max; } else { if ($mode == "min") $command = $command_min; if ($mode == "max") $command = $command_max; } echo $command;

    Read the article

  • Optimizing memory usage and changing file contents with PHP

    - by errata
    In a function like this function download($file_source, $file_target) { $rh = fopen($file_source, 'rb'); $wh = fopen($file_target, 'wb'); if (!$rh || !$wh) { return false; } while (!feof($rh)) { if (fwrite($wh, fread($rh, 1024)) === FALSE) { return false; } } fclose($rh); fclose($wh); return true; } what is the best way to rewrite last few bytes of a file with my custom string? Thanks!

    Read the article

  • Optimizing near-duplicate value search

    - by GApple
    I'm trying to find near duplicate values in a set of fields in order to allow an administrator to clean them up. There are two criteria that I am matching on One string is wholly contained within the other, and is at least 1/4 of its length The strings have an edit distance less than 5% of the total length of the two strings The Pseudo-PHP code: foreach($values as $value){ foreach($values as $match){ if( ( $value['length'] < $match['length'] && $value['length'] * 4 > $match['length'] && stripos($match['value'], $value['value']) !== false ) || ( $match['length'] < $value['length'] && $match['length'] * 4 > $value['length'] && stripos($value['value'], $match['value']) !== false ) || ( abs($value['length'] - $match['length']) * 20 < ($value['length'] + $match['length']) && 0 < ($match['changes'] = levenshtein($value['value'], $match['value'])) && $match['changes'] * 20 <= ($value['length'] + $match['length']) ) ){ $matches[] = &$match; } } } I've tried to reduce calls to the comparatively expensive stripos and levenshtein functions where possible, which has reduced the execution time quite a bit. However, as an O(n^2) operation this just doesn't scale to the larger sets of values and it seems that a significant amount of the processing time is spent simply iterating through the arrays. Some properties of a few sets of values being operated on Total | Strings | # of matches per string | | Strings | With Matches | Average | Median | Max | Time (s) | --------+--------------+---------+--------+------+----------+ 844 | 413 | 1.8 | 1 | 58 | 140 | 593 | 156 | 1.2 | 1 | 5 | 62 | 272 | 168 | 3.2 | 2 | 26 | 10 | 157 | 47 | 1.5 | 1 | 4 | 3.2 | 106 | 48 | 1.8 | 1 | 8 | 1.3 | 62 | 47 | 2.9 | 2 | 16 | 0.4 | Are there any other things I can do to reduce the time to check criteria, and more importantly are there any ways for me to reduce the number of criteria checks required (for example, by pre-processing the input values), since there is such low selectivity?

    Read the article

  • Help Optimizing MySQL Table (~ 500,000 records).

    - by Pyrite
    I have a MySQL table that collects player data from various game servers (Urban Terror). The bot that collects the data runs 24/7, and currently the table is up to about 475,000+ records. Because of this, querying this table from PHP has become quite slow. I wonder what I can do on the database side of things to make it as optomized as possible, then I can focus on the application to query the database. The table is as follows: CREATE TABLE IF NOT EXISTS `people` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `name` varchar(40) NOT NULL, `ip` int(4) unsigned NOT NULL, `guid` varchar(32) NOT NULL, `server` int(4) unsigned NOT NULL, `date` int(11) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `Person` (`name`,`ip`,`guid`), KEY `server` (`server`), KEY `date` (`date`), KEY `PlayerName` (`name`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 COMMENT='People that Play on Servers' AUTO_INCREMENT=475843 ; I'm storying the IPv4 (ip and server) as 4 byte integers, and using the MySQL functions NTOA(), etc to encode and decode, I heard that this way is faster, rather than varchar(15). The guid is a md5sum, 32 char hex. Date is stored as unix timestamp. I have a unique key on name, ip and guid, as to avoid duplicates of the same player. Do I have my keys setup right? Is the way I'm storing data efficient? Here is the code to query this table. You search for a name, ip, or guid, and it grabs the results of the query and cross references other records that match the name, ip, or guid from the results of the first query, and does it for each field. This is kind of hard to explain. But basically, if I search for one player by name, I'll see every other name he has used, every IP he has used and every GUID he has used. <form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post"> Search: <input type="text" name="query" id="query" /><input type="submit" name="btnSubmit" value="Submit" /> </form> <?php if (!empty($_POST['query'])) { ?> <table cellspacing="1" id="1up_people" class="tablesorter" width="300"> <thead> <tr> <th>ID</th> <th>Player Name</th> <th>Player IP</th> <th>Player GUID</th> <th>Server</th> <th>Date</th> </tr> </thead> <tbody> <?php function super_unique($array) { $result = array_map("unserialize", array_unique(array_map("serialize", $array))); foreach ($result as $key => $value) { if ( is_array($value) ) { $result[$key] = super_unique($value); } } return $result; } if (!empty($_POST['query'])) { $query = trim($_POST['query']); $count = 0; $people = array(); $link = mysql_connect('localhost', 'mysqluser', 'yea right!'); if (!$link) { die('Could not connect: ' . mysql_error()); } mysql_select_db("1up"); $sql = "SELECT id, name, INET_NTOA(ip) AS ip, guid, INET_NTOA(server) AS server, date FROM 1up_people WHERE (name LIKE \"%$query%\" OR INET_NTOA(ip) LIKE \"%$query%\" OR guid LIKE \"%$query%\")"; $result = mysql_query($sql, $link); if (!$result) { die(mysql_error()); } // Now take the initial results and parse each column into its own array while ($row = mysql_fetch_array($result, MYSQL_NUM)) { $name = htmlspecialchars($row[1]); $people[] = array( 'id' => $row[0], 'name' => $name, 'ip' => $row[2], 'guid' => $row[3], 'server' => $row[4], 'date' => $row[5] ); } // now for each name, ip, guid in results, find additonal records $people2 = array(); foreach ($people AS $person) { $ip = $person['ip']; $sql = "SELECT id, name, INET_NTOA(ip) AS ip, guid, INET_NTOA(server) AS server, date FROM 1up_people WHERE (ip = \"$ip\")"; $result = mysql_query($sql, $link); while ($row = mysql_fetch_array($result, MYSQL_NUM)) { $name = htmlspecialchars($row[1]); $people2[] = array( 'id' => $row[0], 'name' => $name, 'ip' => $row[2], 'guid' => $row[3], 'server' => $row[4], 'date' => $row[5] ); } } $people3 = array(); foreach ($people AS $person) { $guid = $person['guid']; $sql = "SELECT id, name, INET_NTOA(ip) AS ip, guid, INET_NTOA(server) AS server, date FROM 1up_people WHERE (guid = \"$guid\")"; $result = mysql_query($sql, $link); while ($row = mysql_fetch_array($result, MYSQL_NUM)) { $name = htmlspecialchars($row[1]); $people3[] = array( 'id' => $row[0], 'name' => $name, 'ip' => $row[2], 'guid' => $row[3], 'server' => $row[4], 'date' => $row[5] ); } } $people4 = array(); foreach ($people AS $person) { $name = $person['name']; $sql = "SELECT id, name, INET_NTOA(ip) AS ip, guid, INET_NTOA(server) AS server, date FROM 1up_people WHERE (name = \"$name\")"; $result = mysql_query($sql, $link); while ($row = mysql_fetch_array($result, MYSQL_NUM)) { $name = htmlspecialchars($row[1]); $people4[] = array( 'id' => $row[0], 'name' => $name, 'ip' => $row[2], 'guid' => $row[3], 'server' => $row[4], 'date' => $row[5] ); } } // Combine people and people2 into just people $people = array_merge($people, $people2); $people = array_merge($people, $people3); $people = array_merge($people, $people4); $people = super_unique($people); foreach ($people AS $person) { $date = ($person['date']) ? date("M d, Y", $person['date']) : 'Before 8/1/10'; echo "<tr>\n"; echo "<td>".$person['id']."</td>"; echo "<td>".$person['name']."</td>"; echo "<td>".$person['ip']."</td>"; echo "<td>".$person['guid']."</td>"; echo "<td>".$person['server']."</td>"; echo "<td>".$date."</td>"; echo "</tr>\n"; $count++; } // Find Total Records //$result = mysql_query("SELECT id FROM 1up_people", $link); //$total = mysql_num_rows($result); mysql_close($link); } ?> </tbody> </table> <p> <?php echo $count." Records Found for \"".$_POST['query']."\" out of $total"; ?> </p> <?php } $time_stop = microtime(true); print("Done (ran for ".round($time_stop-$time_start)." seconds)."); ?> Any help at all is appreciated! Thank you.

    Read the article

  • optimizing oracle query

    - by deming
    I'm having a hard time wrapping my head around this query. it is taking almost 200+ seconds to execute. I've pasted the execution plan as well. SELECT user_id , ROLE_ID , effective_from_date , effective_to_date , participant_code , ACTIVE FROM CMP_USER_ROLE E WHERE ACTIVE = 0 AND (SYSDATE BETWEEN effective_from_date AND effective_to_date OR TO_CHAR(effective_to_date,'YYYY-Q') = '2010-2') AND participant_code = 'NY005' AND NOT EXISTS ( SELECT 1 FROM CMP_USER_ROLE r WHERE r.USER_ID= E.USER_ID AND r.role_id = E.role_id AND r.ACTIVE = 4 AND E.effective_to_date <= (SELECT MAX(last_update_date) FROM CMP_USER_ROLE S WHERE S.role_id = r.role_id AND S.role_id = r.role_id AND S.ACTIVE = 4 )) Explain plan ----------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 37 | 154 (2)| 00:00:02 | |* 1 | FILTER | | | | | | |* 2 | TABLE ACCESS BY INDEX ROWID | USER_ROLE | 1 | 37 | 30 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | N_USER_ROLE_IDX6 | 27 | | 3 (0)| 00:00:01 | |* 4 | FILTER | | | | | | | 5 | HASH GROUP BY | | 1 | 47 | 124 (2)| 00:00:02 | |* 6 | TABLE ACCESS BY INDEX ROWID | USER_ROLE | 159 | 3339 | 119 (1)| 00:00:02 | | 7 | NESTED LOOPS | | 11 | 517 | 123 (1)| 00:00:02 | |* 8 | TABLE ACCESS BY INDEX ROWID| USER_ROLE | 1 | 26 | 4 (0)| 00:00:01 | |* 9 | INDEX RANGE SCAN | N_USER_ROLE_IDX5 | 1 | | 3 (0)| 00:00:01 | |* 10 | INDEX RANGE SCAN | N_USER_ROLE_IDX2 | 957 | | 74 (2)| 00:00:01 | -----------------------------------------------------------------------------------------------------

    Read the article

  • Help in optimizing a for loop in matlab

    - by HH
    I have a 1 by N double array consisting of 1 and 0. I would like to map all the 1 to symbol '-3' and '3' and all the 0 to symbol '-1' and '1' equally. Below is my code. As my array is approx 1 by 8 million, it is taking a very long time. How to speed things up? [row,ll] = size(Data); sym_zero = -1; sym_one = -3; for loop = 1 : row if Data(loop,1) == 0 Data2(loop,1) = sym_zero; if sym_zero == -1 sym_zero = 1; else sym_zero = -1; end else Data2(loop,1) = sym_one; if sym_one == -3 sym_zero = 3; else sym_zero = -3; end end end

    Read the article

  • Django: optimizing queries

    - by Josh
    I want to list the number of items for each list. How can I find this number in a single query, rather than a query for each list? Here is a simplified version of my current template code: {% for list in lists %} <li> {{ listname }}: {% with list.num_items as item_count %} {{ item_count }} item{{ item_count|pluralize }} {% endwith %} </li> {% endfor %} lists is passed as: List.objects.filter(user=user) and num_items is a property of the List model: def _get_num_items(self): return self.item_set.filter(archived=False).count() num_items = property(_get_num_items) This queries SELECT COUNT(*) FROM "my_app_item" WHERE... n times, where n is the number of lists. Is it possible to make a single query here?

    Read the article

< Previous Page | 58 59 60 61 62 63 64 65 66 67 68 69  | Next Page >