Search Results

Search found 3512 results on 141 pages for 'premature optimization'.

Page 56/141 | < Previous Page | 52 53 54 55 56 57 58 59 60 61 62 63  | Next Page >

  • How Do You Profile & Optimize CUDA Kernels?

    - by John Dibling
    I am somewhat familiar with the CUDA visual profiler and the occupancy spreadsheet, although I am probably not leveraging them as well as I could. Profiling & optimizing CUDA code is not like profiling & optimizing code that runs on a CPU. So I am hoping to learn from your experiences about how to get the most out of my code. There was a post recently looking for the fastest possible code to identify self numbers, and I provided a CUDA implementation. I'm not satisfied that this code is as fast as it can be, but I'm at a loss as to figure out both what the right questions are and what tool I can get the answers from. How do you identify ways to make your CUDA kernels perform faster?

    Read the article

  • Try to fill the GAE datastore but the code consumes to much cpu time. How to optimize this?

    - by Neverland
    I try to get the list of images in Amazon EC2 inside the Google datastore. I want to realize this with a cron job inside the GAE. class AmazonEC2uswest(db.Model): ami = db.StringProperty(required=True) mani = db.StringProperty() typ = db.StringProperty() arch = db.StringProperty() state = db.StringProperty() owner = db.StringProperty() class CronAMIsAmazonUS_WEST(webapp.RequestHandler): def get(self): aws_access_key_id_admin = "<secret>" aws_secret_access_key_admin = "<secret>" conn_us_west = boto.ec2.connect_to_region('us-west-1', aws_access_key_id=aws_access_key_id_admin, aws_secret_access_key=aws_secret_access_key_admin, is_secure = False) liste_images_us_west = conn_us_west.get_all_images() laenge_liste_images_us_west = len(liste_images_us_west) for i in range(laenge_liste_images_us_west): datastore_uswest_AMIs = AmazonEC2uswest(ami=liste_images_us_west[i].id, mani=str(liste_images_us_west[i].location), typ=liste_images_us_west[i].type, arch=liste_images_us_west[i].architecture, state=liste_images_us_west[i].state, owner=liste_images_us_west[i].ownerId) datastore_uswest_AMIs.put() The problem: Getting the list with get_all_images() lasts only a few seconds. But writing the data to the Google datastore needs way too much CPU time. My IBM T42p (P4M with 2GHz) needs for that piece of code approx. 1 Minute! Is it possible to optimize my code in a way that it needs fewer CPU time?

    Read the article

  • Random Complete System Unresponsiveness Running Mathematical Functions

    - by Computer Guru
    I have a program that loads a file (anywhere from 10MB to 5GB) a chunk at a time (ReadFile), and for each chunk performs a set of mathematical operations (basically calculates the hash). After calculating the hash, it stores info about the chunk in an STL map (basically <chunkID, hash>) and then writes the chunk itself to another file (WriteFile). That's all it does. This program will cause certain PCs to choke and die. The mouse begins to stutter, the task manager takes 2 min to show, ctrl+alt+del is unresponsive, running programs are slow.... the works. I've done literally everything I can think of to optimize the program, and have triple-checked all objects. What I've done: Tried different (less intensive) hashing algorithms. Switched all allocations to nedmalloc instead of the default new operator Switched from stl::map to unordered_set, found the performance to still be abysmal, so I switched again to Google's dense_hash_map. Converted all objects to store pointers to objects instead of the objects themselves. Caching all Read and Write operations. Instead of reading a 16k chunk of the file and performing the math on it, I read 4MB into a buffer and read 16k chunks from there instead. Same for all write operations - they are coalesced into 4MB blocks before being written to disk. Run extensive profiling with Visual Studio 2010, AMD Code Analyst, and perfmon. Set the thread priority to THREAD_MODE_BACKGROUND_BEGIN Set the thread priority to THREAD_PRIORITY_IDLE Added a Sleep(100) call after every loop. Even after all this, the application still results in a system-wide hang on certain machines under certain circumstances. Perfmon and Process Explorer show minimal CPU usage (with the sleep), no constant reads/writes from disk, few hard pagefaults (and only ~30k pagefaults in the lifetime of the application on a 5GB input file), little virtual memory (never more than 150MB), no leaked handles, no memory leaks. The machines I've tested it on run Windows XP - Windows 7, x86 and x64 versions included. None have less than 2GB RAM, though the problem is always exacerbated under lower memory conditions. I'm at a loss as to what to do next. I don't know what's causing it - I'm torn between CPU or Memory as the culprit. CPU because without the sleep and under different thread priorities the system performances changes noticeably. Memory because there's a huge difference in how often the issue occurs when using unordered_set vs Google's dense_hash_map. What's really weird? Obviously, the NT kernel design is supposed to prevent this sort of behavior from ever occurring (a user-mode application driving the system to this sort of extreme poor performance!?)..... but when I compile the code and run it on OS X or Linux (it's fairly standard C++ throughout) it performs excellently even on poor machines with little RAM and weaker CPUs. What am I supposed to do next? How do I know what the hell it is that Windows is doing behind the scenes that's killing system performance, when all the indicators are that the application itself isn't doing anything extreme? Any advice would be most welcome.

    Read the article

  • help optimize sql query

    - by msony
    I have tracking table tbl_track with id, session_id, created_date fields I need count unique session_id for one day here what i got: select count(0) from ( select distinct session_id from tbl_track where created_date between getdate()-1 and getdate() group by session_id )tbl im feeling that it could be better solution for it

    Read the article

  • How does loop address alignment affect the speed on Intel x86_64?

    - by Alexander Gololobov
    I'm seeing 15% performance degradation of the same C++ code compiled to exactly same machine instructions but located on differently aligned addresses. When my tiny main loop starts at 0x415220 it's faster then when it is at 0x415250. I'm running this on Intel Core2 Duo. I use gcc 4.4.5 on x86_64 Ubuntu. Can anybody explain the cause of slowdown and how I can force gcc to optimally align the loop? Here is the disassembly for both cases with profiler annotation: 415220 576 12.56% |XXXXXXXXXXXXXX 48 c1 eb 08 shr $0x8,%rbx 415224 110 2.40% |XX 0f b6 c3 movzbl %bl,%eax 415227 0.00% | 41 0f b6 04 00 movzbl (%r8,%rax,1),%eax 41522c 40 0.87% | 48 8b 04 c1 mov (%rcx,%rax,8),%rax 415230 806 17.58% |XXXXXXXXXXXXXXXXXXX 4c 63 f8 movslq %eax,%r15 415233 186 4.06% |XXXX 48 c1 e8 20 shr $0x20,%rax 415237 102 2.22% |XX 4c 01 f9 add %r15,%rcx 41523a 414 9.03% |XXXXXXXXXX a8 0f test $0xf,%al 41523c 680 14.83% |XXXXXXXXXXXXXXXX 74 45 je 415283 ::Run(char const*, char const*)+0x4b3 41523e 0.00% | 41 89 c7 mov %eax,%r15d 415241 0.00% | 41 83 e7 01 and $0x1,%r15d 415245 0.00% | 41 83 ff 01 cmp $0x1,%r15d 415249 0.00% | 41 89 c7 mov %eax,%r15d 415250 679 13.05% |XXXXXXXXXXXXXXXX 48 c1 eb 08 shr $0x8,%rbx 415254 124 2.38% |XX 0f b6 c3 movzbl %bl,%eax 415257 0.00% | 41 0f b6 04 00 movzbl (%r8,%rax,1),%eax 41525c 43 0.83% |X 48 8b 04 c1 mov (%rcx,%rax,8),%rax 415260 828 15.91% |XXXXXXXXXXXXXXXXXXX 4c 63 f8 movslq %eax,%r15 415263 388 7.46% |XXXXXXXXX 48 c1 e8 20 shr $0x20,%rax 415267 141 2.71% |XXX 4c 01 f9 add %r15,%rcx 41526a 634 12.18% |XXXXXXXXXXXXXXX a8 0f test $0xf,%al 41526c 749 14.39% |XXXXXXXXXXXXXXXXXX 74 45 je 4152b3 ::Run(char const*, char const*)+0x4c3 41526e 0.00% | 41 89 c7 mov %eax,%r15d 415271 0.00% | 41 83 e7 01 and $0x1,%r15d 415275 0.00% | 41 83 ff 01 cmp $0x1,%r15d 415279 0.00% | 41 89 c7 mov %eax,%r15d

    Read the article

  • Can anyone recommend a decent tool for optimizing images other than Photoshop

    - by toomanyairmiles
    Can anyone recommend a decent tool for optimising images other than adobe photoshop, the gimp etc? I'm looking to optimise images for the web preferably online and free. Basically I have a client who can't install additional software on their work PC but needs to optimise photographs and other images for their website and is presently uploading 1 or 2 Mb files. On a personal level I'm interested to see what other people are using...

    Read the article

  • Why better isolation level means better performance in SQL Server

    - by Oleg Zhylin
    When measuring performance on my query I came up with a dependency between isolation level and elapsed time that was surprising to me READUNCOMMITTED - 409024 READCOMMITTED - 368021 REPEATABLEREAD - 358019 SERIALIZABLE - 348019 Left column is table hint, and the right column is elapsed time in microseconds (sys.dm_exec_query_stats.total_elapsed_time). Why better isolation level gives better performance? This is a development machine and no concurrency whatsoever happens. I would expect READUNCOMMITTED to be the fasted due to less locking overhead. Update: I did measure this with DBCC DROPCLEANBUFFERS DBCC FREEPROCCACHE issued and Profiler confirms there're no cache hits happening. Update2: The query in question is an OLAP one and we need to run it as fast as possible. Closing the production server from outside world to get the computation done is not out of question if this gives performance benefits.

    Read the article

  • SEO for Ultraseek 5.7

    - by Adam N
    We've got Ultraseek 5.7 indexing the content on our corporate intranet site, and we'd like to make sure our web pages are being optimized for it. Which SEO techniques are useful for Ultraseek, and where can I find documentation about these features? Features I've considered implementing: Make the title and first H1 contain the most valuable information about the page Implement a sitemap.xml file Ping the Ultraseek xpa interface when new content is added Use "SEO-Friendly" URL strings Add Meta keywords to the HTML pages.

    Read the article

  • PHP Increasing write to page speed.

    - by Frederico
    I'm currently writing out xml and have done the following: header ("content-type: text/xml"); header ("content-length: ".strlen($xml)); $xml being the xml to be written out. I'm near about 1.8 megs of text (which I found via firebug), it seems as the writing is taking more time than the script to run.. is there a way to increase this write speed? Thank you in advance.

    Read the article

  • What is the best algorithm for this array-comparison problem?

    - by mark
    What is the most efficient for speed algorithm to solve the following problem? Given 6 arrays, D1,D2,D3,D4,D5 and D6 each containing 6 numbers like: D1[0] = number D2[0] = number ...... D6[0] = number D1[1] = another number D2[1] = another number .... ..... .... ...... .... D1[5] = yet another number .... ...... .... Given a second array ST1, containing 1 number: ST1[0] = 6 Given a third array ans, containing 6 numbers: ans[0] = 3, ans[1] = 4, ans[2] = 5, ......ans[5] = 8 Using as index for the arrays D1,D2,D3,D4,D5 and D6, the number that goes from 0, to the number stored in ST1[0] minus one, in this example 6, so from 0 to 6-1, compare each res array against each D array My algorithm so far is: I tried to keep everything unlooped as much as possible. EML := ST1[0] //number contained in ST1[0] EML1 := 0 //start index for the arrays D While EML1 < EML if D1[ELM1] = ans[0] goto two if D2[ELM1] = ans[0] goto two if D3[ELM1] = ans[0] goto two if D4[ELM1] = ans[0] goto two if D5[ELM1] = ans[0] goto two if D6[ELM1] = ans[0] goto two ELM1 = ELM1 + 1 return 0 //If the ans[0] number is not found in either D1[0-6], D2[0-6].... D6[0-6] return 0 which will then exclude ans[0-6] numbers two: EML1 := 0 start index for arrays Ds While EML1 < EML if D1[ELM1] = ans[1] goto three if D2[ELM1] = ans[1] goto three if D3[ELM1] = ans[1] goto three if D4[ELM1] = ans[1] goto three if D5[ELM1] = ans[1] goto three if D6[ELM1] = ans[1] goto three ELM1 = ELM1 + 1 return 0 //If the ans[1] number is not found in either D1[0-6], D2[0-6].... D6[0-6] return 0 which will then exclude ans[0-6] numbers three: EML1 := 0 start index for arrays Ds While EML1 < EML if D1[ELM1] = ans[2] goto four if D2[ELM1] = ans[2] goto four if D3[ELM1] = ans[2] goto four if D4[ELM1] = ans[2] goto four if D5[ELM1] = ans[2] goto four if D6[ELM1] = ans[2] goto four ELM1 = ELM1 + 1 return 0 //If the ans[2] number is not found in either D1[0-6], D2[0-6].... D6[0-6] return 0 which will then exclude ans[0-6] numbers four: EML1 := 0 start index for arrays Ds While EML1 < EML if D1[ELM1] = ans[3] goto five if D2[ELM1] = ans[3] goto five if D3[ELM1] = ans[3] goto five if D4[ELM1] = ans[3] goto five if D5[ELM1] = ans[3] goto five if D6[ELM1] = ans[3] goto five ELM1 = ELM1 + 1 return 0 //If the ans[3] number is not found in either D1[0-6], D2[0-6].... D6[0-6] return 0 which will then exclude ans[0-6] numbers five: EML1 := 0 start index for arrays Ds While EML1 < EML if D1[ELM1] = ans[4] goto six if D2[ELM1] = ans[4] goto six if D3[ELM1] = ans[4] goto six if D4[ELM1] = ans[4] goto six if D5[ELM1] = ans[4] goto six if D6[ELM1] = ans[4] goto six ELM1 = ELM1 + 1 return 0 //If the ans[4] number is not found in either D1[0-6], D2[0-6].... D6[0-6] return 0 which will then exclude ans[0-6] numbers six: EML1 := 0 start index for arrays Ds While EML1 < EML if D1[ELM1] = ans[5] return 1 ////If the ans[1] number is not found in either D1[0-6]..... if D2[ELM1] = ans[5] return 1 which will then include ans[0-6] numbers return 1 if D3[ELM1] = ans[5] return 1 if D4[ELM1] = ans[5] return 1 if D5[ELM1] = ans[5] return 1 if D6[ELM1] = ans[5] return 1 ELM1 = ELM1 + 1 return 0 As language of choice, it would be pure c

    Read the article

  • Find the closest vector

    - by Alexey Lebedev
    Hello! Recently I wrote the algorithm to quantize an RGB image. Every pixel is represented by an (R,G,B) vector, and quantization codebook is a couple of 3-dimensional vectors. Every pixel of the image needs to be mapped to (say, "replaced by") the codebook pixel closest in terms of euclidean distance (more exactly, squared euclidean). I did it as follows: class EuclideanMetric(DistanceMetric): def __call__(self, x, y): d = x - y return sqrt(sum(d * d, -1)) class Quantizer(object): def __init__(self, codebook, distanceMetric = EuclideanMetric()): self._codebook = codebook self._distMetric = distanceMetric def quantize(self, imageArray): quantizedRaster = zeros(imageArray.shape) X = quantizedRaster.shape[0] Y = quantizedRaster.shape[1] for i in xrange(0, X): print i for j in xrange(0, Y): dist = self._distMetric(imageArray[i,j], self._codebook) code = argmin(dist) quantizedRaster[i,j] = self._codebook[code] return quantizedRaster ...and it works awfully, almost 800 seconds on my Pentium Core Duo 2.2 GHz, 4 Gigs of memory and an image of 2600*2700 pixels:( Is there a way to somewhat optimize this? Maybe the other algorithm or some Python-specific optimizations.

    Read the article

  • Whats faster in Javascript a bunch of small setInterval loops, or one big one?

    - by RobertWHurst
    Just wondering if its worth it to make a monolithic loop function or just add loops were they're needed. The big loop option would just be a loop of callbacks that are added dynamically with an add function. adding a function would look like this setLoop(function(){ alert('hahaha! I\'m a really annoying loop that bugs you every tenth of a second'); }); setLoop would add the function to the monolithic loop. so is the is worth anything in performance or should I just stick to lots of little loops using setInterval?

    Read the article

  • Benefits of 'Optimize code' option in Visual Studio build

    - by gt
    Much of our C# release code is built with the 'Optimize code' option turned off. I believe this is to allow code built in Release mode to be debugged more easily. Given that we are creating fairly simple desktop software which connects to backend Web Services, (ie. not a particularly processor-intensive application) then what if any sort of performance hit might be expected? And is any particular platform likely to be worse affected? Eg. multi-processor / 64 bit.

    Read the article

  • speed up wamp server + drupal on windows vista

    - by Andrew Welch
    Hi, My localhost performance with drupal six is pretty slow. I found a solution to add a # before the :: localhost line of the system32/etc/hosts file but this was something I had already done and didn't help much. does anyone know of any other optimisations that might work? tHanks Andy

    Read the article

  • Searching with Linq

    - by Phil
    I have a collection of objects, each with an int Frame property. Given an int, I want to find the object in the collection that has the closest Frame. Here is what I'm doing so far: public static void Search(int frameNumber) { var differences = (from rec in _records select new { FrameDiff = Math.Abs(rec.Frame - frameNumber), Record = rec }).OrderBy(x => x.FrameDiff); var closestRecord = differences.FirstOrDefault().Record; //continue work... } This is great and everything, except there are 200,000 items in my collection and I call this method very frequently. Is there a relatively easy, more efficient way to do this?

    Read the article

  • Grand Central Strategy for Opening Multiple Files

    - by user276632
    I have a working implementation using Grand Central dispatch queues that (1) opens a file and computes an OpenSSL DSA hash on "queue1", (2) writing out the hash to a new "side car" file for later verification on "queue2". I would like to open multiple files at the same time, but based on some logic that doesn't "choke" the OS by having 100s of files open and exceeding the hard drive's sustainable output. Photo browsing applications such as iPhoto or Aperture seem to open multiple files and display them, so I'm assuming this can be done. I'm assuming the biggest limitation will be disk I/O, as the application can (in theory) read and write multiple files simultaneously. Any suggestions? TIA

    Read the article

  • Optimizing processing and management of large Java data arrays

    - by mikera
    I'm writing some pretty CPU-intensive, concurrent numerical code that will process large amounts of data stored in Java arrays (e.g. lots of double[100000]s). Some of the algorithms might run millions of times over several days so getting maximum steady-state performance is a high priority. In essence, each algorithm is a Java object that has an method API something like: public double[] runMyAlgorithm(double[] inputData); or alternatively a reference could be passed to the array to store the output data: public runMyAlgorithm(double[] inputData, double[] outputData); Given this requirement, I'm trying to determine the optimal strategy for allocating / managing array space. Frequently the algorithms will need large amounts of temporary storage space. They will also take large arrays as input and create large arrays as output. Among the options I am considering are: Always allocate new arrays as local variables whenever they are needed (e.g. new double[100000]). Probably the simplest approach, but will produce a lot of garbage. Pre-allocate temporary arrays and store them as final fields in the algorithm object - big downside would be that this would mean that only one thread could run the algorithm at any one time. Keep pre-allocated temporary arrays in ThreadLocal storage, so that a thread can use a fixed amount of temporary array space whenever it needs it. ThreadLocal would be required since multiple threads will be running the same algorithm simultaneously. Pass around lots of arrays as parameters (including the temporary arrays for the algorithm to use). Not good since it will make the algorithm API extremely ugly if the caller has to be responsible for providing temporary array space.... Allocate extremely large arrays (e.g. double[10000000]) but also provide the algorithm with offsets into the array so that different threads will use a different area of the array independently. Will obviously require some code to manage the offsets and allocation of the array ranges. Any thoughts on which approach would be best (and why)?

    Read the article

  • Algorithm for optimally choosing actions to perform a task

    - by Jules
    There are two data types: tasks and actions. An action costs a certain time to complete, and a set of tasks this actions consists of. A task has a set of actions, and our job is to choose one of them. So: class Task { Set<Action> choices; } class Action { float time; Set<Task> dependencies; } For example the primary task could be "Get a house". The possible actions for this task: "Buy a house" or "Build a house". The action "Build a house" costs 10 hours and has the dependencies "Get bricks" and "Get cement", etcetera. The total time is the sum of all the times of the actions required to perform. We want to choose actions such that the total time is minimal. Note that the dependencies can be diamond shaped. For example "Get bricks" could require "Get a car" (to transport the bricks) and "Get cement" would also require a car. Even if you do "Get bricks" and "Get cement" you only have to count the time it takes to get a car once. Note also that the dependencies can be circular. For example "Money" - "Job" - "Car" - "Money". This is no problem for us, we simply select all of "Money", "Job" and "Car". The total time is simply the sum of the time of these 3 things. Mathematical description: Let actions be the chosen actions. valid(task) = ?action ? task.choices. (action ? actions ? ?tasks ? action.dependencies. valid(task)) time = sum {action.time | action ? actions} minimize time subject to valid(primaryTask)

    Read the article

  • Does the .NET CLR Really Optimize for the Current Processor

    - by dewald
    When I read about the performance of JITted languages like C# or Java, authors usually say that they should/could theoretically outperform many native-compiled applications. The theory being that native applications are usually just compiled for a processor family (like x86), so the compiler cannot make certain optimizations as they may not truly be optimizations on all processors. On the other hand, the CLR can make processor-specific optimizations during the JIT process. Does anyone know if Microsoft's (or Mono's) CLR actually performs processor-specific optimizations during the JIT process? If so, what kind of optimizations?

    Read the article

  • Jruby rspec to be run parallely

    - by Priyank
    Hi. Is there something like Spork for Jruby too? We want to parallelize our specs to run faster and pre-load the classes while running the rake task; however we have not been able to do so. Since our project is considerable in size, specs take about 15 minutes to complete and this poses a serious challenge to quick turnaround. Any ideas are more than welcome. Cheers

    Read the article

  • mysql subselect alternative

    - by Arnold
    Hi, Lets say I am analyzing how high school sports records affect school attendance. So I have a table in which each row corresponds to a high school basketball game. Each game has an away team id and a home team id (FK to another "team table") and a home score and an away score and a date. I am writing a query that matches attendance with this seasons basketball games. My sample output will be (#_students_missed_class, day_of_game, home_team, away_team, home_team_wins_this_season, away_team_wins_this_season) I now want to add how each team did the previous season to my analysis. Well, I have their previous season stored in the game table but i should be able to accomplish that with a subselect. So in my main select statement I add the subselect: SELECT COUNT(*) FROM game_table WHERE game_table.date BETWEEN 'start of previous season' AND 'end of previous season' AND ( (game_table.home_team = team_table.id AND game_table.home_score > game_table.away_score) OR (game_table.away_team = team_table.id AND game_table.away_score > game_table.home_score)) In this case team-table.id refers to the id of the home_team so I now have all their wins calculated from the previous year. This method of calculation is neither time nor resource intensive. The Explain SQL shows that I have ALL in the Type field and I am not using a Key and the query times out. I'm not sure how I can accomplish a more efficient query with a subselect. It seems proposterously inefficient to have to write 4 of these queries (for home wins, home losses, away wins, away losses). I am sure this could be more lucid. I'll absolutely add color tomorrow if anyone has questions

    Read the article

  • Good Starting Points for Optimizing Database Calls in Ruby on Rails?

    - by viatropos
    I have a menu in Rails which grabs a nested tree of Post models, each which have a Slug model associated via a polymorphic association (using the friendly_id gem for slugs and awesome_nested_set for the tree). The database output in development looks like this (here's the full gist): SQL (0.4ms) SELECT COUNT(*) AS count_id FROM "posts" WHERE ("posts".parent_id = 39) CACHE (0.0ms) SELECT "posts".* FROM "posts" WHERE ("posts"."id" = 13) LIMIT 1 CACHE (0.0ms) SELECT "slugs".* FROM "slugs" WHERE ("slugs".sluggable_id = 13 AND "slugs".sluggable_type = 'Post') ORDER BY id DESC LIMIT 1 Slug Load (0.4ms) SELECT "slugs".* FROM "slugs" WHERE ("slugs".sluggable_id = 40 AND "slugs".sluggable_type = 'Post') ORDER BY id DESC LIMIT 1 SQL (0.3ms) SELECT COUNT(*) AS count_id FROM "posts" WHERE ("posts".parent_id = 40) CACHE (0.0ms) SELECT "posts".* FROM "posts" WHERE ("posts"."id" = 13) LIMIT 1 CACHE (0.0ms) SELECT "slugs".* FROM "slugs" WHERE ("slugs".sluggable_id = 13 AND "slugs".sluggable_type = 'Post') ORDER BY id DESC LIMIT 1 Slug Load (0.4ms) SELECT "slugs".* FROM "slugs" WHERE ("slugs".sluggable_id = 41 AND "slugs".sluggable_type = 'Post') ORDER BY id DESC LIMIT 1 ... Rendered shared/_menu.html.haml (907.6ms) What are some quick things I should always do to optimize this from the start (easy things)? Some things I'm thinking now are: Can Rails 3 eager load the whole Post tree + associated Slugs in one DB call? Can I do that easily with named scopes or custom SQL? What is best practice in this situation? Not really thinking about memcached in this situation as that can be applied to much more than just this.

    Read the article

  • Fastest way to put contents of Set<String> to a single String with words separated by a whitespace?

    - by Lars Andren
    I have a few Set<String>s and want to transform each of these into a single String where each element of the original Set is separated by a whitespace " ". A naive first approach is doing it like this Set<String> set_1; Set<String> set_2; StringBuilder builder = new StringBuilder(); for (String str : set_1) { builder.append(str).append(" "); } this.string_1 = builder.toString(); builder = new StringBuilder(); for (String str : set_2) { builder.append(str).append(" "); } this.string_2 = builder.toString(); Can anyone think of a faster, prettier or more efficient way to do this?

    Read the article

  • Memory efficient int-int dict in Python

    - by Bolo
    Hi, I need a memory efficient int-int dict in Python that would support the following operations in O(log n) time: d[k] = v # replace if present v = d[k] # None or a negative number if not present I need to hold ~250M pairs, so it really has to be tight. Do you happen to know a suitable implementation (Python 2.7)? EDIT Removed impossible requirement and other nonsense. Thanks, Craig and Kylotan! To rephrase. Here's a trivial int-int dictionary with 1M pairs: >>> import random, sys >>> from guppy import hpy >>> h = hpy() >>> h.setrelheap() >>> d = {} >>> for _ in xrange(1000000): ... d[random.randint(0, sys.maxint)] = random.randint(0, sys.maxint) ... >>> h.heap() Partition of a set of 1999530 objects. Total size = 49161112 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 0 25165960 51 25165960 51 dict (no owner) 1 1999521 100 23994252 49 49160212 100 int On average, a pair of integers uses 49 bytes. Here's an array of 2M integers: >>> import array, random, sys >>> from guppy import hpy >>> h = hpy() >>> h.setrelheap() >>> a = array.array('i') >>> for _ in xrange(2000000): ... a.append(random.randint(0, sys.maxint)) ... >>> h.heap() Partition of a set of 14 objects. Total size = 8001108 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 7 8000028 100 8000028 100 array.array On average, a pair of integers uses 8 bytes. I accept that 8 bytes/pair in a dictionary is rather hard to achieve in general. Rephrased question: is there a memory-efficient implementation of int-int dictionary that uses considerably less than 49 bytes/pair?

    Read the article

  • Is it possible to do A/B testing by page rather than by individual?

    - by mojones
    Lets say I have a simple ecommerce site that sells 100 different t-shirt designs. I want to do some a/b testing to optimise my sales. Let's say I want to test two different "buy" buttons. Normally, I would use AB testing to randomly assign each visitor to see button A or button B (and try to ensure that that the user experience is consistent by storing that assignment in session, cookies etc). Would it be possible to take a different approach and instead, randomly assign each of my 100 designs to use button A or B, and measure the conversion rate as (number of sales of design n) / (pageviews of design n) This approach would seem to have some advantages; I would not have to worry about keeping the user experience consistent - a given page (e.g. www.example.com/viewdesign?id=6) would always return the same html. If I were to test different prices, it would be far less distressing to the user to see different prices for different designs than different prices for the same design on different computers. I also wonder whether it might be better for SEO - my suspicion is that Google would "prefer" that it always sees the same html when crawling a page. Obviously this approach would only be suitable for a limited number of sites; I was just wondering if anyone has tried it?

    Read the article

< Previous Page | 52 53 54 55 56 57 58 59 60 61 62 63  | Next Page >