Search Results

Search found 13608 results on 545 pages for 'performance dashboard'.

Page 488/545 | < Previous Page | 484 485 486 487 488 489 490 491 492 493 494 495  | Next Page >

  • What hash algorithms are paralellizable? Optimizing the hashing of large files utilizing on mult-co

    - by DanO
    I'm interested in optimizing the hashing of some large files (optimizing wall clock time). The I/O has been optimized well enough already and the I/O device (local SSD) is only tapped at about 25% of capacity, while one of the CPU cores is completely maxed-out. I have more cores available, and in the future will likely have even more cores. So far I've only been able to tap into more cores if I happen to need multiple hashes of the same file, say an MD5 AND a SHA256 at the same time. I can use the same I/O stream to feed two or more hash algorithms, and I get the faster algorithms done for free (as far as wall clock time). As I understand most hash algorithms, each new bit changes the entire result, and it is inherently challenging/impossible to do in parallel. Are any of the mainstream hash algorithms parallelizable? Are there any non-mainstream hashes that are parallelizable (and that have at least a sample implementation available)? As future CPUs will trend toward more cores and a leveling off in clock speed, is there any way to improve the performance of file hashing? (other than liquid nitrogen cooled overclocking?) or is it inherently non-parallelizable?

    Read the article

  • Flex profiling - what is [enterFrameEvent] doing?

    - by Herms
    I've been tasked with finding (and potentially fixing) some serious performance problems with a Flex application that was delivered to us. The application will consistently take up 50 to 100% of the CPU at times when it is simply idling and shouldn't be doing anything. My first step was to run the profiler that comes with FlexBuilder. I expected to find some method that was taking up most of the time, showing me where the bottleneck was. However, I got something unexpected. The top 4 methods were: [enterFrameEvent] - 84% cumulative, 32% self time [reap] - 20% cumulative and self time [tincan] - 8% cumulative and self time global.isNaN - 4% cumulative and self time All other methods had less than 1% for both cumulative and self time. From what I've found online, the [bracketed methods] are what the profiler lists when it doesn't have an actual Flex method to show. I saw someone claim that [tincan] is the processing of RTMP requests, and I assume [reap] is the garbage collector. Does anyone know what [enterFrameEvent] is actually doing? I assume it's essentially the "main" function for the event loop, so the high cumulative time is expected. But why is the self time so high? What's actually going on? I didn't expect the player internals to be taking up so much time, especially since nothing is actually happening in the app (and there are no UI updates going on). Is there any good way to find dig into what's happening? I know something is going on that shouldn't be (it looks like there must be some kind of busy wait or other runaway loop), but the profiler isn't giving me any results that I was expecting. My next step is going to be to start adding debug trace statements in various places to try and track down what's actually happening, but I feel like there has to be a better way.

    Read the article

  • OptimisticLockException in inner transaction ruins outer transaction

    - by Pace
    I have the following code (OLE = OptimisticLockException)... public void outer() { try { middle() } catch (OLE) { updateEntities(); outer(); } } @Transactional public void middle() { try { inner() } catch (OLE) { updateEntities(); middle(); } @Transactional public void inner() { //Do DB operation } inner() is called by other non-transactional methods which is why both middle() and inner() are transactional. As you can see, I deal with OLEs by updating the entities and retrying the operation. The problem I'm having is that when I designed things this way I was assuming that the only time one could get an OLE was when a transaction closed. This is apparently not the case as the call to inner() is throwing an OLE even when the stack is outer()->middle()->inner(). Now, middle() is properly handling the OLE and the retry succeeds but when it comes time to close the transaction it has been marked rollbackOnly by Spring. When the middle() method call finally returns the closing aspect throws an exception because it can't commit a transaction marked rollbackOnly. I'm uncertain what to do here. I can't clear the rollbackOnly state. I don't want to force create a transaction on every call to inner because that kills my performance. Am I missing something or can anyone see a way I can structure this differently? EDIT: To clarify what I'm asking, let me explain my main question. Is it possible to catch and handle OLE if you are inside of an @Transactional method? FYI: The transaction manager is a JpaTransactionManager and the JPA provider is Hibernate.

    Read the article

  • Dangers when deploying Flash/Flex UI test automation hooks to production?

    - by Merlyn Morgan-Graham
    I am interested in doing automated testing against a Flex based UI. I have found out that my best options for UI automation (due to being C# controllable, good licensing conditions, etc) all seem to require that I compile test hooks into my application. Because of this, I am thinking of recommending that these hooks be compiled into our build. I have found a few places on the net that recommend not deploying bits with this instrumentation enabled, and I'd like to know why. Is it a performance drain, or a security risk? If it is a security risk, can you explain how the attack surface is increased? I am not a Flash or Flex developer, though I have some experience with threat modeling. For reference, here's the tools I'm specifically considering: QTP Selenium-Flex API I am having problems finding all the warnings/suggestions I found last night, but here's an example that I can find: http://www.riatest.com/products/getting-started.html Warning! Automation enabled applications expose all properties of all GUI components. This makes them vulnerable to malicious use. Never make automation enabled application publicly available. Always restrict access to such applications and to RIATest Loader to trusted users only. Related question (how to do conditional compilation to insert/remove those hooks): Conditionally including Flex libraries (SWCs) in mxmlc/compc ant tasks

    Read the article

  • Detaching all entities of T to get fresh data

    - by Goran
    Lets take an example where there are two type of entites loaded: Product and Category, Product.CategoryId - Category.Id. We have available CRUD operations on products (not Categories). If on another screen Categories are updated (or from another user in the network), we would like to be able to reload the Categories, while preserving the context we currently use, since we could be in the middle of editing data, and we do not want changes to be lost (and we cannot depend on saving, since we have incomplete data). Since there is no easy way to tell EF to get fresh data (added, removed and modified), we thought of twp possible ways: 1) Getting products attached to context, and categories detached from context. This would mean that we loose the ability to access Product.Category.Name, which we do sometimes require, so we would need to manually resolve it (example when printing data). 2) detaching / attaching all Categories from current context. Context.ChangeTracker.Entries().Where(x => x.Entity.GetType() == typeof(T)).ForEach(x => x.State = EntityState.Detached); And then reload the categories, which will get fresh data. Do you find any problem with this second approach? We understand that this will require all constraints to be put on foreign keys, and not navigation properties, since when detaching all Categories, Product.Category navigation properties would be reset to null also. Also, there could be a potential performance problem, which we did not test, since there could be couple of thousand products loaded, and all would need to resolve navigation property when reloading. Which of the two do you prefer, and is there a better way (EF6 + .NET 4.0)?

    Read the article

  • Thread-safety of read-only memory access

    - by Edmund
    I've implemented the Barnes-Hut gravity algorithm in C as follows: Build a tree of clustered stars. For each star, traverse the tree and apply the gravitational forces from each applicable node. Update the star velocities and positions. Stage 2 is the most expensive stage, and so is implemented in parallel by dividing the set of stars. E.g. with 1000 stars and 2 threads, I have one thread processing the first 500 stars and the second thread processing the second 500. In practice this works: it speeds the computation by about 30% with two threads on a two-core machine, compared to the non-threaded version. Additionally, it yields the same numerical results as the original non-threaded version. My concern is that the two threads are accessing the same resource (namely, the tree) simultaneously. I have not added any synchronisation to the thread workers, so it's likely they will attempt to read from the same location at some point. Although access to the tree is strictly read-only I am not 100% sure it's safe. It has worked when I've tested it but I know this is no guarantee of correctness! Questions Do I need to make a private copy of the tree for each thread? Even if it is safe, are there performance problems of accessing the same memory from multiple threads?

    Read the article

  • MySQL Database Design with Internationalization

    - by Some name
    Hello, I'm going to start work on a medium sized application, and i'm planning it's db design. One thing that I'm not sure about is this. I will have many tables which will need internationalization, such as: "membership_options, gender_options, language_options etc" Each of these tables will share common i18n fields, like: "title, alternative_title, short_description, description" In your opinion which is the best way to do it? Have an i18n table with the same fields for each of the tables that will need them? or do something like: Membership table Gender table ---------------- -------------- id | created_at id | created_at 1 - 22.03.2001 1 - 14.08.2002 2 - 22.03.2001 2 - 14.08.2002 General translation table ------------------------- record_id | table_name | string_name | alternative_title| .... |id_language 1 - membership regular null 1 (english) 1 - membership normale null 2 (italian) 1 - gender man null 1(english) 1 -gender uomo null 2(italian) This would avoid me repeating something like: membership_translation table ----------------------------- membership_id | name | alternative_title | id_lang 1 regular null 1 1 normale null 2 gender_translation table ----------------------------- gender_id | name | alternative_title | id_lang 1 man null 1 1 uomo null 2 and so on, so i would probably reduce the number of db tables, but i'm not sure about performance.I'm not much of a DB designer, so please let me know.

    Read the article

  • Why doesn't java.lang.Number implement Comparable?

    - by Julien Chastang
    Does anyone know why java.lang.Number does not implement Comparable? This means that you cannot sort Numbers with Collections.sort which seems to me a little strange. Post discussion update: Thanks for all the helpful responses. I ended up doing some more research about this topic. The simplest explanation for why java.lang.Number does not implement Comparable is rooted in mutability concerns. For a bit of review, java.lang.Number is the abstract super-type of AtomicInteger, AtomicLong, BigDecimal, BigInteger, Byte, Double, Float, Integer, Long and Short. On that list, AtomicInteger and AtomicLong to do not implement Comparable. Digging around, I discovered that it is not a good practice to implement Comparable on mutable types because the objects can change during or after comparison rendering the result of the comparison useless. Both AtomicLong and AtomicInteger are mutable. The API designers had the forethought to not have Number implement Comparable because it would have constrained implementation of future subtypes. Indeed, AtomicLong and AtomicInteger were added in Java 1.5 long after java.lang.Number was initially implemented. Apart from mutability, there are probably other considerations here too. A compareTo implementation in Number would have to promote all numeric values to BigDecimal because it is capable of accommodating all the Number sub-types. The implication of that promotion in terms of mathematics and performance is a bit unclear to me, but my intuition finds that solution kludgy.

    Read the article

  • Seeking suggestions on redesigning the interface

    - by ratkok
    As a part of maintaining large piece of legacy code, we need to change part of the design mainly to make it more testable (unit testing). One of the issues we need to resolve is the existing interface between components. The interface between two components is a class that contains static methods only. Simplified example: class ABInterface { static methodA(); static methodB(); ... static methodZ(); }; The interface is used by component A so that different methods can use ABInterface::methodA() in order to prepare some input data and then invoke appropriate functions within component B. Now we are trying to redesign this interface for various reasons: Extending our unit test coverage - we need to resolve this dependency between the components and stubs/mocks are to be introduced The interface between these components diverged from the original design (ie. a lots of newer functions, used for the inter-component i/f are created outside this interface class). The code is old, changed a lot over the time and needs to be refactored. The change should not be disruptive for the rest of the system. We try to limit leaving many test-required artifacts in the production code. Performance is very important and should be no (or very minimal) degradation after the redesign. Code is OO in C++. I am looking for some ideas what approach to take. Any suggestions on how to do this efficiently?

    Read the article

  • Google Adwords API response parse

    - by Yun Ling
    I am trying to figure out how to parse the Adword API query response without exceptions and one issue that i came across is that sometimes, the data itself contains comma besides the comma between each column. Say i do a query on Adroup, campaign and impression by using <reportDefinition xmlns="https://adwords.google.com/api/adwords/cm/v201209"> <selector> <fields>CampaignName</fields> <fields>AdgroupName</fields> <fields>Impressions</fields> <predicates> <field>Status</field> <operator>IN</operator> <values>ENABLED</values> <values>PAUSED</values> </predicates> </selector> <reportName>Custom Adgroup Performance Report</reportName> <reportType>ADGROUP_PERFORMANCE_REPORT</reportType> <dateRangeType>LAST_7_DAYS</dateRangeType> <downloadFormat>CSV</downloadFormat> </reportDefinition> Since my campaign has comma within the string like below: "Adroup,Campaign,Impressions, Premiun Beer, Beer, Chicago, 1000" where the adgroup is "premium beer" and campaign is "Beer,Chicago". that will cause an issue if we parse this information by using comma. Does anyone know how to solve this problem?

    Read the article

  • Groovy as a substitute for Java when using BigDecimal?

    - by geejay
    I have just completed an evaluation of Java, Groovy and Scala. The factors I considered were: readability, precision The factors I would like to know: performance, ease of integration I needed a BigDecimal level of precision. Here are my results: Java void someOp() { BigDecimal del_theta_1 = toDec(6); BigDecimal del_theta_2 = toDec(2); BigDecimal del_theta_m = toDec(0); del_theta_m = abs(del_theta_1.subtract(del_theta_2)) .divide(log(del_theta_1.divide(del_theta_2))); } Groovy void someOp() { def del_theta_1 = 6.0 def del_theta_2 = 2.0 def del_theta_m = 0.0 del_theta_m = Math.abs(del_theta_1 - del_theta_2) / Math.log(del_theta_1 / del_theta_2); } Scala def other(){ var del_theta_1 = toDec(6); var del_theta_2 = toDec(2); var del_theta_m = toDec(0); del_theta_m = ( abs(del_theta_1 - del_theta_2) / log(del_theta_1 / del_theta_2) ) } Note that in Java and Scala I used static imports. Java: Pros: it is Java Cons: no operator overloading (lots o methods), barely readable/codeable Groovy: Pros: default BigDecimal means no visible typing, least surprising BigDecimal support for all operations (division included) Cons: another language to learn Scala: Pros: has operator overloading for BigDecimal Cons: some surprising behaviour with division (fixed with Decimal128), another language to learn

    Read the article

  • Doctrine - get the offset of an object in a collection (implementing an infinite scroll)

    - by dan
    I am using Doctrine and trying to implement an infinite scroll on a collection of notes displayed on the user's browser. The application is very dynamic, therefore when the user submits a new note, the note is added to the top of the collection straightaway, besides being sent (and stored) to the server. Which is why I can't use a traditional pagination method, where you just send the page number to the server and the server will figure out the offset and the number of results from that. To give you an example of what I mean, imagine there are 20 notes displayed, then the user adds 2 more notes, therefore there are 22 notes displayed. If I simply requests "page 2", the first 2 items of that page will be the last two items of the page currently displayed to the user. Which is why I am after a more sophisticated method, which is the one I am about to explain. Please consider the following code, which is part of the server code serving an AJAX request for more notes: // $lastNoteDisplayedId is coming from the AJAX request $lastNoteDisplayed = $repository->findBy($lastNoteDisplayedId); $allNotes = $repository->findBy($filter, array('createdAt' => 'desc')); $offset = getLastNoteDisplayedOffset($allNotes, $lastNoteDisplayedId); // retrieve the page to send back so that it can be appended to the listing $notesPerPage = 30 $notes = $repository->findBy( array(), array('createdAt' => 'desc'), $notesPerPage, $offset ); $response = json_encode($notes); return $response; Basically I would need to write the method getLastNoteDisplayedOffset, that given the whole set of notes and one particoular note, it can give me its offset, so that I can use it for the pagination of the previous Doctrine statement. I know probably a possible implementation would be: getLastNoteDisplayedOffset($allNotes, $lastNoteDisplayedId) { $i = 0; foreach ($allNotes as $note) { if ($note->getId() === $lastNoteDisplayedId->getId()) { break; } $i++; } return $i; } I would prefer not to loop through all notes because performance is an important factor. I was wondering if Doctrine has got a method itself or if you can suggest a different approach.

    Read the article

  • customer.name joining transactions.name vs. customer.id [serial] joining transactions.id [integer]

    - by Frank Computer
    INFORMIX-SQL 7.32 Pawnshop Application: one-to-many relationship where each customer (master) can have many transactions (detail). customer( id serial, pk_name char(30), {PATERNAL-NAME MATERNAL-NAME, FIRST-NAME MIDDLE-NAME} [...] ); unique index on id; unique cluster index on name; transaction( fk_name char(30), ticket_number serial, [...] ); dups cluster index on fk_name; unique index on ticket_number; Several people have told me this is not the correct way to join master to detail. They said I should always join customer.id[serial] to transactions.id[integer]. When a customer pawns merchandise, clerk queries the master using wildcards on name. The query usually returns several customers, clerk scrolls until locating the right name, enters a 'D' to change to detail transactions table, all transactions are automatically queried, then clerk enters an 'A' to add a new transaction. The problem with using customer.id joining transaction.id is that although the customer table is maintained in sorted name order, clustering the transaction table by fk_id groups the transactions by fk_id, but they are not in the same order as the customer name, so when clerk is scrolling through customer names in the master, the system has to jump allover the place to locate the clustered transactions belonging to each customer. As each new customer is added, the next id is assigned to that customer, but new customers dont show up in alphabetical order. I experimented using id joins and confirmed the decrease in performance. How can I use id joins instead of name joins and still preserve the clustered transaction order by name if transactions has no name column?

    Read the article

  • What database strategy to choose for a large web application

    - by Snoopy
    I have to rewrite a large database application, running on 32 servers. The hardware is up to date, each machine has two quad core Xeon and 32 GByte RAM. The database is multi-tenant, each customer has his own file, around 5 to 10 GByte each. I run around 50 databases on this hardware. The app is open to the web, so I have no control on the load. There are no really complex queries, so SQL is not required if there is a better solution. The databases get updated via FTP every day at midnight. The database is read-only. C# is my favourite language and I want to use ASP.NET MVC. I thought about the following options: Use two big SQL servers running SQL Server 2012 to serve the 32 servers with data. On the 32 servers running IIS hosting providing REST services. Denormalize the database and use Redis on each webserver. Use booksleeve as a Redis client. Use a combination of SQL Server and Redis Use SQL Server 2012 together with Hadoop Use Hadoop without SQL Server What is the best way for a read-only database, to get the best performance without loosing maintainability? Does Map-Reduce make sense at all in such a scenario? The reason for the rewrite is, the old app written in C++ with ISAM technology is too slow, the interfaces are old fashioned and not nice to use from an website, especially when using ajax. The app uses a relational datamodel with many tables, but it is possible to write one accerlerator table where all queries can be performed on, and all other information from the other tables are possible by a simple key lookup.

    Read the article

  • Start with remoting or with WCF

    - by Sheldon
    Hi. I'm just starting with distributed application development. I need to create (all by myself) an enterprise application for document management. That application will run on an intranet (within the firewall, no internet access is required now, BUT is probably that will be later). The application needs to manage images that will be stored within MySQL Server (as blobs) and those images will be then recovered by the app and eventually one or more of them will be converted to PDF. Performance is the most important non-functional requirement. I have a couple of doubts. What do you suggest to use, .NET Remoting or WCF over TCP-IP (I think second one is the best for the moment I need to expose the business logic over internet, changing the protocol). Where do you suggest to make the transformation of the images to pdf files, I'm using iText. (I have thought to have the business logic stored within the IIS and exposed via WCF, and that business logic to be responsible of getting the images and transforming them to PDF, that because the IIS and the MySQL Server are the same physical machine). I ask about where to do the transformation because the app must be accessible from multiple devices, and for example, for mobile devices, the pdf maybe is not necessary. Thank you very much in advance.

    Read the article

  • Entity Framework autoincrement key

    - by Tommy Ong
    I'm facing an issue of duplicated incremental field on a concurrency scenario. I'm using EF as the ORM tool, attempting to insert an entity with a field that acts as a incremental INT field. Basically this field is called "SequenceNumber", where each new record before insert, will read the database using MAX to get the last SequenceNumber, append +1 to it, and saves the changes. Between the getting of last SequenceNumber and Saving, that's where the concurrency is happening. I'm not using ID for SequenceNumber as it is not a unique constraint, and may reset on certain conditions such as monthly, yearly, etc. InvoiceNumber | SequenceNumber | DateCreated INV00001_08_14 | 1 | 25/08/2014 INV00001_08_14 | 1 | 25/08/2014 <= (concurrency is creating two SeqNo 1) INV00002_08_14 | 2 | 25/08/2014 INV00003_08_14 | 3 | 26/08/2014 INV00004_08_14 | 4 | 27/08/2014 INV00005_08_14 | 5 | 29/08/2014 INV00001_09_14 | 1 | 01/09/2014 <= (sequence number reset) Invoice number is formatted based on the SequenceNumber. After some research I've ended up with these possible solutions, but wanna know the best practice 1) Optimistic Concurrency, locking the table from any reads until the current transaction is completed (not fancy of this idea as I guess performance will be of a great impact?) 2) Create a Stored Procedure solely for this purpose, does select and insert on a single statement as such concurrency is at minimum (would prefer a EF based approach if possible)

    Read the article

  • What noncluster index would be better to create on SQL Server?

    - by Junior Mayhé
    Here I am studying nonclustered indexes on SQL Server Management Studio. I've created a table with more than 1 million records. This table has a primary key. SELECT CustomerName FROM Customers Which leads the execution plan to show me: I/O cost = 3.45646 Operator cost = 4.57715 For the first attempt to improve performance, I've created a nonclustered index for this table: CREATE NONCLUSTERED INDEX [IX_CustomerID_CustomerName] ON [dbo].[Customers] ( [CustomerId] ASC, [CustomerName] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO With this first try, I've executed the select statement and the execution plan shows me: I/O cost = 2.79942 Operator cost = 3.92001 Now the second try, I've deleted this nonclustered index in order to create a new one. CREATE NONCLUSTERED INDEX [IX_CategoryName] ON [dbo].[Categories] ( [CategoryId] ASC ) INCLUDE ( [CategoryName]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO With this second try, I've executed the select statement and the execution plan shows me the same result: I/O cost = 2.79942 Operator cost = 3.92001 Am I doing something wrong or this is expected? Shall I use the first nonclustered index with two fields, or the second nonclustered with one field (CategoryID) including the second field (CategoryName)?

    Read the article

  • MS SQL - Multi-Column substring matching

    - by hamlin11
    One of my clients is hooked on multi-column substring matching. I understand that Contains and FreeText search for words (and at least in the case of Contains, word prefixes). However, based upon my understanding of this MSDN book, neither of these nor their variants are capable of searching substrings. I have used LIKE rather extensively (Select * from A where A.B Like '%substr%') Sample table A: ID | Col1 | Col2 | Col3 | ------------------------------------- 1 | oklahoma | colorado | Utah | 2 | arkansas | colorado | oklahoma | 3 | florida | michigan | florida | ------------------------------------- The following code will give us row 1 and row 2: select * from A where Col1 like '%klah%' or Col2 like '%klah%' or Col3 like '%klah%' This is rather ugly, probably slow, and I just don't like it very much. Probably because the implementations that I'm dealing with have 10+ columns that need searched. The following may be a slight improvement as code readability goes, but as far as performance, we're still in the same ball park. select * from A where (Col1 + ' ' + Col2 + ' ' + Col3) like '%klah%' I have thought about simply adding insert, update, and delete triggers that simply add the concatenated version of the above columns into a separate table that shadows this table. Sample Shadow_Table: ID | searchtext | --------------------------------- 1 | oklahoma colorado Utah | 2 | arkansas colorado oklahoma | 3 | florida michigan florida | --------------------------------- This would allow us to perform the following query to search for '%klah%' select * from Shadow_Table where searchtext like '%klah%' I really don't like having to remember that this shadow table exists and that I'm supposed to use it when I am performing multi-column substring matching, but it probably yields pretty quick reads at the expense of write and storage space. My gut feeling tells me there there is an existing solution built into SQL Server 2008. However, I don't seem to be able to find anything other than research papers on the subject. Any help would be appreciated.

    Read the article

  • Non standard interaction among two tables to avoid very large merge

    - by riko
    Suppose I have two tables A and B. Table A has a multi-level index (a, b) and one column (ts). b determines univocally ts. A = pd.DataFrame( [('a', 'x', 4), ('a', 'y', 6), ('a', 'z', 5), ('b', 'x', 4), ('b', 'z', 5), ('c', 'y', 6)], columns=['a', 'b', 'ts']).set_index(['a', 'b']) AA = A.reset_index() Table B is another one-column (ts) table with non-unique index (a). The ts's are sorted "inside" each group, i.e., B.ix[x] is sorted for each x. Moreover, there is always a value in B.ix[x] that is greater than or equal to the values in A. B = pd.DataFrame( dict(a=list('aaaaabbcccccc'), ts=[1, 2, 4, 5, 7, 7, 8, 1, 2, 4, 5, 8, 9])).set_index('a') The semantics in this is that B contains observations of occurrences of an event of type indicated by the index. I would like to find from B the timestamp of the first occurrence of each event type after the timestamp specified in A for each value of b. In other words, I would like to get a table with the same shape of A, that instead of ts contains the "minimum value occurring after ts" as specified by table B. So, my goal would be: C: ('a', 'x') 4 ('a', 'y') 7 ('a', 'z') 5 ('b', 'x') 7 ('b', 'z') 7 ('c', 'y') 8 I have some working code, but is terribly slow. C = AA.apply(lambda row: ( row[0], row[1], B.ix[row[0]].irow(np.searchsorted(B.ts[row[0]], row[2]))), axis=1).set_index(['a', 'b']) Profiling shows the culprit is obviously B.ix[row[0]].irow(np.searchsorted(B.ts[row[0]], row[2]))). However, standard solutions using merge/join would take too much RAM in the long run. Consider that now I have 1000 a's, assume constant the average number of b's per a (probably 100-200), and consider that the number of observations per a is probably in the order of 300. In production I will have 1000 more a's. 1,000,000 x 200 x 300 = 60,000,000,000 rows may be a bit too much to keep in RAM, especially considering that the data I need is perfectly described by a C like the one I discussed above. How would I improve the performance?

    Read the article

  • Could I do this blind relative to absolute path conversion (for perforce depot paths) better?

    - by wonderfulthunk
    I need to "blindly" (i.e. without access to the filesystem, in this case the source control server) convert some relative paths to absolute paths. So I'm playing with dotdots and indices. For those that are curious I have a log file produced by someone else's tool that sometimes outputs relative paths, and for performance reasons I don't want to access the source control server where the paths are located to check if they're valid and more easily convert them to their absolute path equivalents. I've gone through a number of (probably foolish) iterations trying to get it to work - mostly a few variations of iterating over the array of folders and trying delete_at(index) and delete_at(index-1) but my index kept incrementing while I was deleting elements of the array out from under myself, which didn't work for cases with multiple dotdots. Any tips on improving it in general or specifically the lack of non-consecutive dotdot support would be welcome. Currently this is working with my limited examples, but I think it could be improved. It can't handle non-consecutive '..' directories, and I am probably doing a lot of wasteful (and error-prone) things that I probably don't need to do because I'm a bit of a hack. I've found a lot of examples of converting other types of relative paths using other languages, but none of them seemed to fit my situation. These are my example paths that I need to convert, from: //depot/foo/../bar/single.c //depot/foo/docs/../../other/double.c //depot/foo/usr/bin/../../../else/more/triple.c to: //depot/bar/single.c //depot/other/double.c //depot/else/more/triple.c And my script: begin paths = File.open(ARGV[0]).readlines puts(paths) new_paths = Array.new paths.each { |path| folders = path.split('/') if ( folders.include?('..') ) num_dotdots = 0 first_dotdot = folders.index('..') last_dotdot = folders.rindex('..') folders.each { |item| if ( item == '..' ) num_dotdots += 1 end } if ( first_dotdot and ( num_dotdots > 0 ) ) # this might be redundant? folders.slice!(first_dotdot - num_dotdots..last_dotdot) # dependent on consecutive dotdots only end end folders.map! { |elem| if ( elem !~ /\n/ ) elem = elem + '/' else elem = elem end } new_paths << folders.to_s } puts(new_paths) end

    Read the article

  • Mysql random rows

    - by n00b
    please read the whole question... 90% of you dont seem to do that and some of you only read the title obviously... and if you dont know the solution, dont answer - i wont have to downvote you -.-'' im entertaining the idea of getting random rows directly from mysql. what i found was SELECT * FROM tablename WHERE somefield='something' ORDER BY RAND() LIMIT 5 but even i see how slow that would be.. is the only way to do this doing something like SELECT * FROM tablename WHERE somefield='something' LIMIT RAND(aincrementvalue-5), 1 5 times? or is there a way that i with my little knowlege of databases cant come up with ? (no i dont want random indexes. i hate the idea of them...) @commenters - please first look, then think, then look again, think again and then post. i wont point fingers but i dislike stupid comments and why i think random indexes are a nasty hack ? it doesnt give you random results. it gives you x results from a random index in a predefined order its like a gapless id only in the wrong order if you fetch by 1 row and get true randomness you fall back to my method but with an additional junk field finally the reason the field exists is only to serve as a helper to something that can be done without it with almost same performance (but the quality (randomness) is better), so it is a nasty hack ;) i solved it, look @ my answer... if you think its incorrect please tell me :)

    Read the article

  • To implement a remote desktop sharing solution

    - by Cameigons
    Hi, I'm on planning/modeling phase to develop a remote desktop sharing solution, which must be web browser based. In other words: an user will be able to see and interact with someone's remote desktop using his web-browser. Everything the user who wants to share his desktop will need, besides his browser, is installing an add-in, which he's going to be prompted about when necessary. The add-in is required since (afaik) no browser technology allows desktop control from an app running within the browser alone. The add-in installation process must be as simple and transparent as possible to the user (similar to AdobeConnectNow, in case anyone's acquainted with it). The user can share his desktop with lots of people at the same time, but concede desktop control to only one of them at a time(makes no sense being otherwise). Project requirements: All technology employed must be open-source license compatible Both front ends are going to be in flash (browser) Must work on Linux, Windows XP(and later) and MacOSX. Must work at least with IE7(and later) and Firefox3.0(and later). At the very least, once the sharer's stream hits the server from where it'll be broadcast, hereon it must be broadcasted in flv (so I'm thinking whether to do the encoding at the client's machine (the one sharing the desktop) or send it in some other format to the server and encode it there). Performance and scalability are important: It must be able to handle hundreds of dozens of users(one desktop sharer, the rest viewers) We'll definitely be using red5. My doubts concern mostly implementing the desktop publisher side (add-in and streamer): 1) Are you aware of other projects that I could look into for ideas? (I'm aware of bigbluebutton.org and code.google.com/p/openmeetings) 2) Should I base myself on VNC ? 3) Bearing in mind the need to have it working cross-platform, what language should I go with? (My team is very used with java and I have some knowledge of C/C++, but anything goes really). 4) Any other advices are appreciated.

    Read the article

  • java: libraries for immutable functional-style data structures

    - by Jason S
    This is very similar to another question (Functional Data Structures in Java) but the answers there are not particularly useful. I need to use immutable versions of the standard Java collections (e.g. HashMap / TreeMap / ArrayList / LinkedList / HashSet / TreeSet). By "immutable" I mean immutable in the functional sense (e.g. purely functional data structures), where updating operations on the data structure do not change the original data, but instead return a new instance of the same kind of data structure. Also typically new and old instances of the data structure will share immutable data to be efficient in time and space. From what I can tell my options include: Functional Java Scala Clojure but I'm not sure whether any of these are particularly appealing to me. I have a few requirements/desirements: the collections in question should be usable directly in Java (with the appropriate libraries in the classpath). FJ would work for me; I'm not sure if I can use Scala's or Clojure's data structures in Java w/o having to use the compilers/interpreters from those languages and w/o having to write Scala or Clojure code. Core operations on lists/maps/sets should be possible w/o having to create function objects with confusing syntaxes (FJ looks slightly iffy) They should be efficient in time and space. I'm looking for a library which ideally has done some performance testing. FJ's TreeMap is based on a red-black tree, not sure how that rates. Documentation / tutorials should be good enough so someone can get started quickly using the data structures. FJ fails on that front. Any suggestions?

    Read the article

  • Custom UIProgressView drawing weirdness

    - by Werner
    I am trying to create my own custom UIProgressView by subclassing it and then overwrite the drawRect function. Everything works as expected except the progress filling bar. I can't get the height and image right. The images are both in Retina resolution and the Simulator is in Retina mode. The images are called: "[email protected]" (28px high) and "[email protected]" (32px high). CustomProgressView.h #import <UIKit/UIKit.h> @interface CustomProgressView : UIProgressView @end CustomProgressView.m #import "CustomProgressView.h" @implementation CustomProgressView - (id)initWithFrame:(CGRect)frame { self = [super initWithFrame:frame]; if (self) { // Initialization code } return self; } // Only override drawRect: if you perform custom drawing. // An empty implementation adversely affects performance during animation. - (void)drawRect:(CGRect)rect { // Drawing code self.frame = CGRectMake(self.frame.origin.x, self.frame.origin.y, self.frame.size.width, 16); UIImage *progressBarTrack = [[UIImage imageNamed:@"progressBarTrack"] resizableImageWithCapInsets:UIEdgeInsetsZero]; UIImage *progressBar = [[UIImage imageNamed:@"progressBar"] resizableImageWithCapInsets:UIEdgeInsetsMake(4, 4, 5, 4)]; [progressBarTrack drawInRect:rect]; NSInteger maximumWidth = rect.size.width - 2; NSInteger currentWidth = floor([self progress] * maximumWidth); CGRect fillRect = CGRectMake(rect.origin.x + 1, rect.origin.y + 1, currentWidth, 14); [progressBar drawInRect:fillRect]; } @end The resulting ProgressView has the right height and width. It also fills at the right percentage (currently set at 80%). But the progress fill image isn't drawn correctly. Does anyone see where I go wrong?

    Read the article

  • Running a Model::find in for loop in cakephp v1.3

    - by Gaurav Sharma
    Hi all, How can I achieve the following result in cakephp: In my application a Topic is related to category, category is related to city and city is finally related to state in other words: topic belongs to category, category belongs to city , city belongs to state.. Now in the Topic controller's index action I want to find out all the topics and it's city and state. How can I do this. I can easily do this using a custom query ($this-Model-query() function ) but then I will be facing pagination difficulties. I tried doing like this function index() { $this->Topic->recursive = 0; $topics = $this->paginate(); for($i=0; $i<count($topics);$i++) { $topics[$i]['City'] = $this->Topic->Category->City->find('all', array('conditions' => array('City.id' => $topics[$i]['Category']['city_id']))); } $this->set(compact('messages')); } The method that I have adopted is not a good one (running query in a loop) Using the recursive property and setting it to highest value (2) will degrade performance and is not going to yield me state information. How shall I solve this ? Please help Thanks

    Read the article

< Previous Page | 484 485 486 487 488 489 490 491 492 493 494 495  | Next Page >