Search Results

Search found 3758 results on 151 pages for 'efficient'.

Page 122/151 | < Previous Page | 118 119 120 121 122 123 124 125 126 127 128 129  | Next Page >

  • Design suggestion for expression tree evaluation with time-series data

    - by Lirik
    I have a (C#) genetic program that uses financial time-series data and it's currently working but I want to re-design the architecture to be more robust. My main goals are: sequentially present the time-series data to the expression trees. allow expression trees to access previous data rows when needed. to optimize performance of the data access while evaluating the expression trees. keep a common interface so various types of data can be used. Here are the possible approaches I've thought about: I can evaluate the expression tree by passing in a data row into the root node and let each child node use the same data row. I can evaluate the expression tree by passing in the data row index and letting each node get the data row from a shared DataSet (currently I'm passing the row index and going to multiple synchronized arrays to get the data). Hybrid: an immutable data set is accessible by all of the expression trees and each expression tree is evaluated by passing in a data row. The benefit of the first approach is that the data row is being passed into the expression tree and there is no further query done on the data set (which should increase performance in a multithreaded environment). The drawback is that the expression tree does not have access to the rest of the data (in case some of the functions need to do calculations using previous data rows). The benefit of the second approach is that the expression trees can access any data up to the latest data row, but unless I specify what that row is, I'll have to iterate through the rows and figure out which one is the last one. The benefit of the hybrid is that it should generally perform better and still provide access to the earlier data. It supports two basic "views" of data: the latest row and the previous rows. Do you guys know of any design patterns or do you have any tips that can help me build this type of system? Should I use a DataSet to hold and present the data, or are there more efficient ways to present rows of data while maintaining a simple interface? FYI: All of my code is written in C#.

    Read the article

  • MS Access: Why is ADODB.Recordset.BatchUpdate so much slower than Application.ImportXML?

    - by apenwarr
    I'm trying to run the code below to insert a whole lot of records (from a file with a weird file format) into my Access 2003 database from VBA. After many, many experiments, this code is the fastest I've been able to come up with: it does 10000 records in about 15 seconds on my machine. At least 14.5 of those seconds (ie. almost all the time) is in the single call to UpdateBatch. I've read elsewhere that the JET engine doesn't support UpdateBatch. So maybe there's a better way to do it. Now, I would just think the JET engine is plain slow, but that can't be it. After generating the 'testy' table with the code below, I right clicked it, picked Export, and saved it as XML. Then I right clicked, picked Import, and reloaded the XML. Total time to import the XML file? Less than one second, ie. at least 15x faster. Surely there's an efficient way to insert data into Access that doesn't require writing a temp file? Sub TestBatchUpdate() CurrentDb.Execute "create table testy (x int, y int)" Dim rs As New ADODB.Recordset rs.CursorLocation = adUseServer rs.Open "testy", CurrentProject.AccessConnection, _ adOpenStatic, adLockBatchOptimistic, adCmdTableDirect Dim n, v n = Array(0, 1) v = Array(50, 55) Debug.Print "starting loop", Time For i = 1 To 10000 rs.AddNew n, v Next i Debug.Print "done loop", Time rs.UpdateBatch Debug.Print "done update", Time CurrentDb.Execute "drop table testy" End Sub I would be willing to resort to C/C++ if there's some API that would let me do fast inserts that way. But I can't seem to find it. It can't be that Application.ImportXML is using undocumented APIs, can it?

    Read the article

  • Which is better Java programming practice for looping up to an int value: a converted for-each loop

    - by Arvanem
    Hi folks, Given the need to loop up to an arbitrary int value, is it better programming practice to convert the value into an array and for-each the array, or just use a traditional for loop? FYI, I am calculating the number of 5 and 6 results ("hits") in multiple throws of 6-sided dice. My arbitrary int value is the dicePool which represents the number of multiple throws. As I understand it, there are two options: Convert the dicePool into an array and for-each the array: public int calcHits(int dicePool) { int[] dp = new int[dicePool]; for (Integer a : dp) { // call throwDice method } } Use a traditional for loop. public int calcHits(int dicePool) { for (int i = 0; i < dicePool; i++) { // call throwDice method } } I apologise for the poor presentation of the code above (for some reason the code button on the Ask Question page is not doing what it should). My view is that option 1 is clumsy code and involves unnecessary creation of an array, even though the for-each loop is more efficient than the traditional for loop in Option 2. Thanks in advance for any suggestions you might have.

    Read the article

  • Django stupid mark_safe?

    - by Mark
    I wrote this little function for writing out HTML tags: def html_tag(tag, content=None, close=True, attrs={}): lst = ['<',tag] for key, val in attrs.iteritems(): lst.append(' %s="%s"' % (key, escape_html(val))) if close: if content is None: lst.append(' />') else: lst.extend(['>', content, '</', tag, '>']) else: lst.append('>') return mark_safe(''.join(lst)) Which worked great, but then I read this article on efficient string concatenation (I know it doesn't really matter for this, but I wanted consistency) and decided to update my script: def html_tag(tag, body=None, close=True, attrs={}): s = StringIO() s.write('<%s'%tag) for key, val in attrs.iteritems(): s.write(' %s="%s"' % (key, escape_html(val))) if close: if body is None: s.write(' />') else: s.write('>%s</%s>' % (body, tag)) else: s.write('>') return mark_safe(s.getvalue()) But now my HTML get escaped when I try to render it from my template. Everything else is exactly the same. It works properly if I replace the last line with return mark_safe(unicode(s.getvalue())). I checked the return type of s.getvalue(). It should be a str, just like the first function, so why is this failing?? Also fails with SafeString(s.getvalue()) but succeeds with SafeUnicode(s.getvalue()). I'd also like to point out that I used return mark_safe(s.getvalue()) in a different function with no odd behavior.

    Read the article

  • SQL Server Index cost

    - by yellowstar
    I have read that one of the tradeoffs for adding table indexes in SQL Server is the increased cost of insert/update/delete queries to benefit the performance of select queries. I can conceptually understand what happens in the case of an insert because SQL Server has to write entries into each index matching the new rows, but update and delete are a little more murky to me because I can't quite wrap my head around what the database engine has to do. Let's take DELETE as an example and assume I have the following schema (pardon the pseudo-SQL) TABLE Foo col1 int ,col2 int ,col3 int ,col4 int PRIMARY KEY (col1,col2) INDEX IX_1 col3 INCLUDE col4 Now, if I issue the statement DELETE FROM Foo WHERE col1=12 AND col2 > 34 I understand what the engine must do to update the table (or clustered index if you prefer). The index is set up to make it easy to find the range of rows to be removed and do so. However, at this point it also needs to update IX_1 and the query that I gave it gives no obvious efficient way for the database engine to find the rows to update. Is it forced to do a full index scan at this point? Does the engine read the rows from the clustered index first and generate a smarter internal delete against the index? It might help me to wrap my head around this if I understood better what is going on under the hood, but I guess my real question is this. I have a database that is spending a significant amount of time in delete and I'm trying to figure out what I can do about it. When I display the execution plan for the deletion, it just shows an entry for "Clustered Index Delete" on table Foo which lists in the details section the other indices that need to be updated but I don't get any indication of the relative cost of these other indices. Are they all equal in this case? Is there some way that I can estimate the impact of removing one or more of these indices without having to actually try it?

    Read the article

  • Storing Arbitrary Contact Information in Ruby on Rails

    - by Anthony Chivetta
    Hi, I am currently working on a Ruby on Rails app which will function in some ways like a site-specific social networking site. As part of this, each user on the site will have a profile where they can fill in their contact information (phone numbers, addresses, email addresses, employer, etc.). A simple solution to modeling this would be to have a database column per piece of information I allow users to enter. However, this seems arbitrary and limited. Further, to support allowing users to enter as many phone numbers as they would like requires the addition of another database table and joins. It seems to me that a better solution would be to serialize all the contact information entered by a user into a single field in their row. Since I will never be conditioning a SQL query on this information, such a solution wouldn't be any less efficient. Ideally, I would like to use a vCard as my serialization format. vCards are the standard solution to storing contact information across the web, and reusing tested solutions is a Good Thing. Alternative serialization formats would include simply marshaling a ruby hash, or YAML. Regardless of serialization format, supporting the reading and updating of this information in a rails-like way seems to be a major implementation challenge. So, here's the question: Has anyone seen this approach used in a rails application? Are there any rails plugins or gems that make such a system easy to implement? Ideally what I would like is an acts_as_vcard to add to my model object that would handle editing the vcard for me and saving it back to the database.

    Read the article

  • Python template engine

    - by jturo
    Hello there, Could it be possible if somebody could help me get started in writing a python template engine? I'm new to python and as I learn the language I've managed to write a little MVC framework running in its own light-weight-WSGI-like server. I've managed to write a script that finds and replaces keys for values: (Obviously this is not how my script is structured or implemented. this is just an example) from string import Template html = '<html>\n' html += ' <head>\n' html += ' <title>This is so Cool : In Controller HTML</title>\n' html += ' </head>\n' html += ' <body>\n' html += ' Home | <a href="/hi">Hi ${name}</a>\n' html += ' </body>\n' html += '<html>' Template(html).safe_substitute(dict(name = 'Arturo')) My next goal is to implement custom statements, modifiers, functions, etc (like 'for' loop) but i don't really know if i should use another module that i don't know about. I thought of regular expressions but i kind feel like that wouldn't be an efficient way of doing it Any help is appreciated, and i'm sure this will be of use to other people too. Thank you

    Read the article

  • A map and set which uses contiguous memory and has a reserve function

    - by edA-qa mort-ora-y
    I use several maps and sets. The lack of contiguous memory, and high number of (de)allocations, is a performance bottleneck. I need a mainly STL-compatbile map and set class which can use a contiguous block of memory for internal objects (or multiple blocks). It also needs to have a reserve function so that I can preallocate for expected sizes. Before I write my own I'd like to check what is available first. Is there something in Boost which does this? Does somebody know of an available implementation elsewhere? Intrusive collection types are not usable here as the same objects need to exist in several collections. As far as I know STL memory pools are per-type, not per instance. These global pools are not efficient with respect to memory locality in mutli-cpu/core processing. Object pools don't work as the types will be shared between instance but their pool should not. In many cases a hash map may be an option in some cases.

    Read the article

  • "Work stealing" vs. "Work shrugging"?

    - by John
    Why is it that I can find lots of information on "work stealing" and nothing on "work shrugging" as a dynamic load-balancing strategy? By "work-shrugging" I mean busy processors pushing excessive work towards less loaded neighbours rather than idle processors pulling work from busy neighbours ("work-stealing"). I think the general scalability should be the same for both strategies. However I believe that it is much more efficient for busy processors to wake idle processors if and when there is definitely work for them to do than having idle processors spinning or waking periodically to speculatively poll all neighbours for possible work. Anyway a quick google didn't show up anything under the heading of "Work Shrugging" or similar so any pointers to prior-art and the jargon for this strategy would be welcome. Clarification/Confession In more detail:- By "Work Shrugging" I actually envisage the work submitting processor (which may or may not be the target processor) being responsible for looking around the immediate locality of the preferred target processor (based on data/code locality) to decide if a near neighbour should be given the new work instead because they don't have as much work to do. I am talking about an atomic read of the immediate (typically 2 to 4) neighbours' estimated q length here. I do not think this is any more coupling than implied by the thieves polling & stealing from their neighbours - just much less often - or rather - only when it makes economic sense to do so. (I am assuming "lock-free, almost wait-free" queue structures in both strategies). Thanks.

    Read the article

  • efficiently determining if a polynomial has a root in the interval [0,T]

    - by user168715
    I have polynomials of nontrivial degree (4+) and need to robustly and efficiently determine whether or not they have a root in the interval [0,T]. The precise location or number of roots don't concern me, I just need to know if there is at least one. Right now I'm using interval arithmetic as a quick check to see if I can prove that no roots can exist. If I can't, I'm using Jenkins-Traub to solve for all of the polynomial roots. This is obviously inefficient since it's checking for all real roots and finding their exact positions, information I don't end up needing. Is there a standard algorithm I should be using? If not, are there any other efficient checks I could do before doing a full Jenkins-Traub solve for all roots? For example, one optimization I could do is to check if my polynomial f(t) has the same sign at 0 and T. If not, there is obviously a root in the interval. If so, I can solve for the roots of f'(t) and evaluate f at all roots of f' in the interval [0,T]. f(t) has no root in that interval if and only if all of these evaluations have the same sign as f(0) and f(T). This reduces the degree of the polynomial I have to root-find by one. Not a huge optimization, but perhaps better than nothing.

    Read the article

  • how to handle multiple profiles per user?

    - by Scott Willman
    I'm doing something that doesn't feel very efficient. From my code below, you can probably see that I'm trying to allow for multiple profiles of different types attached to my custom user object (Person). One of those profiles will be considered a default and should have an accessor from the Person class. Can this be done better? from django.db import models from django.contrib.auth.models import User, UserManager class Person(User): public_name = models.CharField(max_length=24, default="Mr. T") objects = UserManager() def save(self): self.set_password(self.password) super(Person, self).save() def _getDefaultProfile(self): def_teacher = self.teacher_set.filter(default=True) if def_teacher: return def_teacher[0] def_student = self.student_set.filter(default=True) if def_student: return def_student[0] def_parent = self.parent_set.filter(default=True) if def_parent: return def_parent[0] return False profile = property(_getDefaultProfile) def _getProfiles(self): # Inefficient use of QuerySet here. Tolerated because the QuerySets should be very small. profiles = [] if self.teacher_set.count(): profiles.append(list(self.teacher_set.all())) if self.student_set.count(): profiles.append(list(self.student_set.all())) if self.parent_set.count(): profiles.append(list(self.parent_set.all())) return profiles profiles = property(_getProfiles) class BaseProfile(models.Model): person = models.ForeignKey(Person) is_default = models.BooleanField(default=False) class Meta: abstract = True class Teacher(BaseProfile): user_type = models.CharField(max_length=7, default="teacher") class Student(BaseProfile): user_type = models.CharField(max_length=7, default="student") class Parent(BaseProfile): user_type = models.CharField(max_length=7, default="parent")

    Read the article

  • Mysqli results memory usage

    - by Poe
    Why is the memory consumption in this query continuing to rise as the internal pointer progresses through loop? How to make this more efficient and lean? $link = mysqli_connect(...); $result = mysqli_query($link,$query); // 403,268 rows in result set while ($row = mysqli_fetch_row($result)) { // print time, (get memory usage), -- row number } mysqli_free_result(); mysqli_close($link); /* 06:55:25 (1240336) -- Run query 06:55:26 (39958736) -- Query finished 06:55:26 (39958784) -- Begin loop 06:55:26 (39960688) -- Row 0 06:55:26 (45240712) -- Row 10000 06:55:26 (50520712) -- Row 20000 06:55:26 (55800712) -- Row 30000 06:55:26 (61080712) -- Row 40000 06:55:26 (66360712) -- Row 50000 06:55:26 (71640712) -- Row 60000 06:55:26 (76920712) -- Row 70000 06:55:26 (82200712) -- Row 80000 06:55:26 (87480712) -- Row 90000 06:55:26 (92760712) -- Row 100000 06:55:26 (98040712) -- Row 110000 06:55:26 (103320712) -- Row 120000 06:55:26 (108600712) -- Row 130000 06:55:26 (113880712) -- Row 140000 06:55:26 (119160712) -- Row 150000 06:55:26 (124440712) -- Row 160000 06:55:26 (129720712) -- Row 170000 06:55:27 (135000712) -- Row 180000 06:55:27 (140280712) -- Row 190000 06:55:27 (145560712) -- Row 200000 06:55:27 (150840712) -- Row 210000 06:55:27 (156120712) -- Row 220000 06:55:27 (161400712) -- Row 230000 06:55:27 (166680712) -- Row 240000 06:55:27 (171960712) -- Row 250000 06:55:27 (177240712) -- Row 260000 06:55:27 (182520712) -- Row 270000 06:55:27 (187800712) -- Row 280000 06:55:27 (193080712) -- Row 290000 06:55:27 (198360712) -- Row 300000 06:55:27 (203640712) -- Row 310000 06:55:27 (208920712) -- Row 320000 06:55:27 (214200712) -- Row 330000 06:55:27 (219480712) -- Row 340000 06:55:27 (224760712) -- Row 350000 06:55:27 (230040712) -- Row 360000 06:55:27 (235320712) -- Row 370000 06:55:27 (240600712) -- Row 380000 06:55:27 (245880712) -- Row 390000 06:55:27 (251160712) -- Row 400000 06:55:27 (252884360) -- End loop 06:55:27 (1241264) -- Free */

    Read the article

  • How do I simulate "Is In" using Linq2Sql

    - by flipdoubt
    I often find myself with a list of disconnected Linq2Sql objects or keys that I need to re-select from a Linq2Sql data-context to update or delete in the database. If this were SQL, I would use IS IN in the SQL WHERE clause, but I am stuck with what to do in Linq2Sql. Here is a sample of what I would like to write: public void MarkValidated(IList<int> idsToValidate) { using(_Db.NewSession()) // Instatiates new DataContext { // ThatAreIn <- this is where I am stuck var items = _Db.Items.ThatAreIn(idsToValidate).ToList(); foreach(var item in items) item.Validated = DateTime.Now; _Db.SubmitChanges(); } // Disposes of DataContext } Or: public void DeleteItems(IList<int> idsToDelete) { using(_Db.NewSession()) // Instatiates new DataContext { // ThatAreIn <- this is where I am stuck var items = _Db.Items.ThatAreIn(idsToValidate); _Db.Items.DeleteAllOnSubmit(items); _Db.SubmitChanges(); } // Disposes of DataContext } Can I get this done in one trip to the database? If so, how? Is it possible to send all those ints to the database as a list of parameters and is that more efficient than doing a foreach over the list to select each item one at a time?

    Read the article

  • algorithm || method to write prog

    - by fatai
    I am one of the computer science student. My wonder is everyone solve problem with different or same method, but actually I dont know whether they use method or I dont know whether there are such common method to approach problem. All teacher give us problem which in simple form sometimes, but they dont introduce any approach or method(s) so that we can first choose method then apply that one to problem , afterward find solution then write code. I have found one method when I failed the course, More accurately, When I counter problem in language , I will get more paper and then ; first, input/ output step ; my prog will take this / these there argument(s) and return namely X , ex : in c, input length is not known and at same type , so I must use pointer desired output is in form of package , so use structure second, execution part ; in that step , I am writing all step which are goes to final output ex : in python ; 1.) [ + , [- , 4 , [ * , 1 , 2 ]], 5] 2.) [ + , [- , 4 , 2 ],5 ] 3.) [ + , 2 , 5] 4.) 7 ==> return 7 third, I will write test code ex : in c++ input : append 3 4 5 6 vector_x remove 0 1 desired output vector_x holds : 5 6 But now, I wonder other method ; used to construct class :::: for c++ , python, java used to communicate classes / computers used for solving embedded system problem ::::: for c Why I wonder , because I learn if you dont costruct algorithm on paper, you may achieve your goal. Like no money no lunch , I can say no algorithm no prog therefore , feel free when you write your own method , a way which is introduced by someone else but you are using and you find it is so efficient

    Read the article

  • Hybrid EAV/CR model via WCF (and statically-typed language)?

    - by Pat
    Background I'm working on the architecture for a cloud-based LOB application, using Silverlight for the client, WCF, ASP.NET/C# for server and SQL Server for storage. The data model requires some flexibility per user (ability to add custom properties and define validation rules for them, for example), and a hybrid EAV/CR persistence model on the server side will suit nicely. Problem I need an efficient and maintainable technology and approach to handle the transformation from the persisted EAV model to/from WCF (and similarly allow the client to bind to the resulting data - DataGrid is a key UI element)? Admission: I don't yet know enough about WCF to understand if it supports ExpandoObject directly, but I suspect it will. Options I started off looking at WCF RIA services, but quickly discovered they're heavily dependent upon both static type data and compile-time code generation. Neither of these appeal. The options I'm considering include: Using WCF RIA services and pass the data over the network directly in EAV form (i.e. Dictionary), and handle the binding issue purely on the client side (like this) Using a dynamic language (probably IronPython) to handle both ends of the communication, with plumbing to generate the necessary CLR type data on the client to allow binding, and transform to/from EAV form on the server (spam preventer stopped me from posting a URL here, I'll try it in a comment). Dynamic LINQ (CreateClass() and friends), although I'm way out of my depth there and don't know what the limitations on that approach might be yet. I'm interested in comments on these approaches as well as alternative approaches that might solve the problem. Other Notes The Silverlight client will not be the only consumer of the service, making me slightly uncomfortable with option #1 above. While the data model is flexible, it's not expected to be modified heavily. For argument's sake, we could assume that we might have 25 distinct data models active at a given time, with something like 10-20 unique data fields/rules each. Modifications to the data model will happen infrequently (typically when a new user is initially configured).

    Read the article

  • Random Pairings that don't Repeat

    - by Andrew Robinson
    This little project / problem came out of left field for me. Hoping someone can help me here. I have some rough ideas but I am sure (or at least I hope) a simple, fairly efficient solution exists. Thanks in advance.... pseudo code is fine. I generally work in .NET / C# if that sheds any light on your solution. Given: A pool of n individuals that will be meeting on a regular basis. I need to form pairs that have not previously meet. The pool of individuals will slowly change over time. For the purposes of pairing, (A & B) and (B & A) constitute the same pair. The history of previous pairings is maintained. For the purpose of the problem, assume an even number of individuals. For each meeting (collection of pairs) and individual will only pair up once. Is there an algorithm that will allow us to form these pairs? Ideally something better than just ordering the pairs in a random order, generating pairings and then checking against the history of previous pairings. In general, randomness within the pairing is ok.

    Read the article

  • Google App Engine - Caching generated HTML

    - by Alexander
    I have written a Google App Engine application that programatically generates a bunch of HTML code that is really the same output for each user who logs into my system, and I know that this is going to be in-efficient when the code goes into production. So, I am trying to figure out the best way to cache the generated pages. The most probable option is to generate the pages and write them into the database, and then check the time of the database put operation for a given page against the time that the code was last updated. Then, if the code is newer than the last put to the database (for a particular HTML request), new HTML will be generated and served, and cached to the database. If the code is older than the last put to the database, then I will just get the HTML direct from the database and serve it (therefore avoiding all the CPU wastage of generating the HTML). I am not only looking to minimize load times, but to minimize CPU usage. However, one issue that I am having is that I can't figure out how to programatically check when the version of code uploaded to the app engine was updated. I am open to any suggestions on this approach, or other approaches for caching generated html. Note that while memcache could help in this situation, I believe that it is not the final solution since I really only need to re-generate html when the code is updated (as opposed to every time the memcache expires). Kind Regards, and thank you in advance for any suggestions you may be able to offer. -Alex

    Read the article

  • Calculating bounding box a certain distance away from a lat/long coordinate in Java

    - by Bryce Thomas
    Given a coordinate (lat, long), I am trying to calculate a square bounding box that is a given distance (e.g. 50km) away from the coordinate. So as input I have lat, long and distance and as output I would like two coordinates; one being the south-west (bottom-left) corner and one being the north-east (top-right) corner. I have seen a couple of answers on here that try to address this question in Python, but I am looking for a Java implementation in particular. Just to be clear, I intend on using the algorithm on Earth only and so I don't need to accommodate a variable radius. It doesn't have to be hugely accurate (+/-20% is fine) and it'll only be used to calculate bounding boxes over small distances (no more than 150km). So I'm happy to sacrifice some accuracy for an efficient algorithm. Any help is much appreciated. Edit: I should have been clearer, I really am after a square, not a circle. I understand that the distance between the center of a square and various points along the square's perimeter is not a constant value like it is with a circle. I guess what I mean is a square where if you draw a line from the center to any one of the four points on the perimeter that results in a line perpendicular to a side of the perimeter, then those 4 lines have the same length.

    Read the article

  • Design Practice Retrieve Multiple Users - PHP

    - by pws5068
    Greetings all, I'm looking for a more efficient way to grab multiple users without too much code redundancy. I have a Users class, (here's a highly condensed representation) Class Users { function __construct() { ... ... } private static function getNew($id) { // This is all only example code $sql = "SELECT * FROM Users WHERE User_ID=?"; $user = new User(); $user->setID($id); .... return $user; } public static function getNewestUsers($count) { $sql = "SELECT * FROM Users ORDER BY Join_Date LIMIT ?"; $users = array(); // Prepared Statement Data Here while($stmt->fetch()) { $users[] = Users::getNew($id); ... } return $users; } // These have more sanity/security checks, demonstration only. function setID($id) { $this->id = $id; } function setName($name) { $this->name = $name; } ... } So clearly calling getNew() in a loop is inefficient, because it makes $count number of calls to the database each time. I would much rather select all users in a single SQL statement. The problem is that I have probably a dozen or so methods for getting arrays of users, so all of these would now need to know the names of the database columns which is repeating a lot of structure that could always change later. What are some other ways that I could approach this problem? My goals include efficiency, flexibility, scalability, and good design practice.

    Read the article

  • Excel CSV into Nested Dictionary; List Comprehensions

    - by victorhooi
    heya, I have a Excel CSV files with employee records in them. Something like this: mail,first_name,surname,employee_id,manager_id,telephone_number [email protected],john,smith,503422,503423,+65(2)3423-2433 [email protected],george,brown,503097,503098,+65(2)3423-9782 .... I'm using DictReader to put this into a nested dictionary: import csv gd_extract = csv.DictReader(open('filename 20100331 original.csv'), dialect='excel') employees = dict([(row['employee_id'], row) for row in gp_extract]) Is the above the proper way to do it - it does work, but is it the Right Way? Something more efficient? Also, the funny thing is, in IDLE, if I try to print out "employees" at the shell, it seems to cause IDLE to crash (there's approximately 1051 rows). 2. Remove employee_id from inner dict The second issue issue, I'm putting it into a dictionary indexed by employee_id, with the value as a nested dictionary of all the values - however, employee_id is also a key:value inside the nested dictionary, which is a bit redundant? Is there any way to exclude it from the inner dictionary? 3. Manipulate data in comprehension Thirdly, we need do some manipulations to the imported data - for example, all the phone numbers are in the wrong format, so we need to do some regex there. Also, we need to convert manager_id to an actual manager's name, and their email address. Most managers are in the same file, while others are in an external_contractors CSV, which is similar but not quite the same format - I can import that to a separate dict though. Are these two items things that can be done within the single list comprehension, or should I use a for loop? Or does multiple comprehensions work? (sample code would be really awesome here). Or is there a smarter way in Python do it? Cheers, Victor

    Read the article

  • Starting with versioning mysql schemata without overkill. Good solutions?

    - by tharkun
    I've arrived at the point where I realise that I must start versioning my database schemata and changes. I consequently read the existing posts on SO about that topic but I'm not sure how to proceed. I'm basically a one man company and not long ago I didn't even use version control for my code. I'm on a windows environment, using Aptana (IDE) and SVN (with Tortoise). I work on PHP/mysql projects. What's a efficient and sufficient (no overkill) way to version my database schemata? I do have a freelancer or two in some projects but I don't expect a lot of branching and merging going on. So basically I would like to keep track of concurrent schemata to my code revisions. [edit] Momentary solution: for the moment I decided I will just make a schema dump plus one with the necessary initial data whenever I'm going to commit a tag (stable version). That seems to be just enough for me at the current stage.[/edit] [edit2]plus I'm now also using a third file called increments.sql where I put all the changes with dates, etc. to make it easy to trace the change history in one file. from time to time I integrate the changes into the two other files and empty the increments.sql[/edit]

    Read the article

  • Printing out this week's dates in perl

    - by ABach
    Hi folks, I have the following loop to calculate the dates of the current week and print them out. It works, but I am swimming in the amount of date/time possibilities in perl and want to get your opinion on whether there is a better way. Here's the code I've written: #!/usr/bin/env perl use warnings; use strict; use DateTime; # Calculate numeric value of today and the # target day (Monday = 1, Sunday = 7); the # target, in this case, is Monday, since that's # when I want the week to start my $today_dt = DateTime->now; my $today = $today_dt->day_of_week; my $target = 1; # Create DateTime copies to act as the "bookends" # for the date range my ($start, $end) = ($today_dt->clone(), $today_dt->clone()); if ($today == $target) { # If today is the target, "start" is already set; # we simply need to set the end date $end->add( days => 6 ); } else { # Otherwise, we calculate the Monday preceeding today # and the Sunday following today my $delta = ($target - $today + 7) % 7; $start->add( days => $delta - 7 ); $end->add( days => $delta - 1 ); } # I clone the DateTime object again because, for some reason, # I'm wary of using $start directly... my $cur_date = $start->clone(); while ($cur_date <= $end) { my $date_ymd = $cur_date->ymd; print "$date_ymd\n"; $cur_date->add( days => 1 ); } As mentioned, this works - but is it the quickest/most efficient/etc.? I'm guessing that quickness and efficiency may not necessarily go together, but your feedback is very appreciated.

    Read the article

  • Using ARIMA to model and forecast stock prices using user-friendly stats program

    - by Brian
    Hi people, Can anyone please offer some insight into this for me? I'm coming from a functional magnetic resonance imaging research background where I analyzed a lot of time series data, and I'd like to analyze the time series of stock prices (or returns) by: 1) modeling a successful stock in a particular market sector and then cross-correlating the time series of this historically successful stock with that of other newer stocks to look for significant relationships; 2) model a stock's price time series and use forecasting (e.g., exponential smoothing) to predict future values of it. I'd like to use non-linear modeling methods (ARIMA and ARCH) to do this. Several questions: How often do ARIMA and ARCH modeling methods (given that the individual who implements them does so accurately) actually fit the stock time series data they target, and what is the optimal fit I can expect? Is the extent to which this model fits the data commensurate with the extent to which it predicts this stock time series' future values? Rather than randomly selecting stocks to compare or model, if profit is my goal, what is an efficient approach, if any, to selecting the stocks I'm going to analyze? Which stats program is the most user-friendly for this? Any thoughts on this would be great and would go a long way for me. Thanks, Brian

    Read the article

  • Reasons for & against a Database

    - by dbemerlin
    Hi, i had a discussion with a coworker about the architecture of a program i'm writing and i'd like some more opinions. The Situation: The Program should update at near-realtime (+/- 1 Minute). It involves the movement of objects on a coordinate system. There are some events that occur at regular intervals (i.e. creation of the objects). Movements can change at any time through user input. My solution was: Build a server that runs continously and stores the data internally. The server dumps a state-of-the-program at regular intervals to protect against powerfailures and/or crashes. He argued that the program requires a Database and i should use cronjobs to update the data. I can store movement information by storing startpoint, endpoint and speed and update the position in the cronjob (and calculate collisions with other objects there) by calculating direction and speed. His reasons: Requires more CPU & Memory because it runs constantly. Powerfailures/Crashes might destroy data. Databases are faster. My reasons against this are mostly: Not very precise as events can only occur at full minutes (wouldn't be that bad though). Requires (possibly costly) transformation of data on every run from relational data to objects. RDBMS are a general solution for a specialized problem so a specialized solution should be more efficient. Powerfailures (or other crashes) can leave the Data in an undefined state with only partially updated data unless (possibly costly) precautions (like transactions) are taken. What are your opinions about that? Which arguments can you add for any side?

    Read the article

  • How do I process the configure file when cross-compiling with mingw?

    - by vy32
    I have a small open source program that builds with an autoconf configure script. I ran configure I tried to compile with: make CC="/opt/local/bin/i386-mingw32-g++" That didn't work because the configure script found include files that were not available to the mingw system. So then I tried: ./configure CC="/opt/local/bin/i386-mingw32-g++" But that didn't work; the configure script gives me this error: ./configure: line 5209: syntax error near unexpected token `newline' ./configure: line 5209: ` *_cv_*' Because of this code: # The following way of writing the cache mishandles newlines in values, # but we know of no workaround that is simple, portable, and efficient. # So, we kill variables containing newlines. # Ultrix sh set writes to stderr and can't be redirected directly, # and sets the high bit in the cache file unless we assign to the vars. ( for ac_var in `(set) 2>&1 | sed -n 's/^\(a-zA-Z_a-zA-Z0-9_*\)=.*/\1/p'`; do eval ac_val=\$$ac_var case $ac_val in #( *${as_nl}*) case $ac_var in #( *_cv_* fi Which is generated then the AC_OUTPUT is called. Any thoughts? Is there a correct way to do this?

    Read the article

< Previous Page | 118 119 120 121 122 123 124 125 126 127 128 129  | Next Page >