Search Results

Search found 2628 results on 106 pages for 'lexical analysis'.

Page 88/106 | < Previous Page | 84 85 86 87 88 89 90 91 92 93 94 95  | Next Page >

  • Starting out NLP - Python + large data set

    - by pencilNero
    Hi, I've been wanting to learn python and do some NLP, so have finally gotten round to starting. Downloaded the english wikipedia mirror for a nice chunky dataset to start on, and have been playing around a bit, at this stage just getting some of it into a sqlite db (havent worked with dbs in the past unfort). But I'm guessing sqlite is not the way to go for a full blown nlp project(/experiment :) - what would be the sort of things I should look at ? HBase (.. and hadoop) seem interesting, i guess i could run then im java, prototype in python and maybe migrate the really slow bits to java... alternatively just run Mysql.. but the dataset is 12gb, i wonder if that will be a problem? Also looked at lucene, but not sure how (other than breaking the wiki articles into chunks) i'd get that to work.. What comes to mind for a really flexible NLP platform (i dont really know at this stage WHAT i want to do.. just want to learn large scale lang analysis tbh) ? Many thanks.

    Read the article

  • Are PyArg_ParseTuple() "s" format specifiers useful in Python 3.x C API?

    - by Craig McQueen
    I'm trying to write a Python C extension that processes byte strings, and I have something basically working for Python 2.x and Python 3.x. For the Python 2.x code, near the start of my function, I currently have a line: if (!PyArg_ParseTuple(args, "s#:in_bytes", &src_ptr, &src_len)) ... I notice that the s# format specifier accepts both Unicode strings and byte strings. I really just want it to accept byte strings and reject Unicode. For Python 2.x, this might be "good enough"--the standard hashlib seems to do the same, accepting Unicode as well as byte strings. However, Python 3.x is meant to clean up the Unicode/byte string mess and not let the two be interchangeable. So, I'm surprised to find that in Python 3.x, the s format specifiers for PyArg_ParseTuple() still seem to accept Unicode and provide a "default encoded string version" of the Unicode. This seems to go against the principles of Python 3.x, making the s format specifiers unusable in practice. Is my analysis correct, or am I missing something? Looking at the implementation for hashlib for Python 3.x (e.g. see md5module.c, function MD5_update() and its use of GET_BUFFER_VIEW_OR_ERROUT() macro) I see that it avoids the s format specifiers, and just takes a generic object (O specifier) and then does various explicit type checks using the GET_BUFFER_VIEW_OR_ERROUT() macro. Is this what we have to do?

    Read the article

  • Boost Mersenne Twister: how to seed with more than one value?

    - by Eamon Nerbonne
    I'm using the boost mt19937 implementation for a simulation. The simulation needs to be reproducible, and that means storing and potentially reusing the RNG seeds later. I'm using the windows crypto api to generate the seed values because I need an external source for the seeds and not because of any particular guarantees of randomness. The output of any simulation run will have a note including the RNG seed - so the seed needs to be reasonably short. On the other hand, as part of the analysis of the simulation, I'll be comparing several runs - but to be sure that these runs are actually different, I'll need to use different seeds - so the seed needs to be long enough to avoid accidental collisions. I've determined that 64-bits of seeding should suffice; the chance of a collision will reach 50% after about 2^32 runs - that probability is low enough that the average error caused by it is negligible to me. Using just 32-bits of seed is tricky; the chance of a collision reaches 50% already after 2^16 runs; and that's a little too likely for my tastes. Unfortunately, the boost implementation either seeds with a full state vector - which is far, far too long - or a single 32-bit unsigned long - which isn't ideal. How can I seed the generator with more than 32-bits but less than a full state vector? I tried just padding the vector or repeating the seeds to fill the state vector, but even a cursory glance at the results shows that that generates poor results.

    Read the article

  • Oracle T4CPreparedStatement memory leaks?

    - by Jay
    A little background on the application that I am gonna talk about in the next few lines: XYZ is a data masking workbench eclipse RCP application: You give it a source table column, and a target table column, it would apply a trasformation (encryption/shuffling/etc) and copy the row data from source table to target table. Now, when I mask n tables at a time, n threads are launched by this app. Here is the issue: I have run into a production issue on first roll out of the above said app. Unfortunately, I don't have any logs to get to the root. However, I tried to run this app in test region and do a stress test. When I collected .hprof files and ran 'em through an analyzer (yourKit), I noticed that objects of oracle.jdbc.driver.T4CPreparedStatement was retaining heap. The analysis also tells me that one of my classes is holding a reference to this preparedstatement object and thereby, n threads have n such objects. T4CPreparedStatement seemed to have character arrays: lastBoundChars and bindChars each of size char[300000]. So, I researched a bit (google!), obtained ojdbc6.jar and tried decompiling T4CPreparedStatement. I see that T4CPreparedStatement extends OraclePreparedStatement, which dynamically manages array size of lastBoundChars and bindChars. So, my questions here are: Have you ever run into an issue like this? Do you know the significance of lastBoundChars / bindChars? I am new to profiling, so do you think I am not doing it correct? (I also ran the hprofs through MAT - and this was the main identified issue - so, I don't really think I could be wrong?) I have found something similar on the web here: http://forums.oracle.com/forums/thread.jspa?messageID=2860681 Appreciate your suggestions / advice.

    Read the article

  • Database warehoue design: fact tables and dimension tables

    - by morpheous
    I am building a poor man's data warehouse using a RDBMS. I have identified the key 'attributes' to be recorded as: sex (true/false) demographic classification (A, B, C etc) place of birth date of birth weight (recorded daily): The fact that is being recorded My requirements are to be able to run 'OLAP' queries that allow me to: 'slice and dice' 'drill up/down' the data and generally, be able to view the data from different perspectives After reading up on this topic area, the general consensus seems to be that this is best implemented using dimension tables rather than normalized tables. Assuming that this assertion is true (i.e. the solution is best implemented using fact and dimension tables), I would like to see some help in the design of these tables. 'Natural' (or obvious) dimensions are: Date dimension Geographical location Which have hierarchical attributes. However, I am struggling with how to model the following fields: sex (true/false) demographic classification (A, B, C etc) The reason I am struggling with these fields is that: They have no obvious hierarchical attributes which will aid aggregation (AFAIA) - which suggest they should be in a fact table They are mostly static or very rarely change - which suggests they should be in a dimension table. Maybe the heuristic I am using above is too crude? I will give some examples on the type of analysis I would like to carryout on the data warehouse - hopefully that will clarify things further. I would like to aggregate and analyze the data by sex and demographic classification - e.g. answer questions like: How does male and female weights compare across different demographic classifications? Which demographic classification (male AND female), show the most increase in weight this quarter. etc. Can anyone clarify whether sex and demographic classification are part of the fact table, or whether they are (as I suspect) dimension tables.? Also assuming they are dimension tables, could someone elaborate on the table structures (i.e. the fields)? The 'obvious' schema: CREATE TABLE sex_type (is_male int); CREATE TABLE demographic_category (id int, name varchar(4)); may not be the correct one.

    Read the article

  • Keyboard hook return different symbols from card reader depends whther my app in focus or not

    - by user363868
    I code WinForm application where one of the input is magnetic stripe card reader (CR). I am using code George Mamaladze's article Processing Global Mouse and Keyboard Hooks in C# on codeproject.com to listen keyboard (USB card reader acts same way as keyboard) and I have weird situation. One card reader CR1 (Unitech MS240-2UG) produces keystroke which I intercept on KeyPress event analyze that I intercept certain patter like %ABCD-6EFJHI? and trigger some logic. Analysis required because user can type something else into application or in another application meanwhile my app is open When I use another card reader CR2 (IdTech IDBM-334133) keystroke intercepted by hook started from number 5 instead of % (It is actually same key on keyboard). Since it is starting sentinel it is very important for me to have ability recognize input from card reader. Moreover if my app running in background and I have focus on Notepad when I swipe card string %ABCD-6EFJHI? appears in Notepad and same way, with proper starting character) intercepted by keyboard hook. If swiped when focus on Form it is 5ABCD-6EFJHI? User who tried app with another card reader has same result as me with CR2. Only CR1 works for me as expected I was looking into Device manager of Windows and both devices use same HID driver supplied by MS. I checked devices though respective software from CR makers and starting and ending sentinels set to % and ? respective on both. I would appreciate and ideas and thoughts as I hit the wall myself Thank you

    Read the article

  • Access SSAS cube from across domains without direct database connection

    - by SuperKing
    Hello, I'm working with SQL Server Analysis Services for the first time and have the dilemma of working on a project in which users must be able to access SSAS Cubes (via a custom web dashboard) that live across different servers and domains, but without having access to the other server's SSAS database connection strings. So Organization A and Organization B will have their own cubes on their own servers, but Organization A users must be able to view Organization B's cubes, and Organization B users must be able to view Organization A's cubes, but neither organization should have access to the connection string. I've read about allowing HTTP access to the SSAS server and cube from the link below, but that requires setting up users for authentication or allowing anonymous access to one organization's server for users of another organization, and I'm not sure this would be acceptable for this situation, or if this is the preferred way to do this. Is performance acceptable here? http://technet.microsoft.com/en-us/library/cc917711.aspx I also wonder if perhaps it makes sense to run a nightly/weekly process that accesses the other organization's SSAS database via a web service or something, and pull that data into a database on the organization's server, and then rebuild the cube. Then that cube would be queried without having to go and connect to the other organization server when viewing the cube. Has anyone else attempted to accomplish something similar? Is HTTP access the standard way to go for this? Or any other possible options? Thanks, and please let me know if you need more info, still unclear on how some of this works.

    Read the article

  • Secure Webservice (WCF) without storing credentials on consumer application

    - by Pai Gaudêncio
    Howdy folks, I have a customer that sells a lottery analysis application. In this application, he consumes a webservice (my service, I mean, belongs to the company I work for now) to get statistical data about lottery results, bets made, amounts, etc., from all across the globe. The access to this webservice is paid, and each consult costs X credits. Some people have disassembled this lottery application and found the api key/auth key used to access the paid webservice, and started to use it. I would like to prevent this from happening again, but I can't find a way to authenticate on the webservice without storing the auth. keys on the application. Does anyone have any ideas on how to accomplish such task? ps1.Can't ask for the users to input any kind of credentials. Has to be transparent for them (they shouldn't know what is happening). ps2. Can't use digital certificates for the same reason above, not to mention it's easy to retrieve them and we would fall into the original problem. Thanks in advance.

    Read the article

  • Figuring out the Nyquist performance limitation of an ADC on an example PIC microcontroller

    - by AKE
    I'm spec-ing the suitability of a dsPIC microcontroller for an analog-to-digital application. This would be preferable to using dedicated A/D chips and a separate dedicated DSP chip. To do that, I've had to run through some computations, pulling the relevant parameters from the datasheets. I'm not sure I've got it right -- would appreciate a check! (EDITED NOTE: The PIC10F220 in the example below was selected ONLY to walk through a simple example to check that I'm interpreting Tacq, Fosc, TAD, and divisor correctly in working through this sort of Nyquist analysis. The actual chips I'm considering for the design are the dsPIC33FJ128MC804 (with 16b A/D) or dsPIC30F3014 (with 12b A/D).) A simple example: PIC10F220 is the simplest possible PIC with an ADC Runs at clock speed of 8MHz. Has an instruction cycle of 0.5us (4 clock steps per instruction) So: Taking Tacq = 6.06 us (acquisition time for ADC, assume chip temp. = 50*C) [datasheet p34] Taking Fosc = 8MHz (? clock speed) Taking divisor = 4 (4 clock steps per CPU instruction) This gives TAD = 0.5us (TAD = 1/(Fosc/divisor) ) Conversion time is 13*TAD [datasheet p31] This gives conversion time 6.5us ADC duration is then 12.56 us [? Tacq + 13*TAD] Assuming at least 2 instructions for load/store: This is another 1 us [0.5 us per instruction] Which would give max sampling rate of 73.7 ksps (1/13.56) Supposing 8 more instructions for real-time processing: This is another 4 us Thus, total ADC/handling time = 17.56us (12.56us + 1us + 4us) So expected upper sampling rate is 56.9 ksps. Nyquist frequency for this sampling rate is therefore 28 kHz. If this is right, it suggests the (theoretical) performance suitability of this chip's A/D is for signals that are bandlimited to 28 kHz. Is this a correct interpretation of the information given in the data sheet in obtaining the Nyquist performance limit? Any opinions on the noise susceptibility of ADCs in PIC / dsPIC chips would be much appreciated! AKE

    Read the article

  • How to sort the file names in bash in this circumstance?

    - by Nicolas
    I have run a program to generate some results with the different parameters(i.e. the R, C and RP). These results are saved in files named results.txt. Then, I should parse these experimental results to make an analysis. In the params_R_7_C_16_RP_0, the 7 is the value of the parameter R, the 16 is the value of the parameter C and the 0 is the value of the parameter RP. Now, I want to get these results.txt files in the current directory to parse, and sort the path with the parameter values of R,C and RP. I first use the following command to get the results.txt files that I want to parse: find ./ -name "results.txt" and the output is: ./params_R_11_C_9_RP_0/results.txt ./params_R_7_C_9_RP_0/results.txt ./params_R_7_C_4_RP_0/results.txt ./params_R_11_C_16_RP_0/results.txt ./params_R_9_C_4_RP_0/results.txt ./params_R_5_C_9_RP_0/results.txt ./params_R_9_C_25_RP_0/results.txt ./params_R_7_C_16_RP_0/results.txt ./params_R_5_C_25_RP_0/results.txt ./params_R_5_C_16_RP_0/results.txt ./params_R_11_C_4_RP_0/results.txt ./params_R_9_C_16_RP_0/results.txt ./params_R_7_C_25_RP_0/results.txt ./params_R_15_C_4_RP_0/results.txt ./params_R_5_C_4_RP_0/results.txt ./params_R_9_C_9_RP_0/results.txt and I change the command as follows: find ./ -name "results.txt" | sort and the output is: ./params_R_11_C_16_RP_0/results.txt ./params_R_11_C_25_RP_0/results.txt ./params_R_11_C_4_RP_0/results.txt ./params_R_11_C_9_RP_0/results.txt ./params_R_5_C_16_RP_0/results.txt ./params_R_5_C_25_RP_0/results.txt ./params_R_5_C_4_RP_0/results.txt ./params_R_5_C_9_RP_0/results.txt ./params_R_7_C_16_RP_0/results.txt ./params_R_7_C_25_RP_0/results.txt ./params_R_7_C_4_RP_0/results.txt ./params_R_7_C_9_RP_0/results.txt ./params_R_9_C_16_RP_0/results.txt ./params_R_9_C_25_RP_0/results.txt ./params_R_9_C_4_RP_0/results.txt ./params_R_9_C_9_RP_0/results.txt But I want it output as following: ./params_R_5_C_4_RP_0/results.txt ./params_R_5_C_9_RP_0/results.txt ./params_R_5_C_16_RP_0/results.txt ./params_R_5_C_25_RP_0/results.txt ./params_R_7_C_4_RP_0/results.txt ./params_R_7_C_9_RP_0/results.txt ./params_R_7_C_16_RP_0/results.txt ./params_R_7_C_25_RP_0/results.txt ./params_R_9_C_4_RP_0/results.txt ./params_R_9_C_9_RP_0/results.txt ./params_R_9_C_16_RP_0/results.txt ./params_R_9_C_25_RP_0/results.txt ... I should let it params_R_005_C_004_RP_0 when generating the results. But it would take much time to rerun the program to get the results. So I wonder if there is any way to use the bash command to achieve this objective.

    Read the article

  • R + Bioconductor : combining probesets in an ExpressionSet

    - by Mike Dewar
    Hi, First off, this may be the wrong Forum for this question, as it's pretty darn R+Bioconductor specific. Here's what I have: library('GEOquery') GDS = getGEO('GDS785') cd4T = GDS2eSet(GDS) cd4T <- cd4T[!fData(cd4T)$symbol == "",] Now cd4T is an ExpressionSet object which wraps a big matrix with 19794 rows (probesets) and 15 columns (samples). The final line gets rid of all probesets that do not have corresponding gene symbols. Now the trouble is that most genes in this set are assigned to more than one probeset. You can see this by doing gene_symbols = factor(fData(cd4T)$Gene.symbol) length(gene_symbols)-length(levels(gene_symbols)) [1] 6897 So only 6897 of my 19794 probesets have unique probeset - gene mappings. I'd like to somehow combine the expression levels of each probeset associated with each gene. I don't care much about the actual probe id for each probe. I'd like very much to end up with an ExpressionSet containing the merged information as all of my downstream analysis is designed to work with this class. I think I can write some code that will do this by hand, and make a new expression set from scratch. However, I'm assuming this can't be a new problem and that code exists to do it, using a statistically sound method to combine the gene expression levels. I'm guessing there's a proper name for this also but my googles aren't showing up much of use. Can anyone help?

    Read the article

  • Core Data + Core Animation/CALayer together??

    - by ivanTheTerrible
    I am making an Cocoa app with custom interfaces. So far I have implemented one version of the app using CALayer doing the rendering, which has been great given the hierarchical structure of CALayers, and its [hitTest:] function for handling mouse events. In this early version, the model of the app are my custom classes. However, as the program grows I feel the urge of using Core Data for the model, not just for the ease of binding/undo management, but also want to try out the new technology. My method so far: In Core Data: creating a Block entity, with attributes xPos, yPos, width, height...etc. Then, creating a BlockView : CALayer class for drawing, which uses methods such as self.position.x = [self valueForKey:@"xPos"] to fetch the values from the model. In this case, every BlockView object has to also keep a local copy of xPos, which is NOT good. Do any of you guys have better suggestions? Edit: This app is a information visualization tool. So the positions, dimensions of the blocks are important, and should be persisted for later analysis.

    Read the article

  • How best to use XPath with very large XML files in .NET?

    - by glenatron
    I need to do some processing on fairly large XML files ( large here being potentially upwards of a gigabyte ) in C# including performing some complex xpath queries. The problem I have is that the standard way I would normally do this through the System.XML libraries likes to load the whole file into memory before it does anything with it, which can cause memory problems with files of this size. I don't need to be updating the files at all just reading them and querying the data contained in them. Some of the XPath queries are quite involved and go across several levels of parent-child type relationship - I'm not sure whether this will affect the ability to use a stream reader rather than loading the data into memory as a block. One way I can see of making it work is to perform the simple analysis using a stream-based approach and perhaps wrapping the XPath statements into XSLT transformations that I could run across the files afterward, although it seems a little convoluted. Alternately I know that there are some elements that the XPath queries will not run across, so I guess I could break the document up into a series of smaller fragments based on it's original tree structure, which could perhaps be small enough to process in memory without causing too much havoc. I've tried to explain my objective here so if I'm barking up totally the wrong tree in terms of general approach I'm sure you folks can set me right...

    Read the article

  • Is it possible that a single-threaded program is executed simultaneously on more than one CPU core?

    - by Wolfgang Plaschg
    When I run a single-threaded program that i have written on my quad core Intel i can see in the Windows Task Manager that actually all four cores of my CPU are more or less active. One core is more active than the other three, but there is also activity on those. There's no other program (besided the OS kernel of course) running that would be plausible for that activitiy. And when I close my program all activity an all cores drops down to nearly zero. All is left is a little "noise" on the cores, so I'm pretty sure all the visible activity comes directly or indirectly (like invoking system routines) from my program. Is it possible that the OS or the cores themselves try to balance some code or execution on all four cores, even it's not a multithreaded program? Do you have any links that documents this technique? Some infos to the program: It's a console app written in Qt, the Task Manager states that only one thread is running. Maybe Qt uses threads, but I don't use signals or slots, nor any GUI. Link to Task Manager screenshot: http://img97.imageshack.us/img97/6403/taskmanager.png This question is language agnostic and not tied to Qt/C++, i just want to know if Windows or Intel do to balance also single-threaded code on all cores. If they do, how does this technique work? All I can think of is, that kernel routines like reading from disk etc. is scheduled on all cores, but this won't improve performance significantly since the code still has to run synchronous to the kernel api calls. EDIT Do you know any tools to do a better analysis of single and/or multi-threaded programs than the poor Windows Task Manager?

    Read the article

  • Sweave/R - Automatically generating an appendix that contains all the model summaries/plots/data pro

    - by John Horton
    I like the idea of making research available at multiple levels of detail i.e., abstract for the casually curious, full text for the more interested, and finally the data and code for those working in the same area/trying to reproduce your results. In between the actual text and the data/code level, I'd like to insert another layer. Namely, I'd like to create a kind of automatically generated appendix that contains the full regression output, diagnostic plots, exploratory graphs data profiles etc. from the analysis, regardless of whether those plots/regressions etc. made it into the final paper. One idea I had was to write a script that would examine the .Rnw file and automatically: Profile all data sets that are loaded (sort of like the Hmisc(?) package) Summarize all regressions - i.e., run summary(model) for all models Present all plots (regardless of whether they made it in the final version) The idea is to make this kind of a low-effort, push-button sort of thing as opposed to a formal appendix written like the rest of a paper. What I'm looking for is some ideas on how to do this in R in a relatively simple way. My hunch is that there is some way of going through the namespace, figuring out what something is and then dumping into a PDF. Thoughts? Does something like this already exist?

    Read the article

  • How can I improve this design?

    - by klausbyskov
    Let's assume that our system can perform actions, and that an action requires some parameters to do its work. I have defined the following base class for all actions (simplified for your reading pleasure): public abstract class BaseBusinessAction<TActionParameters> : where TActionParameters : IActionParameters { protected BaseBusinessAction(TActionParameters actionParameters) { if (actionParameters == null) throw new ArgumentNullException("actionParameters"); this.Parameters = actionParameters; if (!ParametersAreValid()) throw new ArgumentException("Valid parameters must be supplied", "actionParameters"); } protected TActionParameters Parameters { get; private set; } protected abstract bool ParametersAreValid(); public void CommonMethod() { ... } } Only a concrete implementation of BaseBusinessAction knows how to validate that the parameters passed to it are valid, and therefore the ParametersAreValid is an abstract function. However, I want the base class constructor to enforce that the parameters passed are always valid, so I've added a call to ParametersAreValid to the constructor and I throw an exception when the function returns false. So far so good, right? Well, no. Code analysis is telling me to "not call overridable methods in constructors" which actually makes a lot of sense because when the base class's constructor is called the child class's constructor has not yet been called, and therefore the ParametersAreValid method may not have access to some critical member variable that the child class's constructor would set. So the question is this: How do I improve this design? Do I add a Func<bool, TActionParameters> parameter to the base class constructor? If I did: public class MyAction<MyParameters> { public MyAction(MyParameters actionParameters, bool something) : base(actionParameters, ValidateIt) { this.something = something; } private bool something; public static bool ValidateIt() { return something; } } This would work because ValidateIt is static, but I don't know... Is there a better way? Comments are very welcome.

    Read the article

  • Getting content from PHP: Trouble with POST and query.

    - by vgm64
    Apologies for my longest question on SO ever. I'm trying to interface with a php frontend for a mysql database in ROOT (a CERN framework in C++ for high energy physics analysis). To start off with, I tried to get this php interface to play nice with wget and curl first because I'm more familiar with them. The following command works: wget --post-data "hostname=localhost:3306&un=joeuser&pw=psswd&myquery=show_spazio_databases;" http://some.host.edu/log/log_query_matlab.php The results are: database1 database2 That's good. If I leave out the --post-data then I get the result: Warning: mysql_connect() [function.mysql-connect]: Access denied for user 'admin'@'localhost' (using password: NO) in /log/log_query_matlab.php on line 6 i'm dead! Access denied for user 'admin'@'localhost' (using password: NO) Warning: mysql_query() [function.mysql-query]: Access denied for user 'admin'@'localhost' (using password: NO) in /log/log_query_matlab.php on line 29 Warning: mysql_query() [function.mysql-query]: A link to the server could not be established in /log/log_query_matlab.php on line 29 I have access to the php script (read only), but the error itself isn't too important. What matters it that using ROOT, I use a function called as socket.SendRaw(message, message.Length()) (socket is a TSocket) and this gives me the same "error" as wget without the post data switch if my "message" is "POST http://some.host.edu/log/log_query_matlab.php?hostname=localhost:3306&un=joeuser&pw=psswd&myquery=show_spazio_databases" This may be in vain, but does someone knows a way I should format the "message" that includes something that is equivalent to the --post-data switch. Or, is there a standard way to format POST requests in a single line (I've seen multi-line stuff. Is that right?) Sorry I'm clueless! PS. The mysql query is show databases but the space has been replaced with _spazio_, Italian for space. The author of the db and php interface requires it (and various replacements for symbols), but has anyone seen this before? Trying to troubleshoot that was terrible!

    Read the article

  • MySQL table data transformation -- how can I dis-aggregate MySQL time data?

    - by lighthouse65
    We are coding for a MySQL data warehousing application that stores descriptive data (User ID, Work ID, Machine ID, Start and End Time columns in the first table below) associated with time and production quantity data (Output and Time columns in the first table below) upon which aggregate (SUM, COUNT, AVG) functions are applied. We now wish to dis-aggregate time data for another type of analysis. Our current data table design: +---------+---------+------------+---------------------+---------------------+--------+------+ | User ID | Work ID | Machine ID | Event Start Time | Event End Time | Output | Time | +---------+---------+------------+---------------------+---------------------+--------+------+ | 080025 | ABC123 | M01 | 2008-01-24 16:19:15 | 2008-01-24 16:34:45 | 2120 | 930 | +---------+---------+------------+---------------------+---------------------+--------+------+ Reprocessing dis-aggregation that we would like to do would be to transform table content based on a granularity of minutes, rather than the current production event ("Event Start Time" and "Event End Time") granularity. The resulting reprocessing of existing table rows would look like: +---------+---------+------------+---------------------+--------+ | User ID | Work ID | Machine ID | Production Minute | Output | +---------+---------+------------+---------------------+--------+ | 080025 | ABC123 | M01 | 2010-01-24 16:19 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:20 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:21 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:22 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:23 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:24 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:25 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:26 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:27 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:28 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:29 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:30 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:31 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:22 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:33 | 133 | | 080025 | ABC123 | M01 | 2010-01-24 16:34 | 133 | +---------+---------+------------+---------------------+--------+ So the reprocessing would take an existing row of data created at the granularity of production event and modify the granularity to minutes, eliminating redundant (Event End Time, Time) columns while doing so. It assumes a constant rate of production and divides output by the difference in minutes plus one to populate the new table's Output column. I know this can be done in code...but can it be done entirely in a MySQL insert statement (or otherwise entirely in MySQL)? I am thinking of a INSERT ... INTO construction but keep getting stuck. An additional complexity is that there are hundreds of machines to include in the operation so there will be multiple rows (one for each machine) for each minute of the day. Any ideas would be much appreciated. Thanks.

    Read the article

  • Why use Django on Google App Engine?

    - by Travis Bradshaw
    When researching Google App Engine (GAE), it's clear that using Django is wildly popular for developing in Python on GAE. I've been scouring the web to find information on the costs and benefits of using Django, to find out why it's so popular. While I've been able to find a wide variety of sources on how to run Django on GAE and the various methods of doing so, I haven't found any comparative analysis on why Django is preferable to using the webapp framework provided by Google. To be clear, it's immediately apparent why using Django on GAE is useful for developers with an existing skillset in Django (a majority of Python web developers, no doubt) or existing code in Django (where using GAE is more of a porting exercise). My team, however, is evaluating GAE for use on an all-new project and our existing experience is with TurboGears, not Django. It's been quite difficult to determine why Django is beneficial to a development team when the BigTable libraries have replaced Django's ORM, sessions and authentication are necessarily changed, and Django's templating (if desirable) is available without using the entire Django stack. Finally, it's clear that using Django does have the advantage of providing an "exit strategy" if we later wanted to move away from GAE and need a platform to target for the exodus. I'd be extremely appreciative for help in pointing out why using Django is better than using webapp on GAE. I'm also completely inexperienced with Django, so elaboration on smaller features and/or conveniences that work on GAE are also valuable to me. Thanks in advance for your time!

    Read the article

  • Pass enum value to method which is called by dynamic object

    - by user329588
    hello. I'm working on program which dynamically(in runtime) loads dlls. For an example: Microsoft.AnalysisServices.dll. In this dll we have this enum: namespace Microsoft.AnalysisServices { [Flags] public enum UpdateOptions { Default = 0, ExpandFull = 1, AlterDependents = 2, } } and we also have this class Cube: namespace Microsoft.AnalysisServices { public sealed class Cube : ... { public Cube(string name); public Cube(string name, string id); .. .. .. } } I dynamically load this dll and create object Cube. Than i call a method Cube.Update(). This method deploy Cube to SQL Analysis server. But if i want to call this method with parameters Cube.Update(UpdateOptions.ExpandFull) i get error, because method doesn't get appropriate parameter. I have already tried this, but doesn't work: dynamic updateOptions = AssemblyLoader.LoadStaticAssembly("Microsoft.AnalysisServices", "Microsoft.AnalysisServices.UpdateOptions");//my class for loading assembly Array s = Enum.GetNames(updateOptions); dynamic myEnumValue = s.GetValue(1);//1 = ExpandFull dynamicCube.Update(myEnumValue);// == Cube.Update(UpdateOptions.ExpandFull) I know that error is in parameter myEnumValue but i don't know how to get dynamically enum type from assembly and pass it to the method. Does anybody know the solution? Thank you very much for answers and help!

    Read the article

  • Reading email address from contacts fails with weird memory issue

    - by CapsicumDreams
    Hi all, I'm stumped. I'm trying to get a list of all the email address a person has. I'm using the ABPeoplePickerNavigationController to select the person, which all seems fine. I'm setting my ABRecordRef personDealingWith; from the person argument to - (BOOL)peoplePickerNavigationController:(ABPeoplePickerNavigationController *)peoplePicker shouldContinueAfterSelectingPerson:(ABRecordRef)person property:(ABPropertyID)property identifier:(ABMultiValueIdentifier)identifier { and everything seems fine up till this point. The first time the following code executes, all is well. When subsequently run, I can get issues. First, the code: // following line seems to make the difference (issue 1) // NSLog(@"%d", ABMultiValueGetCount(ABRecordCopyValue(personDealingWith, kABPersonEmailProperty))); // construct array of emails ABMultiValueRef multi = ABRecordCopyValue(personDealingWith, kABPersonEmailProperty); CFIndex emailCount = ABMultiValueGetCount(multi); if (emailCount 0) { // collect all emails in array for (CFIndex i = 0; i < emailCount; i++) { CFStringRef emailRef = ABMultiValueCopyValueAtIndex(multi, i); [emailArray addObject:(NSString *)emailRef]; CFRelease(emailRef); } } // following line also matters (issue 2) CFRelease(multi); If compiled as written, the are no errors or static analysis problems. This crashes with a *** -[Not A Type retain]: message sent to deallocated instance 0x4e9dc60 error. But wait, there's more! I can fix it in either of two ways. Firstly, I can uncomment the NSLog at the top of the function. I get a leak from the NSLog's ABRecordCopyValue every time through, but the code seems to run fine. Also, I can comment out the CFRelease(multi); at the end, which does exactly the same thing. Static compilation errors, but running code. So without a leak, this function crashes. To prevent a crash, I need to haemorrhage memory. Neither is a great solution. Can anyone point out what's going on?

    Read the article

  • Which book should I choose?

    - by sebastianlarsson
    Hi guys, I'm looking for a good read on object oriented design. The two books I'm currently looking Head First Design Patterns and Head First Object object-oriented analysis & design. They seem very similar when looking at the contents and browsing through available sample text. Which one would be the best choice? About myself: I have a bachelor in computer science and I am currently studying Msc. Software Quality Engineering (read Software Engineering with focus on Quality). I am already confident in object-oriented design and have a lot of programming courses in my backpack. I have done games in c++, courses in advanced java programming (I am SCJP certified), but my preferred language is C#. I have also worked with Java for the last 7 months while studying. I am currently also studying for certificates in C# (apart from my usual studies). So I believe I have the prerequisites of actually understanding the contents of both books. Reason: I just want to be better and keep evolving as a programmer. I think it is fun. I believe Bert Bates and Kathy Sierra are involved in both these books and I have previously read their SCJP preparation book in java. I really do enjoy their style of writing. Other books which I am considering are: Clean Code: A Handbook Of Agile Software Craftsmanship Thx in advance Sebastian

    Read the article

  • Hardware acceleration issue with WPF application - Analyzing crash dumps

    - by Appu
    I have a WPF application crashing on the client machine. My initial analysis shows that this is because of H/W acceleration and disabling H/W acceleration at a registry level solves the issue. Now I have to make sure that this is caused because of H/W acceleration. I have the crash dumps available which has a stack trace, 9c52b020 80636ac7 9c52b0bc e1174818 9c52b0c0 nt!_SEH_prolog+0x1a 9c52b038 806379d6 e10378a4 9c52b0bc e1174818 nt!CmpQuerySecurityDescriptorInfo+0x23 9c52b084 805bfe5b e714b160 00000001 9c52b0bc nt!CmpSecurityMethod+0xce 9c52b0c4 805c01c8 e714b160 9c52b0f0 e714b15c nt!ObpGetObjectSecurity+0x99 9c52b0f4 8062f28f e714b160 8617f008 00000001 nt!ObCheckObjectAccess+0x2c 9c52b140 8062ff30 e1038008 0066a710 cde2b714 nt!CmpDoOpen+0x2d5 9c52b340 805bf488 0066a710 0066a710 8617f008 nt!CmpParseKey+0x5a6 9c52b3b8 805bba14 00000000 9c52b3f8 00000240 nt!ObpLookupObjectName+0x53c 9c52b40c 80625696 00000000 8acad448 00000000 nt!ObOpenObjectByName+0xea 9c52b508 8054167c 9c52b828 82000000 9c52b5ac nt!NtOpenKey+0x1c8 9c52b508 80500699 9c52b828 82000000 9c52b5ac nt!KiFastCallEntry+0xfc 9c52b58c 805e701e 9c52b828 82000000 9c52b5ac nt!ZwOpenKey+0x11 9c52b7fc 805e712a 00000002 805e70a0 00000000 nt!RtlpGetRegistryHandleAndPath+0x27a 9c52b844 805e73e3 9c52b864 00000014 9c52bbb8 nt!RtlpQueryRegistryGetBlockPolicy+0x2e 9c52b86c 805e79eb 00000003 e8af79dc 00000014 nt!RtlpQueryRegistryDirect+0x4b 9c52b8bc 805e7f10 e8af79dc 00000003 9c52b948 nt!RtlpCallQueryRegistryRoutine+0x369 9c52bb58 b8f5bca4 00000005 e6024b30 9c52bbb8 nt!RtlQueryRegistryValues+0x482 WARNING: Stack unwind information not available. Following frames may be wrong. 9c52bc00 b8f20a5b 00000005 85f4204c 85f4214c igxpmp32+0x44ca4 9c52c280 b8f1cc7b 890bd358 9c52c2b0 00000000 igxpmp32+0x9a5b 9c52c294 b8f11729 890bd358 9c52c2b0 00000a0c igxpmp32+0x5c7b 9c52c358 804ef19f 890bd040 86d2dad0 0000080c VIDEOPRT!pVideoPortDispatch+0xabf 9c52c368 bf85e8c2 9c52c610 bef6ce84 00000014 nt!IopfCallDriver+0x31 9c52c398 bf85e93c 890bd040 00232150 9c52c3f8 win32k!GreDeviceIoControl+0x93 9c52c3bc bebafc7b 890bd040 00232150 9c52c3f8 win32k!EngDeviceIoControl+0x1f 9c52d624 bebf3fa9 890bd040 bef2a28c bef2a284 igxpdx32+0x8c7b 9c52d6a0 8054167c 9c52da28 b915d000 9c52d744 igxpdx32+0x4cfa9 9c52d6a0 00000000 9c52da28 b915d000 9c52d744 nt!KiFastCallEntry+0xfc How do I ensure that crash is caused by H/W acceleration issue by looking at the above data? I am guessing VIDEOPRT!pVideoPortDispatch+0xabf indicates some error with the rendering. Is that correct? I am using WinDebug to view the crash dump.

    Read the article

  • When optimizing database queries, what exactly is the relationship between number of queries and siz

    - by williamjones
    To optimize application speed, everyone always advises to minimize the number of queries an application makes to the database, consolidating them into fewer queries that retrieve more wherever possible. However, this also always comes with the caution that data transferred is still data transferred, and just because you are making fewer queries doesn't make the data transferred free. I'm in a situation where I can over-include on the query in order to cut down the number of queries, and simply remove the unwanted data in the application code. Is there any type of a rule of thumb on how much of a cost there is to each query, to know when to optimize number of queries versus size of queries? I've tried to Google for objective performance analysis data, but surprisingly haven't been able to find anything like that. Clearly this relationship will change for factors such as when the database grows in size, making this somewhat individualized, but surely this is not so individualized that a broad sense of the landscape can't be drawn out? I'm looking for general answers, but for what it's worth, I'm running an application on Heroku.com, which means Ruby on Rails with a Postgres database.

    Read the article

  • Where is my python script spending time? Is there "missing time" in my cprofile / pstats trace?

    - by fmark
    I am attempting to profile a long running python script. The script does some spatial analysis on raster GIS data set using the gdal module. The script currently uses three files, the main script which loops over the raster pixels called find_pixel_pairs.py, a simple cache in lrucache.py and some misc classes in utils.py. I have profiled the code on a moderate sized dataset. pstats returns: p.sort_stats('cumulative').print_stats(20) Thu May 6 19:16:50 2010 phes.profile 355483738 function calls in 11644.421 CPU seconds Ordered by: cumulative time List reduced from 86 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.008 0.008 11644.421 11644.421 <string>:1(<module>) 1 11064.926 11064.926 11644.413 11644.413 find_pixel_pairs.py:49(phes) 340135349 544.143 0.000 572.481 0.000 utils.py:173(extent_iterator) 8831020 18.492 0.000 18.492 0.000 {range} 231922 3.414 0.000 8.128 0.000 utils.py:152(get_block_in_bands) 142739 1.303 0.000 4.173 0.000 utils.py:97(search_extent_rect) 745181 1.936 0.000 2.500 0.000 find_pixel_pairs.py:40(is_no_data) 285478 1.801 0.000 2.271 0.000 utils.py:98(intify) 231922 1.198 0.000 2.013 0.000 utils.py:116(block_to_pixel_extent) 695766 1.990 0.000 1.990 0.000 lrucache.py:42(get) 1213166 1.265 0.000 1.265 0.000 {min} 1031737 1.034 0.000 1.034 0.000 {isinstance} 142740 0.563 0.000 0.909 0.000 utils.py:122(find_block_extent) 463844 0.611 0.000 0.611 0.000 utils.py:112(block_to_pixel_coord) 745274 0.565 0.000 0.565 0.000 {method 'append' of 'list' objects} 285478 0.346 0.000 0.346 0.000 {max} 285480 0.346 0.000 0.346 0.000 utils.py:109(pixel_coord_to_block_coord) 324 0.002 0.000 0.188 0.001 utils.py:27(__init__) 324 0.016 0.000 0.186 0.001 gdal.py:848(ReadAsArray) 1 0.000 0.000 0.160 0.160 utils.py:50(__init__) The top two calls contain the main loop - the entire analyis. The remaining calls sum to less than 625 of the 11644 seconds. Where are the remaining 11,000 seconds spent? Is it all within the main loop of find_pixel_pairs.py? If so, can I find out which lines of code are taking most of the time?

    Read the article

< Previous Page | 84 85 86 87 88 89 90 91 92 93 94 95  | Next Page >