Search Results

Search found 11524 results on 461 pages for 'pc wiki'.

Page 420/461 | < Previous Page | 416 417 418 419 420 421 422 423 424 425 426 427  | Next Page >

  • Is it possible to reliably auto-decode user files to Unicode? [C#]

    - by NVRAM
    I have a web application that allows users to upload their content for processing. The processing engine expects UTF8 (and I'm composing XML from multiple users' files), so I need to ensure that I can properly decode the uploaded files. Since I'd be surprised if any of my users knew their files even were encoded, I have very little hope they'd be able to correctly specify the encoding (decoder) to use. And so, my application is left with task of detecting before decoding. This seems like such a universal problem, I'm surprised not to find either a framework capability or general recipe for the solution. Can it be I'm not searching with meaningful search terms? I've implemented BOM-aware detection (http://en.wikipedia.org/wiki/Byte_order_mark) but I'm not sure how often files will be uploaded w/o a BOM to indicate encoding, and this isn't useful for most non-UTF files. My questions boil down to: Is BOM-aware detection sufficient for the vast majority of files? In the case where BOM-detection fails, is it possible to try different decoders and determine if they are "valid"? (My attempts indicate the answer is "no.") Under what circumstances will a "valid" file fail with the C# encoder/decoder framework? Is there a repository anywhere that has a multitude of files with various encodings to use for testing? While I'm specifically asking about C#/.NET, I'd like to know the answer for Java, Python and other languages for the next time I have to do this. So far I've found: A "valid" UTF-16 file with Ctrl-S characters has caused encoding to UTF-8 to throw an exception (Illegal character?) (That was an XML encoding exception.) Decoding a valid UTF-16 file with UTF-8 succeeds but gives text with null characters. Huh? Currently, I only expect UTF-8, UTF-16 and probably ISO-8859-1 files, but I want the solution to be extensible if possible. My existing set of input files isn't nearly broad enough to uncover all the problems that will occur with live files. Although the files I'm trying to decode are "text" I think they are often created w/methods that leave garbage characters in the files. Hence "valid" files may not be "pure". Oh joy. Thanks.

    Read the article

  • Keeping track of business rules within IT department?

    - by evaldas-alexander
    I am looking for the best way to keep track of the business rules for both developers and everybody else (support staff / management) in a startup enviroment. The challenge is that our business model requires quite a lot of different business rules, which are created pretty much on the fly and evolving organically after that. After running this project for 3+ years, we have so many of such rules that often the only way to be sure about what the application is supposed to do in a certain situation is to go find the module responsible for that process and analyze its code and comments. That is all fine as long as you have one single developer who created the entire application from the scratch, but every new developer needs to go over pretty much entire codebase in order to understand how the application works. Even bigger problem is that non technical employees don't even have that option and therefore are forced to ask me pretty much every day how some certain case would be handled by the application. Quick example - we only start charging for our customer campaigns once they have been active for at least 72 hours, but at the same time we stop creating invoices for campaigns that belong to insolvent accounts and close such accounts within a month of the first failed charge. That does not apply to accounts that are set to "non-chargeable" which most commonly belongs to us since we are using the service ourselves. The invoices are created on the 1st of each month and include charges from the previous month + any current balance that the account might have. However, some customers are charged only 4 days after their invoice has been generated due to issues with their billing department. In addition to that, invoices are also created when customer deactivates his campaign, but that can only be done once the campaign is not longer under mandatory 6 month contract, unless account manager approves early deactivation. I know, that's quite a lot of rules that need to be taken into account when answering a question "when do we bill our customers", but actually I could still append an asterisk at the end of each sentence in order to disclose some rare exceptions. Of course, it would be easiest just to keep the business rules to the minimum, but we need to adapt to changing marketplace - i.e. less than a year ago we had no contracts whatsoever. One idea that I had so far was a simplistic wiki with categories corresponding to areas such as "Account activation", "Invoicing", "Collection procedures" and so on. Another idea would be to have giant interactive flowchart showing the entire customer "life cycle" from prospecting to account deactivation. What are your experiences / suggestions?

    Read the article

  • DCVS + hosting for a startup commercial multiplatform phone app

    - by AG
    I'm in lean startup mode, working on a simple phone app that will be published initially as a iThingy app and an Android app with, possibly, Blackberry and Symbian versions to follow. I'm about to go from no repository to needing a central repository that up to 4 very part-time resources will be sharing. Two of us have no version control background, one has used Subversion, and I've used most of the major centralized VCS systems. I'm not going to be pushing the technical limitations of any VCS for a long time; I'm sure that any of the major systems would work fine. And the hosting accounts I've looked at seem reasonable. So I'm really focussed on minimizing the downside risks. That is, I'd like to find a stable setup that is easy to learn in general, easy to use from Windows/Eclipse, and won't paint me into any obvious corners for the next 12 months or so. A quick search of the web had led me to consider the following pairs of DVCS and hosting service, with what I think I'm hearing as their strengths and weaknesses (for my purposes): Bazaar/Launchpad -- My initial choice since I need to get more familiar with this pair for the Google Summer of Code mentoring I'm doing. But, whatever the technical merits, a non-starter for me because they are purely open source, no private repositories plans to purchase that I can see. Git/GitHub -- Git: Fast, light, ultimately flexible, but relatively less Windows friendly, Eclipse plugin (eGit) available but relatively young, GitHub: widely used, pricing is fine Mercurial/BitBucket -- Mercurial: a little less flexible, a little more Windows friendly, Eclipse plugin seems a bit more mature, BitBucket: widely used, pricing is fine, includes a wiki and an issue tracker that we might be able to use instead of something like BaseCamp, at least for a while. Mercurial/BitBucket seem like the winning pair so far for my particular situation; at least two of us are definitely going to be working mostly from Eclipse on Windows and reducing my own learning curve is a priority. ;-) But I have two specific questions: 1) Am I wrong about Bazaar/Launchpad and is there a viable, secure way to use them for proprietary code? 2) Any reason to think that the Mecurial/Bitbucket pair will end up being a headache for my Mac developer, soon, or for Blackberry or Symbian developers a little later? ag

    Read the article

  • Increase efficiency for an R simulator of the Monty Hall Puzzle

    - by jahan_m
    The Monty Hall Problem is a simple puzzle involving probability that even stumps professionals in careers dealing with some heavy-duty math. Here's the basic problem: Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice? You can find numerous explanations of the solution here: http://en.wikipedia.org/wiki/Monty_Hall_problem Goal of my simulation: Prove that a switching strategy will win you the car 2/3 of the time. I got curious and wanted to write a little function that simulates the problem many times and returns the proportion of wins if you switched and the proportion of wins if you stayed with your first choice. The function then plots the cumulative wins. First and foremost, I'm interested in hearing if my simulation is indeed replicating the Monty Problem, or if some aspect of the code got it wrong. Secondly, this function takes a long time to run once I get to about 10,000 simulations. I know I don't need this many simulations to prove this but I'd love to hear some ideas on how to make it more efficient. Thanks for your feedback! Monty_Hall=function(repetitions){ doors=c('A','B','C') stay_wins=0 switch_wins=0 series=data.frame(sim_num=seq(repetitions),cum_sum_stay=replicate(repetitions,0),cum_sum_switch=replicate(repetitions,0)) for(i in seq(repetitions)){ winning_door=sample(doors,1) contestant_chooses=sample(doors,1) if(contestant_chooses==winning_door) stay_wins=stay_wins+1 else switch_wins=switch_wins+1 series[i,'cum_sum_stay']=stay_wins series[i,'cum_sum_switch']=switch_wins } plot(series$sim_num,series$cum_sum_switch,col=2,ylab='Cumulative # of wins', xlab='Simulation #',main=sprintf('%d Simulations of the Monty Hall Paradox',repetitions),type='l') lines(series$sim_num,series$cum_sum_stay,col=4) legend('topleft',legend=c('Cumulative wins from switching', 'Cumulative wins from staying'),col=c(2,4),lty=1) result=list(series=series,stay_wins=stay_wins,switch_wins=switch_wins, proportion_stay_wins=stay_wins/repetitions, proportion_switch_wins=switch_wins/repetitions) return(result) } #Theory predicts that it is to the contestant's advantage if he #switches his choice to the other door. This function simulates the game #many times, and shows you the proportion of games in which staying or #switching would win the car. It also plots the cumulative wins for each strategy. Monty_Hall(100)

    Read the article

  • 8bpp Bitmap format on the Compact Framework

    - by Kieran
    Hi friends. I am messing around with Conway's Game of Life - http://en.wikipedia.org/wiki/Conway's_Game_of_Life I started out coding algorithmns for winforms and now want to port my work onto windows mobile 6.1 (compact framework). I came across an article by Jon Skeet where he compared several different algorithmns for calculating next generations in the game. He used an array of bytes to store a cells state (alive or dead) and then he would copy this array to an 8bpp bitmap. For each new generation, he works out the state of each byte, then copies the array to a bitmap, then draws that bitmap to a picturebox. void CreateInitialImage() { bitmap = new Bitmap(Width, Height, PixelFormat.Format8bppIndexed); ColorPalette palette = bitmap.Palette; palette.Entries[0] = Color.Black; palette.Entries[1] = Color.White; bitmap.Palette = palette; } public Image Render() { Rectangle rect = new Rectangle(0, 0, Width, Height); BitmapData bmpData = bitmap.LockBits(rect, ImageLockMode.ReadWrite, bitmap.PixelFormat); Marshal.Copy(Data, 0, bmpData.Scan0, Data.Length); bitmap.UnlockBits(bmpData); return bitmap; } His code above is beautifully simple and very fast to render. Jon is using Windows Forms but now I want to port my own version of this onto Windows Mobile 6.1 (Compact Framework) but . . . .there is no way to format a bitmap to 8bpp in the cf. Can anyone suggest a way of rendering an array of bytes to a drawable image in the CF. This array is created in code on the fly (it is NOT loaded from an image file on disk). I basically need to store an array of cells represented by bytes, they are either alive or dead and I then need to draw that array as an image. The game is particularly slow on the CF so I need to implement clever optimised algoritmns but also need to render as fast as possible and the above solution would be pretty dam perfect if only it was available on the compact framework. Many thanks for any help Any suggestions?

    Read the article

  • HLSL tex2d sampler seemingly using inconsistent rounding; why?

    - by RJFalconer
    Hello all, I have code that needs to render regions of my object differently depending on their location. I am trying to use a colour map to define these regions. The problem is when I sample from my colour map, I get collisions. Ie, two regions with different colours in the colourmap get the same value returned from the sampler. I've tried various formats of my colour map. I set the colours for each region to be "5" apart in each case; Indexed colour RGB, RGBA: region 1 will have RGB 5%,5%,5%. region 2 will have RGB 10%,10%,10% and so on. HSV Greyscale: region 1 will have HSV 0,0,5%. region 2 will have HSV 0,0,10% and so on. (Values selected in The Gimp) The tex2D sampler returns a value [0..1]. [ I then intend to derive an int array index from region. Code to do with that is unrelated, so has been removed from the question ] float region = tex2D(gColourmapSampler,In.UV).x; Sampling the "5%" colour gave a "region" of 0.05098 in hlsl. From this I assume the 5% represents 5/100*255, or 12.75, which is rounded to 13 when stored in the texture OR when sampled by the sampler; can't tell which. (Reasoning: 0.05098 * 255 ~= 13) By this logic, the 50% should be stored as 127.5. Sampled, I get 0.50196 which implies it was stored as 128. the 70% should be stored as 178.5. Sampled, I get 0.698039, which implies it was stored as 178. What rounding is going on here? (127.5 becomes 128, 178.5 becomes 178 ?!) Edit: OK, http://en.wikipedia.org/wiki/Bankers_rounding#Round_half_to_even Apparently this is "banker's rounding". Is this really what HLSL samplers use? I am using Shader Model 2 and FX Composer. This is my sampler declaration; //Colour map texture gColourmapTexture < string ResourceName = "Globe_Colourmap_Regions_Greyscale.png"; string ResourceType = "2D"; >; sampler2D gColourmapSampler : register(s1) = sampler_state { Texture = <gColourmapTexture>; #if DIRECT3D_VERSION >= 0xa00 Filter = MIN_MAG_MIP_LINEAR; #else /* DIRECT3D_VERSION < 0xa00 */ MinFilter = Linear; MipFilter = Linear; MagFilter = Linear; #endif /* DIRECT3D_VERSION */ AddressU = Clamp; AddressV = Clamp; };

    Read the article

  • vim plugin to show current Perl subroutine

    - by Andrew
    I'm trying to make a vim plugin that will split the window on load and simulate a info bar at the top of my terminal. I've got it sorta working but I think I've either reached limits of my knowledge of vim syntax or there's a logic problem in my code. The desired effect would be to do a reverse search for any declaration of a Perl subroutine form my current location in the active buffer and display the line in the top buffer. I'm also trying to make it skip that buffer when I switch buffers with <C-R>. My attempt at that so far can be seen in the mess of nested if statements. Anyway, here's the code. I would greatly appreciate feedback from anyone. (pastebin pastebin.com/8cuMPn1Q) let s:current_function_bufname = 'Current\ Function\/Subroutine' function! s:get_current_function_name(no_echo) let lnum = line(".") let col = col(".") if a:no_echo let s:current_function_name = getline(search("^[^s]sub .$", 'bW')) else echohl ModeMsg echo getline(search("^[^s]sub .$", 'bW')) "echo getline(search("^[^ \t#/]\{2}.[^:]\s$", 'bW')) echohl None endif endfunction let s:previous_winbufnr = 1 let s:current_function_name = '' let s:current_function_buffer_created = 0 let s:current_function_bufnr = 2 function! s:show_current_function() let total_buffers = winnr('$') let current_winbufnr = winnr() if s:previous_winbufnr != current_winbufnr if bufname(current_winbufnr) == s:current_function_bufname if s:previous_winbufnr < current_winbufnr let i = current_winbufnr + 1 if i total_buffers let i = 1 endif if i == s:current_function_bufnr let i = i + 1 endif if i total buffers let i = 1 endif exec i.'wincmd w' else let i = current_winbufnr - 1 if i < 1 let i = total_buffers endif if i == s:current_function_bufnr let i = i - 1 endif if i < 1 let i = total_buffers endif try exec i.'wincmd w' finally exec total_buffers.'wincmd w' endtry endif endif let s:previous_winbufnr = current_winbufnr return 1 endif if s:current_function_buffer_created == 0 exec 'top 1 split '.s:current_function_bufname call s:set_as_scratch_buffer() let s:current_function_buffer_created = 1 let s:current_function_bufnr = winnr() endif call s:activate_buffer_by_name(s:current_function_bufname) setlocal modifiable call s:get_current_function_name(1) call setline(1, s:current_function_name) setlocal nomodifiable call s:activate_buffer_by_name(bufname(current_winbufnr)) endfunction function! s:set_as_scratch_buffer() setlocal noswapfile setlocal nomodifiable setlocal bufhidden=delete setlocal buftype=nofile setlocal nobuflisted setlocal nonumber setlocal nowrap setlocal cursorline endfunction function! s:activate_buffer_by_name(name) for i in range(1, winnr('$')) let name = bufname(winbufnr(i)) let full_name = fnamemodify(bufname(winbufnr(i)), ':p') if name == a:name || full_name == a:name exec i.'wincmd w' return 1 endif endfor return 0 endfunction set laststatus=2 autocmd! CursorMoved,CursorMovedI,BufWinEnter * call s:show_current_function() (pastebin pastebin.com/8cuMPn1Q) similar to VIM: display custom reference bar on top of window and http://vim.wikia.com/wiki/Show_current_function_name_in_C_programs

    Read the article

  • File Output using Gforth

    - by sheepez
    As a first project I have been writing a short program to render the Mandelbrot fractal. I have got to the point of trying to output my results to a file ( e.g. .bmp or .ppm ) and got stuck. I have not really found any examples of exactly what I am trying to do, but I have found two examples of code to copy from one file to another. The examples in the Gforth documentation ( Section 3.27 ) did not work for me ( winXP ) in fact they seemed to open and create files but not write to files properly. This is the Gforth documentation example that copies the contents of one file to another: 0 Value fd-in 0 Value fd-out : open-input ( addr u -- ) r/o open-file throw to fd-in ; : open-output ( addr u -- ) w/o create-file throw to fd-out ; s" foo.in" open-input s" foo.out" open-output : copy-file ( -- ) begin line-buffer max-line fd-in read-line throw while line-buffer swap fd-out write-line throw repeat ; I found this example ( http://rosettacode.org/wiki/File_IO#Forth ) which does work. The main problem is that I can't isolate the part that writes to a file and have it still work. The main confusion is that r doesn't seem to consume TOS as I might expect. : copy-file2 ( a1 n1 a2 n2 -- ) r/o open-file throw >r w/o create-file throw r> begin pad maxstring 2 pick read-file throw ?dup while pad swap 3 pick write-file throw repeat close-file throw close-file throw ; \ Invoke it like this: s" output.txt" s" input.txt" copy-file I would be very grateful if someone could explain exactly how the open, create read and write -file words actually work, as my investigation keeps resulting in somewhat bizarre stacks. Any clues as to why the Gforth examples do not work might help too. In summary, I want to output from Gforth to a file and so far have been thwarted. Can anyone offer any help? Thank you Vijay, I think that I understand the example that you gave. However when I try to use something like this ( which I think is similar ): 0 value test-file : write-test s" testfile.out" w/o create-file throw to test-file s" test text" test-file write-line ; I get ok but nothing is put into the file, have I made a mistake? It seems that the problem was due to not flushing the relevant buffers or explicitly closing the file. Adding something like test-file flush-file throw or test-file close-file throw between write-line and ; makes it work. Thanks again Vijay for helping.

    Read the article

  • Infinite loop when adding a row to a list in a class in python3

    - by Margaret
    I have a script which contains two classes. (I'm obviously deleting a lot of stuff that I don't believe is relevant to the error I'm dealing with.) The eventual task is to create a decision tree, as I mentioned in this question. Unfortunately, I'm getting an infinite loop, and I'm having difficulty identifying why. I've identified the line of code that's going haywire, but I would have thought the iterator and the list I'm adding to would be different objects. Is there some side effect of list's .append functionality that I'm not aware of? Or am I making some other blindingly obvious mistake? class Dataset: individuals = [] #Becomes a list of dictionaries, in which each dictionary is a row from the CSV with the headers as keys def field_set(self): #Returns a list of the fields in individuals[] that can be used to split the data (i.e. have more than one value amongst the individuals def classified(self, predicted_value): #Returns True if all the individuals have the same value for predicted_value def fields_exhausted(self, predicted_value): #Returns True if all the individuals are identical except for predicted_value def lowest_entropy_value(self, predicted_value): #Returns the field that will reduce <a href="http://en.wikipedia.org/wiki/Entropy_%28information_theory%29">entropy</a> the most def __init__(self, individuals=[]): and class Node: ds = Dataset() #The data that is associated with this Node links = [] #List of Nodes, the offspring Nodes of this node level = 0 #Tree depth of this Node split_value = '' #Field used to split out this Node from the parent node node_value = '' #Value used to split out this Node from the parent Node def split_dataset(self, split_value): fields = [] #List of options for split_value amongst the individuals datasets = {} #Dictionary of Datasets, each one with a value from fields[] as its key for field in self.ds.field_set()[split_value]: #Populates the keys of fields[] fields.append(field) datasets[field] = Dataset() for i in self.ds.individuals: #Adds individuals to the datasets.dataset that matches their result for split_value datasets[i[split_value]].individuals.append(i) #<---Causes an infinite loop on the second hit for field in fields: #Creates subnodes from each of the datasets.Dataset options self.add_subnode(datasets[field],split_value,field) def add_subnode(self, dataset, split_value='', node_value=''): def __init__(self, level, dataset=Dataset()): My initialisation code is currently: if __name__ == '__main__': filename = (sys.argv[1]) #Takes in a CSV file predicted_value = "# class" #Identifies the field from the CSV file that should be predicted base_dataset = parse_csv(filename) #Turns the CSV file into a list of lists parsed_dataset = individual_list(base_dataset) #Turns the list of lists into a list of dictionaries root = Node(0, Dataset(parsed_dataset)) #Creates a root node, passing it the full dataset root.split_dataset(root.ds.lowest_entropy_value(predicted_value)) #Performs the first split, creating multiple subnodes n = root.links[0] n.split_dataset(n.ds.lowest_entropy_value(predicted_value)) #Attempts to split the first subnode.

    Read the article

  • Why my json_encode get corrupted

    - by Cullen SUN
    $model = new XUploadForm; $model->file = CUploadedFile::getInstance( $model, 'file' ); //We check that the file was successfully uploaded if( $model->file !== null ) { //Grab some data $model->mime_type = $model->file->getType( ); $model->size = $model->file->getSize( ); $model->name = $model->file->getName( ); $file_extention = $model->file->getExtensionName( ); //(optional) Generate a random name for our file $file_tem_name = md5(Yii::app( )->user->id.microtime( ).$model->name); $file_thumb_name = $file_tem_name.'_thumb.'.$file_extention; $file_image_name = $file_tem_name.".".$file_extention; if( $model->validate( ) ) { //Move our file to our temporary dir $model->file->saveAs( $path.$file_image_name ); if(chmod($path.$file_image_name, 0777 )){ // Yii::import("ext.EPhpThumb.EPhpThumb"); // $thumb_=new EPhpThumb(); // $thumb_->init(); // $thumb_->create($path.$file_image_name) // ->resize(110,80) // ->save($path.$file_thumb_name); } //here you can also generate the image versions you need //using something like PHPThumb //Now we need to save this path to the user's session if( Yii::app( )->user->hasState( 'images' ) ) { $userImages = Yii::app( )->user->getState( 'images' ); } else { $userImages = array(); } $userImages[] = array( "filename" => $file_image_name, 'size' => $model->size, 'mime' => $model->mime_type, "path" => $path.$file_image_name, // "thumb" => $path.$file_thumb_name, ); Yii::app( )->user->setState('images', $userImages); //Now we need to tell our widget that the upload was succesfull //We do so, using the json structure defined in // https://github.com/blueimp/jQuery-File-Upload/wiki/Setup echo json_encode( array( array( "type" => $model->mime_type, "size" => $model->size, "url" => $publicPath.$file_image_name, //"thumbnail_url" => $publicPath.$file_thumb_name, //"thumbnail_url" => $publicPath."thumbs/$filename", "delete_url" => $this->createUrl( "upload", array( "_method" => "delete", "file" => $file_image_name ) ), "delete_type" => "POST" ) ) ); Above code give me correct response, [{"type":"image/jpeg","size":2266,"url":"/uploads/tmp/0b00cbaee07c6410241428c74aae1dca.jpeg","delete_url":"/api/imageUpload/upload?_method=delete&file=0b00cbaee07c6410241428c74aae1dca.jpeg","delete_type":"POST"}] but if I uncomment the following // Yii::import("ext.EPhpThumb.EPhpThumb"); // $thumb_=new EPhpThumb(); // $thumb_->init(); // $thumb_->create($path.$file_image_name) // ->resize(110,80) // ->save($path.$file_thumb_name); it gave me corrupted response: Mac OS X 2??ATTR?dA??Y?Ycom.apple.quarantine0001;50655994;Google\x20Chrome.app;2599ECF9-69C5-4386-B3D9-9F5CC7E0EE1D|com.google.ChromeThis resource fork intentionally left blank ??[{"type":"image/jpeg","size":1941,"url":"/uploads/tmp/409c5921c6d20944e1a81f32b12fc380.jpeg","delete_url":"/api/imageUpload/upload?_method=delete&file=409c5921c6d20944e1a81f32b12fc380.jpeg","delete_type":"POST"}]

    Read the article

  • Persistent (purely functional) Red-Black trees on disk performance

    - by Waneck
    I'm studying the best data structures to implement a simple open-source object temporal database, and currently I'm very fond of using Persistent Red-Black trees to do it. My main reasons for using persistent data structures is first of all to minimize the use of locks, so the database can be as parallel as possible. Also it will be easier to implement ACID transactions and even being able to abstract the database to work in parallel on a cluster of some kind. The great thing of this approach is that it makes possible implementing temporal databases almost for free. And this is something quite nice to have, specially for web and for data analysis (e.g. trends). All of this is very cool, but I'm a little suspicious about the overall performance of using a persistent data structure on disk. Even though there are some very fast disks available today, and all writes can be done asynchronously, so a response is always immediate, I don't want to build all application under a false premise, only to realize it isn't really a good way to do it. Here's my line of thought: - Since all writes are done asynchronously, and using a persistent data structure will enable not to invalidate the previous - and currently valid - structure, the write time isn't really a bottleneck. - There are some literature on structures like this that are exactly for disk usage. But it seems to me that these techniques will add more read overhead to achieve faster writes. But I think that exactly the opposite is preferable. Also many of these techniques really do end up with a multi-versioned trees, but they aren't strictly immutable, which is something very crucial to justify the persistent overhead. - I know there still will have to be some kind of locking when appending values to the database, and I also know there should be a good garbage collecting logic if not all versions are to be maintained (otherwise the file size will surely rise dramatically). Also a delta compression system could be thought about. - Of all search trees structures, I really think Red-Blacks are the most close to what I need, since they offer the least number of rotations. But there are some possible pitfalls along the way: - Asynchronous writes -could- affect applications that need the data in real time. But I don't think that is the case with web applications, most of the time. Also when real-time data is needed, another solutions could be devised, like a check-in/check-out system of specific data that will need to be worked on a more real-time manner. - Also they could lead to some commit conflicts, though I fail to think of a good example of when it could happen. Also commit conflicts can occur in normal RDBMS, if two threads are working with the same data, right? - The overhead of having an immutable interface like this will grow exponentially and everything is doomed to fail soon, so this all is a bad idea. Any thoughts? Thanks! edit: There seems to be a misunderstanding of what a persistent data structure is: http://en.wikipedia.org/wiki/Persistent_data_structure

    Read the article

  • How do you combine "Revision Control" with "WorkFlow" for R?

    - by Tal Galili
    Hello all, I remember coming across R users writing that they use "Revision control" (e.g: "Source control"), and I am curious to know: How do you combine "Revision control" with your statistical analysis WorkFlow? Two (very) interesting discussions talk about how to deal with the WorkFlow. But neither of them refer to the revision control element: http://stackoverflow.com/questions/1266279/how-to-organize-large-r-programs http://stackoverflow.com/questions/1429907/workflow-for-statistical-analysis-and-report-writing A Long Update To The Question: Following some of the people's answers, and Dirk's question in the comment, I would like to direct my question a bit more. After reading the Wiki article about "revision control" (which I was previously not familiar with), it was clear to me that when using revision control, what one does is to build a development structure of his code. This structure either leads to a "final product" or to several branches. When building something like, let's say, a website. There is usually one end product you work towards (the website), with some prototypes along the way. But when doing a statistical analysis, the work (to my view) is different. Sometimes you know where you want to get to. But more often, you explore. Explore cleaning the dataset. Explore different methods for statistical analysis, and ask various questions of your data (and I am writing this, knowing how Frank Harrell, and other experience statisticians feels about Data dredging). That is way the WorkFlow question with statistical programming is (in my view) a serious and deep question, raising many issues, The simpler ones are technical: Which revision control software do you use (and why) ? Which IDE do you use(and why) ? The more interesting question are about work process: How do you structure your files? What do you keep as a separate file and what as a revision? or asking in a different way - What should be a "branch" and what should be a "sub project" in your code? For example: When starting to explore your data, should a plot be creating and then erased because it didn't lead any where (but kept as a revision) or should there be a backup file of that path? How you solve this tension was my initial curiosity. The second question is "what might I be missing?". What rules (of thumb) should one follow so to avoid common pitfalls doing statistical programming with version control? In my intuition, I feel that statistical programming is inherently different then software development (I am writing this without being a real expert in statistical programming, and even less so in software development). That's way I am unsure which of the lessons I have read here about version control would be applicable. Thanks a lot, Tal

    Read the article

  • Google Docs iphone library error reporting

    - by phil harris
    I'm in the process of adding a Google Docs interface to my iPhone app, and I'm largely following the example in the GoogleDocs.m file from Tom Saxton's example app. The objective-c library I'm using is from http://code.google.com/p/gdata-objectivec-client/wiki/GDataObjCIntroduction The library file used is from gdata-objectivec-client-1.10.0.zip. This service:username:password method is a slight variant of the one found in the Saxton file GoogleDocs.m starting at line 351: - (void)service:(NSString *)username password:(NSString *)password { if(service == nil) { service = [[[GDataServiceGoogleDocs alloc] init] autorelease]; [service setUserAgent:s_strUserAgent]; [service setShouldCacheDatedData:NO]; [service setServiceShouldFollowNextLinks:NO]; (void)[service authenticateWithDelegate:self didAuthenticateSelector:@selector(ticket:authenticatedWithError:)]; } // update the username/password each time the service is requested if (username != nil && [username length] && password != nil && [password length]) [service setUserCredentialsWithUsername:username password:password]; else [service setUserCredentialsWithUsername:nil password:nil]; } // associated callback for service:username:password: method - (void)ticket:(GDataServiceTicket *)ticket authenticatedWithError:(NSError *)error { NSLog(@"%@",@"authenticatedWithError called"); if(error == nil) [self selectBackupRestore]; else { NSLog(@"error code(%d)", [error code]); NSLog(@"error domain(%d)", [error domain]); NSLog(@"localizedDescription(%@)", error.localizedDescription); NSLog(@"localizedFailureReason(%@)", error.localizedFailureReason); NSLog(@"localizedRecoveryOptions(%@)", error.localizedRecoveryOptions); NSLog(@"localizedRecoverySuggestion(%@)", error.localizedRecoverySuggestion); } } Please note the service:username:password method and the callback compile and run fine. The problem is that the callback is passing a non-nil NSError object. I added an NSLog() for every error reporting attribute of NSError and the (Xcode) log output of a test run is below. [Session started at 2010-05-27 12:27:16 -0700.] 2010-05-27 12:27:38.778 iFilebox[74596:207] authenticatedWithError called 2010-05-27 12:27:38.779 iFilebox[74596:207] error code(-1) 2010-05-27 12:27:38.780 iFilebox[74596:207] error domain(499324) 2010-05-27 12:27:38.781 iFilebox[74596:207] localizedDescription(Operation could not be completed. (com.google.GDataServiceDomain error -1.)) 2010-05-27 12:27:38.782 iFilebox[74596:207] localizedFailureReason((null)) 2010-05-27 12:27:38.782 iFilebox[74596:207] localizedRecoveryOptions((null)) 2010-05-27 12:27:38.783 iFilebox[74596:207] localizedRecoverySuggestion((null)) My essential question is in the error reporting. I was hoping the localizedDescription would be more specific of the error. All I get for the error code value is -1, and the only description of the error is "Operation could not be completed. (com.google.GDataServiceDomain error -1.". Not very helpful. Does anyone know what a GDataServiceDomain error -1 is? Where can I find a full list of all error codes returned, and a description of what they mean?

    Read the article

  • Finding the most frequent subtrees in a collection of (parse) trees

    - by peter.murray.rust
    I have a collection of trees whose nodes are labelled (but not uniquely). Specifically the trees are from a collection of parsed sentences (see http://en.wikipedia.org/wiki/Treebank). I wish to extract the most common subtrees from the collection - performance is not (yet) an issue. I'd be grateful for algorithms (ideally Java) or pointers to tools which do this for treebanks. Note that order of child nodes is important. EDIT @mjv. We are working in a limited domain (chemistry) which has a stylised language so the varirty of the trees is not huge - probably similar to children's readers. Simple tree for "the cat sat on the mat". <sentence> <nounPhrase> <article/> <noun/> </nounPhrase> <verbPhrase> <verb/> <prepositionPhrase> <preposition/> <nounPhrase> <article/> <noun/> </nounPhrase> </prepositionPhrase> </verbPhrase> </sentence> Here the sentence contains two identical part-of-speech subtrees (the actual tokens "cat". "mat" are not important in matching). So the algorithm would need to detect this. Note that not all nounPhrases are identical - "the big black cat" could be: <nounPhrase> <article/> <adjective/> <adjective/> <noun/> </nounPhrase> The length of sentences will be longer - between 15 to 30 nodes. I would expect to get useful results from 1000 trees. If this does not take more than a day or so that's acceptable. Obviously the shorter the tree the more frequent, so nounPhrase will be very common. EDIT If this is to be solved by flattening the tree then I think it would be related to Longest Common Substring, not Longest Common Sequence. But note that I don't necessarily just want the longest - I want a list of all those long enough to be "interesting" (criterion yet to be decided).

    Read the article

  • MBR status confusion

    - by Ahmed Ghoneim
    EB 58 90 6D 6B 64 6F 73 66 73 00 00 02 08 20 00 02 00 00 00 00 F8 00 00 3E 00 83 00 00 00 00 00 94 88 7E 00 98 1F 00 00 00 00 00 00 02 00 00 00 01 00 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 29 A9 38 B1 34 57 61 76 65 20 20 20 20 20 20 20 46 41 54 33 32 20 20 20 0E 1F BE 77 7C AC 22 C0 74 0B 56 B4 0E BB 07 00 CD 10 5E EB F0 32 E4 CD 16 CD 19 EB FE 54 68 69 73 20 69 73 20 6E 6F 74 20 61 20 62 6F 6F 74 61 62 6C 65 20 64 69 73 6B 2E 20 20 50 6C 65 61 73 65 20 69 6E 73 65 72 74 20 61 20 62 6F 6F 74 61 62 6C 65 20 66 6C 6F 70 70 79 20 61 6E 64 0D 0A 70 72 65 73 73 20 61 6E 79 20 6B 65 79 20 74 6F 20 74 72 79 20 61 67 61 69 6E 20 2E 2E 2E 20 0D 0A 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 AA Learning disk records, this is my USB MBR record viewed by bless on ubuntu formatted with disk utility as MBR table and FAT partition, referring to this Wiki of first record status (0x80 = bootable (active), 0x00 = non-bootable, other = invalid ) but my MBR shows first offset as EB. What's this record stands for ? also, can you provide me with good tables/images tutorials for MBR and other disks' records :)

    Read the article

  • What is an overloaded operator in c++?

    - by Jeff
    I realize this is a basic question but I have searched online, been to cplusplus.com, read through my book, and I can't seem to grasp the concept of overloaded operators. A specific example from cplusplus.com is: // vectors: overloading operators example #include <iostream> using namespace std; class CVector { public: int x,y; CVector () {}; CVector (int,int); CVector operator + (CVector); }; CVector::CVector (int a, int b) { x = a; y = b; } CVector CVector::operator+ (CVector param) { CVector temp; temp.x = x + param.x; temp.y = y + param.y; return (temp); } int main () { CVector a (3,1); CVector b (1,2); CVector c; c = a + b; cout << c.x << "," << c.y; return 0; } from http://www.cplusplus.com/doc/tutorial/classes2/ but reading through it I'm still not understanding them at all. I just need a basic example of the point of the overloaded operator (which I assume is the "CVector CVector::operator+ (CVector param)"). There's also this example from wikipedia: Time operator+(const Time& lhs, const Time& rhs) { Time temp = lhs; temp.seconds += rhs.seconds; if (temp.seconds >= 60) { temp.seconds -= 60; temp.minutes++; } temp.minutes += rhs.minutes; if (temp.minutes >= 60) { temp.minutes -= 60; temp.hours++; } temp.hours += rhs.hours; return temp; } from "http://en.wikipedia.org/wiki/Operator_overloading" The current assignment I'm working on I need to overload a ++ and a -- operator. Thanks in advance for the information and sorry about the somewhat vague question, unfortunately I'm just not sure on it at all.

    Read the article

  • Recommended ASP.NET Shared Hosting

    - by coffeeaddict
    Ok, I have to admit I'm getting fed up with www.discountasp.net's pricing model and this annoyance has built up over the past 8 years or so. I've been with them for years and absolutely love them on the technical side, however it's getting ridiculously expensive for so little that you get. I mean here's my scenario: 1) I am running 2 SQL Server databases which costs me $10/ea per month so that's $20/month for 2 and I only get 500 mb disk space which is horrible 2) I am paying $10/mo just for the hosting itself which I only get 1 gig of disk space! I mean common! 3) I am simply running 2 small apps (Screwturn Wiki & Subtext Blog)...so I don't really care if it's up 99% or not, it's not worth paying a total of $300 just to keep these 2 apps running over discountasp.net Anyone else feel the same? Yes, I know they have great support, probably have great servers running behind this but in the end I really don't care as long as my site is up 95% or better. Yes, the hosting toolset rocks. But you know I bet you I can find a similar set somewhere else. I like how I can totally control IIS 7 at discountasp and I can control my own app pool etc. That's very powerful and essential. But anyone have any good alternatives to discountasp that gives me close to the same at a much more reasonable cost point? I mean http://www.m6.net/prices.aspx gives you 10 SQL Databases for $7 and 200 gigs disk space! I don't know about their tools or support but just looking at those numbers and some other hosts I've seen, I feel that discountasp.net is way out of line. They don't even offer any purchasing discounts such as it would be nice if my 2nd SQL Server is only $5/month not $10...stuff like this, to make it much more realistic and fair. Opinions (people who do have discountasp.net, people who have left them, or people who have another host they like)??? But geez $300 just to host a couple DBs and lightweight open source apps? Not worth the price they are charging. I'm almost at a price point that enables me to get a decent dedicated server! I really don't care about beta support. Not a big deal to me.

    Read the article

  • Feedback on Optimizing C# NET Code Block

    - by Brett Powell
    I just spent quite a few hours reading up on TCP servers and my desired protocol I was trying to implement, and finally got everything working great. I noticed the code looks like absolute bollocks (is the the correct usage? Im not a brit) and would like some feedback on optimizing it, mostly for reuse and readability. The packet formats are always int, int, int, string, string. try { BinaryReader reader = new BinaryReader(clientStream); int packetsize = reader.ReadInt32(); int requestid = reader.ReadInt32(); int serverdata = reader.ReadInt32(); Console.WriteLine("Packet Size: {0} RequestID: {1} ServerData: {2}", packetsize, requestid, serverdata); List<byte> str = new List<byte>(); byte nextByte = reader.ReadByte(); while (nextByte != 0) { str.Add(nextByte); nextByte = reader.ReadByte(); } // Password Sent to be Authenticated string string1 = Encoding.UTF8.GetString(str.ToArray()); str.Clear(); nextByte = reader.ReadByte(); while (nextByte != 0) { str.Add(nextByte); nextByte = reader.ReadByte(); } // NULL string string string2 = Encoding.UTF8.GetString(str.ToArray()); Console.WriteLine("String1: {0} String2: {1}", string1, string2); // Reply to Authentication Request MemoryStream stream = new MemoryStream(); BinaryWriter writer = new BinaryWriter(stream); writer.Write((int)(1)); // Packet Size writer.Write((int)(requestid)); // Mirror RequestID if Authenticated, -1 if Failed byte[] buffer = stream.ToArray(); clientStream.Write(buffer, 0, buffer.Length); clientStream.Flush(); } I am going to be dealing with other packet types as well that are formatted the same (int/int/int/str/str), but different values. I could probably create a packet class, but this is a bit outside my scope of knowledge for how to apply it to this scenario. If it makes any difference, this is the Protocol I am implementing. http://developer.valvesoftware.com/wiki/Source_RCON_Protocol

    Read the article

  • sed - trying to replace first occurrence after a match

    - by wakkaluba
    I am facing a situation that drives me nuts. I am setting up an update server which uses a json file. Don't ask why or how, it sucks and is my only possibility to achieve it. I have been trying and researching for HOURS (many) because I went ballistic and wanted to crack this on my own. But I have to realize I got stuck and need help. So sorry for this chunk but I think it is somewhat important to see... The file is a one liner and repeating the following sequence with changing values (of course). "plugin_name_foo_bar": {"buildDate": "bla", "dependencies": [{"name": "bla", "optional": true, "version": "1.00"}], "developers": [{"developerId": "bla", "email": "[email protected]", "name": "Bla bla2nd"}], "excerpt": "some text {excerpt} !bla.png|thumbnail,border=1! ", "gav": "bla", "labels": ["report", "scm-related"], "name": "plugin_name_foo_bar", "previousTimestamp": "bla", "previousVersion": "1.0", "releaseTimestamp": "bla", "requiredCore": "1", "scm": "github.com", "sha1": "ynnBM2jWo25ZLDdP3ybBOnV/Pio=", "title": "bla", "url": "http://bla.org", "version": "1.0", "wiki": "https://bla.org"}, "Exclusion": {"buildDate": "bla", "dependencies": [], and the next plugin block is glued straight afterwards. What I now want to do is to search for "plugin_foo_bar": {" as this is the unique identifier for a new plugin description block. I want to replace the first sha1 value occuring afterwards. That's where I keep failing. I always grab the first,last or any occurrence in the entire file and not the block :( "title" is the unique identifier after the sha1 value. So I tried to make the .* less greedy but it ain't working out. last attempt was heading towards: sed -i 's/("name": "plugin_name_foo_bar.*sha1": ")([a-zA-Z0-9!@#\$%^&*()\[\]]*)(", "title"\)/\1blablabla\2/1' default.json to find the sha1 value of that plugin but still no joy. I hope someone knows - preferably a simpler approach - before I now continue with trial and error until I have to puke and freakout. I am working with SED on Windows, so Unix approach might help me to figure out how to achieve this in batch but please make it as one-liner if possible. Scripts are a real pain to convert. And I just need SED and no other solution with other tools like AWK. That is absolutely out of discussion. Any help is appreciated :) Cheers Jan

    Read the article

  • Which is the "best" data access framework/approach for C# and .NET?

    - by Frans
    (EDIT: I made it a community wiki as it is more suited to a collaborative format.) There are a plethora of ways to access SQL Server and other databases from .NET. All have their pros and cons and it will never be a simple question of which is "best" - the answer will always be "it depends". However, I am looking for a comparison at a high level of the different approaches and frameworks in the context of different levels of systems. For example, I would imagine that for a quick-and-dirty Web 2.0 application the answer would be very different from an in-house Enterprise-level CRUD application. I am aware that there are numerous questions on Stack Overflow dealing with subsets of this question, but I think it would be useful to try to build a summary comparison. I will endeavour to update the question with corrections and clarifications as we go. So far, this is my understanding at a high level - but I am sure it is wrong... I am primarily focusing on the Microsoft approaches to keep this focused. ADO.NET Entity Framework Database agnostic Good because it allows swapping backends in and out Bad because it can hit performance and database vendors are not too happy about it Seems to be MS's preferred route for the future Complicated to learn (though, see 267357) It is accessed through LINQ to Entities so provides ORM, thus allowing abstraction in your code LINQ to SQL Uncertain future (see Is LINQ to SQL truly dead?) Easy to learn (?) Only works with MS SQL Server See also Pros and cons of LINQ "Standard" ADO.NET No ORM No abstraction so you are back to "roll your own" and play with dynamically generated SQL Direct access, allows potentially better performance This ties in to the age-old debate of whether to focus on objects or relational data, to which the answer of course is "it depends on where the bulk of the work is" and since that is an unanswerable question hopefully we don't have to go in to that too much. IMHO, if your application is primarily manipulating large amounts of data, it does not make sense to abstract it too much into objects in the front-end code, you are better off using stored procedures and dynamic SQL to do as much of the work as possible on the back-end. Whereas, if you primarily have user interaction which causes database interaction at the level of tens or hundreds of rows then ORM makes complete sense. So, I guess my argument for good old-fashioned ADO.NET would be in the case where you manipulate and modify large datasets, in which case you will benefit from the direct access to the backend. Another case, of course, is where you have to access a legacy database that is already guarded by stored procedures. ASP.NET Data Source Controls Are these something altogether different or just a layer over standard ADO.NET? - Would you really use these if you had a DAL or if you implemented LINQ or Entities? NHibernate Seems to be a very powerful and powerful ORM? Open source Some other relevant links; NHibernate or LINQ to SQL Entity Framework vs LINQ to SQL

    Read the article

  • Preoblem with Precision floating point operation in C

    - by Microkernel
    Hi Guys, For one of my course project I started implementing "Naive Bayesian classifier" in C. My project is to implement a document classifier application (especially Spam) using huge training data. Now I have problem implementing the algorithm because of the limitations in the C's datatype. ( Algorithm I am using is given here, http://en.wikipedia.org/wiki/Bayesian_spam_filtering ) PROBLEM STATEMENT: The algorithm involves taking each word in a document and calculating probability of it being spam word. If p1, p2 p3 .... pn are probabilities of word-1, 2, 3 ... n. The probability of doc being spam or not is calculated using Here, probability value can be very easily around 0.01. So even if I use datatype "double" my calculation will go for a toss. To confirm this I wrote a sample code given below. #define PROBABILITY_OF_UNLIKELY_SPAM_WORD (0.01) #define PROBABILITY_OF_MOSTLY_SPAM_WORD (0.99) int main() { int index; long double numerator = 1.0; long double denom1 = 1.0, denom2 = 1.0; long double doc_spam_prob; /* Simulating FEW unlikely spam words */ for(index = 0; index < 162; index++) { numerator = numerator*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD; denom2 = denom2*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD; denom1 = denom1*(long double)(1 - PROBABILITY_OF_UNLIKELY_SPAM_WORD); } /* Simulating lot of mostly definite spam words */ for (index = 0; index < 1000; index++) { numerator = numerator*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD; denom2 = denom2*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD; denom1 = denom1*(long double)(1- PROBABILITY_OF_MOSTLY_SPAM_WORD); } doc_spam_prob= (numerator/(denom1+denom2)); return 0; } I tried Float, double and even long double datatypes but still same problem. Hence, say in a 100K words document I am analyzing, if just 162 words are having 1% spam probability and remaining 99838 are conspicuously spam words, then still my app will say it as Not Spam doc because of Precision error (as numerator easily goes to ZERO)!!!. This is the first time I am hitting such issue. So how exactly should this problem be tackled?

    Read the article

  • Problem with Precision floating point operation in C

    - by Microkernel
    Hi Guys, For one of my course project I started implementing "Naive Bayesian classifier" in C. My project is to implement a document classifier application (especially Spam) using huge training data. Now I have problem implementing the algorithm because of the limitations in the C's datatype. ( Algorithm I am using is given here, http://en.wikipedia.org/wiki/Bayesian_spam_filtering ) PROBLEM STATEMENT: The algorithm involves taking each word in a document and calculating probability of it being spam word. If p1, p2 p3 .... pn are probabilities of word-1, 2, 3 ... n. The probability of doc being spam or not is calculated using Here, probability value can be very easily around 0.01. So even if I use datatype "double" my calculation will go for a toss. To confirm this I wrote a sample code given below. #define PROBABILITY_OF_UNLIKELY_SPAM_WORD (0.01) #define PROBABILITY_OF_MOSTLY_SPAM_WORD (0.99) int main() { int index; long double numerator = 1.0; long double denom1 = 1.0, denom2 = 1.0; long double doc_spam_prob; /* Simulating FEW unlikely spam words */ for(index = 0; index < 162; index++) { numerator = numerator*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD; denom2 = denom2*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD; denom1 = denom1*(long double)(1 - PROBABILITY_OF_UNLIKELY_SPAM_WORD); } /* Simulating lot of mostly definite spam words */ for (index = 0; index < 1000; index++) { numerator = numerator*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD; denom2 = denom2*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD; denom1 = denom1*(long double)(1- PROBABILITY_OF_MOSTLY_SPAM_WORD); } doc_spam_prob= (numerator/(denom1+denom2)); return 0; } I tried Float, double and even long double datatypes but still same problem. Hence, say in a 100K words document I am analyzing, if just 162 words are having 1% spam probability and remaining 99838 are conspicuously spam words, then still my app will say it as Not Spam doc because of Precision error (as numerator easily goes to ZERO)!!!. This is the first time I am hitting such issue. So how exactly should this problem be tackled?

    Read the article

  • Sharing the same file between different projects

    - by selsine
    Hi Everyone, For version control we currently use Visual Source Safe and are thinking of migrating to another version control system (SVN, Mercurial, Git). Currently we use Visual Source Safe's "Shared" file feature quite heavily. This allows us to share code between design and runtimes of a single product, and between multiple products as well. For example: **Product One** - Design Login.cpp Login.h Helper.cpp Helper.h - Runtime Login.cpp Login.h Helper.cpp Helper.h **Product Two** - Design Login.cpp Login.h - Launcher Login.cpp Login.h - Runtime Login.cpp Login.h In this example Login.cpp and Login.h contain common code that all of our projects need, Helper.cpp and Helper.h is only used in Product One. In Visual Source Safe they are shared between the specific projects, which means that whenever the files are updated in one project they are updated in any project they are shared with. This is a simple example but hopefully it explains why we use the shared feature: to reduce the amount of duplicated code and ensure that when a bug is fixed all projects automatically have access to the new fixed code. After researching alternatives to Visual Source Safe it seems that most version control systems do not have the idea of shared files, instead they seem to use the idea of sub repositories. (http://mercurial.selenic.com/wiki/subrepos http://svnbook.red-bean.com/en/1.0/ch07s03.html) My question (after all of that) is about what the best practices for achieving this are using other version control systems? Should we restructure our projects so that two copies of the files do not exist and an include directory is used instead? e.g. Product One Design Login.cpp Login.h Runtime Login.cpp Login.h Common Helper.cpp Helper.h This still leaves what to do with Login.cpp and Logon.h Should the shared files be moved to their own repository and then compiled into a lib or dll? This would make bug fixing more time consuming as the lib projects would have to be edited and then rebuilt. Should we use externals or sub repositories? Should we combine our projects (i.e. runtime, design, and launcher) into one large project? Any help would be appreciated. We have the feeling that our project design has evolved based on the tools that we used and now that we are thinking of switching tools it's difficult for us to see how we can best modify our practices. Or maybe we are the only people are there doing this...? Also, we use Visual Studio for all of our stuff. Thanks.

    Read the article

  • Using PHP5s SOAP Client to send data to an ASP/.NET based SOAP server.

    - by user325143
    I am trying to write a snippet of PHP to connect to a third party's API via SOAP to enter some data into their database. The API requires me to pass several mandatory fields for every call (username, password, companyid, entitytype) in addition to the mandatory data fields. It also requires me to call the "ValidateEntity" funciton before calling the "CreateEntity" function. Documentation can be found here: http://wiki.agemni.com/Getting_Started/APIs/Database_API I have never worked with SOAP before, so I am very new to this. Here is what I have so far: error_reporting(E_ALL); ini_set('display_errors', '1'); $client = new SoapClient("http://agemni.com/AgemniWebservices/service1.asmx?WSDL", array('trace'=> true)); $options = array( 'username' => "myuser", 'password' => "mypassword", 'companyid' => myID, 'entitytype' => 2 ); $params = array( 'fname' => "John", 'lname' => "Doe", 'phone' => "859-333-3333", 'zip' => "40332", 'area id' => "12345", 'lead id' => "28222", 'contactdate' => "4/10/2010" ); $validate = $client->__soapCall("ValidateEntity", array($params), array($options)); $client->__soapCall("CreateEntity", array($params), array($options)); echo "<pre>"; var_dump($client-> __getLastRequestHeaders()); var_dump($client-> __getLastRequest()); var_dump($client-> __getLastResponseHeaders()); var_dump($client-> __getLastResponse()); var_dump($result); echo "</pre>"; Upon executing this code, I get the following error: Fatal error: Uncaught SoapFault exception: [Client] SOAP-ERROR: Encoding: object hasn't 'objecttype' property in /www/tmp/index-soap.php:24 Stack trace: #0 /www/tmp/index-soap.php(24): SoapClient->__soapCall('ValidateEntity', Array) #1 {main} thrown in /www/stealth/tmp/index-soap.php on line 24 I guess my question is.. am I even going about doing this the right way? I know this is a very broad question, but I appreciate any advice you can give me about making this work. Please let me know if you require more detail. Thanks!

    Read the article

  • SQL version control methodology

    - by Tom H.
    There are several questions on SO about version control for SQL and lots of resources on the web, but I can't find something that quite covers what I'm trying to do. First off, I'm talking about a methodology here. I'm familiar with the various source control applications out there and I'm familiar with tools like Red Gate's SQL Compare, etc. and I know how to write an application to check things in and out of my source control system automatically. If there is a tool which would be particularly helpful in providing a whole new methodology or which have a useful and uncommon functionality then great, but for the tasks mentioned above I'm already set. The requirements that I'm trying to meet are: The database schema and look-up table data are versioned DML scripts for data fixes to larger tables are versioned A server can be promoted from version N to version N + X where X may not always be 1 Code isn't duplicated within the version control system - for example, if I add a column to a table I don't want to have to make sure that the change is in both a create script and an alter script The system needs to support multiple clients who are at various versions for the application (trying to get them all up to within 1 or 2 releases, but not there yet) Some organizations keep incremental change scripts in their version control and to get from version N to N + 3 you would have to run scripts for N-N+1 then N+1-N+2 then N+2-N+3. Some of these scripts can be repetitive (for example, a column is added but then later it is altered to change the data type). We're trying to avoid that repetitiveness since some of the client DBs can be very large, so these changes might take longer than necessary. Some organizations will simply keep a full database build script at each version level then use a tool like SQL Compare to bring a database up to one of those versions. The problem here is that intermixing DML scripts can be a problem. Imagine a scenario where I add a column, use a DML script to fill said column, then in a later version that column name is changed. Perhaps there is some hybrid solution? Maybe I'm just asking for too much? Any ideas or suggestions would be greatly appreciated though. If the moderators think that this would be more appropriate as a community wiki, please let me know. Thanks!

    Read the article

< Previous Page | 416 417 418 419 420 421 422 423 424 425 426 427  | Next Page >