Search Results

Search found 16554 results on 663 pages for 'programmers identity'.

Page 189/663 | < Previous Page | 185 186 187 188 189 190 191 192 193 194 195 196  | Next Page >

  • Complex algorithm for complex problem

    - by Locaaaaa
    I got this question in an interview and I was not able to solve it. You have a circular road, with N number of gas stations. You know the ammount of gas that each station has. You know the ammount of gas you need to GO from one station to the next one. Your car starts with 0. The question is: Create an algorithm, to know from which gas station you must start driving. As an exercise to me, I would translate the algorithm to C#.

    Read the article

  • Use the latest technology or use a mature technology as a developer?

    - by Ted Wong
    I would like to develop an application for a group of people to use. I have decided to develop using python, but I am thinking of using python 2.X or python 3.X. If I use python 2.X, I need to upgrade it for the future... But it is more mature, and has many tools and libraries. If I develop using 3.X, I don't need to think of future integration, but currenttly it doesn't have many libraries, even a python to executable is not ready for all platforms. Also, one of the considerations is that it is a brand new application, so I don't have the history burden to maintain the old libraries. Any recommendation on this dilemma? More information about this application: Native application Time for maintenance: 5 years+ Library/Tools must need: don't have idea, yet. Must need feature that in 2.X: Convert to an executable for both Windows and Mac OS X

    Read the article

  • Best exception handling practices or recommendations?

    - by user828584
    I think the two main problems with my programs are my code structure/organization and my error handling. I'm reading Code Complete 2, but I need something to read for working with potential problems. For example, on a website, if something can only happen if the user tampers with data via javascript, do you write for that? Also, when do you not catch errors? When you write a class that expects a string and an int as input, and they aren't a string and int, do you check for that, or do you let it bubble up to the calling method that passed incorrect parameters? I know this is a broad topic that can't be answered in a single answer here, so what I'm looking for is a book or resource that's commonly accepted as teaching proper exception handling practice.

    Read the article

  • C#.NET (AForge) against Java (JavaCV, JMF) for video processing

    - by Leron
    I'm starting to get really confused looking deeper and deeper at video processing world and searching for optimal choices. This is the reason to post this and some other questions to try and navigate myself in the best possible way. I really like Java, working with Java, coding with Java, at the same time C# is not that different from Java and Visual Studio is maybe the best IDE I've been working with. So even though I really want to do my projects in Java so I can get better and better Java programmer at the same time I'm really attract to video processing and even though I'm still at the beginning of this journey I want to take the right path. So I'm really in doubt could Java be used in a production environment for serious video processing software. As the title says I already have been looking at maybe the two most used technologies for video processing in Java - JMF and JavaCV and I'm starting to think that even they are used and they provide some functionality, when it comes to real work and real project that's not the first thing that comes to once mind, I mean to someone that have a professional opinion about this. On the other hand I haven't got the time to investigate .NET (c# specificly) options but even AForge looks a lot more serious library then those provided for Java. So in general -either ways I'm gonna spend a lot of time learning some technology and trying to do something that make sense with it, but my plan is at the end the thing that I'll eventually come up to be my headline project. To represent my skills and eventually help me find a job in the field. So I really don't want to spend time learning something that will give me the programming result I want but at the same time is not something that is needed in the real world development. So what is your opinion, which language, technology is better for this specific issue. Which one worths more in terms that I specified above?

    Read the article

  • Simplified knapsack in PHP

    - by Mikhail
    I have two instances where I'd like to display information in a "justified" alignment - but I don't care if the values are switched in order. One example being displaying the usernames of people online: Anton Brother68 Commissar Dougheater Elflord Foobar Goop Hoo Iee Joo Rearranging them we could get exactly 22 characters long on each line: Anton Brother68 Foobar Commissar Elflord Goop Dougheater Hoo Iee Joo This is kind of a knapsack, except seems like there ought to be a P solution since I don't care about perfection, and I have multiple lines. Second instance is identical, except instead of names and character count I would be displaying random images and use their width.

    Read the article

  • How to send credentials to linkedIn website and get oauth_verifier without signing in again [closed]

    - by akash kumar
    I am facing a problem sending credentials to another website so that I can login the user (automatically, not clicked on sign in here) and get an oauth_verifier value. I want to send the email address and the password through a form (submit button) from my website (e.g. a Liferay portal) to another website (e.g. LinkedIn), so that it automatically returns an oauth_verifier to my website. That means I don't want the user of my website to submit his email and password to LinkedIn again. My goal is to take the email and password of the user in my website and show the user his LinkedIn connection, message, job posting (again, in my website, not LinkedIn). I dont want the user redirected to the LinkedIn website to sign in there and then come back to my website. I have taken a consumer key and a secret key from LinkedIn for my web aplication. I am using the LinkedIn API and getting oauth_verifier for access token but in order to login, I have to take user to LinkedIn to sign in, while I want it to happen in the backend.

    Read the article

  • Thoughts on my new template language/HTML generator?

    - by Ralph
    I guess I should have pre-faced this with: Yes, I know there is no need for a new templating language, but I want to make a new one anyway, because I'm a fool. That aside, how can I improve my language: Let's start with an example: using "html5" using "extratags" html { head { title "Ordering Notice" jsinclude "jquery.js" } body { h1 "Ordering Notice" p "Dear @name," p "Thanks for placing your order with @company. It's scheduled to ship on {@ship_date|dateformat}." p "Here are the items you've ordered:" table { tr { th "name" th "price" } for(@item in @item_list) { tr { td @item.name td @item.price } } } if(@ordered_warranty) p "Your warranty information will be included in the packaging." p(class="footer") { "Sincerely," br @company } } } The "using" keyword indicates which tags to use. "html5" might include all the html5 standard tags, but your tags names wouldn't have to be based on their HTML counter-parts at all if you didn't want to. The "extratags" library for example might add an extra tag, called "jsinclude" which gets replaced with something like <script type="text/javascript" src="@content"></script> Tags can be optionally be followed by an opening brace. They will automatically be closed at the closing brace. If no brace is used, they will be closed after taking one element. Variables are prefixed with the @ symbol. They may be used inside double-quoted strings. I think I'll use single-quotes to indicate "no variable substitution" like PHP does. Filter functions can be applied to variables like @variable|filter. Arguments can be passed to the filter @variable|filter:@arg1,arg2="y" Attributes can be passed to tags by including them in (), like p(class="classname"). You will also be able to include partial templates like: for(@item in @item_list) include("item_partial", item=@item) Something like that I'm thinking. The first argument will be the name of the template file, and subsequent ones will be named arguments where @item gets the variable name "item" inside that template. I also want to have a collection version like RoR has, so you don't even have to write the loop. Thoughts on this and exact syntax would be helpful :) Some questions: Which symbol should I use to prefix variables? @ (like Razor), $ (like PHP), or something else? Should the @ symbol be necessary in "for" and "if" statements? It's kind of implied that those are variables. Tags and controls (like if,for) presently have the exact same syntax. Should I do something to differentiate the two? If so, what? This would make it more clear that the "tag" isn't behaving like just a normal tag that will get replaced with content, but controls the flow. Also, it would allow name-reuse. Do you like the attribute syntax? (round brackets) How should I do template inheritance/layouts? In Django, the first line of the file has to include the layout file, and then you delimit blocks of code which get stuffed into that layout. In CakePHP, it's kind of backwards, you specify the layout in the controller.view function, the layout gets a special $content_for_layout variable, and then the entire template gets stuffed into that, and you don't need to delimit any blocks of code. I guess Django's is a little more powerful because you can have multiple code blocks, but it makes your templates more verbose... trying to decide what approach to take Filtered variables inside quotes: "xxx {@var|filter} yyy" "xxx @{var|filter} yyy" "xxx @var|filter yyy" i.e, @ inside, @ outside, or no braces at all. I think no-braces might cause problems, especially when you try adding arguments, like @var|filter:arg="x", then the quotes would get confused. But perhaps a braceless version could work for when there are no quotes...? Still, which option for braces, first or second? I think the first one might be better because then we're consistent... the @ is always nudged up against the variable. I'll add more questions in a few minutes, once I get some feedback.

    Read the article

  • Understanding branching strategy/workflow correctly

    - by burnersk
    I'm using svn without branches (trunk-only) for a very long time at my workplace. I had discovered most or all of the issues related to projects which do not have any branching strategy. Unlikely this is not going to change at my workplace but for my private projects. For my private projects which most includes coworkers and working together at the same time on different features I like to have an robust branching strategy with supports long-term releases powered by git. I find out that the Atlassian Toolchain (JIRA, Stash and Bamboo) helped me most and it also recommending me an branching strategy which I like to verify for the team needs. The branching strategy was taken directly from Atlassian Stash recommendation with a small modification to the hotfix branch tree. All hotfixes should also merged into mainline. The branching strategy in words mainline (also known as master with git or trunk with svn) contains the "state of the art" developing release. Everything here was successfully checked with various automated tests (through Bamboo) and looks like everything is working. It is not proven as working because of possible missing tests. It is ready to use but not recommended for production. feature covers all new features which are not completely finished. Once a feature is finished it will be merged into mainline. Sample branch: feature/ISSUE-2-A-nice-Feature bugfix fixes non-critical bugs which can wait for the next normal release. Sample branch: bugfix/ISSUE-1-Some-typos production owns the latest release. hotfix fixes critical bugs which have to be release urgent to mainline, production and all affected long-term *release*es. Sample branch: hotfix/ISSUE-3-Check-your-math release is for long-term maintenance. Sample branches: release/1.0, release/1.1 release/1.0-rc1 I am not an expert so please provide me feedback. Which problems might appear? Which parts are missing or slowing down the productivity?

    Read the article

  • Message Queue: Which one is the best scenario?

    - by pandaforme
    I write a web crawler. The crawler has 2 steps: get a html page then parse the page I want to use message queue to improve performance and throughput. I think 2 scenarios: scenario 1: structure: urlProducer -> queue1 -> urlConsumer -> queue2 -> parserConsumer urlProducer: get a target url and add it to queue1 urlConsumer: according to the job info, get the html page and add it to queue2 parserConsumer: according to the job info, parse the page scenario 2: structure: urlProducer -> queue1 -> urlConsumer parserProducer-> queue2 -> parserConsumer urlProducer : get a target url and add it to queue1 urlConsumer: according to the job info, get the html page and write it to db parserProducer: get the html page from db and add it to queue2 parserConsumer: according to the job info, parse the page There are multiple producers or consumers in each structure. scenario1 likes a chaining call. It's difficult to find the point of problem, when occurring errors. scenario2 decouples queue1 and queue2. It's easy to find the point of problem, when occurring errors. I'm not sure the notion is correct. Which one is the best scenario? Or other scenarios? Thanks~

    Read the article

  • Where and how to mention Stackoverflow participation in the résumé?

    - by Sandeepan Nath
    I think I have good enough reputation on SO now - here is my profile - http://stackoverflow.com/users/351903/sandeepan-nath. Well, this may not be that much as compared to so many other users out there but I am happy with mine. So, I was thinking of adding my profile link on my résumé. (Just the profile link and not that "I have this much reputation on SO"). Those who haven't seen, can see this question Would you put your stackoverflow profile link on your CV / Resume?. How would this look like? Forums/Blogs/Miscellaneous others No blogging as yet but active participant in Stackoverflow. My profile link - http://stackoverflow.com/users/351903/sandeepan-nath I think of putting this section after Project Details and Technical Expertise sections. Any tips/advice? Thanks Update MKO has made a very good point - "do you really want a potential employeer to be able to evaluate in detail everything you've ever written on SO". I thought of commenting but it would be too long - In my questions/answers I put a lot of statements like - "AFAIK ...", "following are my assumptions so far ...", "am I correct to conclude that... ?", "I doubt if it is possible to ..." etc. when I am not sure about something and I rarely involve in fights with other users. However I do argue on topics sometimes if I feel it is necessary and if I have a valid point. I do accept my mistakes and apologize for the same. As we all know nobody is perfect. I must have written many things which may be judged as wrong by a potential employer. But what if the same employer notices that I have improved in the quality of content by comparing old content with new one? Isn't that great? I also try to go back to older questions/answers and put corrective comments etc. when I feel I was wrong or if I can improve my post. Of course there are many employers who want you (potential employees) to be correct each and every time. They immediately remove you from consideration when you say a single incorrect thing. I have personally met such an interviewer few months back. He didn't even care to listen to any good thing I had done after he found a single wrong thing. Now the question is do you really care to work with such people? Or do you like those people who give value to the fact that you are striving to improve every day. I personally prefer the latter.

    Read the article

  • How to implement Cache in web apps?

    - by Jhonnytunes
    This is really two questions. Im doing a project for the university for storing baseball players statitics, but from baseball data I have to calculate the score by year for the player who is beign displayed. The background is, lets say 10, 000 users hit the player "Alex Rodriguez", the application have to calculate 10, 000 the A-Rod stats by years intead of just read it from some where is temporal saved. Here I go: What is the best method for caching this type of data? Do I have to used the same database, and some temporal values on the same database, or create a Web Service for that? What reading about web caching so you recommend?

    Read the article

  • How would you explain that software engineering is more specialized than other engineering fields?

    - by Spencer K
    I work with someone who insists that any good software engineer can develop in any software technology, and experience in a particular technology doesn't matter to building good software. His analogy was that you don't have to have knowledge of the product being built to know how to build an assembly line that manufactures said product. In a way it's a compliment to be viewed with an eye such that "if you're good, you're good at everything", but in a way it also trivializes the profession, as in "Codemonkey, go sling code". Without experience in certain software frameworks, you can get in trouble fast, and that's important. I tried explaining this, but he didn't buy it. Any different views or thoughts on this to help explain that my experience in one thing, doesn't translate to all things?

    Read the article

  • Advantages and disadvantages of building a single page web application

    - by ryanzec
    I'm nearing the end of a prototyping/proof of concept phase for a side project I'm working on, and trying to decide on some larger scale application design decisions. The app is a project management system tailored more towards the agile development process. One of the decisions I need to make is whether or not to go with a traditional multi-page application or a single page application. Currently my prototype is a traditional multi-page setup, however I have been looking at backbone.js to clean up and apply some structure to my Javascript (jQuery) code. It seems like while backbone.js can be used in multi-page applications, it shines more with single page applications. I am trying to come up with a list of advantages and disadvantages of using a single page application design approach. So far I have: Advantages All data has to be available via some sort of API - this is a big advantage for my use case as I want to have an API to my application anyway. Right now about 60-70% of my calls to get/update data are done through a REST API. Doing a single page application will allow me to better test my REST API since the application itself will use it. It also means that as the application grows, the API itself will grow since that is what the application uses; no need to maintain the API as an add-on to the application. More responsive application - since all data loaded after the initial page is kept to a minimum and transmitted in a compact format (like JSON), data requests should generally be faster, and the server will do slightly less processing. Disadvantages Duplication of code - for example, model code. I am going to have to create models both on the server side (PHP in this case) and the client side in Javascript. Business logic in Javascript - I can't give any concrete examples on why this would be bad but it just doesn't feel right to me having business logic in Javascript that anyone can read. Javascript memory leaks - since the page never reloads, Javascript memory leaks can happen, and I would not even know where to begin to debug them. There are also other things that are kind of double edged swords. For example, with single page applications, the data processed for each request can be a lot less since the application will be asking for the minimum data it needs for the particular request, however it also means that there could be a lot more small request to the server. I'm not sure if that is a good or bad thing. What are some of the advantages and disadvantages of single page web applications that I should keep in mind when deciding which way I should go for my project?

    Read the article

  • How best to deal with the frustration that you encounter at the beginning of learning to code [closed]

    - by coderboy
    I am right now a newbie on the job learning to code in Cocoa . In the beginning I decided that I would try and understand everything I was doing . But right now I just feel like a clueless wizard chanting some spells . Its all just a matter of googling the right incantation . Frequently getting stuck and having to google for answers is proving to be a major demotivator for me . I know that this will get better over time but still I feel that somewhere , somehow I'm just approaching things the wrong way . I sit there stumped and then finally just look at sample code from Apple and I go Wow ! This is so logical and well structured ! . But just reading it is not going to get me to that level . So I would like to know , how do you guys approach learning something new . Do you read the whole documentation first , or do you read sample code or maybe its just about making lots of small programs first ?

    Read the article

  • How to handle lookup data in a C# ASP.Net MVC4 application?

    - by Jim
    I am writing an MVC4 application to track documents we have on file for our clients. I'm using code first, and have created models for my objects (Company, Document, etc...). I am now faced with the topic of document expiration. Business logic dictates certain documents will expire a set number of days past the document date. For example, Document A might expire in 180 days, Document 2 in 365 days, etc... I have a class for my documents as shown below (simplified for this example). What is the best way for me to create a lookup for expiration values? I want to specify documents of type DocumentA expire in 30 days, type DocumentB expire in 75 days, etc... I can think of a few ways to do this: Lookup table in the database I can query New property in my class (DaysValidFor) which has a custom getter that returns different values based on the DocumentType A method that takes in the document type and returns the number of days and I'm sure there are other ways I'm not even thinking of. My main concern is a) not violating any best practices and b) maintainability. Are there any pros/cons I need to be aware of for the above options, or is this a case of "just pick one and run with it"? One last thought, right now the number of days is a value that does not need to be stored anywhere on a per-document basis -- however, it is possible that business logic will change this (i.e., DocumentA's are 30 days expiration by default, but this DocumentA associated with Company XYZ will be 60 days because we like them). In that case, is a property in the Document class the best way to go, seeing as I need to add that field to the DB? namespace Models { // Types of documents to track public enum DocumentType { DocumentA, DocumentB, DocumentC // etc... } // Document model public class Document { public int DocumentID { get; set; } // Foreign key to companies public int CompanyID { get; set; } public DocumentType DocumentType { get; set; } // Helper to translate enum's value to an integer for DB storage [Column("DocumentType")] public int DocumentTypeInt { get { return (int)this.DocumentType; } set { this.DocumentType = (DocumentType)value; } } [DataType(DataType.Date)] [DisplayFormat(DataFormatString = "{0:MM-dd-yyyy}", ApplyFormatInEditMode = true)] public DateTime DocumentDate { get; set; } // Navigation properties public virtual Company Company { get; set; } } }

    Read the article

  • What simple game is good to learn OO principles?

    - by Bogdan Gavril
    I have to come up with a project propsal for my students, here are some details: The design should be gove over OO concepts: encapsulation, interfaces, inheritance, abstract classes Idealy a game, to keep interest high No GUI, just the console Effective time to finish this: ~ 6 days (1 person per proj) I have found one nice example of a game with carnivore and herbivore cells in a drop of water (array), it's a game of life twist. It is a bit too simple. Any ideeas? Aditional info: - language is C#

    Read the article

  • Why don't computers store decimal numbers as a second whole number?

    - by SomeKittens
    Computers have trouble storing fractional numbers where the denominator is something other than a solution to 2^x. This is because the first digit after the decimal is worth 1/2, the second 1/4 (or 1/(2^1) and 1/(2^2)) etc. Why deal with all sorts of rounding errors when the computer could have just stored the decimal part of the number as another whole number (which is therefore accurate?) The only thing I can think of is dealing with repeating decimals (in base 10), but there could have been an edge solution to that (like we currently have with infinity).

    Read the article

  • Should my program "be lenient" in what it accepts and "discard faulty input silently"?

    - by romkyns
    I was under the impression that by now everyone agrees this maxim was a mistake. But I recently saw this answer which has a "be lenient" comment upvoted 137 times (as of today). In my opinion, the leniency in what browsers accept was the direct cause of the utter mess that HTML and some other web standards were a few years ago, and have only recently begun to properly crystallize out of that mess. The way I see it, being lenient in what you accept will lead to this. The second part of the maxim is "discard faulty input silently, without returning an error message unless this is required by the specification", and this feels borderline offensive. Any programmer who has banged their head on the wall when something fails silently will know what I mean. So, am I completely wrong about this? Should my program be lenient in what it accepts and swallow errors silently? Or am I mis-interpreting what this is supposed to mean? Taken to the extreme, if Excel followed this maxim and I gave it an exe file to open, it would just show a blank spreadsheet without even mentioning that anything went wrong. Is this really a good principle to follow?

    Read the article

  • Why are most websites optimized for viewing in portrait mode?

    - by NVM
    I simply cannot figure this out. Almost all monitors have an aspect ratio where width is much bigger than the height and yet almost all websites are designed exactly for the other way round? I am not really a web developer and am just experimenting stuff at the moment but this madness baffles me!!! Edit: The point is not that I would like to limit the height of a website. The point is that I'd wat it to somehow fill all available space when I have my 1920x1080 in landscape mode. Edit 2: See this to understand what I am saying

    Read the article

  • Is sexist humor more common in the Ruby community than other language communities? [closed]

    - by Andrew Grimm
    I've heard of more cases of sexist humor in the Ruby community, such as the sqoot's "women as perk" and toplessness in advertising, than in all the other programming language communities combined. Is this merely because I'm in the Ruby community, and therefore are more likely to hear about incidents in the Ruby community, or is it because there's a higher rate of sexist humor in the Ruby community compared to, say, the C community?

    Read the article

  • Javascript slider Image and text from php, scrollable in groups by indexes

    - by Roberto de Nobrega
    I am looking for a javascript solution that slides images with text, pulled from php. This slider will slide in groups by indexes in points. I was googling, but nothing as I need. I am going to make an example. Imagine 10 products. I need to show the principal picture, and a text below the image. It is going to show 6 products, and with points (indexes), I click and the group slides to the next group. Do you know some script.?? I know the php code, but I am a newbie with javascript.! Thanks.!! PD. I am lost of where i have to put this question. So, If this was a wrong place, let me know, and accept my apologises.! ;)

    Read the article

  • Video documentary on the open source culture ?

    - by explorest
    Hello, I'm looking for some videos on these subjects: A movie/documentary detailing the origin, history, and current state of open source culture A movie/documentary on how open source software actually gets developed. What are the technical workflows. How do people create projects, recruit contributors, build a community, assign roles, track issues, assimilate new comers ... etc etc. Could someone suggest a title?

    Read the article

  • how to update child records when updating the Master table using Linq [closed]

    - by user20358
    I currently use a general repositry class that can update only a single table like so public abstract class MyRepository<T> : IRepository<T> where T : class { protected IObjectSet<T> _objectSet; protected ObjectContext _context; public MyRepository(ObjectContext Context) { _objectSet = Context.CreateObjectSet<T>(); _context = Context; } public IQueryable<T> GetAll() { return _objectSet.AsQueryable(); } public IQueryable<T> Find(Expression<Func<T, bool>> filter) { return _objectSet.Where(filter); } public void Add(T entity) { _objectSet.AddObject(entity); _context.ObjectStateManager.ChangeObjectState(entity, System.Data.EntityState.Added); _context.SaveChanges(); } public void Update(T entity) { _context.ObjectStateManager.ChangeObjectState(entity, System.Data.EntityState.Modified); _context.SaveChanges(); } public void Delete(T entity) { _objectSet.Attach(entity); _context.ObjectStateManager.ChangeObjectState(entity, System.Data.EntityState.Deleted); _objectSet.DeleteObject(entity); _context.SaveChanges(); } } For every table class generated by my EDMX designer I create another class like this public class CustomerRepo : MyRepository<Customer> { public CustomerRepo (ObjectContext context) : base(context) { } } for any updates that I need to make to a particular table I do this: Customer CustomerObj = new Customer(); CustomerObj.Prop1 = ... CustomerObj.Prop2 = ... CustomerObj.Prop3 = ... CustomerRepo.Update(CustomerObj); This works perfectly well when I am updating just to the specific table called Customer. Now if I need to also update each row of another table which is a child of Customer called Orders what changes do I need to make to the class MyRepository. Orders table will have multiple records for a Customer record and multiple fields too, say for example Field1, Field2, Field3. So my questions are: 1.) If I only need to update Field1 of the Orders table for some rows based on a condition and Field2 for some other rows based on a different condition then what changes I need to do? 2.) If there is no such condition and all child rows need to be updated with the same value for all rows then what changes do I need to do? Thanks for taking the time. Look forward to your inputs...

    Read the article

  • How to simulate inner join on very large files in java (without running out of memory)

    - by Constantin
    I am trying to simulate SQL joins using java and very large text files (INNER, RIGHT OUTER and LEFT OUTER). The files have already been sorted using an external sort routine. The issue I have is I am trying to find the most efficient way to deal with the INNER join part of the algorithm. Right now I am using two Lists to store the lines that have the same key and iterate through the set of lines in the right file once for every line in the left file (provided the keys still match). In other words, the join key is not unique in each file so would need to account for the Cartesian product situations ... left_01, 1 left_02, 1 right_01, 1 right_02, 1 right_03, 1 left_01 joins to right_01 using key 1 left_01 joins to right_02 using key 1 left_01 joins to right_03 using key 1 left_02 joins to right_01 using key 1 left_02 joins to right_02 using key 1 left_02 joins to right_03 using key 1 My concern is one of memory. I will run out of memory if i use the approach below but still want the inner join part to work fairly quickly. What is the best approach to deal with the INNER join part keeping in mind that these files may potentially be huge public class Joiner { private void join(BufferedReader left, BufferedReader right, BufferedWriter output) throws Throwable { BufferedReader _left = left; BufferedReader _right = right; BufferedWriter _output = output; Record _leftRecord; Record _rightRecord; _leftRecord = read(_left); _rightRecord = read(_right); while( _leftRecord != null && _rightRecord != null ) { if( _leftRecord.getKey() < _rightRecord.getKey() ) { write(_output, _leftRecord, null); _leftRecord = read(_left); } else if( _leftRecord.getKey() > _rightRecord.getKey() ) { write(_output, null, _rightRecord); _rightRecord = read(_right); } else { List<Record> leftList = new ArrayList<Record>(); List<Record> rightList = new ArrayList<Record>(); _leftRecord = readRecords(leftList, _leftRecord, _left); _rightRecord = readRecords(rightList, _rightRecord, _right); for( Record equalKeyLeftRecord : leftList ){ for( Record equalKeyRightRecord : rightList ){ write(_output, equalKeyLeftRecord, equalKeyRightRecord); } } } } if( _leftRecord != null ) { write(_output, _leftRecord, null); _leftRecord = read(_left); while(_leftRecord != null) { write(_output, _leftRecord, null); _leftRecord = read(_left); } } else { if( _rightRecord != null ) { write(_output, null, _rightRecord); _rightRecord = read(_right); while(_rightRecord != null) { write(_output, null, _rightRecord); _rightRecord = read(_right); } } } _left.close(); _right.close(); _output.flush(); _output.close(); } private Record read(BufferedReader reader) throws Throwable { Record record = null; String data = reader.readLine(); if( data != null ) { record = new Record(data.split("\t")); } return record; } private Record readRecords(List<Record> list, Record record, BufferedReader reader) throws Throwable { int key = record.getKey(); list.add(record); record = read(reader); while( record != null && record.getKey() == key) { list.add(record); record = read(reader); } return record; } private void write(BufferedWriter writer, Record left, Record right) throws Throwable { String leftKey = (left == null ? "null" : Integer.toString(left.getKey())); String leftData = (left == null ? "null" : left.getData()); String rightKey = (right == null ? "null" : Integer.toString(right.getKey())); String rightData = (right == null ? "null" : right.getData()); writer.write("[" + leftKey + "][" + leftData + "][" + rightKey + "][" + rightData + "]\n"); } public static void main(String[] args) { try { BufferedReader leftReader = new BufferedReader(new FileReader("LEFT.DAT")); BufferedReader rightReader = new BufferedReader(new FileReader("RIGHT.DAT")); BufferedWriter output = new BufferedWriter(new FileWriter("OUTPUT.DAT")); Joiner joiner = new Joiner(); joiner.join(leftReader, rightReader, output); } catch (Throwable e) { e.printStackTrace(); } } } After applying the ideas from the proposed answer, I changed the loop to this private void join(RandomAccessFile left, RandomAccessFile right, BufferedWriter output) throws Throwable { long _pointer = 0; RandomAccessFile _left = left; RandomAccessFile _right = right; BufferedWriter _output = output; Record _leftRecord; Record _rightRecord; _leftRecord = read(_left); _rightRecord = read(_right); while( _leftRecord != null && _rightRecord != null ) { if( _leftRecord.getKey() < _rightRecord.getKey() ) { write(_output, _leftRecord, null); _leftRecord = read(_left); } else if( _leftRecord.getKey() > _rightRecord.getKey() ) { write(_output, null, _rightRecord); _pointer = _right.getFilePointer(); _rightRecord = read(_right); } else { long _tempPointer = 0; int key = _leftRecord.getKey(); while( _leftRecord != null && _leftRecord.getKey() == key ) { _right.seek(_pointer); _rightRecord = read(_right); while( _rightRecord != null && _rightRecord.getKey() == key ) { write(_output, _leftRecord, _rightRecord ); _tempPointer = _right.getFilePointer(); _rightRecord = read(_right); } _leftRecord = read(_left); } _pointer = _tempPointer; } } if( _leftRecord != null ) { write(_output, _leftRecord, null); _leftRecord = read(_left); while(_leftRecord != null) { write(_output, _leftRecord, null); _leftRecord = read(_left); } } else { if( _rightRecord != null ) { write(_output, null, _rightRecord); _rightRecord = read(_right); while(_rightRecord != null) { write(_output, null, _rightRecord); _rightRecord = read(_right); } } } _left.close(); _right.close(); _output.flush(); _output.close(); } UPDATE While this approach worked, it was terribly slow and so I have modified this to create files as buffers and this works very well. Here is the update ... private long getMaxBufferedLines(File file) throws Throwable { long freeBytes = Runtime.getRuntime().freeMemory() / 2; return (freeBytes / (file.length() / getLineCount(file))); } private void join(File left, File right, File output, JoinType joinType) throws Throwable { BufferedReader leftFile = new BufferedReader(new FileReader(left)); BufferedReader rightFile = new BufferedReader(new FileReader(right)); BufferedWriter outputFile = new BufferedWriter(new FileWriter(output)); long maxBufferedLines = getMaxBufferedLines(right); Record leftRecord; Record rightRecord; leftRecord = read(leftFile); rightRecord = read(rightFile); while( leftRecord != null && rightRecord != null ) { if( leftRecord.getKey().compareTo(rightRecord.getKey()) < 0) { if( joinType == JoinType.LeftOuterJoin || joinType == JoinType.LeftExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, leftRecord, null); } leftRecord = read(leftFile); } else if( leftRecord.getKey().compareTo(rightRecord.getKey()) > 0 ) { if( joinType == JoinType.RightOuterJoin || joinType == JoinType.RightExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, null, rightRecord); } rightRecord = read(rightFile); } else if( leftRecord.getKey().compareTo(rightRecord.getKey()) == 0 ) { String key = leftRecord.getKey(); List<File> rightRecordFileList = new ArrayList<File>(); List<Record> rightRecordList = new ArrayList<Record>(); rightRecordList.add(rightRecord); rightRecord = consume(key, rightFile, rightRecordList, rightRecordFileList, maxBufferedLines); while( leftRecord != null && leftRecord.getKey().compareTo(key) == 0 ) { processRightRecords(outputFile, leftRecord, rightRecordFileList, rightRecordList, joinType); leftRecord = read(leftFile); } // need a dispose for deleting files in list } else { throw new Exception("DATA IS NOT SORTED"); } } if( leftRecord != null ) { if( joinType == JoinType.LeftOuterJoin || joinType == JoinType.LeftExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, leftRecord, null); } leftRecord = read(leftFile); while(leftRecord != null) { if( joinType == JoinType.LeftOuterJoin || joinType == JoinType.LeftExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, leftRecord, null); } leftRecord = read(leftFile); } } else { if( rightRecord != null ) { if( joinType == JoinType.RightOuterJoin || joinType == JoinType.RightExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, null, rightRecord); } rightRecord = read(rightFile); while(rightRecord != null) { if( joinType == JoinType.RightOuterJoin || joinType == JoinType.RightExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, null, rightRecord); } rightRecord = read(rightFile); } } } leftFile.close(); rightFile.close(); outputFile.flush(); outputFile.close(); } public void processRightRecords(BufferedWriter outputFile, Record leftRecord, List<File> rightFiles, List<Record> rightRecords, JoinType joinType) throws Throwable { for(File rightFile : rightFiles) { BufferedReader rightReader = new BufferedReader(new FileReader(rightFile)); Record rightRecord = read(rightReader); while(rightRecord != null){ if( joinType == JoinType.LeftOuterJoin || joinType == JoinType.RightOuterJoin || joinType == JoinType.FullOuterJoin || joinType == JoinType.InnerJoin ) { write(outputFile, leftRecord, rightRecord); } rightRecord = read(rightReader); } rightReader.close(); } for(Record rightRecord : rightRecords) { if( joinType == JoinType.LeftOuterJoin || joinType == JoinType.RightOuterJoin || joinType == JoinType.FullOuterJoin || joinType == JoinType.InnerJoin ) { write(outputFile, leftRecord, rightRecord); } } } /** * consume all records having key (either to a single list or multiple files) each file will * store a buffer full of data. The right record returned represents the outside flow (key is * already positioned to next one or null) so we can't use this record in below while loop or * within this block in general when comparing current key. The trick is to keep consuming * from a List. When it becomes empty, re-fill it from the next file until all files have * been consumed (and the last node in the list is read). The next outside iteration will be * ready to be processed (either it will be null or it points to the next biggest key * @throws Throwable * */ private Record consume(String key, BufferedReader reader, List<Record> records, List<File> files, long bufferMaxRecordLines ) throws Throwable { boolean processComplete = false; Record record = records.get(records.size() - 1); while(!processComplete){ long recordCount = records.size(); if( record.getKey().compareTo(key) == 0 ){ record = read(reader); while( record != null && record.getKey().compareTo(key) == 0 && recordCount < bufferMaxRecordLines ) { records.add(record); recordCount++; record = read(reader); } } processComplete = true; // if record is null, we are done if( record != null ) { // if the key has changed, we are done if( record.getKey().compareTo(key) == 0 ) { // Same key means we have exhausted the buffer. // Dump entire buffer into a file. The list of file // pointers will keep track of the files ... processComplete = false; dumpBufferToFile(records, files); records.clear(); records.add(record); } } } return record; } /** * Dump all records in List of Record objects to a file. Then, add that * file to List of File objects * * NEED TO PLACE A LIMIT ON NUMBER OF FILE POINTERS (check size of file list) * * @param records * @param files * @throws Throwable */ private void dumpBufferToFile(List<Record> records, List<File> files) throws Throwable { String prefix = "joiner_" + files.size() + 1; String suffix = ".dat"; File file = File.createTempFile(prefix, suffix, new File("cache")); BufferedWriter writer = new BufferedWriter(new FileWriter(file)); for( Record record : records ) { writer.write( record.dump() ); } files.add(file); writer.flush(); writer.close(); }

    Read the article

  • Data migration - dangerous or essential?

    - by MRalwasser
    The software development department of my company is facing with the problem that data migrations are considered as potentially dangerous, especially for my managers. The background is that our customers are using a large amount of data with poor quality. The reasons for this is only partially related to our software quality, but rather to the history of the data: Most of them have been migrated from predecessor systems, some bugs caused (mostly business) inconsistencies in the data records or misentries by accident on the customer's side (which our software allowed by error). The most important counter-arguments from my managers are that faulty data may turn into even worse data, the data troubles may awake some managers at the customer and some processes on the customer's side may not work anymore because their processes somewhat adapted to our system. Personally, I consider data migrations as an integral part of the software development and that data migration can been seen to data what refactoring is to code. I think that data migration is an essential for creating software that evolves. Without it, we would have to create painful software which somewhat works around a bad data structure. I am asking you: What are your thoughts to data migration, especially for the real life cases and not only from a developer's perspecticve? Do you have any arguments against my managers opinions? How does your company deal with data migrations and the difficulties caused by them? Any other interesting thoughts which belongs to this topics?

    Read the article

< Previous Page | 185 186 187 188 189 190 191 192 193 194 195 196  | Next Page >