Search Results

Search found 40870 results on 1635 pages for 'database design'.

Page 201/1635 | < Previous Page | 197 198 199 200 201 202 203 204 205 206 207 208  | Next Page >

  • How to implement isValid correctly?

    - by Songo
    I'm trying to provide a mechanism for validating my object like this: class SomeObject { private $_inputString; private $_errors=array(); public function __construct($inputString) { $this->_inputString = $inputString; } public function getErrors() { return $this->_errors; } public function isValid() { $isValid = preg_match("/Some regular expression here/", $this->_inputString); if($isValid==0){ $this->_errors[]= 'Error was found in the input'; } return $isValid==1; } } Then when I'm testing my code I'm doing it like this: $obj = new SomeObject('an INVALID input string'); $isValid = $obj->isValid(); $errors=$obj->getErrors(); $this->assertFalse($isValid); $this->assertNotEmpty($errors); Now the test passes correctly, but I noticed a design problem here. What if the user called $obj->getErrors() before calling $obj->isValid()? The test will fail because the user has to validate the object first before checking the error resulting from validation. I think this way the user depends on a sequence of action to work properly which I think is a bad thing because it exposes the internal behaviour of the class. How do I solve this problem? Should I tell the user explicitly to validate first? Where do I mention that? Should I change the way I validate? Is there a better solution for this? UPDATE: I'm still developing the class so changes are easy and renaming functions and refactoring them is possible.

    Read the article

  • Need ideas on how to give my levels structure

    - by akuritsu
    I am making an iOS game for a project at school. It is going to be a tiny bit like Fruit Ninja, as in it will have different things on the screen, and when you hit them, they die, and you get points. The trouble is that unlike Fruit Ninja, my game will have different types of sprites, all doing different things (moving different places, doing different things, etc). The one thing that is bad about having all of these sprites that do different things is that it is hard for them to look neat on the screen all together. I was planning on having a couple of different gamemodes: Time Trial You have 120 seconds to kill as many sprites as possible. Survival You have three lives, every time you try to hit a sprite and miss, you lose a life. ???? Whatever I think of. I am a rookie to game design in general, and I don't know the best way to make my game look good, and play well. I could have all of these sprites on the screen at the same time, or I could have them come in waves, for example 10 of sprite_a come on, and once they are killed, 10 of sprite_b come on, etc... Please give me your opinion about which one I should code. If you have any other suggestions for either a third gamemode, or a completely different way to make the levels, feel free to tell me.

    Read the article

  • Override methods should call base method?

    - by Trevor Pilley
    I'm just running NDepend against some code that I have written and one of the warnings is Overrides of Method() should call base.Method(). The places this occurs are where I have a base class which has virtual properties and methods with default behaviour but which can be overridden by a class which inherits from the base class and doesn't call the overridden method. For example, in the base class I have a property defined like this: protected virtual char CloseQuote { get { return '"'; } } And then in an inheriting class which uses a different close quote: protected override char CloseQuote { get { return ']'; } } Not all classes which inherit from the base class use different quote characters hence my initial design. The alternatives I thought of were have get/set properties in the base class with the defaults set in the constructor: protected BaseClass() { this.CloseQuote = '"'; } protected char CloseQuote { get; set; } public InheritingClass() { this.CloseQuote = ']'; } Or make the base class require the values as constructor args: protected BaseClass(char closeQuote, ...) { this.CloseQuote = '"'; } protected char CloseQuote { get; private set; } public InheritingClass() base (closeQuote: ']', ...) { } Should I use virtual in a scenario where the base implementation may be replaced instead of extended or should I opt for one of the alternatives I thought of? If so, which would be preferable and why?

    Read the article

  • Did I Inadvertently Create a Mediator in my MVC?

    - by SoulBeaver
    I'm currently working on my first biggish project. It's a frontend facebook application that has, since last Tuesday, spanned some 6000-8000 LOC. I say this because I'm using the MVC, an architecture I have never rigidly enforced in any of my hobby projects. I read part of the PureMVC book, but I didn't quite grasp the concept of the Mediator. Since I didn't understand and didn't see the need for it, my project has yet to use a single mediator. Yesterday I went back to the design board because of some requirement changes and noticed that I could move all UI elements out of the View and into its own class. The View essentially only managed the lifetime of the UI and all events from the UI or Model. Technically, the View has now become a 'Mediator' between the Model and UI. Therefore, I realized today, I could just move all my UI stuff back into the View and create a mediator class that handles all events from the view and model. Is my understanding correct in thinking that I have devolved my View as it currently is (handling events from the Model and UI) into a Mediator and that the UI class is what should be the View?

    Read the article

  • How to pass dynamic information between a form and a service? [closed]

    - by qminator
    I have a design problem and hopefully the braintrust which is stack exchange can help. I have a generic form, which loads a dataset and displays it. It never has direct knowledge of what it contains but can pass it to a service for manipulation (via an Onclick event for example). However, the form might need to alter its behaviour based on the manipulation by the service. Example: The service realises this dataset requires sending of an email by the user and needs to send an instruction to the form to open up a mail form. My idea is thus: I'm thinking about passing back some type of key/name dictionary, filled with commands which the service requires. They could then be interpeted by the form without it need to reference something specific. Example: IF the service decides that the dataset needs to refresh it would send back a key/name pair, I might even be able to chain commands. Refreshing the dataset and sending a mail. Refresh / "Foo" Mail / "[email protected]" The form would reference an action explicitly (Refresh or Mail) but not the instructions themselves. Is this a valid idea or am I wasting time?

    Read the article

  • Motivation for a service layer (instead of just copying dlls)?

    - by BornToCode
    I'm creating an application which has 2 different UIs so I'm making it with a service layer which I understood is appropriate for such scenario. However I found myself just creating web methods for every single method I have in the BL layer, so the services basically built from methods that looks like this: return customers_bl.Get_Customer_Prices(customer_id); I understood that a main point of the service layer is to prevent duplication of code so I asked myself - why not just import the BL.DLL (and the dal.dll) to the other UI, and whenever making a change re-copy the dlls, it might not be so 'neat', but still less hassle than one more layer? {I know something is wrong in my approach, I'm probably missing the importance of service layer, I'd like to get more motivation to create another layer, especially because as it is I found that many of my BL functions ALREADY looks like: return customers_dal.Get_Customer_Prices(cust_id) which led me to ask: was it really necessary to create the BL just because on several functions I actually have LOGIC inside the BL?} so I'm looking for more motivation to creating ONE MORE layer, I'm sure it's not just to make it more convenient that I won't have to re-copy the dlls on changes? Am I grasping it wrong? Any simple guidelines on how to design service layer (corresponding to all the BL layer functions or not? any simple example?) any enlightenment on the subject?

    Read the article

  • How to avoid the GameManager god object?

    - by lorancou
    I just read an answer to a question about structuring game code. It made me wonder about the ubiquitous GameManager class, and how it often becomes an issue in a production environment. Let me describe this. First, there's prototyping. Nobody cares about writing great code, we just try to get something running to see if the gameplay adds up. Then there's a greenlight, and in an effort to clean things up, somebody writes a GameManager. Probably to hold a bunch of GameStates, maybe to store a few GameObjects, nothing big, really. A cute, little, manager. In the peaceful realm of pre-production, the game is shaping up nicely. Coders have proper nights of sleep and plenty of ideas to architecture the thing with Great Design Patterns. Then production starts and soon, of course, there is crunch time. Balanced diet is long gone, the bug tracker is cracking with issues, people are stressed and the game has to be released yesterday. At that point, usually, the GameManager is a real big mess (to stay polite). The reason for that is simple. After all, when writing a game, well... all the source code is actually here to manage the game. It's easy to just add this little extra feature or bugfix in the GameManager, where everything else is already stored anyway. When time becomes an issue, no way to write a separate class, or to split this giant manager into sub-managers. Of course this is a classical anti-pattern: the god object. It's a bad thing, a pain to merge, a pain to maintain, a pain to understand, a pain to transform. What would you suggest to prevent this from happening?

    Read the article

  • Which are the best ways to organize view hierarchies in GUI interfaces?

    - by none
    I'm currently trying to figure out the best techniques for organizing GUI view hierarchies, that is dividing a window into several panels which are in turn divided into other components. I've given a look to the Composite Design Pattern, but I don't know if I can find better alternatives, so I'd appreciate to know if using the Composite is a good idea, or it would be better looking for some other techniques. I'm currently developing in Java Swing, but I don't think that the framework or the language can have a great impact on this. Any help will be appreciated. ---------EDIT------------ I was currently developing a frame containing three labels, one button and a text field. At the button pressed, the content inside the text field would be searched, and the results written inside the three labels. One of my typical structure would be the following: MainWindow | Main panel | Panel with text field and labels. | Panel with search button Now, as the title explains, I was looking for a suitable way of organizing both the MainPanel and the other two panels. But here came problems, since I'm not sure whether organizing them like attributes or storing inside some data structure (i.e. LinkedList or something like this). Anyway, I don't really think that both my solution are really good, so I'm wondering if there are really better approaches for facing this kind of problems. Hope it helps

    Read the article

  • Should sanity be a property of a programmer or a program?

    - by toplel32
    I design and implement languages, that can range from object notations to markup languages. In many cases I have considered restrictions in favor of sanity (common knowledge), like in the case of control characters in identifiers. There are two consequences to consider before doing this: It takes extra computation It narrows liberty I'm interested to learn how developers think of decisions like this. As you may know Microsoft C# is very open on the contrary. If you really want to prefix your integer as Long with 'l' instead of 'L' and so risk other developers of confusing '1' and 'l', no problem. If you want to name your variables in non-latin script so they will contrast with C#'s latin keywords, no problem. Or if you want to distribute a string over multiple lines and so break a series of indentation, no problem. It is cheap to ensure consistency with restrictions and this makes it tempting to implement. But in the case of disallowing non-latin characters (concerning the second example), it means a discredit to Unicode, because one would not take full advantage of its capacity.

    Read the article

  • On Developing Web Services with Global State

    - by user74418
    I'm new to web programming. I'm more experienced and comfortable with client-side code. Recently, I've been dabbling in web programming through Python's Google App Engine. I ran into some difficulty while trying to write some simple apps for the purposes of learning, mainly involving how to maintain some kind of consistent universally-accessible state for the application. I tried to write a simple queueing management system, the kind you would expect to be used in a small clinic, or at a cafeteria. Typically, this is done with hardware. You take a number from a ticketing machine, and when your number is displayed or called you approach the counter for service. Alternatively, you could be given a small pager, which will beep or vibrate when it is your turn to receive service. The former is somewhat better in that you have an idea of how many people are still ahead of you in the queue. In this situation, the global state is the last number in queue, which needs to be updated whenever a request is made to the server. I'm not sure how to best to store and maintain this value in a GAE context. The solution I thought of was to keep the value in the Datastore, attempt to query it during a ticket request, update the value, and then re-store it with put. My problem is that I haven't figured out how to lock the resource so that other requests do not check the value while it is in the middle of being updated. I am concerned that I may end up ticket requests that have the same queue number. Also, the whole solution feels awkward to me. I was wondering if there was a more natural way to accomplish this without having to go through the Datastore. Can anyone with more experience in this domain provide some advice on how to approach the design of the above application?

    Read the article

  • Getting Requirements Right

    - by Tim Murphy
    Originally posted on: http://geekswithblogs.net/tmurphy/archive/2013/10/28/getting-requirements-right.aspxI had a meeting with a stakeholder who stated “I bet you wish I wasn’t in these meetings”.  She said this because she kept changing what we thought the end product should look like.  My reply was that it would be much worse if she came in at the end of the project and told us we had just built the wrong solution. You have to take the time to get the requirements right.  Be honest with all involved parties as to the amount of time it is taking to refine the requirements.  The only thing worse than wrong requirements is a surprise in budget overages.  If you give open visibility to your progress then management has the ability to shift priorities if needed. In order to capture the best requirements use different approaches to help your stakeholders to articulate their needs.  Use mock ups and matrix spread sheets to allow them to visualize and confirm that everyone has the same understanding.  The goals isn’t to record every last detail, but to have the major landmarks identified so there are fewer surprises along the way. Help the team members to understand that you all have the same goal.  You want to create the best possible solution for the given business problem.  If you do this everyone involved will do there best to outline a picture of what is to be built and you will be able to design an appropriate solution to fill those needs more easily. Technorati Tags: requirements gathering,PSC Group,PSC

    Read the article

  • Redesigning my website has destroyed my SEO

    - by user20721
    Unfortunately i read an article on how to avoid destroying your websites SEO from a redesign article AFTER its was too late! Here is the article (http://www.searchenginejournal.com/how-to-avoid-seo-disaster-during-a-website-redesign/42824/) On 20 November 12 completely redesigned our www.retromodern.com.au . We get ALL our customers from our website as we do not have a shop. Since that dreaded day a month ago the phone pretty much stopped, basically no emails, Google rankings down and Google analytics have halved by 50%. Yesterday i did some research into as as i had no idea that a re-design of a website could have such a damaging effect - yes i am a novice and use a WYSIWYG type web builder. There are lots of info on how to AVOID this from happening BUT what do i do as i have already made the mistake? Yesterday i reloaded my OLD site with my new pages in the background hoping this would be a start. I really have no idea of how to get out of this mess. Please please help. Thanks in Advance. Monique

    Read the article

  • Hide or Show singleton?

    - by Sinker
    Singleton is a common pattern implemented in both native libraries of .NET and Java. You will see it as such: C#: MyClass.Instance Java: MyClass.getInstance() The question is: when writing APIs, is it better to expose the singleton through a property or getter, or should I hide it as much as possible? Here are the alternatives for illustrative purposes: Exposed(C#): private static MyClass instance; public static MyClass Instance { get { if (instance == null) instance = new MyClass(); return instance; } } public void PerformOperation() { ... } Hidden (C#): private static MyClass instance; public static void PerformOperation() { if (instance == null) { instance = new MyClass(); } ... } EDIT: There seems to be a number of detractors of the Singleton design. Great! Please tell me why and what is the better alternative. Here is my scenario: My whole application utilises one logger (log4net/log4j). Whenever, the program has something to log, it utilises the Logger class (e.g. Logger.Instance.Warn(...) or Logger.Instance.Error(...) etc. Should I use Logger.Warn(...) or Logger.Warn(...) instead? If you have an alternative to singletons that addresses my concern, then please write an answer for it. Thank you :)

    Read the article

  • Why using Fragments?

    - by ahmed_khan_89
    I have read the documentation and some other questions' threads about this topic and I don't really feel convinced; I don't see clearly the limits of use of this technique. Fragments are now seen as a Best Practice; every Activity should be basically a support for one or more Fragments and not call a layout directly. Fragments are created in order to: allow the Activity to use many fragments, to change between them, to reuse these units... == the Fragment is totally dependent to the Context of an activity , so if I need something generic that I can reuse and handle in many Activities, I can create my own custom layouts or Views ... I will not care about this additional Complexity Developing Layer that fragments would add. a better handling to different resolution == OK for tablets/phones in case of long process that we can show two (or more) fragments in the same Activity in Tablets, and one by one in phones. But why would I use fragments always ? handling callbacks to navigate between Fragments (i.e: if the user is Logged-in I show a fragment else I show another fragment). === Just try to see how many bugs facebook SDK Log-in have because of this, to understand that it is really (?) ... considering that an Android Application is based on Activities... Adding another life cycles in the Activity would be better to design an Application... I mean the modules, the scenarios, the data management and the connectivity would be better designed, in that way. === This is an answer of someone who's used to see the Android SDK and Android Framework with a Fragments vision. I don't think it's wrong, but I am not sure it will give good results... And it is really abstract... ==== Why would I complicate my life, coding more, in using them always? else, why is it a best practice if it's just a tool for some cases? what are these cases?

    Read the article

  • SQL Developer Database Diff – Compare Objects From Multiple Schemas

    - by thatjeffsmith
    Ever wonder why Database Diff isn’t called Schema Diff? One reason is because SQL Developer allows you select objects from more than one schema in the ‘Source’ connection for the compare. Simply use the ‘More’ dialog view and select as many tables from as many different schemas as you require Now, before you get around to testing this – as you should never believe what I say, trust but verify – two things you need to know: I’m using SQL Developer version 3.2 On the initial screen you need to use the ‘Maintain’ option Maintain tells SQL Developer to use the schema designation in the source connection to find the same corresponding object in the destination schema. Choose ‘maintain’ if you want to compare objects in the same schema in the destination but don’t have the user login for that schema. So after you’ve selected your databases, your diff preferences, and your objects – you’re ready to perform the compare and review your results. The DIFF Report Notice the highlighted text, SQL Developer is ‘maintaining’ the Schema context from the two databases. Short and sweet. That’s pretty much all there is to doing a compare with SQL Developer with multiple schemas involved. You may have noticed in some posts lately that my editor screenshots had a ‘green screen’ look and feel to them. What’s with the black background in your editors? In the SQL Developer preferences, you can set your editor color schemes. I started with the ‘Twilight’ scheme (team Jacob in case you’re wondering) and then customized it further by going with a default green font color. You could go pretty crazy in here, and I’m assuming 90% of you could care less and will just stick with the original. But for those of you who are particular about your IDE styling – go crazy! SQL Developer Editor Display Preferences

    Read the article

  • SQL SERVER – Automated Type Conversion using Expressor Studio

    - by pinaldave
    Recently I had an interesting situation during my consultation project. Let me share to you how I solved the problem using Expressor Studio. Consider a situation in which you need to read a field, such as customer_identifier, from a text file and pass that field into a database table. In the source file’s metadata structure, customer_identifier is described as a string; however, in the target database table, customer_identifier is described as an integer. Legitimately, all the source values for customer_identifier are valid numbers, such as “109380”. To implement this in an ETL application, you probably would have hard-coded a type conversion function call, such as: output.customer_identifier=stringToInteger(input.customer_identifier) That wasn’t so bad, was it? For this instance, programming this hard-coded type conversion function call was relatively easy. However, hard-coding, whether type conversion code or other business rule code, almost always means that the application containing hard-coded fields, function calls, and values is: a) specific to an instance of use; b) is difficult to adapt to new situations; and c) doesn’t contain many reusable sub-parts. Therefore, in the long run, applications with hard-coded type conversion function calls don’t scale well. In addition, they increase the overall level of effort and degree of difficulty to write and maintain the ETL applications. To get around the trappings of hard-coding type conversion function calls, developers need an access to smarter typing systems. Expressor Studio product offers this feature exactly, by providing developers with a type conversion automation engine based on type abstraction. The theory behind the engine is quite simple. A user specifies abstract data fields in the engine, and then writes applications against the abstractions (whereas in most ETL software, developers develop applications against the physical model). When a Studio-built application is run, Studio’s engine automatically converts the source type to the abstracted data field’s type and converts the abstracted data field’s type to the target type. The engine can do this because it has a couple of built-in rules for type conversions. So, using the example above, a developer could specify customer_identifier as an abstract data field with a type of integer when using Expressor Studio. Upon reading the string value from the text file, Studio’s type conversion engine automatically converts the source field from the type specified in the source’s metadata structure to the abstract field’s type. At the time of writing the data value to the target database, the engine doesn’t have any work to do because the abstract data type and the target data type are just the same. Had they been different, the engine would have automatically provided the conversion. ?Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Database, Pinal Dave, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQLAuthority News, T SQL, Technology Tagged: SSIS

    Read the article

  • How to find and fix performance problems in ORM powered applications

    - by FransBouma
    Once in a while we get requests about how to fix performance problems with our framework. As it comes down to following the same steps and looking into the same things every single time, I decided to write a blogpost about it instead, so more people can learn from this and solve performance problems in their O/R mapper powered applications. In some parts it's focused on LLBLGen Pro but it's also usable for other O/R mapping frameworks, as the vast majority of performance problems in O/R mapper powered applications are not specific for a certain O/R mapper framework. Too often, the developer looks at the wrong part of the application, trying to fix what isn't a problem in that part, and getting frustrated that 'things are so slow with <insert your favorite framework X here>'. I'm in the O/R mapper business for a long time now (almost 10 years, full time) and as it's a small world, we O/R mapper developers know almost all tricks to pull off by now: we all know what to do to make task ABC faster and what compromises (because there are almost always compromises) to deal with if we decide to make ABC faster that way. Some O/R mapper frameworks are faster in X, others in Y, but you can be sure the difference is mainly a result of a compromise some developers are willing to deal with and others aren't. That's why the O/R mapper frameworks on the market today are different in many ways, even though they all fetch and save entities from and to a database. I'm not suggesting there's no room for improvement in today's O/R mapper frameworks, there always is, but it's not a matter of 'the slowness of the application is caused by the O/R mapper' anymore. Perhaps query generation can be optimized a bit here, row materialization can be optimized a bit there, but it's mainly coming down to milliseconds. Still worth it if you're a framework developer, but it's not much compared to the time spend inside databases and in user code: if a complete fetch takes 40ms or 50ms (from call to entity object collection), it won't make a difference for your application as that 10ms difference won't be noticed. That's why it's very important to find the real locations of the problems so developers can fix them properly and don't get frustrated because their quest to get a fast, performing application failed. Performance tuning basics and rules Finding and fixing performance problems in any application is a strict procedure with four prescribed steps: isolate, analyze, interpret and fix, in that order. It's key that you don't skip a step nor make assumptions: these steps help you find the reason of a problem which seems to be there, and how to fix it or leave it as-is. Skipping a step, or when you assume things will be bad/slow without doing analysis will lead to the path of premature optimization and won't actually solve your problems, only create new ones. The most important rule of finding and fixing performance problems in software is that you have to understand what 'performance problem' actually means. Most developers will say "when a piece of software / code is slow, you have a performance problem". But is that actually the case? If I write a Linq query which will aggregate, group and sort 5 million rows from several tables to produce a resultset of 10 rows, it might take more than a couple of milliseconds before that resultset is ready to be consumed by other logic. If I solely look at the Linq query, the code consuming the resultset of the 10 rows and then look at the time it takes to complete the whole procedure, it will appear to me to be slow: all that time taken to produce and consume 10 rows? But if you look closer, if you analyze and interpret the situation, you'll see it does a tremendous amount of work, and in that light it might even be extremely fast. With every performance problem you encounter, always do realize that what you're trying to solve is perhaps not a technical problem at all, but a perception problem. The second most important rule you have to understand is based on the old saying "Penny wise, Pound Foolish": the part which takes e.g. 5% of the total time T for a given task isn't worth optimizing if you have another part which takes a much larger part of the total time T for that same given task. Optimizing parts which are relatively insignificant for the total time taken is not going to bring you better results overall, even if you totally optimize that part away. This is the core reason why analysis of the complete set of application parts which participate in a given task is key to being successful in solving performance problems: No analysis -> no problem -> no solution. One warning up front: hunting for performance will always include making compromises. Fast software can be made maintainable, but if you want to squeeze as much performance out of your software, you will inevitably be faced with the dilemma of compromising one or more from the group {readability, maintainability, features} for the extra performance you think you'll gain. It's then up to you to decide whether it's worth it. In almost all cases it's not. The reason for this is simple: the vast majority of performance problems can be solved by implementing the proper algorithms, the ones with proven Big O-characteristics so you know the performance you'll get plus you know the algorithm will work. The time taken by the algorithm implementing code is inevitable: you already implemented the best algorithm. You might find some optimizations on the technical level but in general these are minor. Let's look at the four steps to see how they guide us through the quest to find and fix performance problems. Isolate The first thing you need to do is to isolate the areas in your application which are assumed to be slow. For example, if your application is a web application and a given page is taking several seconds or even minutes to load, it's a good candidate to check out. It's important to start with the isolate step because it allows you to focus on a single code path per area with a clear begin and end and ignore the rest. The rest of the steps are taken per identified problematic area. Keep in mind that isolation focuses on tasks in an application, not code snippets. A task is something that's started in your application by either another task or the user, or another program, and has a beginning and an end. You can see a task as a piece of functionality offered by your application.  Analyze Once you've determined the problem areas, you have to perform analysis on the code paths of each area, to see where the performance problems occur and which areas are not the problem. This is a multi-layered effort: an application which uses an O/R mapper typically consists of multiple parts: there's likely some kind of interface (web, webservice, windows etc.), a part which controls the interface and business logic, the O/R mapper part and the RDBMS, all connected with either a network or inter-process connections provided by the OS or other means. Each of these parts, including the connectivity plumbing, eat up a part of the total time it takes to complete a task, e.g. load a webpage with all orders of a given customer X. To understand which parts participate in the task / area we're investigating and how much they contribute to the total time taken to complete the task, analysis of each participating task is essential. Start with the code you wrote which starts the task, analyze the code and track the path it follows through your application. What does the code do along the way, verify whether it's correct or not. Analyze whether you have implemented the right algorithms in your code for this particular area. Remember we're looking at one area at a time, which means we're ignoring all other code paths, just the code path of the current problematic area, from begin to end and back. Don't dig in and start optimizing at the code level just yet. We're just analyzing. If your analysis reveals big architectural stupidity, it's perhaps a good idea to rethink the architecture at this point. For the rest, we're analyzing which means we collect data about what could be wrong, for each participating part of the complete application. Reviewing the code you wrote is a good tool to get deeper understanding of what is going on for a given task but ultimately it lacks precision and overview what really happens: humans aren't good code interpreters, computers are. We therefore need to utilize tools to get deeper understanding about which parts contribute how much time to the total task, triggered by which other parts and for example how many times are they called. There are two different kind of tools which are necessary: .NET profilers and O/R mapper / RDBMS profilers. .NET profiling .NET profilers (e.g. dotTrace by JetBrains or Ants by Red Gate software) show exactly which pieces of code are called, how many times they're called, and the time it took to run that piece of code, at the method level and sometimes even at the line level. The .NET profilers are essential tools for understanding whether the time taken to complete a given task / area in your application is consumed by .NET code, where exactly in your code, the path to that code, how many times that code was called by other code and thus reveals where hotspots are located: the areas where a solution can be found. Importantly, they also reveal which areas can be left alone: remember our penny wise pound foolish saying: if a profiler reveals that a group of methods are fast, or don't contribute much to the total time taken for a given task, ignore them. Even if the code in them is perhaps complex and looks like a candidate for optimization: you can work all day on that, it won't matter.  As we're focusing on a single area of the application, it's best to start profiling right before you actually activate the task/area. Most .NET profilers support this by starting the application without starting the profiling procedure just yet. You navigate to the particular part which is slow, start profiling in the profiler, in your application you perform the actions which are considered slow, and afterwards you get a snapshot in the profiler. The snapshot contains the data collected by the profiler during the slow action, so most data is produced by code in the area to investigate. This is important, because it allows you to stay focused on a single area. O/R mapper and RDBMS profiling .NET profilers give you a good insight in the .NET side of things, but not in the RDBMS side of the application. As this article is about O/R mapper powered applications, we're also looking at databases, and the software making it possible to consume the database in your application: the O/R mapper. To understand which parts of the O/R mapper and database participate how much to the total time taken for task T, we need different tools. There are two kind of tools focusing on O/R mappers and database performance profiling: O/R mapper profilers and RDBMS profilers. For O/R mapper profilers, you can look at LLBLGen Prof by hibernating rhinos or the Linq to Sql/LLBLGen Pro profiler by Huagati. Hibernating rhinos also have profilers for other O/R mappers like NHibernate (NHProf) and Entity Framework (EFProf) and work the same as LLBLGen Prof. For RDBMS profilers, you have to look whether the RDBMS vendor has a profiler. For example for SQL Server, the profiler is shipped with SQL Server, for Oracle it's build into the RDBMS, however there are also 3rd party tools. Which tool you're using isn't really important, what's important is that you get insight in which queries are executed during the task / area we're currently focused on and how long they took. Here, the O/R mapper profilers have an advantage as they collect the time it took to execute the query from the application's perspective so they also collect the time it took to transport data across the network. This is important because a query which returns a massive resultset or a resultset with large blob/clob/ntext/image fields takes more time to get transported across the network than a small resultset and a database profiler doesn't take this into account most of the time. Another tool to use in this case, which is more low level and not all O/R mappers support it (though LLBLGen Pro and NHibernate as well do) is tracing: most O/R mappers offer some form of tracing or logging system which you can use to collect the SQL generated and executed and often also other activity behind the scenes. While tracing can produce a tremendous amount of data in some cases, it also gives insight in what's going on. Interpret After we've completed the analysis step it's time to look at the data we've collected. We've done code reviews to see whether we've done anything stupid and which parts actually take place and if the proper algorithms have been implemented. We've done .NET profiling to see which parts are choke points and how much time they contribute to the total time taken to complete the task we're investigating. We've performed O/R mapper profiling and RDBMS profiling to see which queries were executed during the task, how many queries were generated and executed and how long they took to complete, including network transportation. All this data reveals two things: which parts are big contributors to the total time taken and which parts are irrelevant. Both aspects are very important. The parts which are irrelevant (i.e. don't contribute significantly to the total time taken) can be ignored from now on, we won't look at them. The parts which contribute a lot to the total time taken are important to look at. We now have to first look at the .NET profiler results, to see whether the time taken is consumed in our own code, in .NET framework code, in the O/R mapper itself or somewhere else. For example if most of the time is consumed by DbCommand.ExecuteReader, the time it took to complete the task is depending on the time the data is fetched from the database. If there was just 1 query executed, according to tracing or O/R mapper profilers / RDBMS profilers, check whether that query is optimal, uses indexes or has to deal with a lot of data. Interpret means that you follow the path from begin to end through the data collected and determine where, along the path, the most time is contributed. It also means that you have to check whether this was expected or is totally unexpected. My previous example of the 10 row resultset of a query which groups millions of rows will likely reveal that a long time is spend inside the database and almost no time is spend in the .NET code, meaning the RDBMS part contributes the most to the total time taken, the rest is compared to that time, irrelevant. Considering the vastness of the source data set, it's expected this will take some time. However, does it need tweaking? Perhaps all possible tweaks are already in place. In the interpret step you then have to decide that further action in this area is necessary or not, based on what the analysis results show: if the analysis results were unexpected and in the area where the most time is contributed to the total time taken is room for improvement, action should be taken. If not, you can only accept the situation and move on. In all cases, document your decision together with the analysis you've done. If you decide that the perceived performance problem is actually expected due to the nature of the task performed, it's essential that in the future when someone else looks at the application and starts asking questions you can answer them properly and new analysis is only necessary if situations changed. Fix After interpreting the analysis results you've concluded that some areas need adjustment. This is the fix step: you're actively correcting the performance problem with proper action targeted at the real cause. In many cases related to O/R mapper powered applications it means you'll use different features of the O/R mapper to achieve the same goal, or apply optimizations at the RDBMS level. It could also mean you apply caching inside your application (compromise memory consumption over performance) to avoid unnecessary re-querying data and re-consuming the results. After applying a change, it's key you re-do the analysis and interpretation steps: compare the results and expectations with what you had before, to see whether your actions had any effect or whether it moved the problem to a different part of the application. Don't fall into the trap to do partly analysis: do the full analysis again: .NET profiling and O/R mapper / RDBMS profiling. It might very well be that the changes you've made make one part faster but another part significantly slower, in such a way that the overall problem hasn't changed at all. Performance tuning is dealing with compromises and making choices: to use one feature over the other, to accept a higher memory footprint, to go away from the strict-OO path and execute queries directly onto the RDBMS, these are choices and compromises which will cross your path if you want to fix performance problems with respect to O/R mappers or data-access and databases in general. In most cases it's not a big issue: alternatives are often good choices too and the compromises aren't that hard to deal with. What is important is that you document why you made a choice, a compromise: which analysis data, which interpretation led you to the choice made. This is key for good maintainability in the years to come. Most common performance problems with O/R mappers Below is an incomplete list of common performance problems related to data-access / O/R mappers / RDBMS code. It will help you with fixing the hotspots you found in the interpretation step. SELECT N+1: (Lazy-loading specific). Lazy loading triggered performance bottlenecks. Consider a list of Orders bound to a grid. You have a Field mapped onto a related field in Order, Customer.CompanyName. Showing this column in the grid will make the grid fetch (indirectly) for each row the Customer row. This means you'll get for the single list not 1 query (for the orders) but 1+(the number of orders shown) queries. To solve this: use eager loading using a prefetch path to fetch the customers with the orders. SELECT N+1 is easy to spot with an O/R mapper profiler or RDBMS profiler: if you see a lot of identical queries executed at once, you have this problem. Prefetch paths using many path nodes or sorting, or limiting. Eager loading problem. Prefetch paths can help with performance, but as 1 query is fetched per node, it can be the number of data fetched in a child node is bigger than you think. Also consider that data in every node is merged on the client within the parent. This is fast, but it also can take some time if you fetch massive amounts of entities. If you keep fetches small, you can use tuning parameters like the ParameterizedPrefetchPathThreshold setting to get more optimal queries. Deep inheritance hierarchies of type Target Per Entity/Type. If you use inheritance of type Target per Entity / Type (each type in the inheritance hierarchy is mapped onto its own table/view), fetches will join subtype- and supertype tables in many cases, which can lead to a lot of performance problems if the hierarchy has many types. With this problem, keep inheritance to a minimum if possible, or switch to a hierarchy of type Target Per Hierarchy, which means all entities in the inheritance hierarchy are mapped onto the same table/view. Of course this has its own set of drawbacks, but it's a compromise you might want to take. Fetching massive amounts of data by fetching large lists of entities. LLBLGen Pro supports paging (and limiting the # of rows returned), which is often key to process through large sets of data. Use paging on the RDBMS if possible (so a query is executed which returns only the rows in the page requested). When using paging in a web application, be sure that you switch server-side paging on on the datasourcecontrol used. In this case, paging on the grid alone is not enough: this can lead to fetching a lot of data which is then loaded into the grid and paged there. Keep note that analyzing queries for paging could lead to the false assumption that paging doesn't occur, e.g. when the query contains a field of type ntext/image/clob/blob and DISTINCT can't be applied while it should have (e.g. due to a join): the datareader will do DISTINCT filtering on the client. this is a little slower but it does perform paging functionality on the data-reader so it won't fetch all rows even if the query suggests it does. Fetch massive amounts of data because blob/clob/ntext/image fields aren't excluded. LLBLGen Pro supports field exclusion for queries. You can exclude fields (also in prefetch paths) per query to avoid fetching all fields of an entity, e.g. when you don't need them for the logic consuming the resultset. Excluding fields can greatly reduce the amount of time spend on data-transport across the network. Use this optimization if you see that there's a big difference between query execution time on the RDBMS and the time reported by the .NET profiler for the ExecuteReader method call. Doing client-side aggregates/scalar calculations by consuming a lot of data. If possible, try to formulate a scalar query or group by query using the projection system or GetScalar functionality of LLBLGen Pro to do data consumption on the RDBMS server. It's far more efficient to process data on the RDBMS server than to first load it all in memory, then traverse the data in-memory to calculate a value. Using .ToList() constructs inside linq queries. It might be you use .ToList() somewhere in a Linq query which makes the query be run partially in-memory. Example: var q = from c in metaData.Customers.ToList() where c.Country=="Norway" select c; This will actually fetch all customers in-memory and do an in-memory filtering, as the linq query is defined on an IEnumerable<T>, and not on the IQueryable<T>. Linq is nice, but it can often be a bit unclear where some parts of a Linq query might run. Fetching all entities to delete into memory first. To delete a set of entities it's rather inefficient to first fetch them all into memory and then delete them one by one. It's more efficient to execute a DELETE FROM ... WHERE query on the database directly to delete the entities in one go. LLBLGen Pro supports this feature, and so do some other O/R mappers. It's not always possible to do this operation in the context of an O/R mapper however: if an O/R mapper relies on a cache, these kind of operations are likely not supported because they make it impossible to track whether an entity is actually removed from the DB and thus can be removed from the cache. Fetching all entities to update with an expression into memory first. Similar to the previous point: it is more efficient to update a set of entities directly with a single UPDATE query using an expression instead of fetching the entities into memory first and then updating the entities in a loop, and afterwards saving them. It might however be a compromise you don't want to take as it is working around the idea of having an object graph in memory which is manipulated and instead makes the code fully aware there's a RDBMS somewhere. Conclusion Performance tuning is almost always about compromises and making choices. It's also about knowing where to look and how the systems in play behave and should behave. The four steps I provided should help you stay focused on the real problem and lead you towards the solution. Knowing how to optimally use the systems participating in your own code (.NET framework, O/R mapper, RDBMS, network/services) is key for success as well as knowing what's going on inside the application you built. I hope you'll find this guide useful in tracking down performance problems and dealing with them in a useful way.  

    Read the article

  • Azure, don't give me multiple VMs, give me one elastic VM

    - by FransBouma
    Yesterday, Microsoft revealed new major features for Windows Azure (see ScottGu's post). It all looks shiny and great, but after reading most of the material describing the new features, I still find the overall idea behind all of it flawed: why should I care on how much VMs my web app runs? Isn't that a problem to solve for the Windows Azure engineers / software? And what if I need the file system, why can't I simply get a virtual filesystem ? To illustrate my point, let's use a real example: a product website with a customer system/database and next to it a support site with accompanying database. Both are written in .NET, using ASP.NET and use a SQL Server database each. The product website offers files to download by customers, very simple. You have a couple of options to host these websites: Buy a server, place it in a rack at an ISP and run the sites on that server Use 'shared hosting' with an ISP, which means your sites' appdomains are running on the same machine, as well as the files stored, and the databases are hosted in the same server as the other shared databases. Hire a VM, install your OS of choice at an ISP, and host the sites on that VM, basically the same as the first option, except you don't have a physical server At some cloud-vendor, either host the sites 'shared' or in a VM. See above. With all of those options, scalability is a problem, even the cloud-based ones, though not due to the same reasons: The physical server solution has the obvious problem that if you need more power, you need to buy a bigger server or more servers which requires you to add replication and other overhead Shared hosting solutions are almost always capped on memory usage / traffic and database size: if your sites get too big, you have to move out of the shared hosting environment and start over with one of the other solutions The VM solution, be it a VM at an ISP or 'in the cloud' at e.g. Windows Azure or Amazon, in theory allows scaling out by simply instantiating more VMs, however that too introduces the same overhead problems as with the physical servers: suddenly more than 1 instance runs your sites. If a cloud vendor offers its services in the form of VMs, you won't gain much over having a VM at some ISP: the main problems you have to work around are still there: when you spin up more than one VM, your application must be completely stateless at any moment, including the DB sub system, because what's in memory in instance 1 might not be in memory in instance 2. This might sounds trivial but it's not. A lot of the websites out there started rather small: they were perfectly runnable on a single machine with normal memory and CPU power. After all, you don't need a big machine to run a website with even thousands of users a day. Moving these sites to a multi-VM environment will cause a problem: all the in-memory state they use, all the multi-page transitions they use while keeping state across the transition, they can't do that anymore like they did that on a single machine: state is something of the past, you have to store every byte of state in either a DB or in a viewstate or in a cookie somewhere so with the next request, all state information is available through the request, as nothing is kept in-memory. Our example uses a bunch of files in a file system. Using multiple VMs will require that these files move to a cloud storage system which is mounted in each VM so we don't have to store the files on each VM. This might require different file paths, but this change should be minor. What's perhaps less minor is the maintenance procedure in place on the new type of cloud storage used: instead of ftp-ing into a VM, you might have to update the files using different ways / tools. All in all this makes moving an existing website which was written for an environment that's based around a VM (namely .NET with its CLR) overly cumbersome and problematic: it forces you to refactor your website system to be able to be used 'in the cloud', which is caused by the limited way how e.g. Windows Azure offers its cloud services: in blocks of VMs. Offer a scalable, flexible VM which extends with my needs Instead, cloud vendors should offer simply one VM to me. On that VM I run the websites, store my DB and my files. As it's a virtual machine, how this machine is actually ran on physical hardware (e.g. partitioned), I don't care, as that's the problem for the cloud vendor to solve. If I need more resources, e.g. I have more traffic to my server, way more visitors per day, the VM stretches, like I bought a bigger box. This frees me from the problem which comes with multiple VMs: I don't have any refactoring to do at all: I can simply build my website as if it runs on my local hardware server, upload it to the VM offered by the cloud vendor, install it on the VM and I'm done. "But that might require changes to windows!" Yes, but Microsoft is Windows. Windows Azure is their service, they can make whatever change to what they offer to make it look like it's windows. Yet, they're stuck, like Amazon, in thinking in VMs, which forces developers to 'think ahead' and gamble whether they would need to migrate to a cloud with multiple VMs in the future or not. Which comes down to: gamble whether they should invest time in code / architecture which they might never need. (YAGNI anyone?) So the VM we're talking about, is that a low-level VM which runs a guest OS, or is that VM a different kind of VM? The flexible VM: .NET's CLR ? My example websites are ASP.NET based, which means they run inside a .NET appdomain, on the .NET CLR, which is a VM. The only physical OS resource the sites need is the file system, however this too is accessed through .NET. In short: all the websites see is what .NET allows the websites to see, the world as the websites know it is what .NET shows them and lets them access. How the .NET appdomain is run physically, that's the concern of .NET, not mine. This begs the question why Windows Azure doesn't offer virtual appdomains? Or better: .NET environments which look like one machine but could be physically multiple machines. In such an environment, no change has to be made to the websites to migrate them from a local machine or own server to the cloud to get proper scaling: the .NET VM will simply scale with the need: more memory needed, more CPU power needed, it stretches. What it offers to the application running inside the appdomain is simply increasing, but not fragmented: all resources are available to the application: this means that the problem of how to scale is back to where it should be: with the cloud vendor. "Yeah, great, but what about the databases?" The .NET application communicates with the database server through a .NET ADO.NET provider. Where the database is located is not a problem of the appdomain: the ADO.NET provider has to solve that. I.o.w.: we can host the databases in an environment which offers itself as a single resource and is accessible through one connection string without replication overhead on the outside, and use that environment inside the .NET VM as if it was a single DB. But what about memory replication and other problems? This environment isn't simple, at least not for the cloud vendor. But it is simple for the customer who wants to run his sites in that cloud: no work needed. No refactoring needed of existing code. Upload it, run it. Perhaps I'm dreaming and what I described above isn't possible. Yet, I think if cloud vendors don't move into that direction, what they're offering isn't interesting: it doesn't solve a problem at all, it simply offers a way to instantiate more VMs with the guest OS of choice at the cost of me needing to refactor my website code so it can run in the straight jacket form factor dictated by the cloud vendor. Let's not kid ourselves here: most of us developers will never build a website which needs a truck load of VMs to run it: almost all websites created by developers can run on just a few VMs at most. Yet, the most expensive change is right at the start: moving from one to two VMs. As soon as you have refactored your website code to run across multiple VMs, adding another one is just as easy as clicking a mouse button. But that first step, that's the problem here and as it's right there at the beginning of scaling the website, it's particularly strange that cloud vendors refuse to solve that problem and leave it to the developers to solve that. Which makes migrating 'to the cloud' particularly expensive.

    Read the article

  • Oracle Security Inside Out Newsletter – June Edition

    - by Troy Kitch
    This month’s Information In Depth Newsletter, Security Inside Out Edition is now available. In this edition we look at the Gartner Security and Risk Management Summit 2011, discuss safeguarding data from threats with Oracle Database Vault, and reveal the latest database security webcasts, videos, training, events and more. If you don’t have a subscription to this bi-monthly security information update, you can sign up here at the bottom of the page.

    Read the article

  • SQL Server v.Next (Denali) : More on contained databases and "contained users"

    - by AaronBertrand
    One of the reasons for contained databases (see my previous post ) is to allow for a more seamless transition when moving a database from one server to another. One of the biggest complications in doing so is making sure that all of the logins are in place on the new server. Contained databases help solve this issue by creating a new type of user: a database-level user with a password. I want to stress that this is not the same concept as a user without a login , which serves a completely different...(read more)

    Read the article

  • Today’s Performance Tip: Views are for Convenience, Not Performance!

    - by Jonathan Kehayias
    I tweeted this last week on twitter and got a lot of retweets so I thought that I’d blog the story behind the tweet. Most vendor databases have views in them, and when people want to retrieve data from a database, it seems like the most common first stop they make are the vendor supplied Views.  This post is in no way a bash against the usage or creation of Views in a SQL Server Database, I have created them before to simplify code and compartmentalize commonly required queries so that there...(read more)

    Read the article

  • Is it wise to store a big lump of json on a database row

    - by Ieyasu Sawada
    I have this project which stores product details from amazon into the database. Just to give you an idea on how big it is: [{"title":"Genetic Engineering (Opposing Viewpoints)","short_title":"Genetic Engineering ...","brand":"","condition":"","sales_rank":"7171426","binding":"Book","item_detail_url":"http://localhost/wordpress/product/?asin=0737705124","node_list":"Books > Science & Math > Biological Sciences > Biotechnology","node_category":"Books","subcat":"","model_number":"","item_url":"http://localhost/wordpress/wp-content/ecom-plugin-redirects/ecom_redirector.php?id=128","details_url":"http://localhost/wordpress/product/?asin=0737705124","large_image":"http://localhost/wordpress/wp-content/plugins/ecom/img/large-notfound.png","medium_image":"http://localhost/wordpress/wp-content/plugins/ecom/img/medium-notfound.png","small_image":"http://localhost/wordpress/wp-content/plugins/ecom/img/small-notfound.png","thumbnail_image":"http://localhost/wordpress/wp-content/plugins/ecom/img/thumbnail-notfound.png","tiny_img":"http://localhost/wordpress/wp-content/plugins/ecom/img/tiny-notfound.png","swatch_img":"http://localhost/wordpress/wp-content/plugins/ecom/img/swatch-notfound.png","total_images":"6","amount":"33.70","currency":"$","long_currency":"USD","price":"$33.70","price_type":"List Price","show_price_type":"0","stars_url":"","product_review":"","rating":"","yellow_star_class":"","white_star_class":"","rating_text":" of 5","reviews_url":"","review_label":"","reviews_label":"Read all ","review_count":"","create_review_url":"http://localhost/wordpress/wp-content/ecom-plugin-redirects/ecom_redirector.php?id=132","create_review_label":"Write a review","buy_url":"http://localhost/wordpress/wp-content/ecom-plugin-redirects/ecom_redirector.php?id=19186","add_to_cart_action":"http://localhost/wordpress/wp-content/ecom-plugin-redirects/add_to_cart.php","asin":"0737705124","status":"Only 7 left in stock.","snippet_condition":"in_stock","status_class":"ninstck","customer_images":["http://localhost/wordpress/wp-content/uploads/2013/10/ecom_images/51M2vvFvs2BL.jpg","http://localhost/wordpress/wp-content/uploads/2013/10/ecom_images/31FIM-YIUrL.jpg","http://localhost/wordpress/wp-content/uploads/2013/10/ecom_images/51M2vvFvs2BL.jpg","http://localhost/wordpress/wp-content/uploads/2013/10/ecom_images/51M2vvFvs2BL.jpg"],"disclaimer":"","item_attributes":[{"attr":"Author","value":"Greenhaven Press"},{"attr":"Binding","value":"Hardcover"},{"attr":"EAN","value":"9780737705126"},{"attr":"Edition","value":"1"},{"attr":"ISBN","value":"0737705124"},{"attr":"Label","value":"Greenhaven Press"},{"attr":"Manufacturer","value":"Greenhaven Press"},{"attr":"NumberOfItems","value":"1"},{"attr":"NumberOfPages","value":"224"},{"attr":"ProductGroup","value":"Book"},{"attr":"ProductTypeName","value":"ABIS_BOOK"},{"attr":"PublicationDate","value":"2000-06"},{"attr":"Publisher","value":"Greenhaven Press"},{"attr":"SKU","value":"G0737705124I2N00"},{"attr":"Studio","value":"Greenhaven Press"},{"attr":"Title","value":"Genetic Engineering (Opposing Viewpoints)"}],"customer_review_url":"http://localhost/wordpress/wp-content/ecom-customer-reviews/0737705124.html","flickr_results":["http://localhost/wordpress/wp-content/uploads/2013/10/ecom_images/5105560852_06c7d06f14_m.jpg"],"freebase_text":"No around the web data available yet","freebase_image":"http://localhost/wordpress/wp-content/plugins/ecom/img/freebase-notfound.jpg","ebay_related_items":[{"title":"Genetic Engineering (Introducing Issues With Opposing Viewpoints), , Good Book","image":"http://localhost/wordpress/wp-content/uploads/2013/10/ecom_images/140.jpg","url":"http://localhost/wordpress/wp-content/ecom-plugin-redirects/ecom_redirector.php?id=12165","currency_id":"$","current_price":"26.2"},{"title":"Genetic Engineering Opposing Viewpoints by DAVID BENDER - 1964 Hardcover","image":"http://localhost/wordpress/wp-content/uploads/2013/10/ecom_images/140.jpg","url":"http://localhost/wordpress/wp-content/ecom-plugin-redirects/ecom_redirector.php?id=130","currency_id":"AUD","current_price":"11.99"}],"no_follow":"rel=\"nofollow\"","new_tab":"target=\"_blank\"","related_products":[],"super_saver_shipping":"","shipping_availability":"","total_offers":"7","added_to_cart":""}] So the structure for the table is: asin title details (the product details in json) Will the performance suffer if I have to store like 10,000 products? Is there any other way of doing this? I'm thinking of the following, but the current setup is really the most convenient one since I also have to use the data on the client side: store the product details in a file. So something like ASIN123.json store the product details in one big file. (I'm guessing it will be a drag to extract data from this file) store each of the fields in the details in its own table field Thanks in advance!

    Read the article

  • Is inline SQL still classed as bad practice now that we have Micro ORMs?

    - by Grofit
    This is a bit of an open ended question but I wanted some opinions, as I grew up in a world where inline SQL scripts were the norm, then we were all made very aware of SQL injection based issues, and how fragile the sql was when doing string manipulations all over the place. Then came the dawn of the ORM where you were explaining the query to the ORM and letting it generate its own SQL, which in a lot of cases was not optimal but was safe and easy. Another good thing about ORMs or database abstraction layers were that the SQL was generated with its database engine in mind, so I could use Hibernate/Nhibernate with MSSQL, MYSQL and my code never changed it was just a configuration detail. Now fast forward to current day, where Micro ORMs seem to be winning over more developers I was wondering why we have seemingly taken a U-Turn on the whole in-line sql subject. I must admit I do like the idea of no ORM config files and being able to write my query in a more optimal manner but it feels like I am opening myself back up to the old vulnerabilities such as SQL injection and I am also tying myself to one database engine so if I want my software to support multiple database engines I would need to do some more string hackery which seems to then start to make code unreadable and more fragile. (Just before someone mentions it I know you can use parameter based arguments with most micro orms which offers protection in most cases from sql injection) So what are peoples opinions on this sort of thing? I am using Dapper as my Micro ORM in this instance and NHibernate as my regular ORM in this scenario, however most in each field are quite similar. What I term as inline sql is SQL strings within source code. There used to be design debates over SQL strings in source code detracting from the fundamental intent of the logic, which is why statically typed linq style queries became so popular its still just 1 language, but with lets say C# and Sql in one page you have 2 languages intermingled in your raw source code now. Just to clarify, the SQL injection is just one of the known issues with using sql strings, I already mention you can stop this from happening with parameter based queries, however I highlight other issues with having SQL queries ingrained in your source code, such as the lack of DB Vendor abstraction as well as losing any level of compile time error capturing on string based queries, these are all issues which we managed to side step with the dawn of ORMs with their higher level querying functionality, such as HQL or LINQ etc (not all of the issues but most of them). So I am less focused on the individual highlighted issues and more the bigger picture of is it now becoming more acceptable to have SQL strings directly in your source code again, as most Micro ORMs use this mechanism. Here is a similar question which has a few different view points, although is more about the inline sql without the micro orm context: http://stackoverflow.com/questions/5303746/is-inline-sql-hard-coding

    Read the article

  • What is the right way to process inconsistent data files?

    - by Tahabi
    I'm working at a company that uses Excel files to store product data, specifically, test results from products before they are shipped out. There are a few thousand spreadsheets with anywhere from 50-100 relevant data points per file. Over the years, the schema for the spreadsheets has changed significantly, but not unidirectionally - in the sense that, changes often get reverted and then re-added in the space of a few dozen to few hundred files. My project is to convert about 8000 of these spreadsheets into a database that can be queried. I'm using MongoDB to deal with the inconsistency in the data, and Python. My question is, what is the "right" or canonical way to deal with the huge variance in my source files? I've written a data structure which stores the data I want for the latest template, which will be the final template used going forward, but that only helps for a few hundred files historically. Brute-forcing a solution would mean writing similar data structures for each version/template - which means potentially writing hundreds of schemas with dozens of fields each. This seems very inefficient, especially when sometimes a change in the template is as little as moving a single line of data one row down or splitting what used to be one data field into two data fields. A slightly more elegant solution I have in mind would be writing schemas for all the variants I can find for pre-defined groups in the source files, and then writing a function to match a particular series of files with a series of variants that matches that set of files. This is because, more often that not, most of the file will remain consistent over a long period, only marred by one or two errant sections, but inside the period, which section is inconsistent, is inconsistent. For example, say a file has four sections with three data fields, which is represented by four Python dictionaries with three keys each. For files 7000-7250, sections 1-3 will be consistent, but section 4 will be shifted one row down. For files 7251-7500, 1-3 are consistent, section 4 is one row down, but a section five appears. For files 7501-7635, sections 1 and 3 will be consistent, but section 2 will have five data fields instead of three, section five disappears, and section 4 is still shifted down one row. For files 7636-7800, section 1 is consistent, section 4 gets shifted back up, section 2 returns to three cells, but section 3 is removed entirely. Files 7800-8000 have everything in order. The proposed function would take the file number and match it to a dictionary representing the data mappings for different variants of each section. For example, a section_four_variants dictionary might have two members, one for the shifted-down version, and one for the normal version, a section_two_variants might have three and five field members, etc. The script would then read the matchings, load the correct mapping, extract the data, and insert it into the database. Is this an accepted/right way to go about solving this problem? Should I structure things differently? I don't know what to search Google for either to see what other solutions might be, though I believe the problem lies in the domain of ETL processing. I also have no formal CS training aside from what I've taught myself over the years. If this is not the right forum for this question, please tell me where to move it, if at all. Any help is most appreciated. Thank you.

    Read the article

  • SQL SERVER – Storing 64-bit Unsigned Integer Value in Database

    - by Pinal Dave
    Here is a very interesting question I received in an email just another day. Some questions just are so good that it makes me wonder how come I have not faced it first hand. Anyway here is the question - “Pinal, I am migrating my database from MySQL to SQL Server and I have faced unique situation. I have been using Unsigned 64-bit integer in MySQL but when I try to migrate that column to SQL Server, I am facing an issue as there is no datatype which I find appropriate for my column. It is now too late to change the datatype and I need immediate solution. One chain of thought was to change the data type of the column from Unsigned 64-bit (BIGINT) to VARCHAR(n) but that will just change the data type for me such that I will face quite a lot of performance related issues in future. In SQL Server we also have the BIGINT data type but that is Signed 64-bit datatype. BIGINT datatype in SQL Server have range of -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807). However, my digit is much larger than this number. Is there anyway, I can store my big 64-bit Unsigned Integer without loosing much of the performance of by converting it to VARCHAR.” Very interesting question, for the sake of the argument, we can ask user that there should be no need of such a big number or if you are taking about identity column I really doubt that if your table will grow beyond this table. Here the real question which I found interesting was how to store 64-bit unsigned integer value in SQL Server without converting it to String data type. After thinking a bit, I found a fairly simple answer. I can use NUMERIC data type. I can use NUMERIC(20) datatype for 64-bit unsigned integer value, NUMERIC(10) datatype for 32-bit unsigned integer value and NUMERIC(5) datatype for 16-bit unsigned integer value. Numeric datatype supports 38 maximum of 38 precision. Now here is another thing to keep in mind. Using NUMERIC datatype will indeed accept the 64-bit unsigned integer but in future if you try to enter negative value, it will also allow the same. Hence, you will need to put any additional constraint over column to only accept positive integer there. Here is another big concern, SQL Server will store the number as numeric and will treat that as a positive integer for all the practical purpose. You will have to write in your application logic to interpret that as a 64-bit Unsigned Integer. On another side if you are using unsigned integers in your application, there are good chance that you already have logic taking care of the same. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: SQL Datatype

    Read the article

< Previous Page | 197 198 199 200 201 202 203 204 205 206 207 208  | Next Page >