Search Results

Search found 1042 results on 42 pages for 'relational calculus'.

Page 20/42 | < Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27  | Next Page >

  • Is OOP based on any branch of mathematics?

    - by ektrules
    I know relational databases are based on set-theory, functional programming is based on lambda calculus, logic programming is based on logic (of course :)), and now that I think of it; I'm not sure if imperative and generic programming is based on any particular branch of mathematics either.

    Read the article

  • Remote debug a linux app from XCode

    - by erick2red
    hey guys: it's simple i had this linux pc which i connect remote and i have this application that i run there and i want to debug it, but i just don't know how. It's the simpler c++ app on the world, load some libraries, do some calculus, print some output and return, It's just that, i just haven't any clue. So any help would be appreciated. Thxs anyway

    Read the article

  • NoSQL with MongoDB, NoRM and ASP.NET MVC

    - by shiju
     In this post, I will give an introduction to how to work on NoSQL and document database with MongoDB , NoRM and ASP.Net MVC 2. NoSQL and Document Database The NoSQL movement is getting big attention in this year and people are widely talking about document databases and NoSQL along with web application scalability. According to Wikipedia, "NoSQL is a movement promoting a loosely defined class of non-relational data stores that break with a long history of relational databases. These data stores may not require fixed table schemas, usually avoid join operations and typically scale horizontally. Academics and papers typically refer to these databases as structured storage". Document databases are schema free so that you can focus on the problem domain and don't have to worry about updating the schema when your domain is evolving. This enables truly a domain driven development. One key pain point of relational database is the synchronization of database schema with your domain entities when your domain is evolving.There are lots of NoSQL implementations are available and both CouchDB and MongoDB got my attention. While evaluating both CouchDB and MongoDB, I found that CouchDB can’t perform dynamic queries and later I picked MongoDB over CouchDB. There are many .Net drivers available for MongoDB document database. MongoDB MongoDB is an open source, scalable, high-performance, schema-free, document-oriented database written in the C++ programming language. It has been developed since October 2007 by 10gen. MongoDB stores your data as binary JSON (BSON) format . MongoDB has been getting a lot of attention and you can see the some of the list of production deployments from here - http://www.mongodb.org/display/DOCS/Production+Deployments NoRM – C# driver for MongoDB NoRM is a C# driver for MongoDB with LINQ support. NoRM project is available on Github at http://github.com/atheken/NoRM. Demo with ASP.NET MVC I will show a simple demo with MongoDB, NoRM and ASP.NET MVC. To work with MongoDB and  NoRM, do the following steps Download the MongoDB databse For Windows 32 bit, download from http://downloads.mongodb.org/win32/mongodb-win32-i386-1.4.1.zip  and for Windows 64 bit, download  from http://downloads.mongodb.org/win32/mongodb-win32-x86_64-1.4.1.zip . The zip contains the mongod.exe for run the server and mongo.exe for the client Download the NorM driver for MongoDB at http://github.com/atheken/NoRM Create a directory call C:\data\db. This is the default location of MongoDB database. You can override the behavior. Run C:\Mongo\bin\mongod.exe. This will start the MongoDb server Now I am going to demonstrate how to program with MongoDb and NoRM in an ASP.NET MVC application.Let’s write a domain class public class Category {            [MongoIdentifier]public ObjectId Id { get; set; } [Required(ErrorMessage = "Name Required")][StringLength(25, ErrorMessage = "Must be less than 25 characters")]public string Name { get; set;}public string Description { get; set; }}  ObjectId is a NoRM type that represents a MongoDB ObjectId. NoRM will automatically update the Id becasue it is decorated by the MongoIdentifier attribute. The next step is to create a mongosession class. This will do the all interactions to the MongoDB. internal class MongoSession<TEntity> : IDisposable{    private readonly MongoQueryProvider provider;     public MongoSession()    {        this.provider = new MongoQueryProvider("Expense");    }     public IQueryable<TEntity> Queryable    {        get { return new MongoQuery<TEntity>(this.provider); }    }     public MongoQueryProvider Provider    {        get { return this.provider; }    }     public void Add<T>(T item) where T : class, new()    {        this.provider.DB.GetCollection<T>().Insert(item);    }     public void Dispose()    {        this.provider.Server.Dispose();     }    public void Delete<T>(T item) where T : class, new()    {        this.provider.DB.GetCollection<T>().Delete(item);    }     public void Drop<T>()    {        this.provider.DB.DropCollection(typeof(T).Name);    }     public void Save<T>(T item) where T : class,new()    {        this.provider.DB.GetCollection<T>().Save(item);                }  }    The MongoSession constrcutor will create an instance of MongoQueryProvider that supports the LINQ expression and also create a database with name "Expense". If database is exists, it will use existing database, otherwise it will create a new databse with name  "Expense". The Save method can be used for both Insert and Update operations. If the object is new one, it will create a new record and otherwise it will update the document with given ObjectId.  Let’s create ASP.NET MVC controller actions for CRUD operations for the domain class Category public class CategoryController : Controller{ //Index - Get the category listpublic ActionResult Index(){    using (var session = new MongoSession<Category>())    {        var categories = session.Queryable.AsEnumerable<Category>();        return View(categories);    }} //edit a single category[HttpGet]public ActionResult Edit(ObjectId id) {     using (var session = new MongoSession<Category>())    {        var category = session.Queryable              .Where(c => c.Id == id)              .FirstOrDefault();         return View("Save",category);    } }// GET: /Category/Create[HttpGet]public ActionResult Create(){    var category = new Category();    return View("Save", category);}//insert or update a category[HttpPost]public ActionResult Save(Category category){    if (!ModelState.IsValid)    {        return View("Save", category);    }    using (var session = new MongoSession<Category>())    {        session.Save(category);        return RedirectToAction("Index");    } }//Delete category[HttpPost]public ActionResult Delete(ObjectId Id){    using (var session = new MongoSession<Category>())    {        var category = session.Queryable              .Where(c => c.Id == Id)              .FirstOrDefault();        session.Delete(category);        var categories = session.Queryable.AsEnumerable<Category>();        return PartialView("CategoryList", categories);    } }        }  You can easily work on MongoDB with NoRM and can use with ASP.NET MVC applications. I have created a repository on CodePlex at http://mongomvc.codeplex.com and you can download the source code of the ASP.NET MVC application from here

    Read the article

  • Of transactions and Mongo

    - by Nuri Halperin
    Originally posted on: http://geekswithblogs.net/nuri/archive/2014/05/20/of-transactions-and-mongo-again.aspxWhat's the first thing you hear about NoSQL databases? That they lose your data? That there's no transactions? No joins? No hope for "real" applications? Well, you *should* be wondering whether a certain of database is the right one for your job. But if you do so, you should be wondering that about "traditional" databases as well! In the spirit of exploration let's take a look at a common challenge: You are a bank. You have customers with accounts. Customer A wants to pay B. You want to allow that only if A can cover the amount being transferred. Let's looks at the problem without any context of any database engine in mind. What would you do? How would you ensure that the amount transfer is done "properly"? Would you prevent a "transaction" from taking place unless A can cover the amount? There are several options: Prevent any change to A's account while the transfer is taking place. That boils down to locking. Apply the change, and allow A's balance to go below zero. Charge person A some interest on the negative balance. Not friendly, but certainly a choice. Don't do either. Options 1 and 2 are difficult to attain in the NoSQL world. Mongo won't save you headaches here either. Option 3 looks a bit harsh. But here's where this can go: ledger. See, and account doesn't need to be represented by a single row in a table of all accounts with only the current balance on it. More often than not, accounting systems use ledgers. And entries in ledgers - as it turns out – don't actually get updated. Once a ledger entry is written, it is not removed or altered. A transaction is represented by an entry in the ledger stating and amount withdrawn from A's account and an entry in the ledger stating an addition of said amount to B's account. For sake of space-saving, that entry in the ledger can happen using one entry. Think {Timestamp, FromAccountId, ToAccountId, Amount}. The implication of the original question – "how do you enforce non-negative balance rule" then boils down to: Insert entry in ledger Run validation of recent entries Insert reverse entry to roll back transaction if validation failed. What is validation? Sum up the transactions that A's account has (all deposits and debits), and ensure the balance is positive. For sake of efficiency, one can roll up transactions and "close the book" on transactions with a pseudo entry stating balance as of midnight or something. This lets you avoid doing math on the fly on too many transactions. You simply run from the latest "approved balance" marker to date. But that's an optimization, and premature optimizations are the root of (some? most?) evil.. Back to some nagging questions though: "But mongo is only eventually consistent!" Well, yes, kind of. It's not actually true that Mongo has not transactions. It would be more descriptive to say that Mongo's transaction scope is a single document in a single collection. A write to a Mongo document happens completely or not at all. So although it is true that you can't update more than one documents "at the same time" under a "transaction" umbrella as an atomic update, it is NOT true that there' is no isolation. So a competition between two concurrent updates is completely coherent and the writes will be serialized. They will not scribble on the same document at the same time. In our case - in choosing a ledger approach - we're not even trying to "update" a document, we're simply adding a document to a collection. So there goes the "no transaction" issue. Now let's turn our attention to consistency. What you should know about mongo is that at any given moment, only on member of a replica set is writable. This means that the writable instance in a set of replicated instances always has "the truth". There could be a replication lag such that a reader going to one of the replicas still sees "old" state of a collection or document. But in our ledger case, things fall nicely into place: Run your validation against the writable instance. It is guaranteed to have a ledger either with (after) or without (before) the ledger entry got written. No funky states. Again, the ledger writing *adds* a document, so there's no inconsistent document state to be had either way. Next, we might worry about data loss. Here, mongo offers several write-concerns. Write-concern in Mongo is a mode that marshals how uptight you want the db engine to be about actually persisting a document write to disk before it reports to the application that it is "done". The most volatile, is to say you don't care. In that case, mongo would just accept your write command and say back "thanks" with no guarantee of persistence. If the server loses power at the wrong moment, it may have said "ok" but actually no written the data to disk. That's kind of bad. Don't do that with data you care about. It may be good for votes on a pole regarding how cute a furry animal is, but not so good for business. There are several other write-concerns varying from flushing the write to the disk of the writable instance, flushing to disk on several members of the replica set, a majority of the replica set or all of the members of a replica set. The former choice is the quickest, as no network coordination is required besides the main writable instance. The others impose extra network and time cost. Depending on your tolerance for latency and read-lag, you will face a choice of what works for you. It's really important to understand that no data loss occurs once a document is flushed to an instance. The record is on disk at that point. From that point on, backup strategies and disaster recovery are your worry, not loss of power to the writable machine. This scenario is not different from a relational database at that point. Where does this leave us? Oh, yes. Eventual consistency. By now, we ensured that the "source of truth" instance has the correct data, persisted and coherent. But because of lag, the app may have gone to the writable instance, performed the update and then gone to a replica and looked at the ledger there before the transaction replicated. Here are 2 options to deal with this. Similar to write concerns, mongo support read preferences. An app may choose to read only from the writable instance. This is not an awesome choice to make for every ready, because it just burdens the one instance, and doesn't make use of the other read-only servers. But this choice can be made on a query by query basis. So for the app that our person A is using, we can have person A issue the transfer command to B, and then if that same app is going to immediately as "are we there yet?" we'll query that same writable instance. But B and anyone else in the world can just chill and read from the read-only instance. They have no basis to expect that the ledger has just been written to. So as far as they know, the transaction hasn't happened until they see it appear later. We can further relax the demand by creating application UI that reacts to a write command with "thank you, we will post it shortly" instead of "thank you, we just did everything and here's the new balance". This is a very powerful thing. UI design for highly scalable systems can't insist that the all databases be locked just to paint an "all done" on screen. People understand. They were trained by many online businesses already that your placing of an order does not mean that your product is already outside your door waiting (yes, I know, large retailers are working on it... but were' not there yet). The second thing we can do, is add some artificial delay to a transaction's visibility on the ledger. The way that works is simply adding some logic such that the query against the ledger never nets a transaction for customers newer than say 15 minutes and who's validation flag is not set. This buys us time 2 ways: Replication can catch up to all instances by then, and validation rules can run and determine if this transaction should be "negated" with a compensating transaction. In case we do need to "roll back" the transaction, the backend system can place the timestamp of the compensating transaction at the exact same time or 1ms after the original one. Effectively, once A or B visits their ledger, both transactions would be visible and the overall balance "as of now" would reflect no change.  The 2 transactions (attempted/ reverted) would be visible , since we do actually account for the attempt. Hold on a second. There's a hole in the story: what if several transfers from A to some accounts are registered, and 2 independent validators attempt to compute the balance concurrently? Is there a chance that both would conclude non-sufficient-funds even though rolling back transaction 100 would free up enough for transaction 117 (some random later transaction)? Yes. there is that chance. But the integrity of the business rule is not compromised, since the prime rule is don't dispense money you don't have. To minimize or eliminate this scenario, we can also assign a single validation process per origin account. This may seem non-scalable, but it can easily be done as a "sharded" distribution. Say we have 11 validation threads (or processing nodes etc.). We divide the account number space such that each validator is exclusively responsible for a certain range of account numbers. Sounds cunningly similar to Mongo's sharding strategy, doesn't it? Each validator then works in isolation. More capacity needed? Chop the account space into more chunks. So where  are we now with the nagging questions? "No joins": Huh? What are those for? "No transactions": You mean no cross-collection and no cross-document transactions? Granted - but don't always need them either. "No hope for real applications": well... There are more issues and edge cases to slog through, I'm sure. But hopefully this gives you some ideas of how to solve common problems without distributed locking and relational databases. But then again, you can choose relational databases if they suit your problem.

    Read the article

  • OWB 11gR2 &ndash; OLAP and Simba

    - by David Allan
    Oracle Warehouse Builder was the first ETL product to provide a single integrated and complete environment for managing enterprise data warehouse solutions that also incorporate multi-dimensional schemas. The OWB 11gR2 release provides Oracle OLAP 11g deployment for multi-dimensional models (in addition to support for prior releases of OLAP). This means users can easily utilize Simba's MDX Provider for Oracle OLAP (see here for details and cost) which allows you to use the powerful and popular ad hoc query and analysis capabilities of Microsoft Excel PivotTables® and PivotCharts® with your Oracle OLAP business intelligence data. The extensions to the dimensional modeling capabilities have been built on established relational concepts, with the option to seamlessly move from a relational deployment model to a multi-dimensional model at the click of a button. This now means that ETL designers can logically model a complete data warehouse solution using one single tool and control the physical implementation of a logical model at deployment time. As a result data warehouse projects that need to provide a multi-dimensional model as part of the overall solution can be designed and implemented faster and more efficiently. Wizards for dimensions and cubes let you quickly build dimensional models and realize either relationally or as an Oracle database OLAP implementation, both 10g and 11g formats are supported based on a configuration option. The wizard provides a good first cut definition and the objects can be further refined in the editor. Both wizards let you choose the implementation, to deploy to OLAP in the database select MOLAP: multidimensional storage. You will then be asked what levels and attributes are to be defined, by default the wizard creates a level bases hierarchy, parent child hierarchies can be defined in the editor. Once the dimension or cube has been designed there are special mapping operators that make it easy to load data into the objects, below we load a constant value for the total level and the other levels from a source table.   Again when the cube is defined using the wizard we can edit the cube and define a number of analytic calculations by using the 'generate calculated measures' option on the measures panel. This lets you very easily add a lot of rich analytic measures to your cube. For example one of the measures is the percentage difference from a year ago which we can see in detail below. You can also add your own custom calculations to leverage the capabilities of the Oracle OLAP option, either by selecting existing template types such as moving averages to defining true custom expressions. The 11g OLAP option now supports percentage based summarization (the amount of data to precompute and store), this is available from the option 'cost based aggregation' in the cube's configuration. Ensure all measure-dimensions level based aggregation is switched off (on the cube-dimension panel) - previously level based aggregation was the only option. The 11g generated code now uses the new unified API as you see below, to generate the code, OWB needs a valid connection to a real schema, this was not needed before 11gR2 and is a new requirement since the OLAP API which OWB uses is not an offline one. Once all of the objects are deployed and the maps executed then we get to the fun stuff! How can we analyze the data? One option which is powerful and at many users' fingertips is using Microsoft Excel PivotTables® and PivotCharts®, which can be used with your Oracle OLAP business intelligence data by utilizing Simba's MDX Provider for Oracle OLAP (see Simba site for details of cost). I'll leave the exotic reporting illustrations to the experts (see Bud's demonstration here), but with Simba's MDX Provider for Oracle OLAP its very simple to easily access the analytics stored in the database (all built and loaded via the OWB 11gR2 release) and get the regular features of Excel at your fingertips such as using the conditional formatting features for example. That's a very quick run through of the OWB 11gR2 with respect to Oracle 11g OLAP integration and the reporting using Simba's MDX Provider for Oracle OLAP. Not a deep-dive in any way but a quick overview to illustrate the design capabilities and integrations possible.

    Read the article

  • SQL SERVER – Weekend Project – Experimenting with ACID Transactions, SQL Compliant, Elastically Scalable Database

    - by pinaldave
    Database technology is huge and big world. I like to explore always beyond what I know and share the learning. Weekend is the best time when I sit around download random software on my machine which I like to call as a lab machine (it is a pretty old laptop, hardly a quality as lab machine) and experiment it. There are so many free betas available for download that it’s hard to keep track and even harder to find the time to play with very many of them.  This blog is about one you shouldn’t miss if you are interested in the learning various relational databases. NuoDB just released their Beta 7.  I had already downloaded their Beta 6 and yesterday did the same for 7.   My impression is that they are onto something very very interesting.  In fact, it might be something really promising in terms of database elasticity, scale and operational cost reduction. The folks at NuoDB say they are working on the world’s first “emergent” database which they tout as a brand new transitional database that is intended to dramatically change what’s possible with OLTP.  It is SQL compliant, guarantees ACID transactions, yet scales elastically on heterogeneous and decentralized cloud-based resources. Interesting note for sure, making me explore more. Based on what I’ve seen so far, they are solving the architectural challenge that exists between elastic, cloud-based compute infrastructures designed to scale out in response to workload requirements versus the traditional relational database management system’s architecture of central control. Here’s my experience with the NuoDB Beta 6 so far: First they pretty much threw away all the features you’d associate with existing RDBMS architectures except the SQL and ACID transactions which they were smart to keep.  It looks like they have incorporated a number of the big ideas from various algorithms, systems and techniques to achieve maximum DB scalability. From a user’s perspective, the NuoDB Beta software behaves like any other traditional SQL database and seems to offer all the benefits users have come to expect from standards-based SQL solutions. One of the interesting feature is that one can run a transactional node and a storage node on my Windows laptop as well on other platforms – indeed interesting for sure. It’s quite amazing to see a database elastically scale across machine boundaries. So, one of the basic NuoDB concepts is that as you need to scale out, you can easily use more inexpensive hardware when/where you need it.  This is unlike what we have traditionally done to scale a database for an application – we replace the hardware with something more powerful (faster CPU and Disks). This is where I started to feel like NuoDB is on to something that has the potential to elastically scale on commodity hardware while reducing operational expense for a big OLTP database to a degree we’ve never seen before. NuoDB is able to fully leverage the cloud in an asynchronous and highly decentralized manner – while providing both SQL compliance and ACID transactions. Basically what NuoDB is doing is so new that it is all hard to believe until you’ve experienced it in action.  I will keep you up to date as I test the NuoDB Beta 7 but if you are developing a web-scale application or have an on-premise app you are thinking of moving to the cloud, testing this beta is worth your time. If you do try it, let me know what you think.  Before I say anything more, I am going to do more experiments and more test on this product and compare it with other existing similar products. For me it was a weekend worth spent on learning something new. I encourage you to download Beta 7 version and share your opinions here. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Documentation, SQL Download, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Silverlight Cream for March 09, 2011 -- #1057

    - by Dave Campbell
    In this Issue: Dennis Doomen, Peter Kuhn, Michael Crump, Joe McBride, Martin Krüger, Jeremy Likness, Manas Patnaik, Jesse Liberty(-2-), WindowsPhoneGeek(-2-). Above the Fold: Silverlight: "A highlighting AutoCompleteBox in Silverlight" Peter Kuhn WP7: "WP7 WatermarkedTextBox custom control" WindowsPhoneGeek Training: "" Shoutouts: Karl Shifflett announced that he and Josh Smith have heard the developers and released a demo: Mole 2010 Demo Released This is a somewhat older post, but the material is good and I was reminded of it while talking to Josh Smith at the MVP summit last week: Advanced MVVM ... money well-spent From SilverlightCream.com: Introducing the Silverlight Cookbook Dennis Doomen unveils a Codeplex site "containing a Silverlight 4 app that includes most of the complexities you might run into" ... I'm tagging this in my WynApse outlookbar... great stuff, Dennis! A highlighting AutoCompleteBox in Silverlight Peter Kuhn took on a task in response to a forum query and created a highlighting AutoCompleteBox, and is giving it to us... this really looks cool, Peter, and great explanation. Taking a look at the Mindscape Phone Elements for WP7. Michael Crump takes a good look at the Mindscape Phone Elements for WP7... and if you read closely you might still be able to get a free license! Windows Phone – “Can’t connect to your phone. Disconnect it, Restart it, then try connecting again.” Joe McBride explains a way out of an issue that many should be seeing as we repave or replace machines... how to get our device recognized on the updated machine... without giving cryptic messages. How to: only with the full visibility of an application in the browser window start an action Martin Krüger continues his journey in starting storyboards and tackles the condition that the application is completely in the browser window prior to the storyboard starting. A Numeric Input Control for Windows Phone 7 Jeremy Likness came up with a great idea for numeric input for WP7 ... you'll smile when you see it, but what a great idea... and a NumericTextBox to go along with it. Performing CRUD on Relational Data (Multiple table) using RIA in SL4 Manas Patnaik has a post up that breaks the normal blog post or demo mold by having two tables with a relational constraint and doing CRUD operations on them. Plenty of diagrams and good information. Select Many: Reactive Extensions’ Mother Of All Operators [Chaining] Jesse Liberty has part 9 in his Rx series up, and is looking at SelectMany this time, and chaining calls. He's using WPF for the sample, but the goodness is all there for us Silverlight guys too. The Full Stack 8–Adding Search to the Phone Client Jesse Liberty and Jon Galloway have part 8 of their Full Stack series up ... this is the MVC3, ASP.NET, Silverlight, and WP7 app development series... this time out they're putting Search in the Phone client. All about ResourceDictionary in WP7 WindowsPhoneGeek is discussing ResourceDictionaries in this post... beginning with What is a ResourceDictionary and continuing out through creating and using one, plus a good comment on merging. WP7 WatermarkedTextBox custom control In his next post, WindowsPhoneGeek walks us through the creation of a WatermarkedTextBox for WP7 right from the derivation from TextBox... very nice tutorial and lots of code/examples. Stay in the 'Light! Twitter SilverlightNews | Twitter WynApse | WynApse.com | Tagged Posts | SilverlightCream Join me @ SilverlightCream | Phoenix Silverlight User Group Technorati Tags: Silverlight    Silverlight 3    Silverlight 4    Windows Phone MIX10

    Read the article

  • My Take on Hadoop World 2011

    - by Jean-Pierre Dijcks
    I’m sure some of you have read pieces about Hadoop World and I did see some headlines which were somewhat, shall we say, interesting? I thought the keynote by Larry Feinsmith of JP Morgan Chase & Co was one of the highlights of the conference for me. The reason was very simple, he addressed some real use cases outside of internet and ad platforms. The following are my notes, since the keynote was recorded I presume you can go and look at Hadoopworld.com at some point… On the use cases that were mentioned: ETL – how can I do complex data transformation at scale Doing Basel III liquidity analysis Private banking – transaction filtering to feed [relational] data marts Common Data Platform – a place to keep data that is (or will be) valuable some day, to someone, somewhere 360 Degree view of customers – become pro-active and look at events across lines of business. For example make sure the mortgage folks know about direct deposits being stopped into an account and ensure the bank is pro-active to service the customer Treasury and Security – Global Payment Hub [I think this is really consolidation of data to cross reference activity across business and geographies] Data Mining Bypass data engineering [I interpret this as running a lot of a large data set rather than on samples] Fraud prevention – work on event triggers, say a number of failed log-ins to the website. When they occur grab web logs, firewall logs and rules and start to figure out who is trying to log in. Is this me, who forget his password, or is it someone in some other country trying to guess passwords Trade quality analysis – do a batch analysis or all trades done and run them through an analysis or comparison pipeline One of the key requests – if you can say it like that – was for vendors and entrepreneurs to make sure that new tools work with existing tools. JPMC has a large footprint of BI Tools and Big Data reporting and tools should work with those tools, rather than be separate. Security and Entitlement – how to protect data within a large cluster from unwanted snooping was another topic that came up. I thought his Elephant ears graph was interesting (couldn’t actually read the points on it, but the concept certainly made some sense) and it was interesting – when asked to show hands – how the audience did not (!) think that RDBMS and Hadoop technology would overlap completely within a few years. Another interesting session was the session from Disney discussing how Disney is building a DaaS (Data as a Service) platform and how Hadoop processing capabilities are mixed with Database technologies. I thought this one of the best sessions I have seen in a long time. It discussed real use case, where problems existed, how they were solved and how Disney planned some of it. The planning focused on three things/phases: Determine the Strategy – Design a platform and evangelize this within the organization Focus on the people – Hire key people, grow and train the staff (and do not overload what you have with new things on top of their day-to-day job), leverage a partner with experience Work on Execution of the strategy – Implement the platform Hadoop next to the other technologies and work toward the DaaS platform This kind of fitted with some of the Linked-In comments, best summarized in “Think Platform – Think Hadoop”. In other words [my interpretation], step back and engineer a platform (like DaaS in the Disney example), then layer the rest of the solutions on top of this platform. One general observation, I got the impression that we have knowledge gaps left and right. On the one hand are people looking for more information and details on the Hadoop tools and languages. On the other I got the impression that the capabilities of today’s relational databases are underestimated. Mostly in terms of data volumes and parallel processing capabilities or things like commodity hardware scale-out models. All in all I liked this conference, it was great to chat with a wide range of people on Oracle big data, on big data, on use cases and all sorts of other stuff. Just hope they get a set of bigger rooms next time… and yes, I hope I’m going to be back next year!

    Read the article

  • Inappropriate Updates?

    - by Tony Davis
    A recent Simple-talk article by Kathi Kellenberger dissected the fastest SQL solution, submitted by Peter Larsson as part of Phil Factor's SQL Speed Phreak challenge, to the classic "running total" problem. In its analysis of the code, the article re-ignited a heated debate regarding the techniques that should, and should not, be deemed acceptable in your search for fast SQL code. Peter's code for running total calculation uses a variation of a somewhat contentious technique, sometimes referred to as a "quirky update": SET @Subscribers = Subscribers = @Subscribers + PeopleJoined - PeopleLeft This form of the UPDATE statement, @variable = column = expression, is documented and it allows you to set a variable to the value returned by the expression. Microsoft does not guarantee the order in which rows are updated in this technique because, in relational theory, a table doesn’t have a natural order to its rows and the UPDATE statement has no means of specifying the order. Traditionally, in cases where a specific order is requires, such as for running aggregate calculations, programmers who used the technique have relied on the fact that the UPDATE statement, without the WHERE clause, is executed in the order imposed by the clustered index, or in heap order, if there isn’t one. Peter wasn’t satisfied with this, and so used the ingenious device of assuring the order of the UPDATE by the use of an "ordered CTE", based on an underlying temporary staging table (a heap). However, in either case, the ordering is still not guaranteed and, in addition, would be broken under conditions of parallelism, or partitioning. Many argue, with validity, that this reliance on a given order where none can ever be guaranteed is an abuse of basic relational principles, and so is a bad practice; perhaps even irresponsible. More importantly, Microsoft doesn't wish to support the technique and offers no guarantee that it will always work. If you put it into production and it breaks in a later version, you can't file a bug. As such, many believe that the technique should never be tolerated in a production system, under any circumstances. Is this attitude justified? After all, both forms of the technique, using a clustered index to guarantee the order or using an ordered CTE, have been tested rigorously and are proven to be robust; although not guaranteed by Microsoft, the ordering is reliable, provided none of the conditions that are known to break it are violated. In Peter's particular case, the technique is being applied to a temporary table, where the developer has full control of the data ordering, and indexing, and knows that the table will never be subject to parallelism or partitioning. It might be argued that, in such circumstances, the technique is not really "quirky" at all and to ban it from your systems would server no real purpose other than to deprive yourself of a reliable technique that has uses that extend well beyond the running total calculations. Of course, it is doubly important that such a technique, including its unsupported status and the assumptions that underpin its success, is fully and clearly documented, preferably even when posting it online in a competition or forum post. Ultimately, however, this technique has been available to programmers throughout the time Sybase and SQL Server has existed, and so cannot be lightly cast aside, even if one sympathises with Microsoft for the awkwardness of maintaining an archaic way of doing updates. After all, a Table hint could easily be devised that, if specified in the WITH (<Table_Hint_Limited>) clause, could be used to request the database engine to do the update in the conventional order. Then perhaps everyone would be satisfied. Cheers, Tony.

    Read the article

  • Big Data – Data Mining with Hive – What is Hive? – What is HiveQL (HQL)? – Day 15 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the operational database in Big Data Story. In this article we will understand what is Hive and HQL in Big Data Story. Yahoo started working on PIG (we will understand that in the next blog post) for their application deployment on Hadoop. The goal of Yahoo to manage their unstructured data. Similarly Facebook started deploying their warehouse solutions on Hadoop which has resulted in HIVE. The reason for going with HIVE is because the traditional warehousing solutions are getting very expensive. What is HIVE? Hive is a datawarehouseing infrastructure for Hadoop. The primary responsibility is to provide data summarization, query and analysis. It  supports analysis of large datasets stored in Hadoop’s HDFS as well as on the Amazon S3 filesystem. The best part of HIVE is that it supports SQL-Like access to structured data which is known as HiveQL (or HQL) as well as big data analysis with the help of MapReduce. Hive is not built to get a quick response to queries but it it is built for data mining applications. Data mining applications can take from several minutes to several hours to analysis the data and HIVE is primarily used there. HIVE Organization The data are organized in three different formats in HIVE. Tables: They are very similar to RDBMS tables and contains rows and tables. Hive is just layered over the Hadoop File System (HDFS), hence tables are directly mapped to directories of the filesystems. It also supports tables stored in other native file systems. Partitions: Hive tables can have more than one partition. They are mapped to subdirectories and file systems as well. Buckets: In Hive data may be divided into buckets. Buckets are stored as files in partition in the underlying file system. Hive also has metastore which stores all the metadata. It is a relational database containing various information related to Hive Schema (column types, owners, key-value data, statistics etc.). We can use MySQL database over here. What is HiveSQL (HQL)? Hive query language provides the basic SQL like operations. Here are few of the tasks which HQL can do easily. Create and manage tables and partitions Support various Relational, Arithmetic and Logical Operators Evaluate functions Download the contents of a table to a local directory or result of queries to HDFS directory Here is the example of the HQL Query: SELECT upper(name), salesprice FROM sales; SELECT category, count(1) FROM products GROUP BY category; When you look at the above query, you can see they are very similar to SQL like queries. Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Pig. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Big Data – Operational Databases Supporting Big Data – Key-Value Pair Databases and Document Databases – Day 13 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the Relational Database and NoSQL database in the Big Data Story. In this article we will understand the role of Key-Value Pair Databases and Document Databases Supporting Big Data Story. Now we will see a few of the examples of the operational databases. Relational Databases (Yesterday’s post) NoSQL Databases (Yesterday’s post) Key-Value Pair Databases (This post) Document Databases (This post) Columnar Databases (Tomorrow’s post) Graph Databases (Tomorrow’s post) Spatial Databases (Tomorrow’s post) Key Value Pair Databases Key Value Pair Databases are also known as KVP databases. A key is a field name and attribute, an identifier. The content of that field is its value, the data that is being identified and stored. They have a very simple implementation of NoSQL database concepts. They do not have schema hence they are very flexible as well as scalable. The disadvantages of Key Value Pair (KVP) database are that they do not follow ACID (Atomicity, Consistency, Isolation, Durability) properties. Additionally, it will require data architects to plan for data placement, replication as well as high availability. In KVP databases the data is stored as strings. Here is a simple example of how Key Value Database will look like: Key Value Name Pinal Dave Color Blue Twitter @pinaldave Name Nupur Dave Movie The Hero As the number of users grow in Key Value Pair databases it starts getting difficult to manage the entire database. As there is no specific schema or rules associated with the database, there are chances that database grows exponentially as well. It is very crucial to select the right Key Value Pair Database which offers an additional set of tools to manage the data and provides finer control over various business aspects of the same. Riak Rick is one of the most popular Key Value Database. It is known for its scalability and performance in high volume and velocity database. Additionally, it implements a mechanism for collection key and values which further helps to build manageable system. We will further discuss Riak in future blog posts. Key Value Databases are a good choice for social media, communities, caching layers for connecting other databases. In simpler words, whenever we required flexibility of the data storage keeping scalability in mind – KVP databases are good options to consider. Document Database There are two different kinds of document databases. 1) Full document Content (web pages, word docs etc) and 2) Storing Document Components for storage. The second types of the document database we are talking about over here. They use Javascript Object Notation (JSON) and Binary JSON for the structure of the documents. JSON is very easy to understand language and it is very easy to write for applications. There are two major structures of JSON used for Document Database – 1) Name Value Pairs and 2) Ordered List. MongoDB and CouchDB are two of the most popular Open Source NonRelational Document Database. MongoDB MongoDB databases are called collections. Each collection is build of documents and each document is composed of fields. MongoDB collections can be indexed for optimal performance. MongoDB ecosystem is highly available, supports query services as well as MapReduce. It is often used in high volume content management system. CouchDB CouchDB databases are composed of documents which consists fields and attachments (known as description). It supports ACID properties. The main attraction points of CouchDB are that it will continue to operate even though network connectivity is sketchy. Due to this nature CouchDB prefers local data storage. Document Database is a good choice of the database when users have to generate dynamic reports from elements which are changing very frequently. A good example of document usages is in real time analytics in social networking or content management system. Tomorrow In tomorrow’s blog post we will discuss about various other Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Top 10 Linked Blogs of 2010

    - by Bill Graziano
    Each week I send out a SQL Server newsletter and include links to interesting blog posts.  I’ve linked to over 500 blog posts so far in 2010.  Late last year I started storing those links in a database so I could do a little reporting.  I tend to link to posts related to the OLTP engine.  I also try to link to the individual blogger in the group blogs.  Unfortunately that wasn’t possible for the SQLCAT and CSS blogs.  I also have a real weakness for posts related to PASS. These are the top 10 blogs that I linked to during the year ordered by the number of posts I linked to. Paul Randal – Paul writes extensively on the internals of the relational engine.  Lots of great posts around transactions, transaction log, disaster recovery, corruption, indexes and DBCC.  I also linked to many of his SQL Server myths posts. Glenn Berry – Glenn writes very interesting posts on how hardware affects SQL Server.  I especially like his posts on the various CPU platforms.  These aren’t necessarily topics that I’m searching for but I really enjoy reading them. The SQLCAT Team – This Microsoft team focuses on the largest and most interesting SQL Server installations.  The regularly publish white papers and best practices. SQL Server CSS Team – These are the top engineers from the Microsoft Customer Service and Support group.  These are the folks you finally talk to after your case has been escalated about 20 times.  They write about the interesting problems they find. Brent Ozar – The posts I linked to mostly focused on the relational engine: CPU, NUMA, SSD drives, performance monitoring, etc.  But Brent writes about a real variety of topics including blogging, social networking, speaking, the MCM, SQL Azure and anything else that seems to strike his fancy.  His posts are always well written and though provoking. Jeremiah Peschka – A number of Jeremiah’s posts weren’t about SQL Server.  He’s very active in the “NoSQL” area and I linked to a number of those posts.  I think it’s important for people to know what other technologies are out there. Brad McGehee – Brad writes about being a DBA including maintenance plans, DBA checklists, compression and audit. Thomas LaRock – I linked to a variety of posts from PBM to networking to 24 Hours of PASS to TDE.  Just a real variety of topics.  Tom always writes with an interesting style usually mixing in a movie theme and/or bacon. Aaron Bertrand – Many of my links this year were Denali features.  He also had a great series on bad habits to kick. Michael J. Swart – This last one surprised me.  There are some well known SQL Server bloggers below Michael on this list.  I linked to posts on indexes, hierarchies, transactions and I/O performance and a variety of other engine related posts.  All are interesting and well thought out.  Many of his non-SQL posts are also very good.  He seems to have an interest in puzzles and other brain teasers.  Michael, I won’t be surprised again!

    Read the article

  • SQL SERVER – Introduction to Big Data – Guest Post

    - by pinaldave
    BIG Data – such a big word – everybody talks about this now a days. It is the word in the database world. In one of the conversation I asked my friend Jasjeet Sigh the same question – what is Big Data? He instantly came up with a very effective write-up.  Jasjeet is working as a Technical Manager with Koenig Solutions. He leads the SQL domain, and holds rich IT industry experience. Talking about Koenig, it is a 19 year old IT training company that offers several certification choices. Some of its courses include SharePoint Training, Project Management certifications, Microsoft Trainings, Business Intelligence programs, Web Design and Development courses etc. Big Data, as the name suggests, is about data that is BIG in nature. The data is BIG in terms of size, and it is difficult to manage such enormous data with relational database management systems that are quite popular these days. Big Data is not just about being large in size, it is also about the variety of the data that differs in form or type. Some examples of Big Data are given below : Scientific data related to weather and atmosphere, Genetics etc Data collected by various medical procedures, such as Radiology, CT scan, MRI etc Data related to Global Positioning System Pictures and Videos Radio Frequency Data Data that may vary very rapidly like stock exchange information Apart from difficulties in managing and storing such data, it is difficult to query, analyze and visualize it. The characteristics of Big Data can be defined by four Vs: Volume: It simply means a large volume of data that may span Petabyte, Exabyte and so on. However it also depends organization to organization that what volume of data they consider as Big Data. Variety: As discussed above, Big Data is not limited to relational information or structured Data. It can also include unstructured data like pictures, videos, text, audio etc. Velocity:  Velocity means the speed by which data changes. The higher is the velocity, the more efficient should be the system to capture and analyze the data. Missing any important point may lead to wrong analysis or may even result in loss. Veracity: It has been recently added as the fourth V, and generally means truthfulness or adherence to the truth. In terms of Big Data, it is more of a challenge than a characteristic. It is difficult to ascertain the truth out of the enormous amount of data and the one that has high velocity. There are always chances of having un-precise and uncertain data. It is a challenging task to clean such data before it is analyzed. Big Data can be considered as the next big thing in the IT sector in terms of innovation and development. If appropriate technologies are developed to analyze and use the information, it can be the driving force for almost all industrial segments. These include Retail, Manufacturing, Service, Finance, Healthcare etc. This will help them to automate business decisions, increase productivity, and innovate and develop new products. Thanks Jasjeet Singh for an excellent write up.  Jasjeet Sign is working as a Technical Manager with Koenig Solutions. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Database, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Big Data

    Read the article

  • Big Data – Operational Databases Supporting Big Data – Columnar, Graph and Spatial Database – Day 14 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the Key-Value Pair Databases and Document Databases in the Big Data Story. In this article we will understand the role of Columnar, Graph and Spatial Database supporting Big Data Story. Now we will see a few of the examples of the operational databases. Relational Databases (The day before yesterday’s post) NoSQL Databases (The day before yesterday’s post) Key-Value Pair Databases (Yesterday’s post) Document Databases (Yesterday’s post) Columnar Databases (Tomorrow’s post) Graph Databases (Today’s post) Spatial Databases (Today’s post) Columnar Databases  Relational Database is a row store database or a row oriented database. Columnar databases are column oriented or column store databases. As we discussed earlier in Big Data we have different kinds of data and we need to store different kinds of data in the database. When we have columnar database it is very easy to do so as we can just add a new column to the columnar database. HBase is one of the most popular columnar databases. It uses Hadoop file system and MapReduce for its core data storage. However, remember this is not a good solution for every application. This is particularly good for the database where there is high volume incremental data is gathered and processed. Graph Databases For a highly interconnected data it is suitable to use Graph Database. This database has node relationship structure. Nodes and relationships contain a Key Value Pair where data is stored. The major advantage of this database is that it supports faster navigation among various relationships. For example, Facebook uses a graph database to list and demonstrate various relationships between users. Neo4J is one of the most popular open source graph database. One of the major dis-advantage of the Graph Database is that it is not possible to self-reference (self joins in the RDBMS terms) and there might be real world scenarios where this might be required and graph database does not support it. Spatial Databases  We all use Foursquare, Google+ as well Facebook Check-ins for location aware check-ins. All the location aware applications figure out the position of the phone with the help of Global Positioning System (GPS). Think about it, so many different users at different location in the world and checking-in all together. Additionally, the applications now feature reach and users are demanding more and more information from them, for example like movies, coffee shop or places see. They are all running with the help of Spatial Databases. Spatial data are standardize by the Open Geospatial Consortium known as OGC. Spatial data helps answering many interesting questions like “Distance between two locations, area of interesting places etc.” When we think of it, it is very clear that handing spatial data and returning meaningful result is one big task when there are millions of users moving dynamically from one place to another place & requesting various spatial information. PostGIS/OpenGIS suite is very popular spatial database. It runs as a layer implementation on the RDBMS PostgreSQL. This makes it totally unique as it offers best from both the worlds. Courtesy: mushroom network Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Hive. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • SQL SERVER – Last Two Days to Get FREE Book – Joes 2 Pros Certification 70-433

    - by pinaldave
    Earlier this week we announced that we will be giving away FREE SQL Wait Stats book to everybody who will get SQL Server Joes 2 Pros Combo Kit. We had a fantastic response to the contest. We got an overwhelming response to the offer. We knew there would be a great response but we want to honestly say thank you to all of you for making it happen. Rick and I want to make sure that we express our special thanks to all of you who are reading our books. The offer is still on and there are two more days to avail this offer. We want to make sure that everybody who buys our most selling combo kits, we will send our other most popular SQL Wait Stats book. Please read all the details of the offer here. The books are great resources for anyone who wants to learn SQL Server from fundamentals and eventually go on the certification path of 70-433. Exam 70-433 contains following important subject and the book covers the subject of fundamental. If you are taking the exam or not taking the exam – this book is for every SQL Developer to learn the subject from fundamentals.  Create and alter tables. Create and alter views. Create and alter indexes. Create and modify constraints. Implement data types. Implement partitioning solutions. Create and alter stored procedures. Create and alter user-defined functions (UDFs). Create and alter DML triggers. Create and alter DDL triggers. Create and deploy CLR-based objects. Implement error handling. Manage transactions. Query data by using SELECT statements. Modify data by using INSERT, UPDATE, and DELETE statements. Return data by using the OUTPUT clause. Modify data by using MERGE statements. Implement aggregate queries. Combine datasets. INTERSECT, EXCEPT Implement subqueries. Implement CTE (common table expression) queries. Apply ranking functions. Control execution plans. Manage international considerations. Integrate Database Mail. Implement full-text search. Implement scripts by using Windows PowerShell and SQL Server Management Objects (SMOs). Implement Service Broker solutions. Track data changes. Data capture Retrieve relational data as XML. Transform XML data into relational data. Manage XML data. Capture execution plans. Collect output from the Database Engine Tuning Advisor. Collect information from system metadata. Availability of Book USA - Amazon | India - Flipkart | Indiaplaza Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Joes 2 Pros, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • We have our standards, and we need them

    - by Tony Davis
    The presenter suddenly broke off. He was midway through his section on how to apply to the relational database the Continuous Delivery techniques that allowed for rapid-fire rounds of development and refactoring, while always retaining a “production-ready” state. He sighed deeply and then launched into an astonishing diatribe against Database Administrators, much of his frustration directed toward Oracle DBAs, in particular. In broad strokes, he painted the picture of a brave new deployment philosophy being frustratingly shackled by the relational database, and by especially by the attitudes of the guardians of these databases. DBAs, he said, shunned change and “still favored tools I’d have been embarrassed to use in the ’80′s“. DBAs, Oracle DBAs especially, were more attached to their vendor than to their employer, since the former was the primary source of their career longevity and spectacular remuneration. He contended that someone could produce the best IDE or tool in the world for Oracle DBAs and yet none of them would give a stuff, unless it happened to come from the “mother ship”. I sat blinking in astonishment at the speaker’s vehemence, and glanced around nervously. Nobody in the audience disagreed, and a few nodded in assent. Although the primary target of the outburst was the Oracle DBA, it made me wonder. Are we who work with SQL Server, database professionals or merely SQL Server fanbois? Do DBAs, in general, have an image problem? Is it a good career-move to be seen to be holding onto a particular product by the whites of our knuckles, to the exclusion of all else? If we seek a broad, open-minded, knowledge of our chosen technology, the database, and are blessed with merely mortal powers of learning, then we like standards. Vendors of RDBMSs generally don’t conform to standards by instinct, but by customer demand. Microsoft has made great strides to adopt the international SQL Standards, where possible, thanks to considerable lobbying by the community. The implementation of Window functions is a great example. There is still work to do, though. SQL Server, for example, has an unusable version of the Information Schema. One cast-iron rule of any RDBMS is that we must be able to query the metadata using the same language that we use to query the data, i.e. SQL, and we do this by running queries against the INFORMATION_SCHEMA views. Developers who’ve attempted to apply a standard query that works on MySQL, or some other database, but doesn’t produce the expected results on SQL Server are advised to shun the Standards-based approach in favor of the vendor-specific one, using the catalog views. The argument behind this is sound and well-documented, and of course we all use those catalog views, out of necessity. And yet, as database professionals, committed to supporting the best databases for the business, whatever they are now and in the future, surely our heart should sink somewhat when we advocate a vendor specific approach, to a developer struggling with something as simple as writing a guard clause. And when we read messages on the Microsoft documentation informing us that we shouldn’t rely on INFORMATION_SCHEMA to identify reliably the schema of an object, in SQL Server!

    Read the article

  • "cloud architecture" concepts in a system architecture diagrams

    - by markus
    If you design a distributed application for easy scale-out, or you just want to make use of any of the new “cloud computing” offerings by Amazon, Google or Microsoft, there are some typical concepts or components you usually end up using: distributed blob storage (aka S3) asynchronous, durable message queues (aka SQS) non-Relational-/non-transactional databases (like SimpleDB, Google BigTable, Azure SQL Services) distributed background worker pool load-balanced, edge-service processes handling user requests (often virtualized) distributed caches (like memcached) CDN (content delivery network like Akamai) Now when it comes to design and sketch an architecture that makes use of such patterns, are there any commonly used symbols I could use? Or even a download with some cool Visio stencils? :) It doesn’t have to be a formal system like UML but I think it would be great if there were symbols that everyone knows and understands, like we have commonly used shapes for databases or a documents, for example. I think it would be important to not mix it up with traditional concepts like a normal file system (local or network server/SAN), or a relational database. Simply speaking, I want to be able to draw some conclusions about an application’s scalability or data consistency issues by just looking at the system architecture overview diagram. Update: Thank you very much for your answers. I like the idea of putting a small "cloud symbol" on the traditional symbols. However I leave this thread open just in case someone will find specific symbols (maybe in a book or so) - or uploaded some pimped up Visio stencils ;)

    Read the article

  • What scalability problems have you solved using a NoSQL data store?

    - by knorv
    NoSQL refers to non-relational data stores that break with the history of relational databases and ACID guarantees. Popular open source NoSQL data stores include: Cassandra (tabular, written in Java, used by Facebook, Twitter, Digg, Rackspace, Mahalo and Reddit) CouchDB (document, written in Erlang, used by Engine Yard and BBC) Dynomite (key-value, written in C++, used by Powerset) HBase (key-value, written in Java, used by Bing) Hypertable (tabular, written in C++, used by Baidu) Kai (key-value, written in Erlang) MemcacheDB (key-value, written in C, used by Reddit) MongoDB (document, written in C++, used by Sourceforge, Github, Electronic Arts and NY Times) Neo4j (graph, written in Java, used by Swedish Universities) Project Voldemort (key-value, written in Java, used by LinkedIn) Redis (key-value, written in C, used by Engine Yard, Github and Craigslist) Riak (key-value, written in Erlang, used by Comcast and Mochi Media) Ringo (key-value, written in Erlang, used by Nokia) Scalaris (key-value, written in Erlang, used by OnScale) ThruDB (document, written in C++, used by JunkDepot.com) Tokyo Cabinet/Tokyo Tyrant (key-value, written in C, used by Mixi.jp (Japanese social networking site)) I'd like to know about specific problems you - the SO reader - have solved using data stores and what NoSQL data store you used. Questions: What scalability problems have you used NoSQL data stores to solve? What NoSQL data store did you use? What database did you use before switching to a NoSQL data store? I'm looking for first-hand experiences, so please do not answer unless you have that.

    Read the article

  • Pros & Cons of Google App Engine

    - by Rishi
    Pros & Cons of Google App Engine [An Updated List 21st Aug 09] Help me Compile a List of all the Advantages & Disadvantages of Building an Application on the Google App Engine Pros: 1) No Need to buy Servers or Server Space (no maintenance). 2) Makes solving the problem of scaling much easier. Cons: 1) Locked into Google App Engine ?? 2)Developers have read-only access to the filesystem on App Engine. 3)App Engine can only execute code called from an HTTP request (except for scheduled background tasks). 4)Users may upload arbitrary Python modules, but only if they are pure-Python; C and Pyrex modules are not supported. 5)App Engine limits the maximum rows returned from an entity get to 1000 rows per Datastore call. 6)Java applications may only use a subset (The JRE Class White List) of the classes from the JRE standard edition. 7)Java applications cannot create new threads. Known Issues!! http://code.google.com/p/googleappengine/issues/list Hard limits Apps per developer - 10 Time per request - 30 sec Files per app - 3,000 HTTP response size - 10 MB Datastore item size - 1 MB Application code size - 150 MB Pro or Con? App Engine's infrastructure removes many of the system administration and development challenges of building applications to scale to millions of hits. Google handles deploying code to a cluster, monitoring, failover, and launching application instances as necessary. While other services let users install and configure nearly any *NIX compatible software, App Engine requires developers to use Python or Java as the programming language and a limited set of APIs. Current APIs allow storing and retrieving data from a BigTable non-relational database; making HTTP requests; sending e-mail; manipulating images; and caching. Most existing Web applications can't run on App Engine without modification, because they require a relational database.

    Read the article

  • Why put a DAO layer over a persistence layer (like JDO or Hibernate)

    - by Todd Owen
    Data Access Objects (DAOs) are a common design pattern, and recommended by Sun. But the earliest examples of Java DAOs interacted directly with relational databases -- they were, in essence, doing object-relational mapping (ORM). Nowadays, I see DAOs on top of mature ORM frameworks like JDO and Hibernate, and I wonder if that is really a good idea. I am developing a web service using JDO as the persistence layer, and am considering whether or not to introduce DAOs. I foresee a problem when dealing with a particular class which contains a map of other objects: public class Book { // Book description in various languages, indexed by ISO language codes private Map<String,BookDescription> descriptions; } JDO is clever enough to map this to a foreign key constraint between the "BOOKS" and "BOOKDESCRIPTIONS" tables. It transparently loads the BookDescription objects (using lazy loading, I believe), and persists them when the Book object is persisted. If I was to introduce a "data access layer" and write a class like BookDao, and encapsulate all the JDO code within this, then wouldn't this JDO's transparent loading of the child objects be circumventing the data access layer? For consistency, shouldn't all the BookDescription objects be loaded and persisted via some BookDescriptionDao object (or BookDao.loadDescription method)? Yet refactoring in that way would make manipulating the model needlessly complicated. So my question is, what's wrong with calling JDO (or Hibernate, or whatever ORM you fancy) directly in the business layer? Its syntax is already quite concise, and it is datastore-agnostic. What is the advantage, if any, of encapsulating it in Data Access Objects?

    Read the article

  • Is there an XML XQuery interface to existing XML files?

    - by xsaero00
    My company is in education industry and we use XML to store course content. We also store some course related information (mostly metainfo) in relational database. Right now we are in the process of switching from our proprietary XML Schema to DocBook 5. Along with the switch we want to move course related information from database to XML files. The reason for this is to have all course data in one place and to put it under Subversion. However, we would like to keep flexibility of the relational database and be able to easily extract specific information about a course from an XML document. XQuery seems to be up to the task so I was researching databases that supports it but so far could not find what I needed. What I basically want, is to have my XML files in a certain directory structure and then on top of this I would like to have a system that would index my files and let me select anything out of any file using XQuery. This way I can have "my cake and eat it too": I will have XQuery interface and still keep my files in plain text and versioned. Is there anything out there at least remotely resembling to what I want? If you think that what I an asking for is nonsense please make an alternative suggestion. On the related note: What XML Databases (preferably native and open source) do you have experience with and what would you recommend?

    Read the article

  • Simple Database normalization question...

    - by user365531
    Hi all, I have a quick question regarding a database that I am designing and making sure it is normalized... I have a customer table, with a primary key of customerId. It has a StatusCode column that has a code which reflects the customers account status ie. 1 = Open, 2 = Closed, 3 = Suspended etc... Now I would like to have another field in the customer table that flags whether the account is allowed to be suspended or not... certain customers will be automatically suspended if they break there trading terms... others not... so the relevant table fields will be as so: Customers (CustomerId(PK):StatusCode:IsSuspensionAllowed) Now both fields are dependent on the primary key as you can not determine the status or whether suspensions are allowed on a particular customer unless you know the specific customer, except of course when the IsSuspensionAllowed field is set to YES, the the customer should never have a StatusCode of 3 (Suspended). It seems from the above table design it is possible for this to happen unless a check contraint is added to my table. I can't see how another table could be added to the relational design to enforce this though as it's only in the case where IsSuspensionAllowed is set to YES and StatusCode is set to 3 when the two have a dependence on each other. So after my long winded explanation my question is this: Is this a normalization problem and I'm not seeing a relational design that will enforce this... or is it actually just a business rule that should be enforced with a check contraint and the table is in fact still normalized. Cheers, Steve

    Read the article

  • What database systems should an startup company consider?

    - by Am
    Right now I'm developing the prototype of a web application that aggregates large number of text entries from a large number of users. This data must be frequently displayed back and often updated. At the moment I store the content inside a MySQL database and use NHibernate ORM layer to interact with the DB. I've got a table defined for users, roles, submissions, tags, notifications and etc. I like this solution because it works well and my code looks nice and sane, but I'm also worried about how MySQL will perform once the size of our database reaches a significant number. I feel that it may struggle performing join operations fast enough. This has made me think about non-relational database system such as MongoDB, CouchDB, Cassandra or Hadoop. Unfortunately I have no experience with either. I've read some good reviews on MongoDB and it looks interesting. I'm happy to spend the time and learn if one turns out to be the way to go. I'd much appreciate any one offering points or issues to consider when going with none relational dbms?

    Read the article

< Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27  | Next Page >