joins - Page 33 - Developer IT

Repeating fields in similar database tables

- by user1738833

I have been tasked with working on a database that I have never seen before and I'm looking at the DB structure. Some of the central and most heavily queried and joined tables look like virtual duplicates of each other. Here's a massively simplified representation of the situation, with business-sensitive information changed, listing hypothetical table names and fields: TopLevelGroup: PK_TLGroupId, DisplaysXOnBill, DisplaysYOnBill, IsInvoicedForJ, IsInvoicedForK SubGroup: PK_SubGroupId, FK_ParentTopLevelGroupId, DisplaysXOnBill, DisplaysYOnBill, IsInvoicedForJ, IsInvoicedForK SubSubGroup: PK_SubSUbGroupId, FK_ParentSubGroupId, DisplaysXOnBill, DisplaysYOnBill, IsInvoicedForJ, IsInvoicedForK I haven't listed the types of the fields as I don't think it's particularly important to the situation. In addition, it's worth saying that rather than four repeated fields as in the example above, I'm looking at 86 repeated fields. For the most part, those fields genuinely do represent "facts" about the primary table entity, so it's not automatically wrong for that reason. In addition, the "groups" represented here have a property inheritance relationship. If DisplaysXOnBill is NULL in the SubSubGroup, it takes the value of DisplaysXOnBillfrom it's parent, the SubGroup, and so-on up to the TopLevelGroup. Further, the requirements will never require that the model extends beyond three levels, so there is no need for flexibility in that area. Is there a design smell from several tables which describe very similar entities having almost identical fields? If so, what might be a better design of the example above? I'm using the phrase "design smell" to indicate a possible problem. Of course, in any given situation, a particular design might well be the best solution. I'm looking for a more general answer - wondering what might be wrong with this design and what might be the better design were that the case. Possibly related, but not primary questions: Is this database schema in a reasonably normal form (e.g. to 3NF), insofar as can be told from the information I've provided. I can't see a problem with the requirements of 2NF and 3NF, except in their inheriting the requirements of 1NF. Is 1NF satisfied though? Are repeating groups allowed in different tables? Is there a best-practice method for implementing the inheritance relationship in a database as I require? The method above feels clunky to me because any query on the SubSubGroup necessarily needs to join onto the SubGroup and the TopLevelGroup tables to collect inherited facts, which can make even trivial joins requiring facts from the SubSubGroup table rather long-winded. There are, of course, political considerations to making a relatively large change like this. For the purpose of this question, I'm happy to ignore that fact in the interests of keeping the answers ring-fenced to the technical problem.

Read the article

service broker message process order

- by Blootac

Everywhere I read says that messages handled by the service broker are processed in the order that they arrive, and yet if you create a table, message type, contract, service etc , and on activation have a stored proc that waits for 2 seconds and inserts the msg into a table, set the max queue readers to 5 or 10, and send 20 odd messages I can see in the table that they are inserted out of order even though when I insert them into the queue and look at the contents of the queue I can see that the messages are all in the right order. Is it due to the delay waitfor waiting for the nearest second and each thread having different subsecond times and then fighting for a lock or something? The reason i've got a delay in there is to simulate delays with joins etc Thanks demo code: --create the table and service broker CREATE TABLE test ( id int identity(1,1), contents varchar(100) ) CREATE MESSAGE TYPE test CREATE CONTRACT mycontract ( test sent by initiator ) GO CREATE PROCEDURE dostuff AS BEGIN DECLARE @msg varchar(100); RECEIVE TOP (1) @msg = message_body FROM myQueue IF @msg IS NOT NULL BEGIN WAITFOR DELAY '00:00:02' INSERT INTO test(contents)values(@msg) END END GO ALTER QUEUE myQueue WITH STATUS = ON, ACTIVATION ( STATUS = ON, PROCEDURE_NAME = dostuff, MAX_QUEUE_READERS = 10, EXECUTE AS SELF ) create service senderService on queue myQueue ( mycontract ) create service receiverService on queue myQueue ( mycontract ) GO --********************************************************** --now insert lots of messages to the queue DECLARE @dialog_handle uniqueidentifier BEGIN DIALOG @dialog_handle FROM SERVICE senderService TO SERVICE 'receiverService' ON CONTRACT mycontract; SEND ON CONVERSATION @dialog_handle MESSAGE TYPE test ('<test>1</test>'); BEGIN DIALOG @dialog_handle FROM SERVICE senderService TO SERVICE 'receiverService' ON CONTRACT mycontract; SEND ON CONVERSATION @dialog_handle MESSAGE TYPE test ('<test>2</test>') BEGIN DIALOG @dialog_handle FROM SERVICE senderService TO SERVICE 'receiverService' ON CONTRACT mycontract; SEND ON CONVERSATION @dialog_handle MESSAGE TYPE test ('<test>3</test>') BEGIN DIALOG @dialog_handle FROM SERVICE senderService TO SERVICE 'receiverService' ON CONTRACT mycontract; SEND ON CONVERSATION @dialog_handle MESSAGE TYPE test ('<test>4</test>') BEGIN DIALOG @dialog_handle FROM SERVICE senderService TO SERVICE 'receiverService' ON CONTRACT mycontract; SEND ON CONVERSATION @dialog_handle MESSAGE TYPE test ('<test>5</test>') BEGIN DIALOG @dialog_handle FROM SERVICE senderService TO SERVICE 'receiverService' ON CONTRACT mycontract; SEND ON CONVERSATION @dialog_handle MESSAGE TYPE test ('<test>6</test>') BEGIN DIALOG @dialog_handle FROM SERVICE senderService TO SERVICE 'receiverService' ON CONTRACT mycontract; SEND ON CONVERSATION @dialog_handle MESSAGE TYPE test ('<test>7</test>')

Read the article

Is there limit of "join" or the "where" or length of SQL query ?

- by Chetan sharma

Actually i was trying to get data from elgg database based on multiple joins. It generated very big query with lots of JOIN statements and query never respond back. SELECT distinct e.* from test_entities e JOIN test_metadata m1 on e.guid = m1.entity_guid JOIN test_metastrings ms1 on ms1.id = m1.name_id JOIN test_metastrings mv1 on mv1.id = m1.value_id JOIN test_objects_entity obj on e.guid = obj.guid JOIN test_metadata m2 on e.guid = m2.entity_guid JOIN test_metastrings ms2 on ms2.id = m2.name_id JOIN test_metastrings mv2 on mv2.id = m2.value_id JOIN test_metadata m3 on e.guid = m3.entity_guid JOIN test_metastrings ms3 on ms3.id = m3.name_id JOIN test_metastrings mv3 on mv3.id = m3.value_id JOIN test_metadata m4 on e.guid = m4.entity_guid JOIN test_metastrings ms4 on ms4.id = m4.name_id JOIN test_metastrings mv4 on mv4.id = m4.value_id JOIN test_metadata m5 on e.guid = m5.entity_guid JOIN test_metastrings ms5 on ms5.id = m5.name_id JOIN test_metastrings mv5 on mv5.id = m5.value_id JOIN test_metadata m6 on e.guid = m6.entity_guid JOIN test_metastrings ms6 on ms6.id = m6.name_id JOIN test_metastrings mv6 on mv6.id = m6.value_id where ms1.string='expire_date' and mv1.string <= 1272565800 and ms2.string='homecity' and mv2.string LIKE "%dasf%" and ms3.string='schoolname' and mv3.string LIKE "%asdf%" and ms4.string='award_amount' and mv4.string <= 123 and ms5.string='no_of_awards' and mv5.string <= 7 and ms6.string='avg_rating' and mv6.string <= 2 and e.type = 'object' and e.subtype = 5 and e.site_guid = 1 and (obj.title like '%asdf%') OR (obj.description like '%asdf%') and ( (e.access_id = -2 AND e.owner_guid IN ( SELECT guid_one FROM test_entity_relationships WHERE relationship='friend' AND guid_two=5 )) OR (e.access_id IN (2,1) OR (e.owner_guid = 5) OR ( e.access_id = 0 AND e.owner_guid = 5 ) ) and e.enabled='yes') and ( (m1.access_id = -2 AND m1.owner_guid IN ( SELECT guid_one FROM test_entity_relationships WHERE relationship='friend' AND guid_two=5 )) OR (m1.access_id IN (2,1) OR (m1.owner_guid = 5) OR ( m1.access_id = 0 AND m1.owner_guid = 5 ) ) and m1.enabled='yes') and ( (m2.access_id = -2 AND m2.owner_guid IN ( SELECT guid_one FROM test_entity_relationships WHERE relationship='friend' AND guid_two=5 )) OR (m2.access_id IN (2,1) OR (m2.owner_guid = 5) OR ( m2.access_id = 0 AND m2.owner_guid = 5 ) ) and m2.enabled='yes') and ( (m3.access_id = -2 AND m3.owner_guid IN ( SELECT guid_one FROM test_entity_relationships WHERE relationship='friend' AND guid_two=5 )) OR (m3.access_id IN (2,1) OR (m3.owner_guid = 5) OR ( m3.access_id = 0 AND m3.owner_guid = 5 ) ) and m3.enabled='yes') and ( (m4.access_id = -2 AND m4.owner_guid IN ( SELECT guid_one FROM test_entity_relationships WHERE relationship='friend' AND guid_two=5 )) OR (m4.access_id IN (2,1) OR (m4.owner_guid = 5) OR ( m4.access_id = 0 AND m4.owner_guid = 5 ) ) and m4.enabled='yes') and ( (m5.access_id = -2 AND m5.owner_guid IN ( SELECT guid_one FROM test_entity_relationships WHERE relationship='friend' AND guid_two=5 )) OR (m5.access_id IN (2,1) OR (m5.owner_guid = 5) OR ( m5.access_id = 0 AND m5.owner_guid = 5 ) ) and m5.enabled='yes') and ( (m6.access_id = -2 AND m6.owner_guid IN ( SELECT guid_one FROM test_entity_relationships WHERE relationship='friend' AND guid_two=5 )) OR (m6.access_id IN (2,1) OR (m6.owner_guid = 5) OR ( m6.access_id = 0 AND m6.owner_guid = 5 ) ) and m6.enabled='yes') order by obj.title limit 0, 10 this is the query that i am running.

Read the article

Displaying pic for user through a question's answer

- by bgadoci

Ok, I am trying to display the profile pic of a user. The application I have set up allows users to create questions and answers (I am calling answers 'sites' in the code) the view in which I am trying to do so is in the /views/questions/show.html.erb file. It might also be of note that I am using the Paperclip gem. Here is the set up: Associations Users class User < ActiveRecord::Base has_many :questions, :dependent => :destroy has_many :sites, :dependent => :destroy has_many :notes, :dependent => :destroy has_many :likes, :through => :sites , :dependent => :destroy has_many :pics, :dependent => :destroy has_many :likes, :dependent => :destroy end Questions class Question < ActiveRecord::Base has_many :sites, :dependent => :destroy has_many :notes, :dependent => :destroy has_many :likes, :dependent => :destroy belongs_to :user end Answers (sites) class Site < ActiveRecord::Base belongs_to :question belongs_to :user has_many :notes, :dependent => :destroy has_many :likes, :dependent => :destroy has_attached_file :photo, :styles => { :small => "250x250>" } end Pics class Pic < ActiveRecord::Base has_attached_file :profile_pic, :styles => { :small => "100x100" } belongs_to :user end The /views/questions/show.html.erb is rendering the partial /views/sites/_site.html.erb which is calling the Answer (site) with: <% div_for site do %> <%=h site.description %> <% end %> I have been trying to do things like: <%=image_tag site.user.pic.profile_pic.url(:small) %> <%=image_tag site.user.profile_pic.url(:small) %> etc. But that is obviously wrong. My error directs me to the Questions#show action so I am imagining that I need to define something in there but not sure what. Is is possible to call the pic given the current associations, placement of the call, and if so what Controller additions do I need to make, and what line of code will call the pic? UPDATE: Here is the QuestionsController#show code: def show @question = Question.find(params[:id]) @sites = @question.sites.all(:select => "sites.*, SUM(likes.like) as like_total", :joins => "LEFT JOIN likes AS likes ON likes.site_id = sites.id", :group => "sites.id", :order => "like_total DESC") respond_to do |format| format.html # show.html.erb format.xml { render :xml => @question } end end

Read the article

[N]Hibernate: view-like fetching properties of associated class

- by chiccodoro

(Felt quite helpless in formulating an appropriate title...) In my C# app I display a list of "A" objects, along with some properties of their associated "B" objects and properties of B's associated "C" objects: A.Name B.Name B.SomeValue C.Name Foo Bar 123 HelloWorld Bar Hello 432 World ... To clarify: A has an FK to B, B has an FK to C. (Such as, e.g. BankAccount - Person - Company). I have tried two approaches to load these properties from the database (using NHibernate): A fast approach and a clean approach. My eventual question is how to do a fast & clean approach. Fast approach: Define a view in the database which joins A, B, C and provides all these fields. In the A class, define properties "BName", "BSomeValue", "CName" Define a hibernate mapping between A and the View, whereas the needed B and C properties are mapped with update="false" insert="false" and do actually stem from B and C tables, but Hibernate is not aware of that since it uses the view. This way, the listing only loads one object per "A" record, which is quite fast. If the code tries to access the actual associated property, "A.B", I issue another HQL query to get B, set the property and update the faked BName and BSomeValue properties as well. Clean approach: There is no view. Class A is mapped to table A, B to B, C to C. When loading the list of A, I do a double left-join-fetch to get B and C as well: from A a left join fetch a.B left join fetch a.B.C B.Name, B.SomeValue and C.Name are accessed through the eagerly loaded associations. The disadvantage of this approach is that it gets slower and takes more memory, since it needs to created and map 3 objects per "A" record: An A, B, and C object each. Fast and clean approach: I feel somehow uncomfortable using a database view that hides a join and treat that in NHibernate as if it was a table. So I would like to do something like: Have no views in the database. Declare properties "BName", "BSomeValue", "CName" in class "A". Define the mapping for A such that NHibernate fetches A and these properties together using a join SQL query as a database view would do. The mapping should still allow for defining lazy many-to-one associations for getting A.B.C My questions: Is this possible? Is it [un]artful? Is there a better way?

Read the article

GHC.Generics and Type Families

- by jberryman

This is a question related to my module here, and is simplified a bit. It's also related to this previous question, in which I oversimplified my problem and didn't get the answer I was looking for. I hope this isn't too specific, and please change the title if you can think if a better one. Background My module uses a concurrent chan, split into a read side and write side. I use a special class with an associated type synonym to support polymorphic channel "joins": {-# LANGUAGE TypeFamilies #-} class Sources s where type Joined s newJoinedChan :: IO (s, Messages (Joined s)) -- NOT EXPORTED --output and input sides of channel: data Messages a -- NOT EXPORTED data Mailbox a instance Sources (Mailbox a) where type Joined (Mailbox a) = a newJoinedChan = undefined instance (Sources a, Sources b)=> Sources (a,b) where type Joined (a,b) = (Joined a, Joined b) newJoinedChan = undefined -- and so on for tuples of 3,4,5... The code above allows us to do this kind of thing: example = do (mb , msgsA) <- newJoinedChan ((mb1, mb2), msgsB) <- newJoinedChan --say that: msgsA, msgsB :: Messages (Int,Int) --and: mb :: Mailbox (Int,Int) -- mb1,mb2 :: Mailbox Int We have a recursive action called a Behavior that we can run on the messages we pull out of the "read" end of the channel: newtype Behavior a = Behavior (a -> IO (Behavior a)) runBehaviorOn :: Behavior a -> Messages a -> IO () -- NOT EXPORTED This would allow us to run a Behavior (Int,Int) on either of msgsA or msgsB, where in the second case both Ints in the tuple it receives actually came through separate Mailboxes. This is all tied together for the user in the exposed spawn function spawn :: (Sources s) => Behavior (Joined s) -> IO s ...which calls newJoinedChan and runBehaviorOn, and returns the input Sources. What I'd like to do I'd like users to be able to create a Behavior of arbitrary product type (not just tuples) , so for instance we could run a Behavior (Pair Int Int) on the example Messages above. I'd like to do this with GHC.Generics while still having a polymorphic Sources, but can't manage to make it work. spawn :: (Sources s, Generic (Joined s), Rep (Joined s) ~ ??) => Behavior (Joined s) -> IO s The parts of the above example that are actually exposed in the API are the fst of the newJoinedChan action, and Behaviors, so an acceptable solution can modify one or all of runBehaviorOn or the snd of newJoinedChan. I'll also be extending the API above to support sums (not implemented yet) like Behavior (Either a b) so I hoped GHC.Generics would work for me. Questions Is there a way I can extend the API above to support arbitrary Generic a=> Behavior a? If not using GHC's Generics, are there other ways I can get the API I want with minimal end-user pain (i.e. they just have to add a deriving clause to their type)?

Read the article

SQL Native Client 10 Performance miserable (due to server-side cursors)

- by namezero

we have an application that uses ODBC via CDatabase/CRecordset in MFC (VS2010). We have two backends implemented. MSSQL and MySQL. Now, when we use MSSQL (with the Native Client 10.0), retrieving records with SELECT is dramatically slow via slow links (VPN, for example). The MySQL ODBC driver does not exhibit this nasty behavior. For example: CRecordset r(&m_db); r.Open(CRecordset::snapshot, L"SELECT a.something, b.sthelse FROM TableA AS a LEFT JOIN TableB AS b ON a.ID=b.Ref"); r.MoveFirst(); while(!r.IsEOF()) { // Retrieve CString strData; crs.GetFieldValue(L"a.something", strData); crs.MoveNext(); } Now, with the MySQL driver, everything runs as it should. The query is returned, and everything is lightning fast. However, with the MSSQL Native Client, things slow down, because on every MoveNext(), the driver communicates with the server. I think it is due to server-side cursors, but I didn't find a way to disable them. I have tried using: ::SQLSetConnectAttr(m_db.m_hdbc, SQL_ATTR_ODBC_CURSORS, SQL_CUR_USE_ODBC, SQL_IS_INTEGER); But this didn't help either. There are still long-running exec's to sp_cursorfetch() et al in SQL Profiler. I have also tried a small reference project with SQLAPI and bulk fetch, but that hangs in FetchNext() for a long time, too (even if there is only one record in the resultset). This however only happens on queries with LEFT JOINS, table-valued functions, etc. Note that the query doesn't take that long - executing the same SQL via SQL Studio over the same connection returns in a reasonable time. Question1: Is is possible to somehow get the native client to "cache" all results locally use local cursors in a similar fashion as the MySQL driver seems to do it? Maybe this is the wrong approach altogether, but I'm not sure how else to do this. All we want is to retrieve all data at once from a SELECT, then never talk the server again until the next query. We don't care about recordset updates, deletes, etc or any of that nonsense. We only want to retrieve data. We take that recordset, get all the data, and delete it. Question2: Is there a more efficient way to just retrieve data in MFC with ODBC?

Read the article

SQL Design Question regarding schema and if Name value pair is the best solution

- by Aur

I am having a small problem trying to decide on database schema for a current project. I am by no means a DBA. The application parses through a file based on user input and enters that data in the database. The number of fields that can be parsed is between 1 and 42 at the current moment. The current design of the database is entirely flat with there being 42 columns; some have repeated columns such as address1, address2, address3, etc... This says that I should normalize the data. However, data integrity is not needed at this moment and the way the data is shaped I'm looking at several joins. Not a bad thing but the data is still in a 1 to 1 relationship and I still see a lot of empty fields per row. So my concerns are that this does not allow the database or the application to be very extendable. If they want to add more fields to be parsed (which they do) than I'd need to create another table and add another foreign key to the linking table. The third option is I have a table where the fields are defined and a table for each record. So what I was thinking is to make a table that stores the value and then links to those two tables. The problem is I can picture the size of that table growing large depending on the input size. If someone gives me a file with 300,000 records than 300,000 x 40 = 12 million so I have some reservations. However I think if I get to that point than I should be happy it is being used. This option also allows for more custom displaying of information albeit a bit more work but little rework even if you add more fields. So the problem boils down to: 1. Current design is a flat file which makes extending it hard and it is not normalized. 2. Normalize the tables although no real benefits for the moment but requirements change. 3. Normalize it down into the name value pair and hope size doesn't hurt. There are a large number of inserts, updates, and selects against that table. So performance is a worry but I believe the saying is design now, performance testing later? I'm probably just missing something practical so any comments would be appreciated even if it’s a quick sanity check. Thank you for your time.

Read the article

Cakephp query doesn't render correct data

- by user2915012

I'm totally new in cakephp and fetching problem at the time of query to render data I tried this to find out categories/warehouses table info but failed.. $cart_products = $this->Order->OrdersProduct->find('all', array( 'fields' => array('*'), 'contain' => array('Category'), 'joins' => array( array( 'table' => 'products', 'alias' => 'Product', 'type' => 'LEFT', 'conditions' => array('Product.id = OrdersProduct.product_id') ), array( 'table' => 'orders', 'alias' => 'Order', 'type' => 'LEFT', 'conditions' => array('Order.id = OrdersProduct.order_id') ) ), 'conditions' => array( 'Order.store_id' => $store_id, 'Order.order_status' => 'in_cart' ) )); I need the result something like this... Array ( [0] => Array ( [OrdersProduct] => Array ( [id] => 1 [order_id] => 1 [product_id] => 16 [qty] => 10 [created] => 2013-10-24 08:04:33 [modified] => 2013-10-24 08:04:33 ) [Product] => Array ( [id] => 16 [part] => 56-987xyz [title] => iPhone 5 battery [description] => iPhone 5c description [wholesale_price] => 4 [retail_price] => 8 [purchase_cost] => 2 [sort_order] => [Category] => array( [id] => 1, [name] => Iphone 5 ) [Warehouse] => array( [id] => 1, [name] => Warehouse1 ) [weight] => [created] => 2013-10-22 12:14:57 [modified] => 2013-10-22 12:14:57 ) ) ) How can I find this? Can anybody help me? thanks

Read the article

SQL Cartesian product joining table to itself and inserting into existing table

- by Emma

I am working in phpMyadmin using SQL. I want to take the primary key (EntryID) from TableA and create a cartesian product (if I am using the term correctly) in TableB (empty table already created) for all entries which share the same value for FieldB in TableA, except where TableA.EntryID equals TableA.EntryID So, for example, if the values in TableA were: TableA.EntryID TableA.FieldB 1 23 2 23 3 23 4 25 5 25 6 25 The result in TableB would be: Primary key EntryID1 EntryID2 FieldD (Default or manually entered) 1 1 2 Default value 2 1 3 Default value 3 2 1 Default value 4 2 3 Default value 5 3 1 Default value 6 3 2 Default value 7 4 5 Default value 8 4 6 Default value 9 5 4 Default value 10 5 6 Default value 11 6 4 Default value 12 6 5 Default value I am used to working in Access and this is the first query I have attempted in SQL. I started trying to work out the query and got this far. I know it's not right yet, as I’m still trying to get used to the syntax and pieced this together from various articles I found online. In particular, I wasn’t sure where the INSERT INTO text went (to create what would be an Append Query in Access). SELECT EntryID FROM TableA.EntryID TableA.EntryID WHERE TableA.FieldB=TableA.FieldB TableA.EntryID<>TableA.EntryID INSERT INTO TableB.EntryID1 TableB.EntryID2 After I've got that query right, I need to do a TRIGGER query (I think), so if an entry changes it's value in TableA.FieldB (changing it’s membership of that grouping to another grouping), the cartesian product will be re-run on THAT entry, unless TableB.FieldD = valueA or valueB (manually entered values). I have been using the Designer Tab. Does there have to be a relationship link between TableA and TableB. If so, would it be two links from the EntryID Primary Key in TableA, one to each EntryID in TableB? I assume this would not work because they are numbered EntryID1 and EntryID2 and the name needs to be the same to set up a relationship? If you can offer any suggestions, I would be very grateful. Research: http://www.fluffycat.com/SQL/Cartesian-Joins/ Cartesian Join example two Q: You said you can have a Cartesian join by joining a table to itself. Show that! Select * From Film_Table T1, Film_Table T2;

Read the article

Efficiently fetching and storing tweets from a few hundred twitter profiles?

- by MSpreij

The site I'm working on needs to fetch the tweets from 150-300 people, store them locally, and then list them on the front page. The profiles sit in groups. The pages will be showing the last 20 tweets (or 21-40, etc) by date, group of profiles, single profile, search, or "subject" (which is sort of a different group.. I think..) a live, context-aware tag cloud (based on the last 300 tweets of the current search, group of profiles, or single profile shown) various statistics (group stuffs, most active, etc) which depend on the type of page shown. We're expecting a fair bit of traffic. The last, similar site peaked at nearly 40K visits per day, and ran intro trouble before I started caching pages as static files, and disabling some features (some, accidently..). This was caused mostly by the fact that a page load would also fetch the last x tweets from the 3-6 profiles which had not been updated the longest.. With this new site I can fortunately use cron to fetch tweets, so that helps. I'll also be denormalizing the db a little so it needs less joins, optimize it for faster selects instead of size. Now, main question: how do I figure out which profiles to check for new tweets in an efficient manner? Some people will be tweeting more often than others, some will tweet in bursts (this happens a lot). I want to keep the front page of the site as "current" as possible. If it comes to, say, 300 profiles, and I check 5 every minute, some tweets will only appear an hour after the fact. I can check more often (up to 20K) but want to optimize this as much as possible, both to not hit the rate limit and to not run out of resources on the local server (it hit mysql's connection limit with that other site). Question 2: since cron only "runs" once a minute, I figure I have to check multiple profiles each minute - as stated, at least 5, possibly more. To try and spread it out over that minute I could have it sleep a few seconds between batches or even single profiles. But then if it takes longer than 60 seconds altogether, the script will run into itself. Is this a problem? If so, how can I avoid that? Question 3: any other tips? Readmes? URLs?

Read the article

Need help with 2 MySql Queries. Join vs Subqueries.

- by BugBusterX

I have 2 tables: user: id, name message: sender_id, receiver_id, message, read_at, created_at There are 2 results I need to retrieve and I'm trying to find the best solution. I have included queries that I'm using in the very end. I need to retrieve a list of users, and also with each user have information available whether there are any unread messages from each user (them as sender, me as receiver) and whether or not there are any read messages between us ( they send I'm receiver or I send they are receivers) I need Same as above, but only those members where there has been any messaging between us, sorted by unread first, then by last message received. Can you advise please? Should this be done with joins or subqueries? In first case I do not need the count, I just need to know whether or not there is at least one unread message. I'm posting code and my current queries, please have a look when you get a chance: BTW, everything is the way I want in fist query. My concern is: In second query I would like to order by messages.created_at, but I dont think I can do that with grouping? And also I dont know if this approach is the most optimized and fast. CREATE TABLE `user` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `name` varchar(255) NOT NULL, PRIMARY KEY (`id`) ) INSERT INTO `user` VALUES (1,'User 1'),(2,'User 2'),(3,'User 3'),(4,'User 4'),(5,'User 5'); CREATE TABLE `message` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `sender_id` bigint(20) DEFAULT NULL, `receiver_id` bigint(20) DEFAULT NULL, `message` text, `read_at` datetime DEFAULT NULL, `created_at` datetime NOT NULL, PRIMARY KEY (`id`) ) INSERT INTO `message` VALUES (1,3,1,'Messge',NULL,'2010-10-10 10:10:10'),(2,1,4,'Hey','2010-10-10 10:10:12','2010-10-10 10:10:11'),(3,4,1,'Hello','2010-10-10 10:10:19','2010-10-10 10:10:15'),(4,1,4,'Again','2010-10-10 10:10:25','2010-10-10 10:10:21'),(5,3,1,'Hiii',NULL,'2010-10-10 10:10:21'); SELECT u.*, m_new.id as have_new, m.id as have_any FROM user u LEFT JOIN message m_new ON (u.id = m_new.sender_id AND m_new.receiver_id = 1 AND m_new.read_at IS NULL) LEFT JOIN message m ON ((u.id = m.sender_id AND m.receiver_id = 1) OR (u.id = m.receiver_id AND m.sender_id = 1)) GROUP BY u.id SELECT u.*, m_new.id as have_new, m.id as have_any FROM user u LEFT JOIN message m_new ON (u.id = m_new.sender_id AND m_new.receiver_id = 1 AND m_new.read_at IS NULL) LEFT JOIN message m ON ((u.id = m.sender_id AND m.receiver_id = 1) OR (u.id = m.receiver_id AND m.sender_id = 1)) where m.id IS NOT NULL GROUP BY u.id

Read the article

How to custom query using ORM in Fuelphp?

- by viyancs

I have a problem when I want to query table using ORM ,example I have article table with field id,author,text. My code like this : // Single where $article = Model_Article::find()->where('id', 4); print_r($article); that't code will be fetch all field on table article, it's like select * from article where id = 4 Try Possibility $article = Model_Article::find(null, array('id','title'))->where('id', 3); the response is object(Orm\Query)#89 (14) { ["model":protected]=> string(10) "Model_Article" ["connection":protected]=> NULL ["view":protected]=> NULL ["alias":protected]=> string(2) "t0" ["relations":protected]=> array(0) { } ["joins":protected]=> array(0) { } ["select":protected]=> array(1) { ["t0_c0"]=> string(5) "t0.id" } ["limit":protected]=> NULL ["offset":protected]=> NULL ["rows_limit":protected]=> NULL ["rows_offset":protected]=> NULL ["where":protected]=> array(1) { [0]=> array(2) { [0]=> string(9) "and_where" [1]=> array(3) { [0]=> string(5) "t0.id" [1]=> string(1) "=" [2]=> int(3) } } } ["order_by":protected]=> array(0) { } ["values":protected]=> array(0) { } } that's is not return id or title field. but when i'm try by adding get_one() method $article = Model_Article::find(null, array('id','title'))->where('id', 3)->get_one(); id is return , but title is not and another field, i don't know why ? Reference ORM Discussion FuelPHP it's say ORM currently will be select all column, no plans to change that at the moment. My Goal I want to query in orm like this select id,owner from article where id = 4 it's will be return only id & owner, how i can get that using orm ?

Read the article

Magento - How to select mysql rows by max value?

- by Damodar Bashyal

mysql> SELECT * FROM `log_customer` WHERE `customer_id` = 224 LIMIT 0, 30; +--------+------------+-------------+---------------------+-----------+----------+ | log_id | visitor_id | customer_id | login_at | logout_at | store_id | +--------+------------+-------------+---------------------+-----------+----------+ | 817 | 50139 | 224 | 2011-03-21 23:56:56 | NULL | 1 | | 830 | 52317 | 224 | 2011-03-27 23:43:54 | NULL | 1 | | 1371 | 136549 | 224 | 2011-11-16 04:33:51 | NULL | 1 | | 1495 | 164024 | 224 | 2012-02-08 01:05:48 | NULL | 1 | | 2130 | 281854 | 224 | 2012-11-13 23:44:13 | NULL | 1 | +--------+------------+-------------+---------------------+-----------+----------+ 5 rows in set (0.00 sec) mysql> SELECT * FROM `customer_entity` WHERE `entity_id` = 224; +-----------+----------------+---------------------------+----------+---------------------+---------------------+ | entity_id | entity_type_id | email | group_id | created_at | updated_at | +-----------+----------------+---------------------------+----------+---------------------+---------------------+ | 224 | 1 | [email protected] | 3 | 2011-03-21 04:59:17 | 2012-11-13 23:46:23 | +-----------+----------------+---------------------------+----------+--------------+----------+-----------------+ 1 row in set (0.00 sec) How can i search for customers who hasn't logged in for last 10 months and their account has not been updated for last 10 months. I tried below but failed. $collection = Mage::getModel('customer/customer')->getCollection(); $collection->getSelect()->joinRight(array('l'=>'log_customer'), "customer_id=entity_id AND MAX(l.login_at) <= '" . date('Y-m-d H:i:s', strtotime('10 months ago')) . "'")->group('e.entity_id'); $collection->addAttributeToSelect('*'); $collection->addFieldToFilter('updated_at', array( 'lt' => date('Y-m-d H:i:s', strtotime('10 months ago')), 'datetime'=>true, )); $collection->addAttributeToFilter('group_id', array( 'neq' => 5, )); Above tables have one customer for reference. I have no idea how to use MAX() on joins. Thanks UPDATE: This seems returning correct data, but I would like to do magento way using resource collection, so i don't need to do load customer again on for loop. $read = Mage::getSingleton('core/resource')->getConnection('core_read'); $sql = "select * from ( select e.*,l.login_at from customer_entity as e left join log_customer as l on l.customer_id=e.entity_id group by e.entity_id order by l.login_at desc ) as l where ( l.login_at <= '".date('Y-m-d H:i:s', strtotime('10 months ago'))."' or ( l.created_at <= '".date('Y-m-d H:i:s', strtotime('10 months ago'))."' and l.login_at is NULL ) ) and group_id != 5"; $result = $read->fetchAll($sql); I have loaded full shell script to github https://github.com/dbashyal/Magento-ecommerce-Shell-Scripts/blob/master/shell/suspendCustomers.php

Read the article

4 table query / join. getting duplicate rows

- by Horse

Read the article

Users and roles in context

- by Eric W.

I'm trying to get a sense of how to implement the user/role relationships for an application I'm writing. The persistence layer is Google App Engine's datastore, which places some interesting (but generally beneficial) constraints on what can be done. Any thoughts are appreciated. It might be helpful to keep things very concrete. I would like there to be organizations, users, test content and test administrations (records of tests that have been taken). A user can have the role of participant (test-taker), contributor of test material or both. A user can also be a member of zero or more organizations. In the role of participant, the user can see the previous administrations of tests he or she has taken. The user can also see a test administration of another participant if that participant has given the user authorization. The user can see test material that has been made public, and he or she can see restricted content as a participant during a specific administration of a test for which that user has been authorized by an organization. As a member of an organization, the user can see restricted content in the role of contributor, and he or she might or might not also be able to edit the content. Each organization should have one or more administrators that can determine whether a member can see and edit content and determine who has admin privileges. There should also be one or more application-wide superusers that can troubleshoot and solve problems. Members of organizations can see the administrations of tests that the participants concerned have authorized them to see, and they can see anonymous data if no authorization has been given. A user cannot see the test results of another user in any other circumstances. Since there are no joins in the App Engine datastore, it might be necessary to have things less normalized than usual for the typical SQL database in order to ensure that queries that check permissions are fast (e.g., ones that determine whether a link is to be displayed). My questions are: How do I move forward on this? Should I spend a lot of time up front in order to get the model right, or can I iterate several times and gradually roll in additional complexity? Does anyone have some general ideas about how to break things up in this instance? Are there any GAE libraries that handle roles in a way that is compatible with this arrangement?

Read the article

Apache SSO through Kerberos using Machine Account

- by watkipet

I'm attempting to get Apache on Ubuntu 12.04 to authenticate users via Kerberos SSO to a Windows 2008 Active Directory server. Here are a few things that make my situation different: I don't have administrative access to the Windows Server (nor will I ever have access). I also cannot have any changes to the server made on my behalf. I've joined Ubuntu server to the Active Directory using PBIS open. Users can log into the Ubuntu server using their AD credentials. kinit also works fine for each user. Since I can't change AD (except for adding new machines and SPNs), I cannot add a service account for Apache on Ubuntu. Since I can't add I service account, I have to use the machine keytab (/etc/krb5.keytab), or at least use the machine password in another keytab. Right now I'm using the machine keytab and giving Apache readonly access (bad idea, I know). I've already added the SPN using net ads keytab add HTTP -U Since I'm using Ubuntu 12.04, the only encoding types that get added during "net ads keytab add" are arcfour-hmac, des-cbc-crc, and des-cbc-md5. PBIS adds the AES encoding types to the host and cifs principals when it joins the domain, but I have yet to get "net ads keytab add" to do this. ktpass and setspn are out of the question because of #1 above. I've configured (for Kerberos SSO) and tested both IE 8 Firefox. I'm using the following configuration in my Apache site config: <Location /secured> AuthType Kerberos AuthName "Kerberos Login" KrbMethodNegotiate On KrbMethodK5Passwd On KrbAuthRealms DOMAIN.COM Krb5KeyTab /etc/krb5.keytab KrbLocalUserMapping On require valid-user </Location> When Firefox tries to connect get the following in Apache's error.log (LogLevel debug): [Wed Oct 23 13:48:31 2013] [debug] src/mod_auth_kerb.c(1628): [client 192.168.0.2] kerb_authenticate_user entered with user (NULL) and auth_type Kerberos [Wed Oct 23 13:48:31 2013] [debug] mod_deflate.c(615): [client 192.168.0.2] Zlib: Compressed 477 to 322 : URL /secured [Wed Oct 23 13:48:37 2013] [debug] src/mod_auth_kerb.c(1628): [client 192.168.0.2] kerb_authenticate_user entered with user (NULL) and auth_type Kerberos [Wed Oct 23 13:48:37 2013] [debug] src/mod_auth_kerb.c(994): [client 192.168.0.2] Using HTTP/[email protected] as server principal for password verification [Wed Oct 23 13:48:37 2013] [debug] src/mod_auth_kerb.c(698): [client 192.168.0.2] Trying to get TGT for user [email protected] [Wed Oct 23 13:48:37 2013] [debug] src/mod_auth_kerb.c(609): [client 192.168.0.2] Trying to verify authenticity of KDC using principal HTTP/[email protected] [Wed Oct 23 13:48:37 2013] [debug] src/mod_auth_kerb.c(652): [client 192.168.0.2] krb5_rd_req() failed when verifying KDC [Wed Oct 23 13:48:37 2013] [error] [client 192.168.0.2] failed to verify krb5 credentials: Decrypt integrity check failed [Wed Oct 23 13:48:37 2013] [debug] src/mod_auth_kerb.c(1073): [client 192.168.0.2] kerb_authenticate_user_krb5pwd ret=401 user=(NULL) authtype=(NULL) [Wed Oct 23 13:48:37 2013] [debug] mod_deflate.c(615): [client 192.168.0.2] Zlib: Compressed 477 to 322 : URL /secured When IE 8 tries to connect I get: [Wed Oct 23 14:03:30 2013] [debug] src/mod_auth_kerb.c(1628): [client 192.168.0.2] kerb_authenticate_user entered with user (NULL) and auth_type Kerberos [Wed Oct 23 14:03:30 2013] [debug] mod_deflate.c(615): [client 192.168.0.2] Zlib: Compressed 477 to 322 : URL /secured [Wed Oct 23 14:03:30 2013] [debug] src/mod_auth_kerb.c(1628): [client 192.168.0.2] kerb_authenticate_user entered with user (NULL) and auth_type Kerberos [Wed Oct 23 14:03:30 2013] [debug] src/mod_auth_kerb.c(1240): [client 192.168.0.2] Acquiring creds for HTTP@apache_server [Wed Oct 23 14:03:30 2013] [debug] src/mod_auth_kerb.c(1385): [client 192.168.0.2] Verifying client data using KRB5 GSS-API [Wed Oct 23 14:03:30 2013] [debug] src/mod_auth_kerb.c(1401): [client 192.168.0.2] Client didn't delegate us their credential [Wed Oct 23 14:03:30 2013] [debug] src/mod_auth_kerb.c(1420): [client 192.168.0.2] GSS-API token of length 9 bytes will be sent back [Wed Oct 23 14:03:30 2013] [debug] src/mod_auth_kerb.c(1101): [client 192.168.0.2] GSS-API major_status:000d0000, minor_status:000186a5 [Wed Oct 23 14:03:30 2013] [error] [client 192.168.0.2] gss_accept_sec_context() failed: Unspecified GSS failure. Minor code may provide more information (, ) [Wed Oct 23 14:03:30 2013] [debug] mod_deflate.c(615): [client 192.168.0.2] Zlib: Compressed 477 to 322 : URL /secured Let me know if you'd like additional log and config files--the initial question is getting long enough.

Read the article

Basics of Join Predicate Pushdown in Oracle

- by Maria Colgan

Happy New Year to all of our readers! We hope you all had a great holiday season. We start the new year by continuing our series on Optimizer transformations. This time it is the turn of Predicate Pushdown. I would like to thank Rafi Ahmed for the content of this blog.Normally, a view cannot be joined with an index-based nested loop (i.e., index access) join, since a view, in contrast with a base table, does not have an index defined on it. A view can only be joined with other tables using three methods: hash, nested loop, and sort-merge joins. Introduction The join predicate pushdown (JPPD) transformation allows a view to be joined with index-based nested-loop join method, which may provide a more optimal alternative. In the join predicate pushdown transformation, the view remains a separate query block, but it contains the join predicate, which is pushed down from its containing query block into the view. The view thus becomes correlated and must be evaluated for each row of the outer query block. These pushed-down join predicates, once inside the view, open up new index access paths on the base tables inside the view; this allows the view to be joined with index-based nested-loop join method, thereby enabling the optimizer to select an efficient execution plan. The join predicate pushdown transformation is not always optimal. The join predicate pushed-down view becomes correlated and it must be evaluated for each outer row; if there is a large number of outer rows, the cost of evaluating the view multiple times may make the nested-loop join suboptimal, and therefore joining the view with hash or sort-merge join method may be more efficient. The decision whether to push down join predicates into a view is determined by evaluating the costs of the outer query with and without the join predicate pushdown transformation under Oracle's cost-based query transformation framework. The join predicate pushdown transformation applies to both non-mergeable views and mergeable views and to pre-defined and inline views as well as to views generated internally by the optimizer during various transformations. The following shows the types of views on which join predicate pushdown is currently supported. UNION ALL/UNION view Outer-joined view Anti-joined view Semi-joined view DISTINCT view GROUP-BY view Examples Consider query A, which has an outer-joined view V. The view cannot be merged, as it contains two tables, and the join between these two tables must be performed before the join between the view and the outer table T4. A: SELECT T4.unique1, V.unique3 FROM T_4K T4, (SELECT T10.unique3, T10.hundred, T10.ten FROM T_5K T5, T_10K T10 WHERE T5.unique3 = T10.unique3) VWHERE T4.unique3 = V.hundred(+) AND T4.ten = V.ten(+) AND T4.thousand = 5; The following shows the non-default plan for query A generated by disabling join predicate pushdown. When query A undergoes join predicate pushdown, it yields query B. Note that query B is expressed in a non-standard SQL and shows an internal representation of the query. B: SELECT T4.unique1, V.unique3 FROM T_4K T4, (SELECT T10.unique3, T10.hundred, T10.ten FROM T_5K T5, T_10K T10 WHERE T5.unique3 = T10.unique3 AND T4.unique3 = V.hundred(+) AND T4.ten = V.ten(+)) V WHERE T4.thousand = 5; The execution plan for query B is shown below. In the execution plan BX, note the keyword 'VIEW PUSHED PREDICATE' indicates that the view has undergone the join predicate pushdown transformation. The join predicates (shown here in red) have been moved into the view V; these join predicates open up index access paths thereby enabling index-based nested-loop join of the view. With join predicate pushdown, the cost of query A has come down from 62 to 32. As mentioned earlier, the join predicate pushdown transformation is cost-based, and a join predicate pushed-down plan is selected only when it reduces the overall cost. Consider another example of a query C, which contains a view with the UNION ALL set operator.C: SELECT R.unique1, V.unique3 FROM T_5K R, (SELECT T1.unique3, T2.unique1+T1.unique1 FROM T_5K T1, T_10K T2 WHERE T1.unique1 = T2.unique1 UNION ALL SELECT T1.unique3, T2.unique2 FROM G_4K T1, T_10K T2 WHERE T1.unique1 = T2.unique1) V WHERE R.unique3 = V.unique3 and R.thousand < 1; The execution plan of query C is shown below. In the above, 'VIEW UNION ALL PUSHED PREDICATE' indicates that the UNION ALL view has undergone the join predicate pushdown transformation. As can be seen, here the join predicate has been replicated and pushed inside every branch of the UNION ALL view. The join predicates (shown here in red) open up index access paths thereby enabling index-based nested loop join of the view. Consider query D as an example of join predicate pushdown into a distinct view. We have the following cardinalities of the tables involved in query D: Sales (1,016,271), Customers (50,000), and Costs (787,766). D: SELECT C.cust_last_name, C.cust_city FROM customers C, (SELECT DISTINCT S.cust_id FROM sales S, costs CT WHERE S.prod_id = CT.prod_id and CT.unit_price > 70) V WHERE C.cust_state_province = 'CA' and C.cust_id = V.cust_id; The execution plan of query D is shown below. As shown in XD, when query D undergoes join predicate pushdown transformation, the expensive DISTINCT operator is removed and the join is converted into a semi-join; this is possible, since all the SELECT list items of the view participate in an equi-join with the outer tables. Under similar conditions, when a group-by view undergoes join predicate pushdown transformation, the expensive group-by operator can also be removed. With the join predicate pushdown transformation, the elapsed time of query D came down from 63 seconds to 5 seconds. Since distinct and group-by views are mergeable views, the cost-based transformation framework also compares the cost of merging the view with that of join predicate pushdown in selecting the most optimal execution plan. Summary We have tried to illustrate the basic ideas behind join predicate pushdown on different types of views by showing example queries that are quite simple. Oracle can handle far more complex queries and other types of views not shown here in the examples. Again many thanks to Rafi Ahmed for the content of this blog post.

Read the article

ASP.NET and WIF: Showing custom profile username as User.Identity.Name

- by DigiMortal

I am building ASP.NET MVC application that uses external services to authenticate users. For ASP.NET users are fully authenticated when they are redirected back from external service. In system they are logically authenticated when they have created user profiles. In this posting I will show you how to force ASP.NET MVC controller actions to demand existence of custom user profiles. Using external authentication sources with AppFabric Suppose you want to be user-friendly and you don’t force users to keep in mind another username/password when they visit your site. You can accept logins from different popular sites like Windows Live, Facebook, Yahoo, Google and many more. If user has account in some of these services then he or she can use his or her account to log in to your site. If you have community site then you usually have support for user profiles too. Some of these providers give you some information about users and other don’t. So only thing in common you get from all those providers is some unique ID that identifies user in service uniquely. Image above shows you how new user joins your site. Existing users who already have profile are directed to users homepage after they are authenticated. You can read more about how to solve semi-authorized users problem from my blog posting ASP.NET MVC: Using ProfileRequiredAttribute to restrict access to pages. The other problem is related to usernames that we don’t get from all identity providers. Why is IIdentity.Name sometimes empty? The problem is described more specifically in my blog posting Identifying AppFabric Access Control Service users uniquely. Shortly the problem is that not all providers have claim called http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name. The following diagram illustrates what happens when user got token from AppFabric ACS and was redirected to your site. Now, when user was authenticated using Windows Live ID then we don’t have name claim in token and that’s why User.Identity.Name is empty. Okay, we can force nameidentifier to be used as name (we can do it in web.config file) but we have user profiles and we want username from profile to be shown when username is asked. Modifying name claim Now let’s force IClaimsIdentity to use username from our user profiles. You can read more about my profiles topic from my blog posting ASP.NET MVC: Using ProfileRequiredAttribute to restrict access to pages and you can find some useful extension methods for claims identity from my blog posting Identifying AppFabric Access Control Service users uniquely. Here is what we do to set User.Identity.Name: we will check if user has profile, if user has profile we will check if User.Identity.Name matches the name given by profile, if names does not match then probably identity provider returned some name for user, we will remove name claim and recreate it with correct username, we will add new name claim to claims collection. All this stuff happens in Application_AuthorizeRequest event of our web application. The code is here. protected void Application_AuthorizeRequest() { if (string.IsNullOrEmpty(User.Identity.Name)) { var identity = User.Identity; var profile = identity.GetProfile(); if (profile != null) { if (profile.UserName != identity.Name) { identity.RemoveName(); var claim = new Claim("http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name", profile.UserName); var claimsIdentity = (IClaimsIdentity)identity; claimsIdentity.Claims.Add(claim); } } } } RemoveName extension method is simple – it looks for name claims of IClaimsIdentity claims collection and removes them. public static void RemoveName(this IIdentity identity) { if (identity == null) return; var claimsIndentity = identity as ClaimsIdentity; if (claimsIndentity == null) return; for (var i = claimsIndentity.Claims.Count - 1; i >= 0; i--) { var claim = claimsIndentity.Claims[i]; if (claim.ClaimType == "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name") claimsIndentity.Claims.RemoveAt(i); } } And we are done. Now User.Identity.Name returns the username from user profile and you can use it to show username of current user everywhere in your site. Conclusion Mixing AppFabric Access Control Service and Windows Identity Foundation with custom authorization logic is not impossible but a little bit tricky. This posting finishes my little series about AppFabric ACS and WIF for this time and hopefully you found some useful tricks, tips, hacks and code pieces you can use in your own applications.

Read the article

Migrating SQL Server Databases – The DBA’s Checklist (Part 1)

- by Sadequl Hussain

It is a fact of life: SQL Server databases change homes. They move from one instance to another, from one version to the next, from old servers to new ones. They move around as an organisation’s data grows, applications are enhanced or new versions of the database software are released. If not anything else, servers become old and unreliable and databases eventually need to find a new home. Consider the following scenarios: 1. A new database application is rolled out in a production server from the development or test environment 2. A copy of the production database needs to be installed in a test server for troubleshooting purposes 3. A copy of the development database is regularly refreshed in a test server during the system development life cycle 4. A SQL Server is upgraded to a newer version. This can be an in-place upgrade or a side-by-side migration 5. One or more databases need to be moved between different instances as part of a consolidation strategy. The instances can be running the same or different version of SQL Server 6. A database has to be restored from a backup file provided by a third party application vendor 7. A backup of the database is restored in the same or different instance for disaster recovery 8. A database needs to be migrated within the same instance: a. Files are moved from direct attached storage to storage area network b. The same database is copied under a different name for another application Migrating SQL Server database applications is a complex topic in itself. There are a number of components that can be involved: jobs, DTS or SSIS packages, logins or linked servers are only few pieces of the puzzle. However, in this article we will focus only on the central part of migration: the installation of the database itself. Unless it is an in-place upgrade, typically the database is taken from a source server and installed in a destination instance. Most of the time, a full backup file is used for the rollout. The backup file is either provided to the DBA or the DBA takes the backup and restores it in the target server. Sometimes the database is detached from the source and the files are copied to and attached in the destination. Regardless of the method of copying, moving, refreshing, restoring or upgrading the physical database, there are a number of steps the DBA should follow before and after it has been installed in the destination. It is these post database installation steps we are going to discuss below. Some of these steps apply in almost every scenario described above while some will depend on the type of objects contained within the database. Also, the principles hold regardless of the number of databases involved. Step 1: Make a copy of data and log files when attaching and detaching When detaching and attaching databases, ensure you have made copies of the data and log files if the destination is running a newer version of SQL Server. This is because once attached to a newer version, the database cannot be detached and attached back to an older version. Trying to do so will give you a message like the following: Server: Msg 602, Level 21, State 50, Line 1 Could not find row in sysindexes for database ID 6, object ID 1, index ID 1. Run DBCC CHECKTABLE on sysindexes. Connection Broken If you try to backup the attached database and restore it in the source, it will still fail. Similarly, if you are restoring the database in a newer version, it cannot be backed up or detached and put back in an older version of SQL. Unlike detach and attach method though, you do not lose the backup file or the original database here. When detaching and attaching a database, it is important you keep all the log files available along with the data files. It is possible to attach a database without a log file and SQL Server can be instructed to create a new log file, however this does not work if the database was detached when the primary file group was read-only. You will need all the log files in such cases. Step 2: Change database compatibility level Once the database has been restored or attached to a newer version of SQL Server, change the database compatibility level to reflect the newer version unless there is a compelling reason not to do so. When attaching or restoring from a previous version of SQL, the database retains the older version’s compatibility level. The only time you would want to keep a database with an older compatibility level is when the code within your database is no longer supported by SQL Server. For example, outer joins with *= or the =* operators were still possible in SQL 2000 (with a warning message), but not in SQL 2005 anymore. If your stored procedures or triggers are using this form of join, you would want to keep the database with an older compatibility level. For a list of compatibility issues between older and newer versions of SQL Server databases, refer to the Books Online under the sp_dbcmptlevel topic. Application developers and architects can help you in deciding whether you should change the compatibility level or not. You can always change the compatibility mode from the newest to an older version if necessary. To change the compatibility level, you can either use the database’s property from the SQL Server Management Studio or use the sp_dbcmptlevel stored procedure. Bear in mind that you cannot run the built-in reports for databases from SQL Server Management Studio if you keep the database with an older compatibility level. The following figure shows the error message I received when trying to run the “Disk Usage by Top Tables” report against a database. This database was hosted in a SQL Server 2005 system and still had a compatibility mode 80 (SQL 2000). Continues…

Read the article

Is Berkeley DB a NoSQL solution?

- by Gregory Burd

Berkeley DB is a library. To use it to store data you must link the library into your application. You can use most programming languages to access the API, the calls across these APIs generally mimic the Berkeley DB C-API which makes perfect sense because Berkeley DB is written in C. The inspiration for Berkeley DB was the DBM library, a part of the earliest versions of UNIX written by AT&T's Ken Thompson in 1979. DBM was a simple key/value hashtable-based storage library. In the early 1990s as BSD UNIX was transitioning from version 4.3 to 4.4 and retrofitting commercial code owned by AT&T with unencumbered code, it was the future founders of Sleepycat Software who wrote libdb (aka Berkeley DB) as the replacement for DBM. The problem it addressed was fast, reliable local key/value storage. At that time databases almost always lived on a single node, even the most sophisticated databases only had simple fail-over two node solutions. If you had a lot of data to store you would choose between the few commercial RDBMS solutions or to write your own custom solution. Berkeley DB took the headache out of the custom approach. These basic market forces inspired other DBM implementations. There was the "New DBM" (ndbm) and the "GNU DBM" (GDBM) and a few others, but the theme was the same. Even today TokyoCabinet calls itself "a modern implementation of DBM" mimicking, and improving on, something first created over thirty years ago. In the mid-1990s, DBM was the name for what you needed if you were looking for fast, reliable local storage. Fast forward to today. What's changed? Systems are connected over fast, very reliable networks. Disks are cheep, fast, and capable of storing huge amounts of data. CPUs continued to follow Moore's Law, processing power that filled a room in 1990 now fits in your pocket. PCs, servers, and other computers proliferated both in business and the personal markets. In addition to the new hardware entire markets, social systems, and new modes of interpersonal communication moved onto the web and started evolving rapidly. These changes cause a massive explosion of data and a need to analyze and understand that data. Taken together this resulted in an entirely different landscape for database storage, new solutions were needed. A number of novel solutions stepped up and eventually a category called NoSQL emerged. The new market forces inspired the CAP theorem and the heated debate of BASE vs. ACID. But in essence this was simply the market looking at what to trade off to meet these new demands. These new database systems shared many qualities in common. There were designed to address massive amounts of data, millions of requests per second, and scale out across multiple systems. The first large-scale and successful solution was Dynamo, Amazon's distributed key/value database. Dynamo essentially took the next logical step and added a twist. Dynamo was to be the database of record, it would be distributed, data would be partitioned across many nodes, and it would tolerate failure by avoiding single points of failure. Amazon did this because they recognized that the majority of the dynamic content they provided to customers visiting their web store front didn't require the services of an RDBMS. The queries were simple, key/value look-ups or simple range queries with only a few queries that required more complex joins. They set about to use relational technology only in places where it was the best solution for the task, places like accounting and order fulfillment, but not in the myriad of other situations. The success of Dynamo, and it's design, inspired the next generation of Non-SQL, distributed database solutions including Cassandra, Riak and Voldemort. The problem their designers set out to solve was, "reliability at massive scale" so the first focal point was distributed database algorithms. Underneath Dynamo there is a local transactional database; either Berkeley DB, Berkeley DB Java Edition, MySQL or an in-memory key/value data structure. Dynamo was an evolution of local key/value storage onto networks. Cassandra, Riak, and Voldemort all faced similar design decisions and one, Voldemort, choose Berkeley DB Java Edition for it's node-local storage. Riak at first was entirely in-memory, but has recently added write-once, append-only log-based on-disk storage similar type of storage as Berkeley DB except that it is based on a hash table which must reside entirely in-memory rather than a btree which can live in-memory or on disk. Berkeley DB evolved too, we added high availability (HA) and a replication manager that makes it easy to setup replica groups. Berkeley DB's replication doesn't partitioned the data, every node keeps an entire copy of the database. For consistency, there is a single node where writes are committed first - a master - then those changes are delivered to the replica nodes as log records. Applications can choose to wait until all nodes are consistent, or fire and forget allowing Berkeley DB to eventually become consistent. Berkeley DB's HA scales-out quite well for read-intensive applications and also effectively eliminates the central point of failure by allowing replica nodes to be elected (using a PAXOS algorithm) to mastership if the master should fail. This implementation covers a wide variety of use cases. MemcacheDB is a server that implements the Memcache network protocol but uses Berkeley DB for storage and HA to replicate the cache state across all the nodes in the cache group. Google Accounts, the user authentication layer for all Google properties, was until recently running Berkeley DB HA. That scaled to a globally distributed system. That said, most NoSQL solutions try to partition (shard) data across nodes in the replication group and some allow writes as well as reads at any node, Berkeley DB HA does not. So, is Berkeley DB a "NoSQL" solution? Not really, but it certainly is a component of many of the existing NoSQL solutions out there. Forgetting all the noise about how NoSQL solutions are complex distributed databases when you boil them down to a single node you still have to store the data to some form of stable local storage. DBMs solved that problem a long time ago. NoSQL has more to do with the layers on top of the DBM; the distributed, sometimes-consistent, partitioned, scale-out storage that manage key/value or document sets and generally have some form of simple HTTP/REST-style network API. Does Berkeley DB do that? Not really. Is Berkeley DB a "NoSQL" solution today? Nope, but it's the most robust solution on which to build such a system. Re-inventing the node-local data storage isn't easy. A lot of people are starting to come to appreciate the sophisticated features found in Berkeley DB, even mimic them in some cases. Could Berkeley DB grow into a NoSQL solution? Absolutely. Our key/value API could be extended over the net using any of a number of existing network protocols such as memcache or HTTP/REST. We could adapt our node-local data partitioning out over replicated nodes. We even have a nice query language and cost-based query optimizer in our BDB XML product that we could reuse were we to build out a document-based NoSQL-style product. XML and JSON are not so different that we couldn't adapt one to work with the other interchangeably. Without too much effort we could add what's missing, we could jump into this No SQL market withing a single product development cycle. Why isn't Berkeley DB already a NoSQL solution? Why aren't we working on it? Why indeed...

Read the article

SQL SERVER – Guest Post – Architecting Data Warehouse – Niraj Bhatt

- by pinaldave

Niraj Bhatt works as an Enterprise Architect for a Fortune 500 company and has an innate passion for building / studying software systems. He is a top rated speaker at various technical forums including Tech·Ed, MCT Summit, Developer Summit, and Virtual Tech Days, among others. Having run a successful startup for four years Niraj enjoys working on – IT innovations that can impact an enterprise bottom line, streamlining IT budgets through IT consolidation, architecture and integration of systems, performance tuning, and review of enterprise applications. He has received Microsoft MVP award for ASP.NET, Connected Systems and most recently on Windows Azure. When he is away from his laptop, you will find him taking deep dives in automobiles, pottery, rafting, photography, cooking and financial statements though not necessarily in that order. He is also a manager/speaker at BDOTNET, Asia’s largest .NET user group. Here is the guest post by Niraj Bhatt. As data in your applications grows it’s the database that usually becomes a bottleneck. It’s hard to scale a relational DB and the preferred approach for large scale applications is to create separate databases for writes and reads. These databases are referred as transactional database and reporting database. Though there are tools / techniques which can allow you to create snapshot of your transactional database for reporting purpose, sometimes they don’t quite fit the reporting requirements of an enterprise. These requirements typically are data analytics, effective schema (for an Information worker to self-service herself), historical data, better performance (flat data, no joins) etc. This is where a need for data warehouse or an OLAP system arises. A Key point to remember is a data warehouse is mostly a relational database. It’s built on top of same concepts like Tables, Rows, Columns, Primary keys, Foreign Keys, etc. Before we talk about how data warehouses are typically structured let’s understand key components that can create a data flow between OLTP systems and OLAP systems. There are 3 major areas to it: a) OLTP system should be capable of tracking its changes as all these changes should go back to data warehouse for historical recording. For e.g. if an OLTP transaction moves a customer from silver to gold category, OLTP system needs to ensure that this change is tracked and send to data warehouse for reporting purpose. A report in context could be how many customers divided by geographies moved from sliver to gold category. In data warehouse terminology this process is called Change Data Capture. There are quite a few systems that leverage database triggers to move these changes to corresponding tracking tables. There are also out of box features provided by some databases e.g. SQL Server 2008 offers Change Data Capture and Change Tracking for addressing such requirements. b) After we make the OLTP system capable of tracking its changes we need to provision a batch process that can run periodically and takes these changes from OLTP system and dump them into data warehouse. There are many tools out there that can help you fill this gap – SQL Server Integration Services happens to be one of them. c) So we have an OLTP system that knows how to track its changes, we have jobs that run periodically to move these changes to warehouse. The question though remains is how warehouse will record these changes? This structural change in data warehouse arena is often covered under something called Slowly Changing Dimension (SCD). While we will talk about dimensions in a while, SCD can be applied to pure relational tables too. SCD enables a database structure to capture historical data. This would create multiple records for a given entity in relational database and data warehouses prefer having their own primary key, often known as surrogate key. As I mentioned a data warehouse is just a relational database but industry often attributes a specific schema style to data warehouses. These styles are Star Schema or Snowflake Schema. The motivation behind these styles is to create a flat database structure (as opposed to normalized one), which is easy to understand / use, easy to query and easy to slice / dice. Star schema is a database structure made up of dimensions and facts. Facts are generally the numbers (sales, quantity, etc.) that you want to slice and dice. Fact tables have these numbers and have references (foreign keys) to set of tables that provide context around those facts. E.g. if you have recorded 10,000 USD as sales that number would go in a sales fact table and could have foreign keys attached to it that refers to the sales agent responsible for sale and to time table which contains the dates between which that sale was made. These agent and time tables are called dimensions which provide context to the numbers stored in fact tables. This schema structure of fact being at center surrounded by dimensions is called Star schema. A similar structure with difference of dimension tables being normalized is called a Snowflake schema. This relational structure of facts and dimensions serves as an input for another analysis structure called Cube. Though physically Cube is a special structure supported by commercial databases like SQL Server Analysis Services, logically it’s a multidimensional structure where dimensions define the sides of cube and facts define the content. Facts are often called as Measures inside a cube. Dimensions often tend to form a hierarchy. E.g. Product may be broken into categories and categories in turn to individual items. Category and Items are often referred as Levels and their constituents as Members with their overall structure called as Hierarchy. Measures are rolled up as per dimensional hierarchy. These rolled up measures are called Aggregates. Now this may seem like an overwhelming vocabulary to deal with but don’t worry it will sink in as you start working with Cubes and others. Let’s see few other terms that we would run into while talking about data warehouses. ODS or an Operational Data Store is a frequently misused term. There would be few users in your organization that want to report on most current data and can’t afford to miss a single transaction for their report. Then there is another set of users that typically don’t care how current the data is. Mostly senior level executives who are interesting in trending, mining, forecasting, strategizing, etc. don’t care for that one specific transaction. This is where an ODS can come in handy. ODS can use the same star schema and the OLAP cubes we saw earlier. The only difference is that the data inside an ODS would be short lived, i.e. for few months and ODS would sync with OLTP system every few minutes. Data warehouse can periodically sync with ODS either daily or weekly depending on business drivers. Data marts are another frequently talked about topic in data warehousing. They are subject-specific data warehouse. Data warehouses that try to span over an enterprise are normally too big to scope, build, manage, track, etc. Hence they are often scaled down to something called Data mart that supports a specific segment of business like sales, marketing, or support. Data marts too, are often designed using star schema model discussed earlier. Industry is divided when it comes to use of data marts. Some experts prefer having data marts along with a central data warehouse. Data warehouse here acts as information staging and distribution hub with spokes being data marts connected via data feeds serving summarized data. Others eliminate the need for a centralized data warehouse citing that most users want to report on detailed data. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Best Practices, Business Intelligence, Data Warehousing, Database, Pinal Dave, PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article

Fun with Aggregates

- by Paul White

There are interesting things to be learned from even the simplest queries. For example, imagine you are given the task of writing a query to list AdventureWorks product names where the product has at least one entry in the transaction history table, but fewer than ten. One possible query to meet that specification is: SELECT p.Name FROM Production.Product AS p JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID GROUP BY p.ProductID, p.Name HAVING COUNT_BIG(*) < 10; That query correctly returns 23 rows (execution plan and data sample shown below): The execution plan looks a bit different from the written form of the query: the base tables are accessed in reverse order, and the aggregation is performed before the join. The general idea is to read all rows from the history table, compute the count of rows grouped by ProductID, merge join the results to the Product table on ProductID, and finally filter to only return rows where the count is less than ten. This ‘fully-optimized’ plan has an estimated cost of around 0.33 units. The reason for the quote marks there is that this plan is not quite as optimal as it could be – surely it would make sense to push the Filter down past the join too? To answer that, let’s look at some other ways to formulate this query. This being SQL, there are any number of ways to write logically-equivalent query specifications, so we’ll just look at a couple of interesting ones. The first query is an attempt to reverse-engineer T-SQL from the optimized query plan shown above. It joins the result of pre-aggregating the history table to the Product table before filtering: SELECT p.Name FROM ( SELECT th.ProductID, cnt = COUNT_BIG(*) FROM Production.TransactionHistory AS th GROUP BY th.ProductID ) AS q1 JOIN Production.Product AS p ON p.ProductID = q1.ProductID WHERE q1.cnt < 10; Perhaps a little surprisingly, we get a slightly different execution plan: The results are the same (23 rows) but this time the Filter is pushed below the join! The optimizer chooses nested loops for the join, because the cardinality estimate for rows passing the Filter is a bit low (estimate 1 versus 23 actual), though you can force a merge join with a hint and the Filter still appears below the join. In yet another variation, the < 10 predicate can be ‘manually pushed’ by specifying it in a HAVING clause in the “q1” sub-query instead of in the WHERE clause as written above. The reason this predicate can be pushed past the join in this query form, but not in the original formulation is simply an optimizer limitation – it does make efforts (primarily during the simplification phase) to encourage logically-equivalent query specifications to produce the same execution plan, but the implementation is not completely comprehensive. Moving on to a second example, the following query specification results from phrasing the requirement as “list the products where there exists fewer than ten correlated rows in the history table”: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) < 10 ); Unfortunately, this query produces an incorrect result (86 rows): The problem is that it lists products with no history rows, though the reasons are interesting. The COUNT_BIG(*) in the EXISTS clause is a scalar aggregate (meaning there is no GROUP BY clause) and scalar aggregates always produce a value, even when the input is an empty set. In the case of the COUNT aggregate, the result of aggregating the empty set is zero (the other standard aggregates produce a NULL). To make the point really clear, let’s look at product 709, which happens to be one for which no history rows exist: -- Scalar aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709; -- Vector aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709 GROUP BY th.ProductID; The estimated execution plans for these two statements are almost identical: You might expect the Stream Aggregate to have a Group By for the second statement, but this is not the case. The query includes an equality comparison to a constant value (709), so all qualified rows are guaranteed to have the same value for ProductID and the Group By is optimized away. In fact there are some minor differences between the two plans (the first is auto-parameterized and qualifies for trivial plan, whereas the second is not auto-parameterized and requires cost-based optimization), but there is nothing to indicate that one is a scalar aggregate and the other is a vector aggregate. This is something I would like to see exposed in show plan so I suggested it on Connect. Anyway, the results of running the two queries show the difference at runtime: The scalar aggregate (no GROUP BY) returns a result of zero, whereas the vector aggregate (with a GROUP BY clause) returns nothing at all. Returning to our EXISTS query, we could ‘fix’ it by changing the HAVING clause to reject rows where the scalar aggregate returns zero: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) BETWEEN 1 AND 9 ); The query now returns the correct 23 rows: Unfortunately, the execution plan is less efficient now – it has an estimated cost of 0.78 compared to 0.33 for the earlier plans. Let’s try adding a redundant GROUP BY instead of changing the HAVING clause: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY th.ProductID HAVING COUNT_BIG(*) < 10 ); Not only do we now get correct results (23 rows), this is the execution plan: I like to compare that plan to quantum physics: if you don’t find it shocking, you haven’t understood it properly :) The simple addition of a redundant GROUP BY has resulted in the EXISTS form of the query being transformed into exactly the same optimal plan we found earlier. What’s more, in SQL Server 2008 and later, we can replace the odd-looking GROUP BY with an explicit GROUP BY on the empty set: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ); I offer that as an alternative because some people find it more intuitive (and it perhaps has more geek value too). Whichever way you prefer, it’s rather satisfying to note that the result of the sub-query does not exist for a particular correlated value where a vector aggregate is used (the scalar COUNT aggregate always returns a value, even if zero, so it always ‘EXISTS’ regardless which ProductID is logically being evaluated). The following query forms also produce the optimal plan and correct results, so long as a vector aggregate is used (you can probably find more equivalent query forms): WHERE Clause SELECT p.Name FROM Production.Product AS p WHERE ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) < 10; APPLY SELECT p.Name FROM Production.Product AS p CROSS APPLY ( SELECT NULL FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ) AS ca (dummy); FROM Clause SELECT q1.Name FROM ( SELECT p.Name, cnt = ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) FROM Production.Product AS p ) AS q1 WHERE q1.cnt < 10; This last example uses SUM(1) instead of COUNT and does not require a vector aggregate…you should be able to work out why :) SELECT q.Name FROM ( SELECT p.Name, cnt = ( SELECT SUM(1) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID ) FROM Production.Product AS p ) AS q WHERE q.cnt < 10; The semantics of SQL aggregates are rather odd in places. It definitely pays to get to know the rules, and to be careful to check whether your queries are using scalar or vector aggregates. As we have seen, query plans do not show in which ‘mode’ an aggregate is running and getting it wrong can cause poor performance, wrong results, or both. © 2012 Paul White Twitter: @SQL_Kiwi email: [email protected]

Read the article

Lessons from rewriting POP Forums for MVC, open source-like

- by Jeff

It has been a ton of work, interrupted over the last two years by unemployment, moving, a baby, failing to sell houses and other life events, but it's really exciting to see POP Forums v9 coming together. I'm not even sure when I decided to really commit to it as an open source project, but working on the same team as the CodePlex folks probably had something to do with it. Moving along the roadmap I set for myself, the app is now running on a quasi-production site... we launched MouseZoom last weekend. (That's a post-beta 1 build of the forum. There's also some nifty Silverlight DeepZoom goodness on that site.)I have to make a point to illustrate just how important starting over was for me. I started this forum thing for my sites in old ASP more than ten years ago. What a mess that stuff was, including SQL injection vulnerabilities and all kinds of crap. It went to ASP.NET in 2002, but even then, it felt a little too much like script. More than a year later, in 2003, I did an honest to goodness rewrite. If you've been in this business of writing code for any amount of time, you know how much you hate what you wrote a month ago, so just imagine that with seven years in between. The subsequent versions still carried a fair amount of crap, and that's why I had to start over, to make a clean break. Mind you, much of that crap is still running on some of my production sites in a stable manner, but it's a pain in the ass to maintain.So with that clean break, there is much that I have learned. These are a few of those lessons, in no particular order...Avoid shiny object syndromeOver the years, I've embraced new things without bothering to ask myself why. I remember spending the better part of a year trying to adapt this app to use the membership and profile API's in ASP.NET, just because they were there. They didn't solve any known problem. Early on in this version, I dabbled in exotic ORM's, even though I already had the fundamental SQL that I knew worked. I bloated up the client side code with all kinds of jQuery UI and plugins just because, and it got in the way. All the new shiny can be distracting, and I've come to realize that I've allowed it to be a distraction most of my professional life.Just query what you needI've spent a lot of time over-thinking how to query data. In the SQL world, this means exotic joins, special caches, the read-update-commit loop of ORM's, etc. There are times when you have to remind yourself that you aren't Facebook, you'll never be Facebook, and that databases are in fact intended to serve data. In a lot of projects, back in the day, I used to have these big, rich data objects and pass them all over the place, through various application tiers, when in reality, all I needed was some ID from the entity. I try to be mindful of how many queries hit the database on a given request, but I don't obsess over it. I just get what I need.Don't spend too much time worrying about your unit testsIf you've looked at any of the tests for POP Forums, you might offer an audible WTF. That's OK. There's a whole lot of mocking going on. In some cases, it points out where you're doing too much, and that's good for improving your design. In other cases it shows where your design sucks. But the biggest trap of unit testing is that you worry it should be prettier. That's a waste of time. When you write a test, in many cases before the production code, the important part is that you're testing the right thing. If you have to mock up a bunch of stuff to test the outcome, so be it, but it's not wasted time. You're still doing up the typical arrange-action-assert deal, and you'll be able to read that later if you need to.Get back to your HTTP rootsASP.NET Webforms did a reasonably decent job at abstracting us away from the stateless nature of the Web. A lot of people criticize it, but I think it all worked pretty well. These days, with MVC, jQuery, REST services, and what not, we've gone back to thinking about the wire. The nuts and bolts passing between our Web browser and server matters. This doesn't make things harder, in my opinion, it makes them easier. There is something incredibly freeing about how we approach development of Web apps now. HTTP is a really simple protocol, and the stuff we push through it, in particular HTML and JSON, are pretty simple too. The debugging points are really easy to trap and trace.Premature optimization is prematureI'll go back to the data thing for a moment. I've been known to look at a particular action or use case and stress about the number of calls that are made to the database. I'm not suggesting that it's a bad thing to keep these in mind, but if you worry about it outside of the context of the actual impact, you're wasting time. For example, I query the database for last read times in a forum separately of the user and the list of forums. The impact on performance barely exists. If I put it under load, exceeding the kind of load I expect, it still barely has an impact. Then consider it only counts for logged in users. The context of this "inefficient" action is that it doesn't matter. Did I mention I won't be Facebook?Solve your own problems firstThis is another trap I've fallen into. I've often thought about what other people might need for some feature or aspect of the app. In other words, I was willing to make design decisions based on non-existent data. How stupid is that? When I decided to truly open source this thing, building for myself first was a stated design goal. This app has to server the audiences of CoasterBuzz, MouseZoom and other sites first. In this development scenario, you don't have access to mountains of usability studies or user focus groups. You have to start with what you know.I'm sure there are other points I could make too. It has been a lot of fun to work on, and I look forward to evolving the UI as time goes on. That's where I hope to see more magic in the future.

Read the article

Fun with Aggregates

- by Paul White

There are interesting things to be learned from even the simplest queries. For example, imagine you are given the task of writing a query to list AdventureWorks product names where the product has at least one entry in the transaction history table, but fewer than ten. One possible query to meet that specification is: SELECT p.Name FROM Production.Product AS p JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID GROUP BY p.ProductID, p.Name HAVING COUNT_BIG(*) < 10; That query correctly returns 23 rows (execution plan and data sample shown below): The execution plan looks a bit different from the written form of the query: the base tables are accessed in reverse order, and the aggregation is performed before the join. The general idea is to read all rows from the history table, compute the count of rows grouped by ProductID, merge join the results to the Product table on ProductID, and finally filter to only return rows where the count is less than ten. This ‘fully-optimized’ plan has an estimated cost of around 0.33 units. The reason for the quote marks there is that this plan is not quite as optimal as it could be – surely it would make sense to push the Filter down past the join too? To answer that, let’s look at some other ways to formulate this query. This being SQL, there are any number of ways to write logically-equivalent query specifications, so we’ll just look at a couple of interesting ones. The first query is an attempt to reverse-engineer T-SQL from the optimized query plan shown above. It joins the result of pre-aggregating the history table to the Product table before filtering: SELECT p.Name FROM ( SELECT th.ProductID, cnt = COUNT_BIG(*) FROM Production.TransactionHistory AS th GROUP BY th.ProductID ) AS q1 JOIN Production.Product AS p ON p.ProductID = q1.ProductID WHERE q1.cnt < 10; Perhaps a little surprisingly, we get a slightly different execution plan: The results are the same (23 rows) but this time the Filter is pushed below the join! The optimizer chooses nested loops for the join, because the cardinality estimate for rows passing the Filter is a bit low (estimate 1 versus 23 actual), though you can force a merge join with a hint and the Filter still appears below the join. In yet another variation, the < 10 predicate can be ‘manually pushed’ by specifying it in a HAVING clause in the “q1” sub-query instead of in the WHERE clause as written above. The reason this predicate can be pushed past the join in this query form, but not in the original formulation is simply an optimizer limitation – it does make efforts (primarily during the simplification phase) to encourage logically-equivalent query specifications to produce the same execution plan, but the implementation is not completely comprehensive. Moving on to a second example, the following query specification results from phrasing the requirement as “list the products where there exists fewer than ten correlated rows in the history table”: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) < 10 ); Unfortunately, this query produces an incorrect result (86 rows): The problem is that it lists products with no history rows, though the reasons are interesting. The COUNT_BIG(*) in the EXISTS clause is a scalar aggregate (meaning there is no GROUP BY clause) and scalar aggregates always produce a value, even when the input is an empty set. In the case of the COUNT aggregate, the result of aggregating the empty set is zero (the other standard aggregates produce a NULL). To make the point really clear, let’s look at product 709, which happens to be one for which no history rows exist: -- Scalar aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709; -- Vector aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709 GROUP BY th.ProductID; The estimated execution plans for these two statements are almost identical: You might expect the Stream Aggregate to have a Group By for the second statement, but this is not the case. The query includes an equality comparison to a constant value (709), so all qualified rows are guaranteed to have the same value for ProductID and the Group By is optimized away. In fact there are some minor differences between the two plans (the first is auto-parameterized and qualifies for trivial plan, whereas the second is not auto-parameterized and requires cost-based optimization), but there is nothing to indicate that one is a scalar aggregate and the other is a vector aggregate. This is something I would like to see exposed in show plan so I suggested it on Connect. Anyway, the results of running the two queries show the difference at runtime: The scalar aggregate (no GROUP BY) returns a result of zero, whereas the vector aggregate (with a GROUP BY clause) returns nothing at all. Returning to our EXISTS query, we could ‘fix’ it by changing the HAVING clause to reject rows where the scalar aggregate returns zero: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) BETWEEN 1 AND 9 ); The query now returns the correct 23 rows: Unfortunately, the execution plan is less efficient now – it has an estimated cost of 0.78 compared to 0.33 for the earlier plans. Let’s try adding a redundant GROUP BY instead of changing the HAVING clause: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY th.ProductID HAVING COUNT_BIG(*) < 10 ); Not only do we now get correct results (23 rows), this is the execution plan: I like to compare that plan to quantum physics: if you don’t find it shocking, you haven’t understood it properly :) The simple addition of a redundant GROUP BY has resulted in the EXISTS form of the query being transformed into exactly the same optimal plan we found earlier. What’s more, in SQL Server 2008 and later, we can replace the odd-looking GROUP BY with an explicit GROUP BY on the empty set: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ); I offer that as an alternative because some people find it more intuitive (and it perhaps has more geek value too). Whichever way you prefer, it’s rather satisfying to note that the result of the sub-query does not exist for a particular correlated value where a vector aggregate is used (the scalar COUNT aggregate always returns a value, even if zero, so it always ‘EXISTS’ regardless which ProductID is logically being evaluated). The following query forms also produce the optimal plan and correct results, so long as a vector aggregate is used (you can probably find more equivalent query forms): WHERE Clause SELECT p.Name FROM Production.Product AS p WHERE ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) < 10; APPLY SELECT p.Name FROM Production.Product AS p CROSS APPLY ( SELECT NULL FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ) AS ca (dummy); FROM Clause SELECT q1.Name FROM ( SELECT p.Name, cnt = ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) FROM Production.Product AS p ) AS q1 WHERE q1.cnt < 10; This last example uses SUM(1) instead of COUNT and does not require a vector aggregate…you should be able to work out why :) SELECT q.Name FROM ( SELECT p.Name, cnt = ( SELECT SUM(1) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID ) FROM Production.Product AS p ) AS q WHERE q.cnt < 10; The semantics of SQL aggregates are rather odd in places. It definitely pays to get to know the rules, and to be careful to check whether your queries are using scalar or vector aggregates. As we have seen, query plans do not show in which ‘mode’ an aggregate is running and getting it wrong can cause poor performance, wrong results, or both. © 2012 Paul White Twitter: @SQL_Kiwi email: [email protected]

Search Results

Search found 871 results on 35 pages for 'joins'.

Page 33/35 | < Previous Page | 29 30 31 32 33 34 35 | Next Page >

- by user1738833

- by Blootac

- by Chetan sharma

- by bgadoci

- by chiccodoro

- by jberryman

- by namezero

- by Aur

- by user2915012

- by Emma

- by MSpreij

- by BugBusterX

- by viyancs

- by Damodar Bashyal

- by Horse

- by Eric W.

- by watkipet

- by Maria Colgan

- by DigiMortal

- by Sadequl Hussain

- by Gregory Burd

- by pinaldave

- by Paul White

- by Jeff

- by Paul White

< Previous Page | 29 30 31 32 33 34 35 | Next Page >