Would this method work to scale out SQL queries?

Posted by David on Stack Overflow See other posts from Stack Overflow or by David
Published on 2010-03-15T14:43:26Z Indexed on 2010/03/15 22:29 UTC
Read the original article Hit count: 343

Filed under:

sql

|

parallel-processing

|

parallel

|

dbms

|

database

I have a database containing a single huge table. At the moment a query can take anything from 10 to 20 minutes and I need that to go down to 10 seconds. I have spent months trying different products like GridSQL. GridSQL works fine, but is using its own parser which does not have all the needed features. I have also optimized my database in various ways without getting the speedup I need.

I have a theory on how one could scale out queries, meaning that I utilize several nodes to run a single query in parallel. The idea is to take an incoming SQL query and simply run it exactly like it is on all the nodes. When the results are returned to a coordinator node, the same query is run on the union of the resultsets. I realize that an aggregate function like average need to be rewritten into a count and sum to the nodes and that the coordinator divides the sum of the sums with the sum of the counts to get the average.

What kinds of problems could not easily be solved using this model. I believe one issue would be the count distinct function.

Edit: I am getting so many nice suggestions, but none have addressed the method.

© Stack Overflow or respective owner

Related posts about sql

SQL SERVER – Concat Strings in SQL Server using T-SQL – SQL in Sixty Seconds #035 – Video

as seen on SQL Authority - Search for 'SQL Authority'
Concatenating string is one of the most common tasks in SQL Server and every developer has to come across it. We have to concat the string when we have to see the display full name of the person by first name and last name. In this video we will see various methods to concatenate the strings. SQL… >>> More
SQL SERVER – Concat Function in SQL Server – SQL Concatenation

as seen on SQL Authority - Search for 'SQL Authority'
Earlier this week, I was delivering Advanced BI training on the subject of “SQL Server 2008 R2″. I had great time delivering the session. During the session, we talked about SQL Server 2010 Denali. Suddenly one of the attendees suggested his displeasure for the product. He said, even though… >>> More
Error with SQL Server Setup 2012 on Windows 2012

as seen on Server Fault - Search for 'Server Fault'
I am trying to install SQL Server on Windows 2012. I was able to finally get the wizard up and running after making some changes on the server, but now it fails no matter what I do with the following error: TITLE: SQL Server Setup failure. SQL Server Setup has encountered the following error: … >>> More
How can I detect which version of SQL (eg SQL 2008 or SQL Azure)

as seen on Stack Overflow - Search for 'Stack Overflow'
I need to detect which version of SQL I am dealing with to perorm various tasks, I need specifically detect if I am on SQL 2008 or SQL Azure. How can I do this with detection code written in SQL? >>> More
Nested SQL Select statement fails on SQL Server 2000, ok on SQL Server 2005

as seen on Stack Overflow - Search for 'Stack Overflow'
Here is the query: INSERT INTO @TempTable SELECT UserID, Name, Address1 = (SELECT TOP 1 [Address] FROM (SELECT TOP 1 [Address] FROM [UserAddress] ua INNER JOIN UserAddressOrder uo ON ua.UserID = uo.UserID WHERE ua.UserID = u.UserID ORDER BY uo.AddressOrder ASC) q ORDER BY AddressOrder… >>> More

Related posts about parallel-processing

Intel Core 2 duo / AMD athlon X2 parallel processing capability

as seen on Super User - Search for 'Super User'
Does Intel Core 2 duo/ AMD athlon X2 really have 2 separate processors? i.e are they capable of doing real parallel processing? What I don't understand is the difference when somebody says Cores or Processors. >>> More
JVM (embarrasingly) parallel processing libraries/tools

as seen on Stack Overflow - Search for 'Stack Overflow'
I am looking for something that will make it easy to run (correctly coded) embarrassingly parallel JVM code on a cluster (so that I can use Clojure + Incanter). I have used Parallel Python in the past to do this. We have a new PBS cluster and our admin will soon set up IPython nodes that use PBS… >>> More
Parallel processing in R 2.11 Windows 64-bit using SNOW not quite working

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm running R 2.11 64-bit on a WinXP64 machine with 8 processors. With R 2.10.1 the following code spawned 6 R processes for parallel processing: require(foreach) require(doSNOW) cl = makeCluster(6, type='SOCK') registerDoSNOW(cl) bl2 = foreach(i=icount(length(unqmrno))) %dopar% { (Some code… >>> More
Databases supporting parallel processing across multiple servers

as seen on Server Fault - Search for 'Server Fault'
I need a database engine that can utilize multiple servers for processing a single SQL query in parallel. So far I know that this is possible with the some engines, though none of them are feasible for me either because of pricing or missing features. The engines currently known to me are: MS SQL… >>> More
Multithreading/Parallel Processing in PHP

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a PHP script that will generate a report using PHPExcel from data queried from a MySQL DB. Currently, it is linear in processing in that it gets the data back from MySQL, reads in the Excel template, writes the data to the template, then outputs it. I have optimized the code to the point that… >>> More