Search Results

Search found 11543 results on 462 pages for 'partition wise join'.

Page 1/462 | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Improving Partitioned Table Join Performance

    - by Paul White
    The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) );   CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY);   WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50);   -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE);   -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory:   Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   CREATE TABLE #P ( partition_number integer PRIMARY KEY);   INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1;   SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals;   DROP TABLE #P;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated  – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

    Read the article

  • In SQL, if we rename INNER JOIN as INTERSECT JOIN, LEFT OUTER JOIN as LEFT UNION JOIN, and FULL OUTE

    - by Jian Lin
    In SQL, the name Join gives an idea of "merging" or a sense of "union", making something bigger. But in fact, as in the other post http://stackoverflow.com/questions/2706051/in-sql-a-join-is-actually-an-intersection-and-it-is-also-a-linkage-or-a-sidew it turns out that a Join (Inner Join) is actually an Intersection. So if we think of Join = Inner Join = Intersect Join Left Outer Join = Left Union Join Full Outer Join = Full Union Join = Union Join then we always get a feel of what's happening, and maybe never forget what they are easily. In a way, we can think of Intersect as "making it less", therefore it is excluding something. That's why the name "Join" won't go with the idea of "Intersect". But in fact, both Intersect and Union can be thought of as: Union: bringing something together and merge them unconditionally. Intersect: bringing something together and merge them based on some condition. so the "bringing something together" is probably what "Join" is all about. It is like, Intersection is a "half glass of water" -- we can thinking of it as "excluding something" or as "bringing something together and accepting the common ones". So if the word "Intersect Join" is used, maybe a clear picture is there, and "Union Join" can be a clear picture too. Maybe the word "Inner Join" and "Outer Join" is very clear when we use SQL a lot. Somehow, the word "Outer" tends to give a feeling that it is "outside" and excluding something rather than a "Union".

    Read the article

  • Partition Wise Joins II

    - by jean-pierre.dijcks
    One of the things that I did not talk about in the initial partition wise join post was the effect it has on resource allocation on the database server. When Oracle applies a different join method - e.g. not PWJ - what you will see in SQL Monitor (in Enterprise Manager) or in an Explain Plan is a set of producers and a set of consumers. The producers scan the tables in the the join. If there are two tables the producers first scan one table, then the other. The producers thus provide data to the consumers, and when the consumers have the data from both scans they do the join and give the data to the query coordinator. Now that behavior means that if you choose a degree of parallelism of 4 to run such query with, Oracle will allocate 8 parallel processes. Of these 8 processes 4 are producers and 4 are consumers. The consumers only actually do work once the producers are fully done with scanning both sides of the join. In the plan above you can see that the producers access table SALES [line 11] and then do a PX SEND [line 9]. That is the producer set of processes working. The consumers receive that data [line 8] and twiddle their thumbs while the producers go on and scan CUSTOMERS. The producers send that data to the consumer indicated by PX SEND [line 5]. After receiving that data [line 4] the consumers do the actual join [line 3] and give the data to the QC [line 2]. BTW, the myth that you see twice the number of processes due to the setting PARALLEL_THREADS_PER_CPU=2 is obviously not true. The above is why you will see 2 times the processes of the DOP. In a PWJ plan the consumers are not present. Instead of producing rows and giving those to different processes, a PWJ only uses a single set of processes. Each process reads its piece of the join across the two tables and performs the join. The plan here is notably different from the initial plan. First of all the hash join is done right on top of both table scans [line 8]. This query is a little more complex than the previous so there is a bit of noise above that bit of info, but for this post, lets ignore that (sort stuff). The important piece here is that the PWJ plan typically will be faster and from a PX process number / resources typically cheaper. You may want to look out for those plans and try to get those to appear a lot... CREDITS: credits for the plans and some of the info on the plans go to Maria, as she actually produced these plans and is the expert on plans in general... You can see her talk about explaining the explain plan and other optimizer stuff over here: ODTUG in Washington DC, June 27 - July 1 On the Optimizer blog At OpenWorld in San Francisco, September 19 - 23 Happy joining and hope to see you all at ODTUG and OOW...

    Read the article

  • If Inner Join can be thought of as Cross Join but filtering out the records satisfying the condition

    - by Jian Lin
    If an Inner Join can be thought of as a cross join and then getting the records that satisfy the condition, then a LEFT OUTER JOIN can be thought of as that, plus ONE record on the left table that doesn't satisfy the condition. In other words, it is not a cross join that "goes easy" on the left records (even when the condition is not satisfied), because then the left record can appear many times (as many as how many records there are in the right table). So the LEFT OUTER JOIN is the Cross JOIN with the records satisfying the condition, plus ONE record from the LEFT TABLE that doesn't satisfy the condition.

    Read the article

  • Thinking of an Inner Join as a Cross Join and then satisfying some condition(s)?

    - by Jian Lin
    It seems like the safest way to think of an Inner Join is to think of it as a Cross Join and then satisfying some condition(s)? Because the equi-join can be obvious, but the non-equi-join can be a bit confusing. But if we always use the Cross Join, and then filter out the ones satisfying the condition, then we get the resulting table. In other words, we can always analyze it by using the first record on the left table, and then go through every single records on the right, and then repeat that for 2nd record on the left, and for the 3rd, 4th, ... etc. So in our mind, we can analyze it using this way, and it is like O(n^2), although what happens in the DBMS maybe that it is a lot faster (when an index is present). Is there another good way to think of it besides this method?

    Read the article

  • It seems like the safest way to think of an Inner Join is to think of it as a Cross Join and then sa

    - by Jian Lin
    It seems like the safest way to think of an Inner Join is to think of it as a Cross Join and then satisfying some condition(s)? Because the equi-join can be obvious, but the non-equi-join can be a bit confusing. But if we always use the Cross Join, and then filter out the ones satisfying the condition, then we get the resulting table. In other words, we can always analyze it by using the first record on the left table, and then go through every single records on the right, and then repeat that for 2nd record on the left, and for the 3rd, 4th, ... etc. So in our mind, we can analyze it using this way, and it is like O(n^2), although what happens in the DBMS maybe that it is a lot faster (when an index is present). Is there another good way to think of it besides this method?

    Read the article

  • SQL – Difference Between INNER JOIN and JOIN

    - by Pinal Dave
    Here is the follow up question to my earlier question SQL – Difference between != and Operator <> used for NOT EQUAL TO Operation. There was a pretty good discussion about this subject earlier and lots of people participated with their opinion. Though the answer was very simple but the conversation was indeed delightful and was indeed very informative. In this blog post I have another following up question to all of you. What is the difference between INNER JOIN and JOIN? If you are working with database you will find developers use above both the kinds of the joins in their SQL Queries. Here is the quick example of the same. Query using INNER JOIN SELECT * FROM Table1 INNER JOIN  Table2 ON Table1.Col1 = Table2.Col1 Query using JOIN SELECT * FROM Table1 JOIN  Table2 ON Table1.Col1 = Table2.Col1 The question is what is the difference between above two syntax. Here is the answer – They are equal to each other. There is absolutely no difference between them. They are equal in performance as well as implementation. JOIN is actually shorter version of INNER JOIN. Personally I prefer to write INNER JOIN because it is much cleaner to read and it avoids any confusion if there is related to JOIN. For example if users had written INNER JOIN instead of JOIN there would have been no confusion in mind and hence there was no need to have original question. Here is the question back to you - Which one of the following syntax do you use when you are inner joining two tables – INNER JOIN or JOIN? and Why? Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Joins, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Using CTAS & Exchange Partition Replace IAS for Copying Partition on Exadata

    - by Bandari Huang
    Usage Scenario: Copy data&index from one partition to another partition in a partitioned table. Solution: Create a partition definition Copy data from one partition to another partiton by 'Insert as select (IAS)' Create a nonpartitioned table by 'Create table as select (CTAS)' Convert a nonpartitioned table into a partition of partitoned table by exchangng their data segments. Rebuild unusable index Exchange Partition Convertion Mutual convertion between a partition (or subpartition) and a nonpartitioned table Mutual convertion between a hash-partitioned table and a partition of a composite *-hash partitioned table Mutual convertiton a [range | list]-partitioned table into a partition of a composite *-[range | list] partitioned table. Exchange Partition Usage Scenario High-speed data loading of new, incremental data into an existing partitioned table in DW environment Exchanging old data partitions out of a partitioned table, the data is purged from the partitioned table without actually being deleted and can be archived separately Exchange Partition Syntax ALTER TABLE schema.table EXCHANGE [PARTITION|SUBPARTITION] [partition|subprtition] WITH TABLE schema.table [INCLUDE|EXCLUDING] INDEX [WITH|WITHOUT] VALIDATION UPDATE [INDEXES|GLOBAL INDEXES] INCLUDING | EXCLUDING INDEXES Specify INCLUDING INDEXES if you want local index partitions or subpartitions to be exchanged with the corresponding table index (for a nonpartitioned table) or local indexes (for a hash-partitioned table). Specify EXCLUDING INDEXES if you want all index partitions or subpartitions corresponding to the partition and all the regular indexes and index partitions on the exchanged table to be marked UNUSABLE. If you omit this clause, then the default is EXCLUDING INDEXES. WITH | WITHOUT VALIDATION Specify WITH VALIDATION if you want Oracle Database to return an error if any rows in the exchanged table do not map into partitions or subpartitions being exchanged. Specify WITHOUT VALIDATION if you do not want Oracle Database to check the proper mapping of rows in the exchanged table. If you omit this clause, then the default is WITH VALIDATION.  UPADATE INDEX|GLOBAL INDEX Unless you specify UPDATE INDEXES, the database marks UNUSABLE the global indexes or all global index partitions on the table whose partition is being exchanged. Global indexes or global index partitions on the table being exchanged remain invalidated. (You cannot use UPDATE INDEXES for index-organized tables. Use UPDATE GLOBAL INDEXES instead.) Exchanging Partitions&Subpartitions Notes Both tables involved in the exchange must have the same primary key, and no validated foreign keys can be referencing either of the tables unless the referenced table is empty.  When exchanging partitioned index-organized tables: – The source and target table or partition must have their primary key set on the same columns, in the same order. – If key compression is enabled, then it must be enabled for both the source and the target, and with the same prefix length. – Both the source and target must be index organized. – Both the source and target must have overflow segments, or neither can have overflow segments. Also, both the source and target must have mapping tables, or neither can have a mapping table. – Both the source and target must have identical storage attributes for any LOB columns. 

    Read the article

  • SQL Server 2005 RIGHT OUTER JOIN not working

    - by CheeseConQueso
    I'm looking up access logs for specific courses. I need to show all the courses even if they don't exist in the logs table. Hence the outer join.... but after trying (presumably) all of the variations of LEFT OUTER, RIGHT OUTER, INNER and placement of the tables within the SQL code, I couldn't get my result. Here's what I am running: SELECT (a.first_name+' '+a.last_name) instructor, c.course_id, COUNT(l.access_date) course_logins, a.logins system_logins, MAX(l.access_date) last_course_login, a.last_login last_system_login FROM lsn_logs l RIGHT OUTER JOIN courses c ON l.course_id = c.course_id, accounts a WHERE l.object_id = 'LOGIN' AND c.course_type = 'COURSE' AND c.course_id NOT LIKE '%TEST%' AND a.account_rights > 2 AND l.user_id = a.username AND ((a.first_name+' '+a.last_name) = c.instructor) GROUP BY c.course_id, a.first_name, a.last_name, a.last_login, a.logins, c.instructor ORDER BY a.last_name, a.first_name, c.course_id, course_logins DESC Is it something in the WHERE clause that's preventing me from getting course_id's that don't exist in lsn_logs? Is it the way I'm joining the tables? Again, in short, I want all course_id's regardless of their existence in lsn_logs.

    Read the article

  • copy boot-able partition

    - by Dima
    I have an disk image with 3 partitions: first partition (hd0,0) is boot-able with GRUB1 with the following configuration GRUB file: default=0 timeout=5 title Bank A root (hd0,1) chainloader +1 title Bank B root (hd0,2) chainloader +1 The partitions (hd0,1) and (hd0,2) are also boot-able. I'm trying to clone partition (hd0,1) to (hd0,2) by creating device map using kpartx and copying whole partition using dd command. The problem is: after partition cloning, the cloned partition did not boot (but all files are OK). What the wrong? I need both partitions to bee identical (I'm using them for fail-over purposes into embedded device)

    Read the article

  • Software installed on root partition or on home partition

    - by Tim
    I am planning to install some big softwares such as Matlab (4GB), Mathematica (4GB) on my Ubuntu partitions. I was wondering if I installed them on my home partition, when I reinstall Ubuntu without touching the home partition, will the softwares still be runnable after reinstallation? what are the advantage and disadvantages of installing softwares on root partition and of on home partition? with your answer to the previous questions, what are some reasonable plans for the sizes of root partition and of home partition? Note that I would like to learn programming in C, C++, Java, Python, Lisp, databases under both Ubuntu and Windows, and no games. My laptop has around 230 GB, where I plan to install both Windows and Ubuntu, and reserve 40 GB for Ubuntu (three partitions: swap, root and home), 110 GB for NTFS partition shared between the two OSes, 70 GB for Windows OS partition, and 10 GB that can be added to any of the above partitions. I will change my plan according to your suggestions. Thanks and regards!

    Read the article

  • How to join data frames in R (inner, outer, left, right)?

    - by Dan Goldstein
    Given two data frames df1 = data.frame(CustomerId=c(1:6),Product=c(rep("Toaster",3),rep("Radio",3))) df2 = data.frame(CustomerId=c(2,4,6),State=c(rep("Alabama",2),rep("Ohio",1))) > df1 CustomerId Product 1 Toaster 2 Toaster 3 Toaster 4 Radio 5 Radio 6 Radio > df2 CustomerId State 2 Alabama 4 Alabama 6 Ohio How can I do database style, i.e., sql style, joins? That is, how do I get: An inner join of df1 and df1 An outer join of df1 and df2 A left outer join of df1 and df2 A right outer join of df1 and df2 P.S. IKT-JARQ (I Know This - Just Adding R Questions) Extra credit: How can I do a sql style select statement?

    Read the article

  • Merge Join component sorted outputs [SSIS]

    - by jamiet
    One question that I have been asked a few times of late in regard to performance tuning SSIS data flows is this: Why isn’t the Merge Join output sorted (i.e.IsSorted=True)? This is a fair question. After all both of the Merge Join inputs are sorted, hence why wouldn’t the output be sorted as well? Well here’s a little secret, the Merge Join output IS sorted! There’s a caveat though – it is only under certain circumstances and SSIS itself doesn’t do a good job of informing you of it. Let’s take a look at an example. Here we have a dataflow that consumes data from the [AdventureWorks2008].[Sales].[SalesOrderHeader] & [AdventureWorks2008].[Sales].[SalesOrderDetail] tables then joins them using a Merge Join component: Let’s take a look inside the editor of the Merge Join: We are joining on the [SalesOrderId] field (which is what the two inputs just happen to be sorted upon). We are also putting [SalesOrderHeader].[SalesOrderId] into the output. Believe it or not the output from this Merge Join component is sorted (i.e. has IsSorted=True) but unfortunately the Merge Join component does not have an Advanced Editor hence it is hidden away from us. There are a couple of ways to prove to you that is the case; I could open up the package XML inside the .dtsx file and show you the metadata but there is an easier way than that – I can attach a Sort component to the output. Take a look: Notice that the Sort component is attempting to sort on the [SalesOrderId] column. This gives us the following warning: Validation warning. DFT Get raw data: {992B7C9A-35AD-47B9-A0B0-637F7DDF93EB}: The data is already sorted as specified so the transform can be removed. The warning proves that the output from the Merge Join is sorted! It must be noted that the Merge Join output will only have IsSorted=True if at least one of the join columns is included in the output. So there you go, the Merge Join component can indeed produce a sorted output and that’s very useful in order to avoid unnecessary expensive Sort operations downstream. Hope this is useful to someone out there! @Jamiet  P.S. Thank you to Bob Bojanic on the SSIS product team who pointed this out to me!

    Read the article

  • Merge Join component sorted outputs [SSIS]

    - by jamiet
    One question that I have been asked a few times of late in regard to performance tuning SSIS data flows is this: Why isn’t the Merge Join output sorted (i.e.IsSorted=True)? This is a fair question. After all both of the Merge Join inputs are sorted, hence why wouldn’t the output be sorted as well? Well here’s a little secret, the Merge Join output IS sorted! There’s a caveat though – it is only under certain circumstances and SSIS itself doesn’t do a good job of informing you of it. Let’s take a look at an example. Here we have a dataflow that consumes data from the [AdventureWorks2008].[Sales].[SalesOrderHeader] & [AdventureWorks2008].[Sales].[SalesOrderDetail] tables then joins them using a Merge Join component: Let’s take a look inside the editor of the Merge Join: We are joining on the [SalesOrderId] field (which is what the two inputs just happen to be sorted upon). We are also putting [SalesOrderHeader].[SalesOrderId] into the output. Believe it or not the output from this Merge Join component is sorted (i.e. has IsSorted=True) but unfortunately the Merge Join component does not have an Advanced Editor hence it is hidden away from us. There are a couple of ways to prove to you that is the case; I could open up the package XML inside the .dtsx file and show you the metadata but there is an easier way than that – I can attach a Sort component to the output. Take a look: Notice that the Sort component is attempting to sort on the [SalesOrderId] column. This gives us the following warning: Validation warning. DFT Get raw data: {992B7C9A-35AD-47B9-A0B0-637F7DDF93EB}: The data is already sorted as specified so the transform can be removed. The warning proves that the output from the Merge Join is sorted! It must be noted that the Merge Join output will only have IsSorted=True if at least one of the join columns is included in the output. So there you go, the Merge Join component can indeed produce a sorted output and that’s very useful in order to avoid unnecessary expensive Sort operations downstream. Hope this is useful to someone out there! @Jamiet  P.S. Thank you to Bob Bojanic on the SSIS product team who pointed this out to me!

    Read the article

  • Partition table correpted in windows 7 machine as simple dynamic partition

    - by raki
    I have Windows 7 installed in my system. When I originally partitioned it I made 3 partitions and 16 gb space as unallocated. Later I tried to create a partition using this free space using diskmanagement tool. It shows free space as unusabe space and the only one option available is to make it as a simple partition. Unfortunately I made it as simple partition and all my partitions converted to simple dynamic partition. Now after reboot the OS is not loading. I tried to reinstall the OS by formatting the C drive, but it doesn't work. Now I can't load the OS properly. How can I install Windows 7 on my system without losing my data on the other two drives?

    Read the article

  • Partition table correpted in windows 7 machine as simple dynamic partition

    - by raki
    I have Windows 7 installed in my system. When I originally partitioned it I made 3 partitions and 16 gb space as unallocated. Later I tried to create a partition using this free space using diskmanagement tool. It shows free space as unusabe space and the only one option available is to make it as a simple partition. Unfortunately I made it as simple partition and all my partitions converted to simple dynamic partition. Now after reboot the OS is not loading. I tried to reinstall the OS by formatting the C drive, but it doesn't work. Now I can't load the OS properly. How can I install Windows 7 on my system without losing my data on the other two drives?

    Read the article

  • convert a logical partition to a primary partition

    - by ant2009
    Hello, Fedora 14 xfce I have the following partition setup. I would like to know how can I convert the logical partition sda6 to a primary partition. Disk /dev/sda: 320.1 GB, 320072933376 bytes 255 heads, 63 sectors/track, 38913 cylinders, total 625142448 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x1707a8a5 Device Boot Start End Blocks Id System /dev/sda1 2048 1026047 512000 83 Linux /dev/sda2 1026048 205844479 102409216 83 Linux /dev/sda3 205844480 214228991 4192256 82 Linux swap / Solaris /dev/sda4 214228992 625141759 205456384 5 Extended /dev/sda5 214231040 573562879 179665920 83 Linux /dev/sda6 573564928 625141759 25788416 7 HPFS/NTFS Filesystem Size Used Avail Use% Mounted on /dev/sda2 97G 5.0G 91G 6% / tmpfs 494M 176K 494M 1% /dev/shm /dev/sda1 485M 68M 392M 15% /boot /dev/sda5 169G 26G 135G 16% /home # partition table of /dev/sda unit: sectors /dev/sda1 : start= 2048, size= 1024000, Id=83 /dev/sda2 : start= 1026048, size=204818432, Id=83 /dev/sda3 : start=205844480, size= 8384512, Id=82 /dev/sda4 : start=214228992, size=410912768, Id= 5 /dev/sda5 : start=214231040, size=359331840, Id=83 /dev/sda6 : start=573564928, size= 51576832, Id= 7 I would like to convert sda6 to a primary partition, the reason for this it to install windows 7 starter. Many thanks for any suggestions,

    Read the article

  • Basics of Join Predicate Pushdown in Oracle

    - by Maria Colgan
    Happy New Year to all of our readers! We hope you all had a great holiday season. We start the new year by continuing our series on Optimizer transformations. This time it is the turn of Predicate Pushdown. I would like to thank Rafi Ahmed for the content of this blog.Normally, a view cannot be joined with an index-based nested loop (i.e., index access) join, since a view, in contrast with a base table, does not have an index defined on it. A view can only be joined with other tables using three methods: hash, nested loop, and sort-merge joins. Introduction The join predicate pushdown (JPPD) transformation allows a view to be joined with index-based nested-loop join method, which may provide a more optimal alternative. In the join predicate pushdown transformation, the view remains a separate query block, but it contains the join predicate, which is pushed down from its containing query block into the view. The view thus becomes correlated and must be evaluated for each row of the outer query block. These pushed-down join predicates, once inside the view, open up new index access paths on the base tables inside the view; this allows the view to be joined with index-based nested-loop join method, thereby enabling the optimizer to select an efficient execution plan. The join predicate pushdown transformation is not always optimal. The join predicate pushed-down view becomes correlated and it must be evaluated for each outer row; if there is a large number of outer rows, the cost of evaluating the view multiple times may make the nested-loop join suboptimal, and therefore joining the view with hash or sort-merge join method may be more efficient. The decision whether to push down join predicates into a view is determined by evaluating the costs of the outer query with and without the join predicate pushdown transformation under Oracle's cost-based query transformation framework. The join predicate pushdown transformation applies to both non-mergeable views and mergeable views and to pre-defined and inline views as well as to views generated internally by the optimizer during various transformations. The following shows the types of views on which join predicate pushdown is currently supported. UNION ALL/UNION view Outer-joined view Anti-joined view Semi-joined view DISTINCT view GROUP-BY view Examples Consider query A, which has an outer-joined view V. The view cannot be merged, as it contains two tables, and the join between these two tables must be performed before the join between the view and the outer table T4. A: SELECT T4.unique1, V.unique3 FROM T_4K T4,            (SELECT T10.unique3, T10.hundred, T10.ten             FROM T_5K T5, T_10K T10             WHERE T5.unique3 = T10.unique3) VWHERE T4.unique3 = V.hundred(+) AND       T4.ten = V.ten(+) AND       T4.thousand = 5; The following shows the non-default plan for query A generated by disabling join predicate pushdown. When query A undergoes join predicate pushdown, it yields query B. Note that query B is expressed in a non-standard SQL and shows an internal representation of the query. B: SELECT T4.unique1, V.unique3 FROM T_4K T4,           (SELECT T10.unique3, T10.hundred, T10.ten             FROM T_5K T5, T_10K T10             WHERE T5.unique3 = T10.unique3             AND T4.unique3 = V.hundred(+)             AND T4.ten = V.ten(+)) V WHERE T4.thousand = 5; The execution plan for query B is shown below. In the execution plan BX, note the keyword 'VIEW PUSHED PREDICATE' indicates that the view has undergone the join predicate pushdown transformation. The join predicates (shown here in red) have been moved into the view V; these join predicates open up index access paths thereby enabling index-based nested-loop join of the view. With join predicate pushdown, the cost of query A has come down from 62 to 32.  As mentioned earlier, the join predicate pushdown transformation is cost-based, and a join predicate pushed-down plan is selected only when it reduces the overall cost. Consider another example of a query C, which contains a view with the UNION ALL set operator.C: SELECT R.unique1, V.unique3 FROM T_5K R,            (SELECT T1.unique3, T2.unique1+T1.unique1             FROM T_5K T1, T_10K T2             WHERE T1.unique1 = T2.unique1             UNION ALL             SELECT T1.unique3, T2.unique2             FROM G_4K T1, T_10K T2             WHERE T1.unique1 = T2.unique1) V WHERE R.unique3 = V.unique3 and R.thousand < 1; The execution plan of query C is shown below. In the above, 'VIEW UNION ALL PUSHED PREDICATE' indicates that the UNION ALL view has undergone the join predicate pushdown transformation. As can be seen, here the join predicate has been replicated and pushed inside every branch of the UNION ALL view. The join predicates (shown here in red) open up index access paths thereby enabling index-based nested loop join of the view. Consider query D as an example of join predicate pushdown into a distinct view. We have the following cardinalities of the tables involved in query D: Sales (1,016,271), Customers (50,000), and Costs (787,766).  D: SELECT C.cust_last_name, C.cust_city FROM customers C,            (SELECT DISTINCT S.cust_id             FROM sales S, costs CT             WHERE S.prod_id = CT.prod_id and CT.unit_price > 70) V WHERE C.cust_state_province = 'CA' and C.cust_id = V.cust_id; The execution plan of query D is shown below. As shown in XD, when query D undergoes join predicate pushdown transformation, the expensive DISTINCT operator is removed and the join is converted into a semi-join; this is possible, since all the SELECT list items of the view participate in an equi-join with the outer tables. Under similar conditions, when a group-by view undergoes join predicate pushdown transformation, the expensive group-by operator can also be removed. With the join predicate pushdown transformation, the elapsed time of query D came down from 63 seconds to 5 seconds. Since distinct and group-by views are mergeable views, the cost-based transformation framework also compares the cost of merging the view with that of join predicate pushdown in selecting the most optimal execution plan. Summary We have tried to illustrate the basic ideas behind join predicate pushdown on different types of views by showing example queries that are quite simple. Oracle can handle far more complex queries and other types of views not shown here in the examples. Again many thanks to Rafi Ahmed for the content of this blog post.

    Read the article

  • Quantifying the effects of partition mis-alignment

    - by Matt
    I'm experiencing some significant performance issues on an NFS server. I've been reading up a bit on partition alignment, and I think I have my partitions mis-aligned. I can't find anything that tells me how to actually quantify the effects of mis-aligned partitions. Some of the general information I found suggests the performance penalty can be quite high (upwards of 60%) and others say it's negligible. What I want to do is determine if partition alignment is a factor in this server's performance problems or not; and if so, to what degree? So I'll put my info out here, and hopefully the community can confirm if my partitions are indeed mis-aligned, and if so, help me put a number to what the performance cost is. Server is a Dell R510 with dual E5620 CPUs and 8 GB RAM. There are eight 15k 2.5” 600 GB drives (Seagate ST3600057SS) configured in hardware RAID-6 with a single hot spare. RAID controller is a Dell PERC H700 w/512MB cache (Linux sees this as a LSI MegaSAS 9260). OS is CentOS 5.6, home directory partition is ext3, with options “rw,data=journal,usrquota”. I have the HW RAID configured to present two virtual disks to the OS: /dev/sda for the OS (boot, root and swap partitions), and /dev/sdb for a big NFS share: [root@lnxutil1 ~]# parted -s /dev/sda unit s print Model: DELL PERC H700 (scsi) Disk /dev/sda: 134217599s Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 63s 465884s 465822s primary ext2 boot 2 465885s 134207009s 133741125s primary lvm [root@lnxutil1 ~]# parted -s /dev/sdb unit s print Model: DELL PERC H700 (scsi) Disk /dev/sdb: 5720768639s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 34s 5720768606s 5720768573s lvm Edit 1 Using the cfq IO scheduler (default for CentOS 5.6): # cat /sys/block/sd{a,b}/queue/scheduler noop anticipatory deadline [cfq] noop anticipatory deadline [cfq] Chunk size is the same as strip size, right? If so, then 64kB: # /opt/MegaCli -LDInfo -Lall -aALL -NoLog Adapter #0 Number of Virtual Disks: 2 Virtual Disk: 0 (target id: 0) Name:os RAID Level: Primary-6, Secondary-0, RAID Level Qualifier-3 Size:65535MB State: Optimal Stripe Size: 64kB Number Of Drives:7 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Number of Spans: 1 Span: 0 - Number of PDs: 7 ... physical disk info removed for brevity ... Virtual Disk: 1 (target id: 1) Name:share RAID Level: Primary-6, Secondary-0, RAID Level Qualifier-3 Size:2793344MB State: Optimal Stripe Size: 64kB Number Of Drives:7 Span Depth:1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAdaptive, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Number of Spans: 1 Span: 0 - Number of PDs: 7 If it's not obvious, virtual disk 0 corresponds to /dev/sda, for the OS; virtual disk 1 is /dev/sdb (the exported home directory tree).

    Read the article

  • Setting up a Pagefile and Partition in Server 2008

    - by Brett Powell
    I am setting up 18 new machines for our company, and I have instructions from my new boss on setting up a Pagefile and Partition. I have looked at their existing machines to base the new setups off of, but there is no consistency between any 2 machines, which has left me extremely frustrated to say the least. My instructions are... 1) Set a static pagefile (use recommended value as max/min), set it on SSD if SSD available. 2) Make 3 partitions: C: is used for OS and install files D: is used for backups on machines with a SSD. On machines without SSD create a D: partition for pagefile (2*installed RAM for partition size) E: must be the partition hosting user files I have never messed with Pagefiles before, and looking at their existing machines is offering no help. My questions are... 1) As the machines I am setting up have no SSD (just 2 SATA drives) does it sound like the Pagefile should be setup on the C: (primary) drive or the D:? The instructions are vague so I have no idea. 2) As C: and D: are both Physical drives, does it sound like C: should be partitioned out to create the E: drive or D:? Thanks for any help I can get. I am extremely stressed out under a massive workload right now, and these vague instructions are quite infuriating.

    Read the article

  • Lock a partition to a few folders

    - by Oxwivi
    I want to have a few folders on a partition whose contents I can freely edit, but outside nothing should be saved. Specifically, I want the folders in my Home (Documents, Music, etc) on a different partition, but rest of the normally hidden folders remain in the main partition with Ubuntu. I can make the files within the Home folders save in another partition using fstab binding, but I still can't think of how to lock the partition from edits outside those folders. I'm open to suggestions of alternatives to binding - but please, no symbolic links.

    Read the article

  • “BAD” partition showing up in Partition Manager

    - by Quintin Par
    I tried to partition my primary hard disk (NTFS partitions) with qtparted and got stuck in the process. Consequently I had to kill the process and exit my knoppix live CD boot up. Even though I was expecting XP to get corrupted, it booted fine and showed up all the drives accessible. But when I opened this with partition manager 8, it shows up as “BAD”. I ran chkdsk /f without any success. My objective with qparted and partition magic was to resize my existing partitions and add some space to c: How do I fix this problem and resize my partitions? Edit: Here's how my primary drive as per windows is

    Read the article

  • win 2008 core create a partition with an offset to allow other partition expand

    - by Rqomey
    We are running a win 2008 core host in a HyperV role. We have expanded the logical drive on a RAID 1+0 array belonging to the server, as we needed more space. We have two data partitions D: and E: I want to expand them both so they use all space, and are equally sized. There is data on all partitions, although E is not in live use (so files can be moved and copied from it. Current: What I want- temporary Partition (F:) at end of drive: I am going to create a temporary partition F: so I can move the files from E: onto it, then delete E:, expand D: to the desired size, then rename F: to E: To do this I need to create F: from the end of the drive, ie. have unused space between E: and F: tl;dr How do I create a partition with a large offset in Windows server?

    Read the article

1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >