Search Results

Search found 1150 results on 46 pages for 'partitioning'.

Page 32/46 | < Previous Page | 28 29 30 31 32 33 34 35 36 37 38 39  | Next Page >

  • How to create multiple tables with the same schema using SQLite jdbc

    - by Space_C0wb0y
    I want to split a large table horizontally, and I would like to make sure that all three of them have the same schema. Currently I am using this piece of code to create the tables: statement .executeUpdate("CREATE TABLE AnnotationsMolecularFunction (Id INTEGER PRIMARY KEY ASC AUTOINCREMENT, " + "ProteinId NOT NULL, " + "GOId NOT NULL, " + "UNIQUE (ProteinId, GOId)" + "FOREIGN KEY(ProteinId) REFERENCES Protein(Id))"); There is one such statement for each table. This is bad, because if I decide to change the schema later (which will most certainly happen), I will have to change it three times, which begs for errors, so I would like a way to make sure that the other tables have the same schema without explicitly writing it again. I can use: statement .executeUpdate("CREATE TABLE AnnotationsBiologicalProcess AS SELECT * FROM AnnotationsMolecularFunction"); to create the other tables with the same columns, but the constraints are not aplied. I could of course just generate the same query-string three times with different table-names in Java, but I would like to know if there is an SQL-way of achieving this.

    Read the article

  • Future proof Primary Key design in postgresql

    - by John P
    I've always used either auto_generated or Sequences in the past for my primary keys. With the current system I'm working on there is the possibility of having to eventually partition the data which has never been a requirement in the past. Knowing that I may need to partition the data in the future, is there any advantage of using UUIDs for PKs instead of the database's built-in sequences? If so, is there a design pattern that can safely generate relatively short keys (say 6 characters instead of the usual long one e6709870-5cbc-11df-a08a-0800200c9a66)? 36^6 keys per-table is more than sufficient for any table I could imagine. I will be using the keys in URLs so conciseness is important.

    Read the article

  • Range partition skip check

    - by user289429
    We have large amount of data partitioned on year value using range partition in oracle. We have used range partition but each partition contains data only for one year. When we write a query targeting a specific year, oracle fetches the information from that partition but still checks if the year is what we have specified. Since this year column is not part of the index it fetches the year from table and compares it. We have seen that any time the query goes to fetch table data it is getting too slow. Can we somehow avoid oracle comparing the year values since we for sure know that the partition contains information for only one year.

    Read the article

  • Is it OK to re-create many SQL connections (SQL 2008)

    - by Mr. Flibble
    When performing many inserts into a database I would usually have code like this: using (var connection = new SqlConnection(connStr)) { connection.Open(); foreach (var item in items) { var cmd = new SqlCommand("INSERT ...") cmd.ExecuteNonQuery(); } } I now want to shard the database and therefore need to choose the connection string based on the item being inserted. This would make my code run more like this foreach (var item in items) { connStr = GetConnectionString(item); using (var connection = new SqlConnection(connStr)) { connection.Open(); var cmd = new SqlCommand("INSERT ...") cmd.ExecuteNonQuery(); } } Which basically means it's creating a new connection to the database for each item. Will this work or will recreating connections for each insert cause terrible overhead?

    Read the article

  • Does my TPL partitioner cause a deadlock?

    - by Scott Chamberlain
    I am starting to write my first parallel applications. This partitioner will enumerate over a IDataReader pulling chunkSize records at a time from the data-source. protected class DataSourcePartitioner<object[]> : System.Collections.Concurrent.Partitioner<object[]> { private readonly System.Data.IDataReader _Input; private readonly int _ChunkSize; public DataSourcePartitioner(System.Data.IDataReader input, int chunkSize = 10000) : base() { if (chunkSize < 1) throw new ArgumentOutOfRangeException("chunkSize"); _Input = input; _ChunkSize = chunkSize; } public override bool SupportsDynamicPartitions { get { return true; } } public override IList<IEnumerator<object[]>> GetPartitions(int partitionCount) { var dynamicPartitions = GetDynamicPartitions(); var partitions = new IEnumerator<object[]>[partitionCount]; for (int i = 0; i < partitionCount; i++) { partitions[i] = dynamicPartitions.GetEnumerator(); } return partitions; } public override IEnumerable<object[]> GetDynamicPartitions() { return new ListDynamicPartitions(_Input, _ChunkSize); } private class ListDynamicPartitions : IEnumerable<object[]> { private System.Data.IDataReader _Input; int _ChunkSize; private object _ChunkLock = new object(); public ListDynamicPartitions(System.Data.IDataReader input, int chunkSize) { _Input = input; _ChunkSize = chunkSize; } public IEnumerator<object[]> GetEnumerator() { while (true) { List<object[]> chunk = new List<object[]>(_ChunkSize); lock(_Input) { for (int i = 0; i < _ChunkSize; ++i) { if (!_Input.Read()) break; var values = new object[_Input.FieldCount]; _Input.GetValues(values); chunk.Add(values); } if (chunk.Count == 0) yield break; } var chunkEnumerator = chunk.GetEnumerator(); lock(_ChunkLock) //Will this cause a deadlock? { while (chunkEnumerator.MoveNext()) { yield return chunkEnumerator.Current; } } } } IEnumerator IEnumerable.GetEnumerator() { return ((IEnumerable<object[]>)this).GetEnumerator(); } } } I wanted IEnumerable object it passed back to be thread safe (the .Net example was so I am assuming PLINQ and TPL could need it) will the lock on _ChunkLock near the bottom help provide thread safety or will it cause a deadlock? From the documentation I could not tell if the lock would be released on the yeld return. Also if there is built in functionality to .net that will do what I am trying to do I would much rather use that. And if you find any other problems with the code I would appreciate it.

    Read the article

  • Does my GetEnumerator cause a deadlock?

    - by Scott Chamberlain
    I am starting to write my first parallel applications. This partitioner will enumerate over a IDataReader pulling chunkSize records at a time from the data-source. TLDR; version private object _Lock = new object(); public IEnumerator GetEnumerator() { var infoSource = myInforSource.GetEnumerator(); //Will this cause a deadlock if two threads lock (_Lock) //use the enumator at the same time? { while (infoSource.MoveNext()) { yield return infoSource.Current; } } } full code protected class DataSourcePartitioner<object[]> : System.Collections.Concurrent.Partitioner<object[]> { private readonly System.Data.IDataReader _Input; private readonly int _ChunkSize; public DataSourcePartitioner(System.Data.IDataReader input, int chunkSize = 10000) : base() { if (chunkSize < 1) throw new ArgumentOutOfRangeException("chunkSize"); _Input = input; _ChunkSize = chunkSize; } public override bool SupportsDynamicPartitions { get { return true; } } public override IList<IEnumerator<object[]>> GetPartitions(int partitionCount) { var dynamicPartitions = GetDynamicPartitions(); var partitions = new IEnumerator<object[]>[partitionCount]; for (int i = 0; i < partitionCount; i++) { partitions[i] = dynamicPartitions.GetEnumerator(); } return partitions; } public override IEnumerable<object[]> GetDynamicPartitions() { return new ListDynamicPartitions(_Input, _ChunkSize); } private class ListDynamicPartitions : IEnumerable<object[]> { private System.Data.IDataReader _Input; int _ChunkSize; private object _ChunkLock = new object(); public ListDynamicPartitions(System.Data.IDataReader input, int chunkSize) { _Input = input; _ChunkSize = chunkSize; } public IEnumerator<object[]> GetEnumerator() { while (true) { List<object[]> chunk = new List<object[]>(_ChunkSize); lock(_Input) { for (int i = 0; i < _ChunkSize; ++i) { if (!_Input.Read()) break; var values = new object[_Input.FieldCount]; _Input.GetValues(values); chunk.Add(values); } if (chunk.Count == 0) yield break; } var chunkEnumerator = chunk.GetEnumerator(); lock(_ChunkLock) //Will this cause a deadlock? { while (chunkEnumerator.MoveNext()) { yield return chunkEnumerator.Current; } } } } IEnumerator IEnumerable.GetEnumerator() { return ((IEnumerable<object[]>)this).GetEnumerator(); } } } I wanted IEnumerable object it passed back to be thread safe (the MSDN example was so I am assuming PLINQ and TPL could need it) will the lock on _ChunkLock near the bottom help provide thread safety or will it cause a deadlock? From the documentation I could not tell if the lock would be released on the yeld return. Also if there is built in functionality to .net that will do what I am trying to do I would much rather use that. And if you find any other problems with the code I would appreciate it.

    Read the article

  • How to design data storage for partitioned tagging system?

    - by Morgan Cheng
    How to design data storage for huge tagging system (like digg or delicious)? There is already discussion about it, but it is about centralized database. Since the data is supposed to grow, we'll need to partition the data into multiple shards soon or later. So, the question turns to be: How to design data storage for partitioned tagging system? The tagging system basically has 3 tables: Item (item_id, item_content) Tag (tag_id, tag_title) TagMapping(map_id, tag_id, item_id) That works fine for finding all items for given tag and finding all tags for given item, if the table is stored in one database instance. If we need to partition the data into multiple database instances, it is not that easy. For table Item, we can partition its content with its key item_id. For table Tag, we can partition its content with its key tag_id. For example, we want to partition table Tag into K databases. We can simply choose number (tag_id % K) database to store given tag. But, how to partition table TagMapping? The TagMapping table represents the many-to-many relationship. I can only image to have duplication. That is, same content of TagMappping has two copies. One is partitioned with tag_id and the other is partitioned with item_id. In scenario to find tags for given item, we use partition with tag_id. If scenario to find items for given tag, we use partition with item_id. As a result, there is data redundancy. And, the application level should keep the consistency of all tables. It looks hard. Is there any better solution to solve this many-to-many partition problem?

    Read the article

  • Database(Partitions) related query.... Please suggest..........

    - by AGeek
    The Query is as follows:- We have an oracle schema, with tables partitioned. -- Need to make one of the partition offline. (How to achieve this,, or have thought of assigning each partition, a different tablespace, and then making that tablespace offline) any other way out possible so as to avoid making so many tablespaces???? -- Then, copy the data files/ logs related to that partition to the temporary storage..... how to achieve this thing... any docs related to the same.... -- Then We make this partition / related tablespace online... -- Then we create a second schema with the same tables as above..... -- Copy the partition data from temporary storage to this table. (How this can be achieved,,, any specific doc??? -- Then verifying to the data is accessible in all respects..... If possible, let's discuss to find out the solution........ Thanks....

    Read the article

  • How to make ActiveRecord work with legacy partitioned/sharded databases/tables?

    - by Utensil
    thanks for your time first...after all the searching on google, github and here, and got more confused about the big words(partition/shard/fedorate),I figure that I have to describe the specific problem I met and ask around. My company's databases deals with massive users and orders, so we split databases and tables in various ways, some are described below: way database and table name shard by (maybe it's should be called partitioned by?) YZ.X db_YZ.tb_X order serial number last three digits YYYYMMDD. db_YYYYMMDD.tb date YYYYMM.DD db_YYYYMM.tb_ DD date too The basic concept is that databases and tables are seperated acording to a field(not nessissarily the primary key), and there are too many databases and too many tables, so that writing or magically generate one database.yml config for each database and one model for each table isn't possible or at least not the best solution. I looked into drnic's magic solutions, and datafabric, and even the source code of active record, maybe I could use ERB to generate database.yml and do database connection in around filter, and maybe I could use named_scope to dynamically decide the table name for find, but update/create opertions are bounded to "self.class.quoted_table_name" so that I couldn't easily get my problem solved. And even I could generate one model for each table, because its amount is up to 30 most. But this is just not DRY! What I need is a clean solution like the following DSL: class Order < ActiveRecord::Base shard_by :order_serialno do |key| [get_db_config_by(key), #because some or all of the databaes might share the same machine in a regular way or can be configed by a hash of regex, and it can also be a const get_db_name_by(key), get_tb_name_by(key), ] end end Can anybody enlight me? Any help would be greatly appreciated~~~~

    Read the article

  • Is there a way to split the results of a select query into two equal halfs?

    - by Matthias
    I'd like to have a query returning two ResultSets each of which holding exactly half of all records matching a certain criteria. I tried using TOP 50 PERCENT in conjunction with an Order By but if the number of records in the table is odd, one record will show up in both resultsets. Example: I've got a simple table with TheID (PK) and TheValue fields (varchar(10)) and 5 records. Skip the where clause for now. SELECT TOP 50 PERCENT * FROM TheTable ORDER BY TheID asc results in the selected id's 1,2,3 SELECT TOP 50 PERCENT * FROM TheTable ORDER BY TheID desc results in the selected id's 3,4,5 3 is a dup. In real life of course the queries are fairly complicated with a ton of where clauses and subqueries.

    Read the article

  • I tried installing Ubuntu 10.04 and I got this message - any ideas on what to do?

    - by vette982
    No root file system defined. Please correct this from the partition menu. This message shows up when I first boot into Ubuntu after the installation. I installed it by mounting the ISO with Daemon Tools, and I just did the default Wubi installation. I keep reading everywhere that I need to choose my installation directory, but I don't get any option to do that. These are all the options I get for installation directory. I have a C and D partition on my drive, and I tried installing it on both and no luck either way. Any ideas?

    Read the article

  • What is the best way to partition large tables in SQL Server?

    - by RyanFetz
    In a recent project the "lead" developer designed a database schema where "larger" tables would be split across two seperate databases with a view on the main database which unioned the two seperate database-tables together. The main database is what the application was driven off of so these tables looked and felt like ordinary tables (except some quirkly things around updating). This seemed like a HUGE performance problem. We do see problems with performance around these tables but nothing to make him change his mind about his design. Just wondering what is the best way to do this, or if it is even worth doing?

    Read the article

  • What is a good practice for 2D scene graph partitioning for culling?

    - by DevilWithin
    I need to know an efficient way to cull the scene graph objects, to render exclusively the ones in the view, and as fast as possible. I am thinking of doing it the following way, having in each object a local boundingbox which holds the object bounds, and a global boundingbox which holds the bounds of the object and all children. When a camera is moved, the render list is updated by traversing the global boundingboxes. When only the object is being moved, it tries to enlarge or shrink the ancestors global boundingboxes, and in the end updating or not the renderlist. What do you think of this approach? Do you think it will provide a fast and efficient culling? Also, because the render list is a contiguous list, it could accelerate the rendering, right? Any further tips for a 2D scene graphs are highly appreciated!

    Read the article

  • Where should the partitioning column go in the primary key on SQL Server?

    - by Bialecki
    Using SQL Server 2005 and 2008. I've got a potentially very large table (potentially hundreds of millions of rows) consisting of the following columns: CREATE TABLE ( date SMALLDATETIME, id BIGINT, value FLOAT ) which is being partitioned on column date in daily partitions. The question then is should the primary key be on date, id or value, id? I can imagine that SQL Server is smart enough to know that it's already partitioning on date and therefore, if I'm always querying for whole chunks of days, then I can have it second in the primary key. Or I can imagine that SQL Server will need that column to be first in the primary key to get the benefit of partitioning. Can anyone lend some insight into which way the table should be keyed?

    Read the article

  • Oracle Database 12 c New Partition Maintenance Features by Gwen Lazenby

    - by hamsun
    One of my favourite new features in Oracle Database 12c is the ability to perform partition maintenance operations on multiple partitions. This means we can now add, drop, truncate and merge multiple partitions in one operation, and can split a single partition into more than two partitions also in just one command. This would certainly have made my life slightly easier had it been available when I administered a data warehouse at Oracle 9i. To demonstrate this new functionality and syntax, I am going to create two tables, ORDERS and ORDERS_ITEMS which have a parent-child relationship. ORDERS is to be partitioned using range partitioning on the ORDER_DATE column, and ORDER_ITEMS is going to partitioned using reference partitioning and its foreign key relationship with the ORDERS table. This form of partitioning was a new feature in 11g and means that any partition maintenance operations performed on the ORDERS table will also take place on the ORDER_ITEMS table as well. First create the ORDERS table - SQL CREATE TABLE orders ( order_id NUMBER(12), order_date TIMESTAMP, order_mode VARCHAR2(8), customer_id NUMBER(6), order_status NUMBER(2), order_total NUMBER(8,2), sales_rep_id NUMBER(6), promotion_id NUMBER(6), CONSTRAINT orders_pk PRIMARY KEY(order_id) ) PARTITION BY RANGE(order_date) (PARTITION Q1_2007 VALUES LESS THAN (TO_DATE('01-APR-2007','DD-MON-YYYY')), PARTITION Q2_2007 VALUES LESS THAN (TO_DATE('01-JUL-2007','DD-MON-YYYY')), PARTITION Q3_2007 VALUES LESS THAN (TO_DATE('01-OCT-2007','DD-MON-YYYY')), PARTITION Q4_2007 VALUES LESS THAN (TO_DATE('01-JAN-2008','DD-MON-YYYY')) ); Table created. Now the ORDER_ITEMS table SQL CREATE TABLE order_items ( order_id NUMBER(12) NOT NULL, line_item_id NUMBER(3) NOT NULL, product_id NUMBER(6) NOT NULL, unit_price NUMBER(8,2), quantity NUMBER(8), CONSTRAINT order_items_fk FOREIGN KEY(order_id) REFERENCES orders(order_id) on delete cascade) PARTITION BY REFERENCE(order_items_fk) tablespace example; Table created. Now look at DBA_TAB_PARTITIONS to get details of what partitions we have in the two tables – SQL select table_name,partition_name, partition_position position, high_value from dba_tab_partitions where table_owner='SH' and table_name like 'ORDER_%' order by partition_position, table_name; TABLE_NAME PARTITION_NAME POSITION HIGH_VALUE -------------- --------------- -------- ------------------------- ORDERS Q1_2007 1 TIMESTAMP' 2007-04-01 00:00:00' ORDER_ITEMS Q1_2007 1 ORDERS Q2_2007 2 TIMESTAMP' 2007-07-01 00:00:00' ORDER_ITEMS Q2_2007 2 ORDERS Q3_2007 3 TIMESTAMP' 2007-10-01 00:00:00' ORDER_ITEMS Q3_2007 3 ORDERS Q4_2007 4 TIMESTAMP' 2008-01-01 00:00:00' ORDER_ITEMS Q4_2007 4 Just as an aside it is also now possible in 12c to use interval partitioning on reference partitioned tables. In 11g it was not possible to combine these two new partitioning features. For our first example of the new 12cfunctionality, let us add all the partitions necessary for 2008 to the tables using one command. Notice that the partition specification part of the add command is identical in format to the partition specification part of the create command as shown above - SQL alter table orders add PARTITION Q1_2008 VALUES LESS THAN (TO_DATE('01-APR-2008','DD-MON-YYYY')), PARTITION Q2_2008 VALUES LESS THAN (TO_DATE('01-JUL-2008','DD-MON-YYYY')), PARTITION Q3_2008 VALUES LESS THAN (TO_DATE('01-OCT-2008','DD-MON-YYYY')), PARTITION Q4_2008 VALUES LESS THAN (TO_DATE('01-JAN-2009','DD-MON-YYYY')); Table altered. Now look at DBA_TAB_PARTITIONS and we can see that the 4 new partitions have been added to both tables – SQL select table_name,partition_name, partition_position position, high_value from dba_tab_partitions where table_owner='SH' and table_name like 'ORDER_%' order by partition_position, table_name; TABLE_NAME PARTITION_NAME POSITION HIGH_VALUE -------------- --------------- -------- ------------------------- ORDERS Q1_2007 1 TIMESTAMP' 2007-04-01 00:00:00' ORDER_ITEMS Q1_2007 1 ORDERS Q2_2007 2 TIMESTAMP' 2007-07-01 00:00:00' ORDER_ITEMS Q2_2007 2 ORDERS Q3_2007 3 TIMESTAMP' 2007-10-01 00:00:00' ORDER_ITEMS Q3_2007 3 ORDERS Q4_2007 4 TIMESTAMP' 2008-01-01 00:00:00' ORDER_ITEMS Q4_2007 4 ORDERS Q1_2008 5 TIMESTAMP' 2008-04-01 00:00:00' ORDER_ITEMS Q1_2008 5 ORDERS Q2_2008 6 TIMESTAMP' 2008-07-01 00:00:00' ORDER_ITEM Q2_2008 6 ORDERS Q3_2008 7 TIMESTAMP' 2008-10-01 00:00:00' ORDER_ITEMS Q3_2008 7 ORDERS Q4_2008 8 TIMESTAMP' 2009-01-01 00:00:00' ORDER_ITEMS Q4_2008 8 Next, we can drop or truncate multiple partitions by giving a comma separated list in the alter table command. Note the use of the plural ‘partitions’ in the command as opposed to the singular ‘partition’ prior to 12c– SQL alter table orders drop partitions Q3_2008,Q2_2008,Q1_2008; Table altered. Now look at DBA_TAB_PARTITIONS and we can see that the 3 partitions have been dropped in both the two tables – TABLE_NAME PARTITION_NAME POSITION HIGH_VALUE -------------- --------------- -------- ------------------------- ORDERS Q1_2007 1 TIMESTAMP' 2007-04-01 00:00:00' ORDER_ITEMS Q1_2007 1 ORDERS Q2_2007 2 TIMESTAMP' 2007-07-01 00:00:00' ORDER_ITEMS Q2_2007 2 ORDERS Q3_2007 3 TIMESTAMP' 2007-10-01 00:00:00' ORDER_ITEMS Q3_2007 3 ORDERS Q4_2007 4 TIMESTAMP' 2008-01-01 00:00:00' ORDER_ITEMS Q4_2007 4 ORDERS Q4_2008 5 TIMESTAMP' 2009-01-01 00:00:00' ORDER_ITEMS Q4_2008 5 Now let us merge all the 2007 partitions together to form one single partition – SQL alter table orders merge partitions Q1_2005, Q2_2005, Q3_2005, Q4_2005 into partition Y_2007; Table altered. TABLE_NAME PARTITION_NAME POSITION HIGH_VALUE -------------- --------------- -------- ------------------------- ORDERS Y_2007 1 TIMESTAMP' 2008-01-01 00:00:00' ORDER_ITEMS Y_2007 1 ORDERS Q4_2008 2 TIMESTAMP' 2009-01-01 00:00:00' ORDER_ITEMS Q4_2008 2 Splitting partitions is a slightly more involved. In the case of range partitioning one of the new partitions must have no high value defined, and in list partitioning one of the new partitions must have no list of values defined. I call these partitions the ‘everything else’ partitions, and will contain any rows contained in the original partition that are not contained in the any of the other new partitions. For example, let us split the Y_2007 partition back into 4 quarterly partitions – SQL alter table orders split partition Y_2007 into (PARTITION Q1_2007 VALUES LESS THAN (TO_DATE('01-APR-2007','DD-MON-YYYY')), PARTITION Q2_2007 VALUES LESS THAN (TO_DATE('01-JUL-2007','DD-MON-YYYY')), PARTITION Q3_2007 VALUES LESS THAN (TO_DATE('01-OCT-2007','DD-MON-YYYY')), PARTITION Q4_2007); Now look at DBA_TAB_PARTITIONS to get details of the new partitions – TABLE_NAME PARTITION_NAME POSITION HIGH_VALUE -------------- --------------- -------- ------------------------- ORDERS Q1_2007 1 TIMESTAMP' 2007-04-01 00:00:00' ORDER_ITEMS Q1_2007 1 ORDERS Q2_2007 2 TIMESTAMP' 2007-07-01 00:00:00' ORDER_ITEMS Q2_2007 2 ORDERS Q3_2007 3 TIMESTAMP' 2007-10-01 00:00:00' ORDER_ITEMS Q3_2007 3 ORDERS Q4_2007 4 TIMESTAMP' 2008-01-01 00:00:00' ORDER_ITEMS Q4_2007 4 ORDERS Q4_2008 5 TIMESTAMP' 2009-01-01 00:00:00' ORDER_ITEMS Q4_2008 5 Partition Q4_2007 has a high value equal to the high value of the original Y_2007 partition, and so has inherited its upper boundary from the partition that was split. As for a list partitioning example let look at the following another table, SALES_PAR_LIST, which has 2 partitions, Americas and Europe and a partitioning key of country_name. SQL select table_name,partition_name, high_value from dba_tab_partitions where table_owner='SH' and table_name = 'SALES_PAR_LIST'; TABLE_NAME PARTITION_NAME HIGH_VALUE -------------- --------------- ----------------------------- SALES_PAR_LIST AMERICAS 'Argentina', 'Canada', 'Peru', 'USA', 'Honduras', 'Brazil', 'Nicaragua' SALES_PAR_LIST EUROPE 'France', 'Spain', 'Ireland', 'Germany', 'Belgium', 'Portugal', 'Denmark' Now split the Americas partition into 3 partitions – SQL alter table sales_par_list split partition americas into (partition south_america values ('Argentina','Peru','Brazil'), partition north_america values('Canada','USA'), partition central_america); Table altered. Note that no list of values was given for the ‘Central America’ partition. However it should have inherited any values in the original ‘Americas’ partition that were not assigned to either the ‘North America’ or ‘South America’ partitions. We can confirm this by looking at the DBA_TAB_PARTITIONS view. SQL select table_name,partition_name, high_value from dba_tab_partitions where table_owner='SH' and table_name = 'SALES_PAR_LIST'; TABLE_NAME PARTITION_NAME HIGH_VALUE --------------- --------------- -------------------------------- SALES_PAR_LIST SOUTH_AMERICA 'Argentina', 'Peru', 'Brazil' SALES_PAR_LIST NORTH_AMERICA 'Canada', 'USA' SALES_PAR_LIST CENTRAL_AMERICA 'Honduras', 'Nicaragua' SALES_PAR_LIST EUROPE 'France', 'Spain', 'Ireland', 'Germany', 'Belgium', 'Portugal', 'Denmark' In conclusion, I hope that DBA’s whose work involves maintaining partitions will find the operations a bit more straight forward to carry out once they have upgraded to Oracle Database 12c. Gwen Lazenby is a Principal Training Consultant at Oracle. She is part of Oracle University's Core Technology delivery team based in the UK, teaching Database Administration and Linux courses. Her specialist topics include using Oracle Partitioning and Parallelism in Data Warehouse environments, as well as Oracle Spatial and RMAN.

    Read the article

  • Installing Solaris with Window. Disk partitioning problem.

    - by RishiPatel
    I am running windows and want to install the solaris but having hard time while using installer. I have 1 primary partition of 40GB and one extended partition. Extended partition have 4 logical drive. Solaris disk management window show only two partition one is of 40GB(Primary) and second is Extended partition. Can i convert a logical drive into primary partition( I have one free of 25 GB). Please look at the screenshot of disk management utitlity of window. http://img195.imageshack.us/img195/2005/20726948.jpg Is there any way to install solaris without reformatting the and repartitioning whole drive? In case it is not possible how should i partition and using which utility ?

    Read the article

  • Decrease in disk performance after partitioning and encryption, is this much of a drop normal?

    - by Biohazard
    I have a server that I only have remote access to. Earlier in the week I repartitioned the 2 disk raid as follows: Filesystem Size Used Avail Use% Mounted on /dev/mapper/sda1_crypt 363G 1.8G 343G 1% / tmpfs 2.0G 0 2.0G 0% /lib/init/rw udev 2.0G 140K 2.0G 1% /dev tmpfs 2.0G 0 2.0G 0% /dev/shm /dev/sda5 461M 26M 412M 6% /boot /dev/sda7 179G 8.6G 162G 6% /data The raid consists of 2 x 300gb SAS 15k disks. Prior to the changes I made, it was being used as a single unencrypted root parition and hdparm -t /dev/sda was giving readings around 240mb/s, which I still get if I do it now: /dev/sda: Timing buffered disk reads: 730 MB in 3.00 seconds = 243.06 MB/sec Since the repartition and encryption, I get the following on the separate partitions: Unencrypted /dev/sda7: /dev/sda7: Timing buffered disk reads: 540 MB in 3.00 seconds = 179.78 MB/sec Unencrypted /dev/sda5: /dev/sda5: Timing buffered disk reads: 476 MB in 2.55 seconds = 186.86 MB/sec Encrypted /dev/mapper/sda1_crypt: /dev/mapper/sda1_crypt: Timing buffered disk reads: 150 MB in 3.03 seconds = 49.54 MB/sec I expected a drop in performance on the encrypted partition, but not that much, but I didn't expect I would get a drop in performance on the other partitions at all. The other hardware in the server is: 2 x Quad Core Intel(R) Xeon(R) CPU E5405 @ 2.00GHz and 4gb RAM $ cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 32 Lun: 00 Vendor: DP Model: BACKPLANE Rev: 1.05 Type: Enclosure ANSI SCSI revision: 05 Host: scsi0 Channel: 02 Id: 00 Lun: 00 Vendor: DELL Model: PERC 6/i Rev: 1.11 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 00 Vendor: HL-DT-ST Model: CD-ROM GCR-8240N Rev: 1.10 Type: CD-ROM ANSI SCSI revision: 05 I'm guessing this means the server has a PERC 6/i RAID controller? The encryption was done with default settings during debian 6 installation. I can't recall the exact specifics and am not sure how I go about finding them? Thanks

    Read the article

  • How does the process of disk partitioning actually work on most HDD's?

    - by Dark Templar
    From what I know of most laptops, you are able to "partition" your disk into as many other drives as you please. The more you cut it up, the smaller your partitions are, but from an organizational point of view, this may be desirable... I was wondering how the filesystem itself becomes partitioned underneath the partitions visible to the user. For instance, a laptop disk is usually divided into platters, each with two surfaces. The surfaces are further divided into "tracks". I guess what I am asking is, is it possible to identify how the disk itself keeps track of partitions? (whether each partition has its own platter? each partition has its own set of adjacent tracks? or some other configuration, or whether the data from different partitions are just randomly interleaved and scattered throughout the disk?)

    Read the article

  • Oracle Open World 2012: SQL Developer Recap

    - by thatjeffsmith
    Last week was the ‘big show’ in San Francisco. I was very happy to meet many of you in person. And many of you had questions – lots of questions! We had full or overflowing rooms for our sessions and hands-on-labs. The SQL Developer ‘booths’ were also slammed several times. So exciting to see so many of YOU excited about SQL Developer. It’s very cool to hear the stories of our tools saving you and your organizations so much time (and money!) Instead of doing a Day 0 – Day 9 recap, I thought I’d share with you the questions that I heard more than once. And just for giggles, I’ll throw in some answers as well So in no particular order… What’s the difference between Oracle SQL Developer & Oracle SQL Developer Data Modeler? Mathematically speaking – two words. But as far as the actual modeling features go, there’s no difference between the two applications. The same ‘code’ or features as it pertains to data modeling and design are in both tools. However, in SQL Developer you have all of the OTHER features fighting for real estate in the UI. So I have a general rule of thumb – if you spend MOST of your time in the database, use SQL Developer. And if you spend most of your time in the data model, run the separate and dedicated program, Oracle SQL Developer Data Modeler. Here’s a couple of screenshots to drive home the UI point: Oracle SQL Developer Oracle SQL Developer Data Modeler running INSIDE of SQL Developer. Notice how the Modeler menu items fold under the file menu? Oracle SQL Developer Data Modeler Easier to navigate and manipulate your models with the stand alone modeler. Just no worksheet to run your ad-hoc queries, etc. Don’t forget you can disable the Data Modeler inside of SQL Developer via the Extensions preference page. How can I model my table partitions? Partitioning is defined via the Physical model. So after you have finished your relational model, you need to generate a physical model. Oracle SQL Developer Data Modeler Physical Model and Partitioning Open the properties for your physical model table. Enable the ‘partitioned’ property. Once you do so, the ‘Partitioning’ page will activate. Lots and lots of partitioning support and options here But what about Interval Partitioning? An extension of range partitioning in 11gR2, we don’t currently support this partitioning scheme in SQL Developer. But we’re working on it! Can SQL Developer ignore column order when comparing models? Yes! After you start a model compare, one of your options is to disregard the order of an attribute or column definition. Tell SQL Developer you don’t care when your column shows up, just as long as it DOES show up. Wow, you got a lot of questions around modeling! Is that normal? Yes! While we appreciate that many folks inherit their applications and associated designs, new applications are being ‘born’ every day. Since both of our tools are free for anyone to design their new Oracle applications with, we attract a fair amount of attention I want to do a Hands On Lab. How do I get your software and instructional guides? Go here. Download VirtualBox. Then download the VB image. Import the appliance. Start it. Connect oracle/oracle on the OEL VM. Click on ‘Start Here’ in the desktop. Follow the instructions. If you need help, ask away! You went too fast in your Tips & Tricks session. Do you have cliff notes? Yes! And you’re SO close to finding them! Just go to my SQL Developer resources page. All of my tips are documented on this blog somewhere. I’ve indexed the most popular ones on the resource page. You can use the Search dialog on the right to find the rest. Or just send me a comment or question, and I’ll do my best to answer them as they come in.

    Read the article

  • Master Note for Generic Data Warehousing

    - by lajos.varady(at)oracle.com
    ++++++++++++++++++++++++++++++++++++++++++++++++++++ The complete and the most recent version of this article can be viewed from My Oracle Support Knowledge Section. Master Note for Generic Data Warehousing [ID 1269175.1] ++++++++++++++++++++++++++++++++++++++++++++++++++++In this Document   Purpose   Master Note for Generic Data Warehousing      Components covered      Oracle Database Data Warehousing specific documents for recent versions      Technology Network Product Homes      Master Notes available in My Oracle Support      White Papers      Technical Presentations Platforms: 1-914CU; This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review. Applies to: Oracle Server - Enterprise Edition - Version: 9.2.0.1 to 11.2.0.2 - Release: 9.2 to 11.2Information in this document applies to any platform. Purpose Provide navigation path Master Note for Generic Data Warehousing Components covered Read Only Materialized ViewsQuery RewriteDatabase Object PartitioningParallel Execution and Parallel QueryDatabase CompressionTransportable TablespacesOracle Online Analytical Processing (OLAP)Oracle Data MiningOracle Database Data Warehousing specific documents for recent versions 11g Release 2 (11.2)11g Release 1 (11.1)10g Release 2 (10.2)10g Release 1 (10.1)9i Release 2 (9.2)9i Release 1 (9.0)Technology Network Product HomesOracle Partitioning Advanced CompressionOracle Data MiningOracle OLAPMaster Notes available in My Oracle SupportThese technical articles have been written by Oracle Support Engineers to provide proactive and top level information and knowledge about the components of thedatabase we handle under the "Database Datawarehousing".Note 1166564.1 Master Note: Transportable Tablespaces (TTS) -- Common Questions and IssuesNote 1087507.1 Master Note for MVIEW 'ORA-' error diagnosis. For Materialized View CREATE or REFRESHNote 1102801.1 Master Note: How to Get a 10046 trace for a Parallel QueryNote 1097154.1 Master Note Parallel Execution Wait Events Note 1107593.1 Master Note for the Oracle OLAP OptionNote 1087643.1 Master Note for Oracle Data MiningNote 1215173.1 Master Note for Query RewriteNote 1223705.1 Master Note for OLTP Compression Note 1269175.1 Master Note for Generic Data WarehousingWhite Papers Transportable Tablespaces white papers Database Upgrade Using Transportable Tablespaces:Oracle Database 11g Release 1 (February 2009) Platform Migration Using Transportable Database Oracle Database 11g and 10g Release 2 (August 2008) Database Upgrade using Transportable Tablespaces: Oracle Database 10g Release 2 (April 2007) Platform Migration using Transportable Tablespaces: Oracle Database 10g Release 2 (April 2007)Parallel Execution and Parallel Query white papers Best Practices for Workload Management of a Data Warehouse on the Sun Oracle Database Machine (June 2010) Effective resource utilization by In-Memory Parallel Execution in Oracle Real Application Clusters 11g Release 2 (Feb 2010) Parallel Execution Fundamentals in Oracle Database 11g Release 2 (November 2009) Parallel Execution with Oracle Database 10g Release 2 (June 2005)Oracle Data Mining white paper Oracle Data Mining 11g Release 2 (March 2010)Partitioning white papers Partitioning with Oracle Database 11g Release 2 (September 2009) Partitioning in Oracle Database 11g (June 2007)Materialized Views and Query Rewrite white papers Oracle Materialized Views  and Query Rewrite (May 2005) Improving Performance using Query Rewrite in Oracle Database 10g (December 2003)Database Compression white papers Advanced Compression with Oracle Database 11g Release 2 (September 2009) Table Compression in Oracle Database 10g Release 2 (May 2005)Oracle OLAP white papers On-line Analytic Processing with Oracle Database 11g Release 2 (September 2009) Using Oracle Business Intelligence Enterprise Edition with the OLAP Option to Oracle Database 11g (July 2008)Generic white papers Enabling Pervasive BI through a Practical Data Warehouse Reference Architecture (February 2010) Optimizing and Protecting Storage with Oracle Database 11g Release 2 (November 2009) Oracle Database 11g for Data Warehousing and Business Intelligence (August 2009) Best practices for a Data Warehouse on Oracle Database 11g (September 2008)Technical PresentationsA selection of ObE - Oracle by Examples documents: Generic Using Basic Database Functionality for Data Warehousing (10g) Partitioning Manipulating Partitions in Oracle Database (11g Release 1) Using High-Speed Data Loading and Rolling Window Operations with Partitioning (11g Release 1) Using Partitioned Outer Join to Fill Gaps in Sparse Data (10g) Materialized View and Query Rewrite Using Materialized Views and Query Rewrite Capabilities (10g) Using the SQLAccess Advisor to Recommend Materialized Views and Indexes (10g) Oracle OLAP Using Microsoft Excel With Oracle 11g Cubes (how to analyze data in Oracle OLAP Cubes using Excel's native capabilities) Using Oracle OLAP 11g With Oracle BI Enterprise Edition (Creating OBIEE Metadata for OLAP 11g Cubes and querying those in BI Answers) Building OLAP 11g Cubes Querying OLAP 11g Cubes Creating Interactive APEX Reports Over OLAP 11g CubesSelection of presentations from the BIWA website:Extreme Data Warehousing With Exadata  by Hermann Baer (July 2010) (slides 2.5MB, recording 54MB)Data Mining Made Easy! Introducing Oracle Data Miner 11g Release 2 New "Work flow" GUI   by Charlie Berger (May 2010) (slides 4.8MB, recording 85MB )Best Practices for Deploying a Data Warehouse on Oracle Database 11g  by Maria Colgan (December 2009)  (slides 3MB, recording 18MB, white paper 3MB )

    Read the article

  • MS Sync framework - Identity crisis resolution by partitioning the primary key.

    - by user326136
    Hello, We implementing offline feature to an existing application. We have implemented the syn with SQL Server internal change tracking and over WCF using MS Sync Framework (http://msdn.microsoft.com/en-us/sync/default.aspx) All of our tables have primary key as integer, we cannot move to GUID. So as you are thinking we will have identity crises between applications. So we decided to go with the way Merge replication does(http://msdn.microsoft.com/en-us/library/aa179416(SQL.80).aspx) partition the primary key range. Below is the example scenario - Server Table A - ID Range - 0 to 100 Client 1 Table A - ID Range - 101 to 200 Client 2 Table A - ID Range - 201 to 300 how to implement this ? i know we can use BCC CHECKIDENT (yourtable, reseed, value) CHECK (([ID]<=(100))) but this does not solve the issue.... Merge replication provides an option of "Not for replication"(http://msdn.microsoft.com/en-us/library/aa237102(SQL.80).aspx) to achieve insert form clients and still maintain the set range.. can i use that somehow here? please help...

    Read the article

  • Improving Partitioned Table Join Performance

    - by Paul White
    The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) );   CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x,   CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY);   WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50);   -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE);   -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory:   Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME();   CREATE TABLE #P ( partition_number integer PRIMARY KEY);   INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1;   SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals;   DROP TABLE #P;   SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated  – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

    Read the article

  • 'Multi' partition recipe from Ubuntu alternate install CD without preseed

    - by Nick Meyer
    The Ubuntu/Debian installer includes a built-in guided partitioning recipe, called 'multi', which creates separate /home, /usr, /var, and /tmp partitions. It can be selected by starting the installer with a preseed file. You can see it described in the Karmic install guide: # You can choose one of the three predefined partitioning recipes: # - atomic: all files in one partition # - home: separate /home partition # - multi: separate /home, /usr, /var, and /tmp partitions d-i partman-auto/choose_recipe select atomic Is there any way to use guided partitioning with this recipe from the Ubuntu alternate install CD without the need to create a preseed file?

    Read the article

< Previous Page | 28 29 30 31 32 33 34 35 36 37 38 39  | Next Page >