Search Results

Search found 30858 results on 1235 pages for 'database tuning'.

Page 24/1235 | < Previous Page | 20 21 22 23 24 25 26 27 28 29 30 31  | Next Page >

  • DB2 on SPARC T3 Tuning Tips

    - by cherry.shu(at)oracle.com
    With the new self tuning feature in DB2 V9.x, a lot of database parameters are set to automatic in DB2 v9.7 by default so that DB2 can adjust the values as needed. Most should work fine without manual tweaks. But for transaction workload on SPARC T3 systems, two parameters need to be adjust manually to achieve optimal performance. DATABASE_MEMORY: When this parameter is set to AUTOMATIC and SELF_TUNING_MEM is set to ON, DB2 will allocate small page size (64KB) for all memory allocation, and expands and shrinks the memory as needed. In order to take advantage of the large page size (up to 256MB) supported by the SPARC T3, we need to manually set the size of the DATABASE_MEMORY so that DB2 can use 256MB page size for its buffer pools which are implemented as ISM segments. I know this sounds strange as it seems that you turn a switch and it ends up controlling another function. pmap(1M) output can verify the page sizes used by DB2 db2sysc process. NUM_IOCLEANERS: This parameter defines the number of page cleaners. The default value of this parameter is AUTOMATIC, which is calculated based on the number of available CPUs and the number of logical partitions. On a SPARC T3 system where there are over a hundred of virtual CPUs and single DB2 partition, DB2 would set it to #CPUs - 1. This would lead to too many page cleaners to compete flushing to disks and cause aio mutex lock contentions. So we need to decrease the value for it. The good practice is to set the value to the number of physical devices that are used by the database table space containers.

    Read the article

  • Geek City: Clearing Plans for a Single Database

    - by Kalen Delaney
    I know Friday afternoon isn't the best time for blogging, as everyone is going home now, and by Monday morning, this post will be old news. But I'm not shutting down just yet, and a something came up this week that I just realized not everybody knew about, so I decided to blog it. Many (or most?) of you are aware that you can clear all cached plans using DBCC FREEPROCCACHE. In addition, there are certain configuration options, for which changing their values will cause all plans in cache to be removed....(read more)

    Read the article

  • Here Comes the FY11 Earmarks Database

    - by Mike C
    I'm really interested in politics (don't worry, I'm not going to start bashing politicians and hammering you with political rage). The point is when the U.S. FY11 Omnibus Spending Bill (the bill to fund the U.S. Government for another year) was announced it piqued my interest. I'm fascinated by " earmarks " (also affectionally known as " pork "). For those who aren't familiar with U.S. politics, "earmark" is a slang term for "Congressionally Directed Spending". It's basically the set of provisions...(read more)

    Read the article

  • Geek City: Clearing Plans for a Single Database

    - by Kalen Delaney
    I know Friday afternoon isn't the best time for blogging, as everyone is going home now, and by Monday morning, this post will be old news. But I'm not shutting down just yet, and a something came up this week that I just realized not everybody knew about, so I decided to blog it. Many (or most?) of you are aware that you can clear all cached plans using DBCC FREEPROCCACHE. In addition, there are certain configuration options, for which changing their values will cause all plans in cache to be removed....(read more)

    Read the article

  • Storing Attendance Data in database

    - by Ali Abbas
    So i have to store daily attendance of employees of my organisation from my application . The part where I need some help is, the efficient way to store attendance data. After some research and brain storming I came up with some approaches . Could you point me out which one is the best and any unobvious ill effects of the mentioned approaches. The approaches are as follows Create a single table for whole organisation and store empid,date,presentstatus as a row for every employee everyday. Create a single table for whole organisation and store a single row for each day with a comma delimited string of empids which are absent. I will generate the string on my application. Create different tables for each department and follow the 1 method. Please share your views and do mention any other good methods

    Read the article

  • Database Administration as a Service

    A DBA should provide two things, a service and leadership. For Grant Fritchey, it was whilst serving a role in the Scouts of America that he had his epiphany. Creative chaos and energy, if tactfully harnessed and directed, led to effective ways to perform team-based tasks. Then he wondered why these skills couldn't be applied to the workplace. Are we DBAs doing it wrong in the way we interact with our co-workers?

    Read the article

  • Would a model like this translate well to a document or graph database?

    - by Eric
    I'm trying to understand what types of models that I have traditionally persisted relationally would translate well to some kind of NoSQL database. Suppose I have a model with the following relationships: Product 1-----0..N Order Customer 1-----0..N Order And suppose I need to frequently query things like All Orders, All Products, All Customers, All Orders for Given Customer, All Orders for Given Product. My feeling is that this kind of model would not denormalize cleanly - If I had Product and Customer documents with embedded Orders, both documents would have duplicate orders. So I think I'd need separate documents for all three entities. Does a characteristic like this typically indicate that a document database is not well suited for a given model? Generally speaking, would a document database perform as well as a relational database in this kind of situation? I know very little about graph databases, but I understand that a graph database handles relationships more performantly than a document database - would a graph database be suited for this kind of model?

    Read the article

  • Optimal Database design regarding functionality of letting user share posts by other users

    - by codecool
    I want to implement functionality which let user share posts by other users similar to what Facebook and Google+ share button and twitter retweet. There are 2 choices: 1) I create duplicate copy of the post and have a column which keeps track of the original post id and makes clear this is a shared post. 2) I have a separate table shared post where I save the post id which is a foreign key to post id in post table. Talking in terms of programming basically I keep pointer to the original post in a separate table and when need to get post posted by user and also shared ones I do a left join on post and shared post table Post(post_id(PK), post_content, posted_by) SharedPost(post_id(FK to Post.post_id), sharing_user, sharedfrom(in case someone shares from non owners profile)) I am in favour of second choice but wanted to know the advice of experts out there? One thing more posts on my webapp will be more on the lines of facebook size not tweet size.

    Read the article

  • Android application Database Framework

    - by Marek Sebera
    When creating mobile (specially Android) application, I usually come to touch with similar pattern of working with data. Usually I need to fetch some remote data (covered by authorization process) to local cache. And on next request: Check networking Check presence of cache file Check version of cache file (if networking) Get new version and save cache (if networking and file not in cache, or outdated) Data store is no-SQL JSON Document-Based (and yes, I know about CouchDB Android version, but it doesn't fit my needs yet.) Process of authorizing to data source and code for check version of local cache is adapted to application. But the other code (handling network, saving cache, handling exceptions,...) is always the same. Is there any Data Store helper I can use, which provides functions I described above?

    Read the article

  • How to deal with transactions when creating a database connection for each query

    - by webnoob
    In line with this post here I am going to change my website to create a connection per query to take advantage of .NET's connection pooling. With this in mind, I don't know how I should deal with transactions. At the moment I do something like (psuedo code): GlobalTransaction = GlobalDBConnection.BeginTransaction(); try { ExecSQL("insert into table ..") ExecSQL("update some_table ..") .... GlobalTransaction.Commit(); }catch{ GlobalTransaction.Rollback(); throw; } ExecSQL would be like this: using (SqlCommand Command = GlobalDBConnection.CreateCommand()) { Command.Connection = GlobalDBConnection; Command.Transaction = GlobalTransaction; Command.CommandText = SQLStr; Command.ExecuteNonQuery(); } I'm not quite sure how to change this concept to deal with transactions if the connection is created within ExecSQL because I would want the transaction to be shared between both the insert and update routines.

    Read the article

  • Looking for free, specific Ip2Location Database

    - by Andresch Serj
    I am searching for a free db (like an updated XML or CSV file) that relates IP addresses to specific locations. I want more information than just the country. I want some sort of region or city reference, even if that ends up to be a number that makes no sense to me. Doesn't have to be super correct or always up to date either. It is just to distinguish between user groups and not to monitor or spy on them.

    Read the article

  • Modular Database Structures

    - by John D
    I have been examining the code base we use in work and I am worried about the size the packages have grown to. The actual code is modular, procedures have been broken down into small functional (and testable) parts. The issue I see is that we have 100 procedures in a single package - almost an entire domain model. I had thought of breaking these packages down - to create sub domains that are centered around the procedure relationships to other objects. Group a bunch of procedures that have 80% of their relationships to three tables etc. The end result would be a lot more packages, but the packages would be smaller and I feel the entire code base would be more readable - when procedures cross between two domain models it is less of a struggle to figure which package it belongs to. The problem I now have is what the actual benefit of all this would really be. I looked at the general advantages of modularity: 1. Re-usability 2. Asynchronous Development 3. Maintainability Yet when I consider our latest development, the procedures within the packages are already reusable. At this advanced stage we rarely require asynchronous development - and when it is required we simply ladder the stories across iterations. So I guess my question is if people know of reasons why you would break down classes rather than just the methods inside of classes? Right now I do believe there is an issue with these mega packages forming but the only benefit I can really pin down to break them down is readability - something that experience gained from working with them would solve.

    Read the article

  • Free, specific Ip2Location Database

    - by Andresch Serj
    I am searching for a free db (like an updated xml or csv file) that relates ip adresses to specific locations. I want more information than just the Country. I want some sort of region or city refference, even if that ends up to be a number that makes no sense to me. Doesn't have to be super correct or always up to date either. It is just to distinguish between usergroups and not to monitor or spy on them.

    Read the article

  • Continuous Delivery and the Database

    Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of software delivery and deployment by making build, integration and delivery into a routine. The way that databases fit into the Continuous Delivery story has been less-well defined. Phil Factor explains why he's an enthusiast for databases being full participants, and suggests practical ways of doing so.

    Read the article

  • Set modified date = created date or null on record creation?

    - by User
    I've been following the convention of adding created and modified columns to most of my database tables. I also have been leaving the modified column as null on record creation and only setting a value on actual modification. The other alternative is to set the modified date to be equal to created date on record creation. I've been doing it the former way but I recent ran into one con which is seriously making me think of switching. I needed to set a database cache dependency to find out if any existing data has been changed or new data added. Instead of being able to do the following: SELECT MAX(modified) FROM customer I have to do this: SELECT GREATEST(MAX(created), MAX(modified)) FROM customer The negative being that it's a more complicated query and slower. Another thing is in file systems I believe they usually use the second convention of setting modified date = created date on creation. What are the pros and cons of the different methods? That is, what are the issues to consider?

    Read the article

  • Database in the cloud?

    - by Jlouro
    Some of my recent clients are asking for remote connections to the office server, for standalone work, etc, in winForm applications. Since the concept of the web is remote connection to a server both of data and resources, it should be possible to place both of this in cloud and have the winForm apps connect to it as if web Apps. As any one tested this, is working like this? Is it fast enough? Is it secure? What is the best cloud host for this type of work ?

    Read the article

  • SQL and Database: Where to start! [closed]

    - by Nizar
    First of all I just know HTML and CSS (this is my background in web development and design) and I have found that before I move to a server-side language I need to learn about databases and SQL. My first question: Do you think this order of learning is good (I mean to learn SQL after HTML and CSS)? My secod related question: Do I have to learn a lot about SQL and databases? or just the basics? and if you know any good beginners books please write their titles.

    Read the article

  • How should I implement Transaction database EJB 3.0

    - by JamesBoyZ
    In the CustomerTransactions entity, I have the following field to record what the customer bought: @ManyToMany private List<Item> listOfItemsBought; When I think more about this field, there's a chance it may not work because merchants are allowed to change item's information (e.g. price, discount, etc...). Hence, this field will not be able to record what the customer actually bought when the transaction occurred. At the moment, I can only think of 2 ways to make it work. I will record the transaction details into a String field. I feel that this way would be messy if I need to extract some information about the transaction later on. Whenever the merchant changes an item's information, I will not update directly to that item's fields. Instead, I will create another new item with all the new information and keep the old item untouched. I feel that this way is better because I can easily extract information about the transaction later on. However, the bad side is that my Item table may contain a lot of rows. I'd be very grateful if someone could give me an advice on how I should tackle this problem. UPDATE: I'd like to add more information about the current design. public class Customer implements Serializable { @OneToMany private List<CustomerTransactions> listOfTransactions; } public class CustomerTransactions implements Serializable { @ManyToMany private List<Item> listOfItemsBought; } public class Merchant implements Serializable { @OneToMany private List<Item> listOfSellingItems; }

    Read the article

  • ??????Oracle Enterprise Manager???????

    - by Yusuke.Yamamoto
    ????? ??:2010/10/19 ??:???? ?????????????????????????????????????????????????? Oracle Enterprise Manager(EM)????????????4?????EM ???????????????????????? ?1? ???????????????/ ???????????? Oracle Database????????????????EM ??????????????????2? EM ??????????/ ????????????? EM?????????????????????3? ????????????·???/ ????????????????Enterprise Edition ?????????Standard Edition ?????????????????????????????????·???????4? ?????????????????/ Oracle Database ???????? EM ?????????????????????????????????????????·????·?????? ????????? ????????????????? http://oracletech.jp/products/pickup/000028.html

    Read the article

  • SQL Server Database In Single User Mode after Failover

    - by jlichauc
    Here is a weird situation we experienced with a SQL Server 2008 Database Mirroring Failover. We have a pair of mirrored databases running in high-availability mode and both the principal and mirror showed as synchronized. As part of some maintenance I triggered a manual failover of the principal to the mirror. However after the failover the principal was now in single-user mode instead of the expected "Principal/Synchronized" state we usually get. The database had been in multi-user mode on the previous principal before this had happened. We ended up stopping all applications, restarting the SQL Server instances, and executing "ALTER DATABASE ... SET MULTI_USER" to bring the database back to the expected "Principal/Synchronized" state in a multi-user mode. Question. Does anyone know where SQL Server stores information about whether a database should be in single-user mode or not? I'm wondering if there is some system database or table that has this setting recorded somewhere. In particular we had an incident once with the database on the original principal (the one I was failing over to) where when trying to detach the database it was put into single-user mode. I'm wondering if that setting is cached somewhere and is the reason that SQL Server put it back into single-user mode after a failover.

    Read the article

  • Optimizing MySQL for ALTER TABLE of InnoDB

    - by schuilr
    Sometime soon we will need to make schema changes to our production database. We need to minimize downtime for this effort, however, the ALTER TABLE statements are going to run for quite a while. Our largest tables have 150 million records, largest table file is 50G. All tables are InnoDB, and it was set up as one big data file (instead of a file-per-table). We're running MySQL 5.0.46 on an 8 core machine, 16G memory and a RAID10 config. I have some experience with MySQL tuning, but this usually focusses on reads or writes from multiple clients. There is lots of info to be found on the Internet on this subject, however, there seems to be very little information available on best practices for (temporarily) tuning your MySQL server to speed up ALTER TABLE on InnoDB tables, or for INSERT INTO .. SELECT FROM (we will probably use this instead of ALTER TABLE to have some more opportunities to speed things up a bit). The schema changes we are planning to do is adding a integer column to all tables and make it the primary key, instead of the current primary key. We need to keep the 'old' column as well so overwriting the existing values is not an option. What would be the ideal settings to get this task done as quick as possible?

    Read the article

  • 4GB limitation on these embedded/express DBs good enough? what's next if limitation is reached?

    - by edwin.nathaniel
    I'm wondering how long a (theoretically) desktop-app can consume the full 4GB limitation of these express/embedded database products (SQL-Server Express, Oracle Express, SQLite3, etc) provided that big blobs will be stored in filesystem. Also what would be your strategy when it hits the 4GB? Archive the old DB Copy 1-3 months of data to the new DB (consider this as cache strategy?) Start using the new DB from this point onward (How do you access the old data?) I understand that the answer might varies depending on how much data you stored in the table/column. But please describe based on your experience (what kind of desktop-app, write/read heavy, how long will it reach according to your guess).

    Read the article

  • SSIS Lookup component tuning tips

    - by jamiet
    Yesterday evening I attended a London meeting of the UK SQL Server User Group at Microsoft’s offices in London Victoria. As usual it was both a fun and informative evening and in particular there seemed to be a few questions arising about tuning the SSIS Lookup component; I rattled off some comments and figured it would be prudent to drop some of them into a dedicated blog post, hence the one you are reading right now. Scene setting A popular pattern in SSIS is to use a Lookup component to determine whether a record in the pipeline already exists in the intended destination table or not and I cover this pattern in my 2006 blog post Checking if a row exists and if it does, has it changed? (note to self: must rewrite that blog post for SSIS2008). Fundamentally the SSIS lookup component (when using FullCache option) sucks some data out of a database and holds it in memory so that it can be compared to data in the pipeline. One of the big benefits of using SSIS dataflows is that they process data one buffer at a time; that means that not all of the data from your source exists in the dataflow at the same time and is why a SSIS dataflow can process data volumes that far exceed the available memory. However, that only applies to data in the pipeline; for reasons that are hopefully obvious ALL of the data in the lookup set must exist in the memory cache for the duration of the dataflow’s execution which means that any memory used by the lookup cache will not be available to be used as a pipeline buffer. Moreover, there’s an obvious correlation between the amount of data in the lookup cache and the time it takes to charge that cache; the more data you have then the longer it will take to charge and the longer you have to wait until the dataflow actually starts to do anything. For these reasons your goal is simple: ensure that the lookup cache contains as little data as possible. General tips Here is a simple tick list you can follow in order to tune your lookups: Use a SQL statement to charge your cache, don’t just pick a table from the dropdown list made available to you. (Read why in SELECT *... or select from a dropdown in an OLE DB Source component?) Only pick the columns that you need, ignore everything else Make the database columns that your cache is populated from as narrow as possible. If a column is defined as VARCHAR(20) then SSIS will allocate 20 bytes for every value in that column – that is a big waste if the actual values are significantly less than 20 characters in length. Do you need DT_WSTR typed columns or will DT_STR suffice? DT_WSTR uses twice the amount of space to hold values that can be stored using a DT_STR so if you can use DT_STR, consider doing so. Same principle goes for the numerical datatypes DT_I2/DT_I4/DT_I8. Only populate the cache with data that you KNOW you will need. In other words, think about your WHERE clause! Thinking outside the box It is tempting to build a large monolithic dataflow that does many things, one of which is a Lookup. Often though you can make better use of your available resources by, well, mixing things up a little and here are a few ideas to get your creative juices flowing: There is no rule that says everything has to happen in a single dataflow. If you have some particularly resource intensive lookups then consider putting that lookup into a dataflow all of its own and using raw files to pass the pipeline data in and out of that dataflow. Know your data. If you think, for example, that the majority of your incoming rows will match with only a small subset of your lookup data then consider chaining multiple lookup components together; the first would use a FullCache containing that data subset and the remaining data that doesn’t find a match could be passed to a second lookup that perhaps uses a NoCache lookup thus negating the need to pull all of that least-used lookup data into memory. Do you need to process all of your incoming data all at once? If you can process different partitions of your data separately then you can partition your lookup cache as well. For example, if you are using a lookup to convert a location into a [LocationId] then why not process your data one region at a time? This will mean your lookup cache only has to contain data for the location that you are currently processing and with the ability of the Lookup in SSIS2008 and beyond to charge the cache using a dynamically built SQL statement you’ll be able to achieve it using the same dataflow and simply loop over it using a ForEach loop. Taking the previous data partitioning idea further … a dataflow can contain more than one data path so why not split your data using a conditional split component and, again, charge your lookup caches with only the data that they need for that partition. Lookups have two uses: to (1) find a matching row from the lookup set and (2) put attributes from that matching row into the pipeline. Ask yourself, do you need to do these two things at the same time? After all once you have the key column(s) from your lookup set then you can use that key to get the rest of attributes further downstream, perhaps even in another dataflow. Are you using the same lookup data set multiple times? If so, consider the file caching option in SSIS 2008 and beyond. Above all, experiment and be creative with different combinations. You may be surprised at what works. Final  thoughts If you want to know more about how the Lookup component differs in SSIS2008 from SSIS2005 then I have a dedicated blog post about that at Lookup component gets a makeover. I am on a mini-crusade at the moment to get a BULK MERGE feature into the database engine, the thinking being that if the database engine can quickly merge massive amounts of data in a similar manner to how it can insert massive amounts using BULK INSERT then that’s a lot of work that wouldn’t have to be done in the SSIS pipeline. If you think that is a good idea then go and vote for BULK MERGE on Connect. If you have any other tips to share then please stick them in the comments. Hope this helps! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • The SSIS tuning tip that everyone misses

    - by Rob Farley
    I know that everyone misses this, because I’m yet to find someone who doesn’t have a bit of an epiphany when I describe this. When tuning Data Flows in SQL Server Integration Services, people see the Data Flow as moving from the Source to the Destination, passing through a number of transformations. What people don’t consider is the Source, getting the data out of a database. Remember, the source of data for your Data Flow is not your Source Component. It’s wherever the data is, within your database, probably on a disk somewhere. You need to tune your query to optimise it for SSIS, and this is what most people fail to do. I’m not suggesting that people don’t tune their queries – there’s plenty of information out there about making sure that your queries run as fast as possible. But for SSIS, it’s not about how fast your query runs. Let me say that again, but in bolder text: The speed of an SSIS Source is not about how fast your query runs. If your query is used in a Source component for SSIS, the thing that matters is how fast it starts returning data. In particular, those first 10,000 rows to populate that first buffer, ready to pass down the rest of the transformations on its way to the Destination. Let’s look at a very simple query as an example, using the AdventureWorks database: We’re picking the different Weight values out of the Product table, and it’s doing this by scanning the table and doing a Sort. It’s a Distinct Sort, which means that the duplicates are discarded. It'll be no surprise to see that the data produced is sorted. Obvious, I know, but I'm making a comparison to what I'll do later. Before I explain the problem here, let me jump back into the SSIS world... If you’ve investigated how to tune an SSIS flow, then you’ll know that some SSIS Data Flow Transformations are known to be Blocking, some are Partially Blocking, and some are simply Row transformations. Take the SSIS Sort transformation, for example. I’m using a larger data set for this, because my small list of Weights won’t demonstrate it well enough. Seven buffers of data came out of the source, but none of them could be pushed past the Sort operator, just in case the last buffer contained the data that would be sorted into the first buffer. This is a blocking operation. Back in the land of T-SQL, we consider our Distinct Sort operator. It’s also blocking. It won’t let data through until it’s seen all of it. If you weren’t okay with blocking operations in SSIS, why would you be happy with them in an execution plan? The source of your data is not your OLE DB Source. Remember this. The source of your data is the NCIX/CIX/Heap from which it’s being pulled. Picture it like this... the data flowing from the Clustered Index, through the Distinct Sort operator, into the SELECT operator, where a series of SSIS Buffers are populated, flowing (as they get full) down through the SSIS transformations. Alright, I know that I’m taking some liberties here, because the two queries aren’t the same, but consider the visual. The data is flowing from your disk and through your execution plan before it reaches SSIS, so you could easily find that a blocking operation in your plan is just as painful as a blocking operation in your SSIS Data Flow. Luckily, T-SQL gives us a brilliant query hint to help avoid this. OPTION (FAST 10000) This hint means that it will choose a query which will optimise for the first 10,000 rows – the default SSIS buffer size. And the effect can be quite significant. First let’s consider a simple example, then we’ll look at a larger one. Consider our weights. We don’t have 10,000, so I’m going to use OPTION (FAST 1) instead. You’ll notice that the query is more expensive, using a Flow Distinct operator instead of the Distinct Sort. This operator is consuming 84% of the query, instead of the 59% we saw from the Distinct Sort. But the first row could be returned quicker – a Flow Distinct operator is non-blocking. The data here isn’t sorted, of course. It’s in the same order that it came out of the index, just with duplicates removed. As soon as a Flow Distinct sees a value that it hasn’t come across before, it pushes it out to the operator on its left. It still has to maintain the list of what it’s seen so far, but by handling it one row at a time, it can push rows through quicker. Overall, it’s a lot more work than the Distinct Sort, but if the priority is the first few rows, then perhaps that’s exactly what we want. The Query Optimizer seems to do this by optimising the query as if there were only one row coming through: This 1 row estimation is caused by the Query Optimizer imagining the SELECT operation saying “Give me one row” first, and this message being passed all the way along. The request might not make it all the way back to the source, but in my simple example, it does. I hope this simple example has helped you understand the significance of the blocking operator. Now I’m going to show you an example on a much larger data set. This data was fetching about 780,000 rows, and these are the Estimated Plans. The data needed to be Sorted, to support further SSIS operations that needed that. First, without the hint. ...and now with OPTION (FAST 10000): A very different plan, I’m sure you’ll agree. In case you’re curious, those arrows in the top one are 780,000 rows in size. In the second, they’re estimated to be 10,000, although the Actual figures end up being 780,000. The top one definitely runs faster. It finished several times faster than the second one. With the amount of data being considered, these numbers were in minutes. Look at the second one – it’s doing Nested Loops, across 780,000 rows! That’s not generally recommended at all. That’s “Go and make yourself a coffee” time. In this case, it was about six or seven minutes. The faster one finished in about a minute. But in SSIS-land, things are different. The particular data flow that was consuming this data was significant. It was being pumped into a Script Component to process each row based on previous rows, creating about a dozen different flows. The data flow would take roughly ten minutes to run – ten minutes from when the data first appeared. The query that completes faster – chosen by the Query Optimizer with no hints, based on accurate statistics (rather than pretending the numbers are smaller) – would take a minute to start getting the data into SSIS, at which point the ten-minute flow would start, taking eleven minutes to complete. The query that took longer – chosen by the Query Optimizer pretending it only wanted the first 10,000 rows – would take only ten seconds to fill the first buffer. Despite the fact that it might have taken the database another six or seven minutes to get the data out, SSIS didn’t care. Every time it wanted the next buffer of data, it was already available, and the whole process finished in about ten minutes and ten seconds. When debugging SSIS, you run the package, and sit there waiting to see the Debug information start appearing. You look for the numbers on the data flow, and seeing operators going Yellow and Green. Without the hint, I’d sit there for a minute. With the hint, just ten seconds. You can imagine which one I preferred. By adding this hint, it felt like a magic wand had been waved across the query, to make it run several times faster. It wasn’t the case at all – but it felt like it to SSIS.

    Read the article

< Previous Page | 20 21 22 23 24 25 26 27 28 29 30 31  | Next Page >