database query - Page 60

Can I use an "IN" filter on an Advantage Database Table (TAdsTable)?

- by Anthony

I want to apply a filter to an advantage table using multiple values for an Integer field. The equivalent SQL would be: SELECT * FROM TableName WHERE FieldName IN (1, 2, 3) Is it possible to do the same on an AdsTable within having to repeat the field using an "OR"? I want to something like: Filter := 'FieldName IN (1, 2, 3)' Instead of: Filter := 'FieldName = 1 OR FieldName = 2 OR FieldName = 3'

Read the article

SQL Server 05, which is optimal, LIKE %<term>% or CONTAINS() for searching large column

- by Spud1

I've got a function written by another developer which I am trying to modify for a slightly different use. It is used by a SP to check if a certain phrase exists in a text document stored in the DB, and returns 1 if the value is found or 0 if its not. This is the query: SELECT @mres=1 from documents where id=@DocumentID and contains(text, @search_term) The document contains mostly XML, and the search_term is a GUID formatted as an nvarchar(40). This seems to run quite slowly to me (taking 5-6 seconds to execute this part of the process), but in the same script file there is also this version of the above, commented out. SELECT @mres=1 from documents where id=@DocumentID and textlike '%' + @search_term + '%' This version runs MUCH quicker, taking 4ms compared to 15ms for the first example. So, my question is why use the first over the second? I assume this developer (who is no longer working with me) had a good reason, but at the moment I am struggling to find it.. Is it possibly something to do with the full text indexing? (this is a dev DB I am working with, so the production version may have better indexing..) I am not that clued up on FTI really so not quite sure at the moment. Thoughts/ideas?

Read the article

Guide to reduce TFS database growth using the Test Attachment Cleaner

- by terje

Recently there has been several reports on TFS databases growing too fast and growing too big. Notable this has been observed when one has started to use more features of the Testing system. Also, the TFS 2010 handles test results differently from TFS 2008, and this leads to more data stored in the TFS databases. As a consequence of this there has been released some tools to remove unneeded data in the database, and also some fixes to correct for bugs which has been found and corrected during this process. Further some preventive practices and maintenance rules should be adopted. A lot of people have blogged about this, among these are: Anu’s very important blog post here describes both the problem and solutions to handle it. She describes both the Test Attachment Cleaner tool, and also some QFE/CU releases to fix some underlying bugs which prevented the tool from being fully effective. Brian Harry’s blog post here describes the problem too This forum thread describes the problem with some solution hints. Ravi Shanker’s blog post here describes best practices on solving this (TBP) Grant Holidays blogpost here describes strategies to use the Test Attachment Cleaner both to detect space problems and how to rectify them. The problem can be divided into the following areas: Publishing of test results from builds Publishing of manual test results and their attachments in particular Publishing of deployment binaries for use during a test run Bugs in SQL server preventing total cleanup of data (All the published data above is published into the TFS database as attachments.) The test results will include all data being collected during the run. Some of this data can grow rather large, like IntelliTrace logs and video recordings. Also the pushing of binaries which happen for automated test runs, including tests run during a build using code coverage which will include all the files in the deployment folder, contributes a lot to the size of the attached data. In order to handle this systematically, I have set up a 3-stage process: Find out if you have a database space issue Set up your TFS server to minimize potential database issues If you have the “problem”, clean up the database and otherwise keep it clean Analyze the data Are your database( s) growing ? Are unused test results growing out of proportion ? To find out about this you need to query your TFS database for some of the information, and use the Test Attachment Cleaner (TAC) to obtain some more detailed information. If you don’t have too many databases you can use the SQL Server reports from within the Management Studio to analyze the database and table sizes. Or, you can use a set of queries . I find queries often faster to use because I can tweak them the way I want them. But be aware that these queries are non-documented and non-supported and may change when the product team wants to change them. If you have multiple Project Collections, find out which might have problems: (Disclaimer: The queries below work on TFS 2010. They will not work on Dev-11, since the table structure have been changed. I will try to update them for Dev-11 when it is released.) Open a SQL Management Studio session onto the SQL Server where you have your TFS Databases. Use the query below to find the Project Collection databases and their sizes, in descending size order. use master select DB_NAME(database_id) AS DBName, (size/128) SizeInMB FROM sys.master_files where type=0 and substring(db_name(database_id),1,4)='Tfs_' and DB_NAME(database_id)<>'Tfs_Configuration' order by size desc Doing this on one of our SQL servers gives the following results: It is pretty easy to see on which collection to start the work Find out which tables are possibly too large Keep a special watch out for the Tfs_Attachment table. Use the script at the bottom of Grant’s blog to find the table sizes in descending size order. In our case we got this result: From Grant’s blog we learnt that the tbl_Content is in the Version Control category, so the major only big issue we have here is the tbl_AttachmentContent. Find out which team projects have possibly too large attachments In order to use the TAC to find and eventually delete attachment data we need to find out which team projects have these attachments. The team project is a required parameter to the TAC. Use the following query to find this, replace the collection database name with whatever applies in your case: use Tfs_DefaultCollection select p.projectname, sum(a.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a inner join tbl_testrun as tr on a.testrunid=tr.testrunid inner join tbl_project as p on p.projectid=tr.projectid group by p.projectname order by sum(a.compressedlength) desc In our case we got this result (had to remove some names), out of more than 100 team projects accumulated over quite some years: As can be seen here it is pretty obvious the “Byggtjeneste – Projects” are the main team project to take care of, with the ones on lines 2-4 as the next ones. Check which attachment types takes up the most space It can be nice to know which attachment types takes up the space, so run the following query: use Tfs_DefaultCollection select a.attachmenttype, sum(a.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a inner join tbl_testrun as tr on a.testrunid=tr.testrunid inner join tbl_project as p on p.projectid=tr.projectid group by a.attachmenttype order by sum(a.compressedlength) desc We then got this result: From this it is pretty obvious that the problem here is the binary files, as also mentioned in Anu’s blog. Check which file types, by their extension, takes up the most space Run the following query use Tfs_DefaultCollection select SUBSTRING(filename,len(filename)-CHARINDEX('.',REVERSE(filename))+2,999)as Extension, sum(compressedlength)/1024 as SizeInKB from tbl_Attachment group by SUBSTRING(filename,len(filename)-CHARINDEX('.',REVERSE(filename))+2,999) order by sum(compressedlength) desc This gives a result like this: Now you should have collected enough information to tell you what to do – if you got to do something, and some of the information you need in order to set up your TAC settings file, both for a cleanup and for scheduled maintenance later. Get your TFS server and environment properly set up Even if you have got the problem or if have yet not got the problem, you should ensure the TFS server is set up so that the risk of getting into this problem is minimized. To ensure this you should install the following set of updates and components. The assumption is that your TFS Server is at SP1 level. Install the QFE for KB2608743 – which also contains detailed instructions on its use, download from here. The QFE changes the default settings to not upload deployed binaries, which are used in automated test runs. Binaries will still be uploaded if: Code coverage is enabled in the test settings. You change the UploadDeploymentItem to true in the testsettings file. Be aware that this might be reset back to false by another user which haven't installed this QFE. The hotfix should be installed to The build servers (the build agents) The machine hosting the Test Controller Local development computers (Visual Studio) Local test computers (MTM) It is not required to install it to the TFS Server, test agents or the build controller – it has no effect on these programs. If you use the SQL Server 2008 R2 you should also install the CU 10 (or later). This CU fixes a potential problem of hanging “ghost” files. This seems to happen only in certain trigger situations, but to ensure it doesn’t bite you, it is better to make sure this CU is installed. There is no such CU for SQL Server 2008 pre-R2 Work around: If you suspect hanging ghost files, they can be – with some mental effort, deduced from the ghost counters using the following SQL query: use master SELECT DB_NAME(database_id) as 'database',OBJECT_NAME(object_id) as 'objectname', index_type_desc,ghost_record_count,version_ghost_record_count,record_count,avg_record_size_in_bytes FROM sys.dm_db_index_physical_stats (DB_ID(N'<DatabaseName>'), OBJECT_ID(N'<TableName>'), NULL, NULL , 'DETAILED') The problem is a stalled ghost cleanup process. Restarting the SQL server after having stopped all components that depends on it, like the TFS Server and SPS services – that is all applications that connect to the SQL server. Then restart the SQL server, and finally start up all dependent processes again. (I would guess a complete server reboot would do the trick too.) After this the ghost cleanup process will run properly again. The fix will come in the next CU cycle for SQL Server R2 SP1. The R2 pre-SP1 and R2 SP1 have separate maintenance cycles, and are maintained individually. Each have its own set of CU’s. When it comes I will add the link here to that CU. The "hanging ghost file” issue came up after one have run the TAC, and deleted enourmes amount of data. The SQL Server can get into this hanging state (without the QFE) in certain cases due to this. And of course, install and set up the Test Attachment Cleaner command line power tool. This should be done following some guidelines from Ravi Shanker: “When you run TAC, ensure that you are deleting small chunks of data at regular intervals (say run TAC every night at 3AM to delete data that is between age 730 to 731 days) – this will ensure that small amounts of data are being deleted and SQL ghosted record cleanup can catch up with the number of deletes performed. “ This rule minimizes the risk of the ghosted hang problem to occur, and further makes it easier for the SQL server ghosting process to work smoothly. “Run DBCC SHRINKDB post the ghosted records are cleaned up to physically reclaim the space on the file system” This is the last step in a 3 step process of removing SQL server data. First they are logically deleted. Then they are cleaned out by the ghosting process, and finally removed using the shrinkdb command. Cleaning out the attachments The TAC is run from the command line using a set of parameters and controlled by a settingsfile. The parameters point out a server uri including the team project collection and also point at a specific team project. So in order to run this for multiple team projects regularly one has to set up a script to run the TAC multiple times, once for each team project. When you install the TAC there is a very useful readme file in the same directory. When the deployment binaries are published to the TFS server, ALL items are published up from the deployment folder. That often means much more files than you would assume are necessary. This is a brute force technique. It works, but you need to take care when cleaning up. Grant has shown how their settings file looks in his blog post, removing all attachments older than 180 days , as long as there are no active workitems connected to them. This setting can be useful to clean out all items, both in a clean-up once operation, and in a general There are two scenarios we need to consider: Cleaning up an existing overgrown database Maintaining a server to avoid an overgrown database using scheduled TAC 1. Cleaning up a database which has grown too big due to these attachments. This job is a “Once” job. We do this once and then move on to make sure it won’t happen again, by taking the actions in 2) below. In this scenario you should only consider the large files. Your goal should be to simply reduce the size, and don’t bother about the smaller stuff. That can be left a scheduled TAC cleanup ( 2 below). Here you can use a very general settings file, and just remove the large attachments, or you can choose to remove any old items. Grant’s settings file is an example of the last one. A settings file to remove only large attachments could look like this:  <DeletionCriteria> <TestRun /> <Attachment> <SizeInMB GreaterThan="10" /> </Attachment> </DeletionCriteria> Or like this: If you want only to remove dll’s and pdb’s about that size, add an Extensions-section. Without that section, all extensions will be deleted.  <DeletionCriteria> <TestRun /> <Attachment> <SizeInMB GreaterThan="10" /> <Extensions> <Include value="dll" /> <Include value="pdb" /> </Extensions> </Attachment> </DeletionCriteria> Before you start up your scheduled maintenance, you should clear out all older items. 2. Scheduled maintenance using the TAC If you run a schedule every night, and remove old items, and also remove them in small batches. It is important to run this often, like every night, in order to keep the number of deleted items low. That way the SQL ghost process works better. One approach could be to delete all items older than some number of days, let’s say 180 days. This could be combined with restricting it to keep attachments with active or resolved bugs. Doing this every night ensures that only small amounts of data is deleted.  <DeletionCriteria> <TestRun> <AgeInDays OlderThan="180" /> </TestRun> <Attachment /> <LinkedBugs> <Exclude state="Active" /> <Exclude state="Resolved"/> </LinkedBugs> </DeletionCriteria> In my experience there are projects which are left with active or resolved workitems, akthough no further work is done. It can be wise to have a cleanup process with no restrictions on linked bugs at all. Note that you then have to remove the whole LinkedBugs section. A approach which could work better here is to do a two step approach, use the schedule above to with no LinkedBugs as a sweeper cleaning task taking away all data older than you could care about. Then have another scheduled TAC task to take out more specifically attachments that you are not likely to use. This task could be much more specific, and based on your analysis clean out what you know is troublesome data.  <DeletionCriteria> <TestRun > <AgeInDays OlderThan="30" /> </TestRun> <Attachment> <SizeInMB GreaterThan="10" /> <Extensions> <Include value="iTrace"/> <Include value="dll"/> <Include value="pdb"/> <Include value="wmv"/> </Extensions> </Attachment> <LinkedBugs> <Exclude state="Active" /> <Exclude state="Resolved" /> </LinkedBugs> </DeletionCriteria> The readme document for the TAC says that it recognizes “internal” extensions, but it does recognize any extension. To run the tool do the following command: tcmpt attachmentcleanup /collection:your_tfs_collection_url /teamproject:your_team_project /settingsfile:path_to_settingsfile /outputfile:%temp%/teamproject.tcmpt.log /mode:delete Shrinking the database You could run a shrink database command after the TAC has run in cases where there are a lot of data being deleted. In this case you SHOULD do it, to free up all that space. But, after the shrink operation you should do a rebuild indexes, since the shrink operation will leave the database in a very fragmented state, which will reduce performance. Note that you need to rebuild indexes, reorganizing is not enough. For smaller amounts of data you should NOT shrink the database, since the data will be reused by the SQL server when it need to add more records. In fact, it is regarded as a bad practice to shrink the database regularly. So on a daily maintenance schedule you should NOT shrink the database. To shrink the database you do a DBCC SHRINKDATABASE command, and then follow up with a DBCC INDEXDEFRAG afterwards. I find the easiest way to do this is to create a SQL Maintenance plan including the Shrink Database Task and the Rebuild Index Task and just execute it when you need to do this.

Read the article

SQL SERVER – DMV – sys.dm_exec_query_optimizer_info – Statistics of Optimizer

- by pinaldave

Incredibly, SQL Server has so much information to share with us. Every single day, I am amazed with this SQL Server technology. Sometimes I find several interesting information by just querying few of the DMV. And when I present this info in front of my client during performance tuning consultancy, they are surprised with my findings. Today, I am going to share one of the hidden gems of DMV with you, the one which I frequently use to understand what’s going on under the hood of SQL Server. SQL Server keeps the record of most of the operations of the Query Optimizer. We can learn many interesting details about the optimizer which can be utilized to improve the performance of server. SELECT * FROM sys.dm_exec_query_optimizer_info WHERE counter IN ('optimizations', 'elapsed time','final cost', 'insert stmt','delete stmt','update stmt', 'merge stmt','contains subquery','tables', 'hints','order hint','join hint', 'view reference','remote query','maximum DOP', 'maximum recursion level','indexed views loaded', 'indexed views matched','indexed views used', 'indexed views updated','dynamic cursor request', 'fast forward cursor request') All occurrence values are cumulative and are set to 0 at system restart. All values for value fields are set to NULL at system restart. I have removed a few of the internal counters from the script above, and kept only documented details. Let us check the result of the above query. As you can see, there is so much vital information that is revealed in above query. I can easily say so many things about how many times Optimizer was triggered and what the average time taken by it to optimize my queries was. Additionally, I can also determine how many times update, insert or delete statements were optimized. I was able to quickly figure out that my client was overusing the Query Hints using this dynamic management view. If you have been reading my blog, I am sure you are aware of my series related to SQL Server Views SQL SERVER – The Limitations of the Views – Eleven and more…. With this, I can take a quick look and figure out how many times Views were used in various solutions within the query. Moreover, you can easily know what fraction of the optimizations has been involved in tuning server. For example, the following query would tell me, in total optimizations, what the fraction of time View was “reference“. As this View also includes system Views and DMVs, the number is a bit higher on my machine. SELECT (SELECT CAST (occurrence AS FLOAT) FROM sys.dm_exec_query_optimizer_info WHERE counter = 'view reference') / (SELECT CAST (occurrence AS FLOAT) FROM sys.dm_exec_query_optimizer_info WHERE counter = 'optimizations') AS ViewReferencedFraction Reference : Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL DMV, SQL Optimization, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article

Oracle Database 11gR2 11.2.0.3 Certified with E-Business Suite on HP-UX PA-RISC

- by John Abraham

As a follow up to our original announcement, Oracle Database 11g Release 2 (11.2.0.3) is now certified with Oracle E-Business Suite Release 11i and Release 12 on the following HP-UX platforms: Release 11i (11.5.10.2 + ATG PF.H RUP 6 and higher) : HP-UX PA-RISC (64-bit) (11.31) Release 12 (12.0.4 and higher, 12.1.1 and higher): HP-UX PA-RISC (64-bit) (11.31) This announcement for Oracle E-Business Suite 11i and R12 includes: Real Application Clusters (RAC) Oracle Database Vault Transparent Data Encryption (Column Encryption) TDE Tablespace Encryption Advanced Security Option (ASO)/Advanced Networking Option (ANO) Export/Import Process for Oracle E-Business Suite Release 11i and Release 12 Database Instances Transportable Database and Transportable Tablespaces Data Migration Processes for Oracle E-Business Suite Release 11i and Release 12 References MOS Document 881505.1 - Interoperability Notes - Oracle E-Business Suite Release 11i with Oracle Database 11g Release 2 (11.2.0) MOS Document 1058763.1 - Interoperability Notes - Oracle E-Business Suite Release 12 with Oracle Database 11g Release 2 (11.2.0) MOS Document 1091086.1 - Integrating Oracle E-Business Suite Release 11i with Oracle Database Vault 11gR2 MOS Document 1091083.1 - Integrating Oracle E-Business Suite Release 12 with Oracle Database Vault 11gR2 MOS Document 216205.1 - Database Initialization Parameters for Oracle E-Business Suite 11i MOS Document 396009.1 - Database Initialization Parameters for Oracle Applications Release 12 MOS Document 761570.1 - Database Preparation Guidelines for an Oracle E-Business Suite Release 12.1.1 Upgrade MOS Document 823586.1 - Using Oracle 11g Release 2 Real Application Clusters with Oracle E-Business Suite Release 11i MOS Document 823587.1 - Using Oracle 11g Release 2 Real Application Clusters with Oracle E-Business Suite Release 12 MOS Document 403294.1 - Using Transparent Data Encryption (TDE) Column Encryption with Oracle E-Business Suite Release 11i MOS Document 732764.1 - Using Transparent Data Encryption (TDE) Column Encryption with Oracle E-Business Suite Release 12 MOS Document 828223.1 - Using TDE Tablespace Encryption with Oracle E-Business Suite Release 11i MOS Document 828229.1 - Using TDE Tablespace Encryption with Oracle E-Business Suite Release 12 MOS Document 391248.1 - Encrypting Oracle E-Business Suite Release 11i Network Traffic using Advanced Security Option and Advanced Networking Option MOS Document 732764.1 - Using Transparent Data Encryption (TDE) Column Encryption with Oracle E-Business Suite Release 12 MOS Document 557738.1 - Export/Import Process for Oracle E-Business Suite Release 11i Database Instances Using Oracle Database 11g Release 1 or 11g Release 2 MOS Document 741818.1 - Export/Import Process for Oracle E-Business Suite Release 12 Database Instances Using Oracle Database 11g Release 1 or 11g Release 2 MOS Document 1366265.1 - Using Transportable Tablespaces to Migrate Oracle Applications 11i Using Oracle Database 11g Release 2 MOS Document 1311487.1 - Using Transportable Tablespaces to Migrate Oracle E-Business Suite Release 12 Using Oracle Database 11g Release 2 MOS Document 729309.1 - Using Transportable Database to Migrate Oracle E-Business Suite Release 11i Using Oracle Database 10g Release 2 or 11g MOS Document 734763.1 - Using Transportable Database to Migrate Oracle E-Business Suite Release 12 Using Oracle Database 10g Release 2 or 11g Please also review the platform-specific Oracle Database Installation Guides for operating system and other prerequisites.

Read the article

SQL SERVER – Introduction to FIRST _VALUE and LAST_VALUE – Analytic Functions Introduced in SQL Server 2012

- by pinaldave

SQL Server 2012 introduces new analytical functions FIRST_VALUE() and LAST_VALUE(). This function returns first and last value from the list. It will be very difficult to explain this in words so I’d like to attempt to explain its function through a brief example. Instead of creating a new table, I will be using the AdventureWorks sample database as most developers use that for experiment purposes. Now let’s have fun following query: USE AdventureWorks GO SELECT s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty, FIRST_VALUE(SalesOrderDetailID) OVER (ORDER BY SalesOrderDetailID) FstValue, LAST_VALUE(SalesOrderDetailID) OVER (ORDER BY SalesOrderDetailID) LstValue FROM Sales.SalesOrderDetail s WHERE SalesOrderID IN (43670, 43669, 43667, 43663) ORDER BY s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty GO The above query will give us the following result: What’s the most interesting thing here is that as we go from row 1 to row 10, the value of the FIRST_VALUE() remains the same but the value of the LAST_VALUE is increasing. The reason behind this is that as we progress in every line – considering that line and all the other lines before it, the last value will be of the row where we are currently looking at. To fully understand this statement, see the following figure: This may be useful in some cases; but not always. However, when we use the same thing with PARTITION BY, the same query starts showing the result which can be easily used in analytical algorithms and needs. Let us have fun through the following query: Let us fun following query. USE AdventureWorks GO SELECT s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty, FIRST_VALUE(SalesOrderDetailID) OVER (PARTITION BY SalesOrderID ORDER BY SalesOrderDetailID) FstValue, LAST_VALUE(SalesOrderDetailID) OVER (PARTITION BY SalesOrderID ORDER BY SalesOrderDetailID) LstValue FROM Sales.SalesOrderDetail s WHERE SalesOrderID IN (43670, 43669, 43667, 43663) ORDER BY s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty GO The above query will give us the following result: Let us understand how PARTITION BY windows the resultset. I have used PARTITION BY SalesOrderID in my query. This will create small windows of the resultset from the original resultset and will follow the logic or FIRST_VALUE and LAST_VALUE in this resultset. Well, this is just an introduction to these functions. In the future blog posts we will go deeper to discuss the usage of these two functions. By the way, these functions can be applied over VARCHAR fields as well and are not limited to the numeric field only. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Function, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article

Best algorithm/practice when creating a search mechanism for your database?

- by Alex Hope O'Connor

I have been designing a database where it is very important to provide users with a good search mechanism. So I was wondering what some of the best practices are for using keywords to search over multiple database tables and return the relevent records? Some other things I am curious about: The users location, if they provide an address The speed of the algorithm Additional Information: I am using C# and LINQ-To-SQL.

Read the article

How to fix “Cannot connect to the configuration database.”

- by ybbest

The problem: When I browse to a SharePoint site, I got the Server Error in ‘/’ Application, Cannot connect to the configuration database. The Analysis: The reason you get the message is that SharePoint WFE cannot connect to the SQL database, you need to check the weather SQL server service is started as shown below. Solution: When checking the SQL Server service, I see it is not started. After starting the service, it works like a charm.

Read the article

What are the advantages of storing xml in a relational database?

- by Chris

I was poking around the AdventureWorks database today and I noticed that a number of tables (HumanResources.JobCandidate and Sales.Individual for example) have a column which is storing xml data. What I would to know is, what is the advantage of storing basically a database table row's worth of data in another table's column? Doesn't this make it difficult to query off of this information? Or is the assumption that the data won't need to be queried and just needs to be stored?

Read the article

Creating database connections - Do it once or for each query?

- by webnoob

At the moment I create a database connection when my web page is first loaded. I then process the page and run any queries against that conection. Is this the best way to do it or should I be creating a database connection each time I run a query? p.s It makes more sense to me to create 1 connection and use it but I don't know if this can cause any other issues. I am using C# (ASP.NET) with MSSQL.

Read the article

What are the advantages of storing xml in a relational database?

- by Chris

I was poking around the AdventureWorks database today and I noticed that a number of tables (HumanResources.JobCandidate and Sales.Individual for example) have a column which is storing xml data. What I would to know is, what is the advantage of storing basically a database table row's worth of data in another table's column? Doesn't this make it difficult to query off of this information? Or is the assumption that the data won't need to be queried and just needs to be stored?

Read the article

Authoritative sources about Database vs. Flatfile decision

- by FastAl

<tldr>looking for a reference to a book or other undeniably authoritative source that gives reasons when you should choose a database vs. when you should choose other storage methods. I have provided an un-authoritative list of reasons about 2/3 of the way down this post.</tldr> I have a situation at my company where a database is being used where it would be better to use another solution (in this case, an auto-generated piece of source code that contains a static lookup table, searched by binary sort). Normally, a database would be an OK solution even though the problem does not require a database, e.g, none of the elements of ACID are needed, as it is read-only data, updated about every 3-5 years (also requiring other sourcecode changes), and fits in memory, and can be keyed into via binary search (a tad faster than db, but speed is not an issue). The problem is that this code runs on our enterprise server, but is shared with several PC platforms (some disconnected, some use a central DB, etc.), and parts of it are managed by multiple programming units, parts by the DBAs, parts even by mathematicians in another department, etc. These hit their own platform’s version of their databases (containing their own copy of the static data). What happens is that every implementation, every little change, something different goes wrong. There are many other issues as well. I can’t even use a flatfile, because one mode of running on our enterprise server does not have permission to read files (only databases, and of course, its own literal storage, e.g., in-source table). Of course, other parts of the system use databases in proper, less obscure manners; there is no problem with those parts. So why don’t we just change it? I don’t have administrative ability to force a change. But I’m affected because sometimes I have to help fix the problems, but mostly because it causes outages and tons of extra IT time by other programmers and d*mmit that makes me mad! The reason neither management, nor the designers of the system, can see the problem is that they propose a solution that won’t work: increase communication; implement more safeguards and standards; etc. But every time, in a different part of the already-pared-down but still multi-step processes, a few different diligent, hard-working, top performing IT personnel make a unique subtle error that causes it to fail, sometimes after the last round of testing! And in general these are not single-person failures, but understandable miscommunications. And communication at our company is actually better than most. People just don't think that's the case because they haven't dug into the matter. However, I have it on very good word from somebody with extensive formal study of sociology and psychology that the relatively small amount of less-than-proper database usage in this gigantic cross-platform multi-source, multi-language project is bureaucratically un-maintainable. Impossible. No chance. At least with Human Beings in the loop, and it can’t be automated. In addition, the management and developers who could change this, though intelligent and capable, don’t understand the rigidity of this ‘how humans are’ issue, and are not convincible on the matter. The reason putting the static data in sourcecode will solve the problem is, although the solution is less sexy than a database, it would function with no technical drawbacks; and since the sharing of sourcecode already works very well, you basically erase any database-related effort from this section of the project, along with all the drawbacks of it that are causing problems. OK, that’s the background, for the curious. I won’t be able to convince management that this is an unfixable sociological problem, and that the real solution is coding around these limits of human nature, just as you would code around a bug in a 3rd party component that you can’t change. So what I have to do is exploit the unsuitableness of the database solution, and not do it using logic, but rather authority. I am aware of many reasons, and posts on this site giving reasons for one over the other; I’m not looking for lists of reasons like these (although you can add a comment if I've miss a doozy): WHY USE A DATABASE? instead of flatfile/other DB vs. file: if you need... Random Read / Transparent search optimization Advanced / varied / customizable Searching and sorting capabilities Transaction/rollback Locks, semaphores Concurrency control / Shared users Security 1-many/m-m is easier Easy modification Scalability Load Balancing Random updates / inserts / deletes Advanced query Administrative control of design, etc. SQL / learning curve Debugging / Logging Centralized / Live Backup capabilities Cached queries / dvlp & cache execution plans Interleaved update/read Referential integrity, avoid redundant/missing/corrupt/out-of-sync data Reporting (from on olap or oltp db) / turnkey generation tools [Disadvantages:] Important to get right the first time - professional design - but only b/c it's meant to last s/w & h/w cost Usu. over a network, speed issue (best vs. best design vs. local=even then a separate process req's marshalling/netwk layers/inter-p comm) indicies and query processing can stand in the way of simple processing (vs. flatfile) WHY USE FLATFILE: If you only need... Sequential Row processing only Limited usage append only (no reading, no master key/update) Only Update the record you're reading (fixed length recs only) Too big to fit into memory If Local disk / read-ahead network connection Portability / small system Email / cut & Paste / store as document by novice - simple format Low design learning curve but high cost later WHY USE IN-MEMORY/TABLE (tables, arrays, etc.): if you need... Processing a single db/ff record that was imported Known size of data Static data if hardcoding the table Narrow, unchanging use (e.g., one program or proc) -includes a class that will be shared, but encapsulates its data manipulation Extreme speed needed / high transaction frequency Random access - but search is dependent on implementation Following are some other posts about the topic: http://stackoverflow.com/questions/1499239/database-vs-flat-text-file-what-are-some-technical-reasons-for-choosing-one-over http://stackoverflow.com/questions/332825/are-flat-file-databases-any-good http://stackoverflow.com/questions/2356851/database-vs-flat-files http://stackoverflow.com/questions/514455/databases-vs-plain-text/514530 What I’d like to know is if anybody could recommend a hard, authoritative source containing these reasons. I’m looking for a paper book I can buy, or a reputable website with whitepapers about the issue (e.g., Microsoft, IBM), not counting the user-generated content on those sites. This will have a greater change to elicit a change that I’m looking for: less wasted programmer time, and more reliable programs. Thanks very much for your help. You win a prize for reading such a large post!

Read the article

restore content database in sharepoint server 2007

- by Boris

I have a site collection set up at web app running at port 80. I have made the backup of the site collection content db using stsadm.exe tool. Now, I want to restore that backup as a new content db of a different site collection - the one set up at web app running at port 500. I have done the following: Created a backup Created new web app at port 500 (I did not create a site collection for this web app) I have removed the content db of that new web app using Central Administration I have run the stsadm.exe -o addcontentdb -url webapp-at-port-500 -databasename Command is successfully completed, however when I check the Content Database page for that web app, it says that the Number of Sites is 0! Also, when I try to open http://webapp-at-port-500, I get the error saying that the webpage cannot be found. Could anyone please help me, it's driving me crazy. Thanks.

Read the article

Downloading Database dump from server

- by Ctroy

I have a mysql database on a server that is around 4 gb in size and I couldn't get it downloaded to my local machine. I tried getting a dump on the server, but the dump is not getting created probably because of the big size. Is there any way, I could get the dump downloaded on my local machine? I can syncing using Sqlyog but I know it will take ages. Is there a way, I can get a dump created on the server? By the way, my server is a linux server and its running php/mysql.

Read the article

SQL Server 2000 msdb database loading/suspect

- by Blake Parcell

My SQL Server recently suffered a raid controller/hard drive crash. After getting my hard drive problem corrected I soon found that some of my databases were (suspect) namely msdb. I am not a DBA by any means however am somewhat familiar with the daily SQL activities that happen on my server. So I restored from backup, and tried to bring my msdb database online. It is now forever stuck in (Loading\Suspect) and I am unable to script backups for my important databases. I can recreate all of the backup plans etc if i can somehow get a working msdb. Any help would be greatly appreciated. I am currently using: Microsoft SQL Server 2000 Version: 8.00.194

Read the article

How does fail2ban 0.9 database storage actually works?

- by Arantir

Fail2ban 0.9 introduce database storage to save bans on restart. But I can't find out the actual mechanism of it work. There is dbpurgeage parameter which controls lifetime of old bans, defaults to 24 hours. As I see from code research, fail2ban saves a ban to the db with timeofban equals to the moment of ban being saved. Then every dbpurgeage period it removes all bans with timeofban < MyTime.time() - self._purgeAge, in other words removes all bans have been stored more than 24 hours ago. But what if an IP was banned for the month? Does all this mean that with dbpurgeage = 86400 after restart in 24 hours I will lost all bans longer than 24 hours? I just want that all my permanent bans will be preserved in any case.

Read the article

Several Server Errors (No database connect, can't create TCP/IP socket etc

- by Tobias Baumeister

My server stops taking requests on my website today. It works for some time, but then the server just stops working and throws several errors: 500 Internal Server Error Warning: mysql_connect(): Can't create TCP/IP socket (105) in [...] on line 7 Couldn't connect to database. Please try again. mysql_connect(): Host [...] is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts' mod_fcgid: can't apply process slot for /var/www/cgi-bin/cgi_wrapper/cgi_wrapper (this is from Error Log) Any ideas what might cause it? operating system is Ubuntu

Read the article

Setting up MySQL database replication [without restarting mysql]

- by FunkyChicken

I'm trying to setup MySQL db replication, it seems pretty straight forward. I was using this tutorial: http://www.howtoforge.com/mysql_database_replication Now I run a rather large MySQL database for a very large website, and in this tutorial it asks me to restart MySQL to apply the new settings in the /etc/my.cnf file. I'm try to avoid that step at all costs, as I know that restarting MySQL can take a few minutes on my machine (due to large logs/dbs), and I don't want any downtime. Is there a way to apply the necessary settings WITHOUT fully restarting Mysql?

Read the article

MySQL query, 2 similar servers, 2 minute difference in execution times

- by mr12086

I had a similar question on stack overflow, but it seems to be more server/mysql setup related than coding. The queries below all execute instantly on our development server where as they can take upto 2 minutes 20 seconds. The query execution time seems to be affected by home ambiguous the LIKE string's are. If they closely match a country that has few matches it will take less time, and if you use something like 'ge' for germany - it will take longer to execute. But this doesn't always work out like that, at times its quite erratic. Sending data appears to be the culprit but why and what does that mean. Also memory on production looks to be quite low (free memory)? Production: Intel Quad Xeon E3-1220 3.1GHz 4GB DDR3 2x 1TB SATA in RAID1 Network speed 100Mb Ubuntu Development Intel Core i3-2100, 2C/4T, 3.10GHz 500 GB SATA - No RAID 4GB DDR3 UPDATE 2 : mysqltuner output: [prod] -------- General Statistics -------------------------------------------------- [--] Skipped version check for MySQLTuner script [OK] Currently running supported MySQL version 5.1.61-0ubuntu0.10.04.1 [OK] Operating on 64-bit architecture -------- Storage Engine Statistics ------------------------------------------- [--] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster [--] Data in MyISAM tables: 103M (Tables: 180) [--] Data in InnoDB tables: 491M (Tables: 19) [!!] Total fragmented tables: 38 -------- Security Recommendations ------------------------------------------- [OK] All database users have passwords assigned -------- Performance Metrics ------------------------------------------------- [--] Up for: 77d 4h 6m 1s (53M q [7.968 qps], 14M conn, TX: 87B, RX: 12B) [--] Reads / Writes: 98% / 2% [--] Total buffers: 58.0M global + 2.7M per thread (151 max threads) [OK] Maximum possible memory usage: 463.8M (11% of installed RAM) [OK] Slow queries: 0% (12K/53M) [OK] Highest usage of available connections: 22% (34/151) [OK] Key buffer size / total MyISAM indexes: 16.0M/10.6M [OK] Key buffer hit rate: 98.7% (162M cached / 2M reads) [OK] Query cache efficiency: 20.7% (7M cached / 36M selects) [!!] Query cache prunes per day: 3934 [OK] Sorts requiring temporary tables: 1% (3K temp sorts / 230K sorts) [!!] Joins performed without indexes: 71068 [OK] Temporary tables created on disk: 24% (3M on disk / 13M total) [OK] Thread cache hit rate: 99% (690 created / 14M connections) [!!] Table cache hit rate: 0% (64 open / 85M opened) [OK] Open file limit used: 12% (128/1K) [OK] Table locks acquired immediately: 99% (16M immediate / 16M locks) [!!] InnoDB data size / buffer pool: 491.9M/8.0M -------- Recommendations ----------------------------------------------------- General recommendations: Run OPTIMIZE TABLE to defragment tables for better performance Enable the slow query log to troubleshoot bad queries Adjust your join queries to always utilize indexes Increase table_cache gradually to avoid file descriptor limits Variables to adjust: query_cache_size (> 16M) join_buffer_size (> 128.0K, or always use indexes with joins) table_cache (> 64) innodb_buffer_pool_size (>= 491M) [dev] -------- General Statistics -------------------------------------------------- [--] Skipped version check for MySQLTuner script [OK] Currently running supported MySQL version 5.1.62-0ubuntu0.11.10.1 [!!] Switch to 64-bit OS - MySQL cannot currently use all of your RAM -------- Storage Engine Statistics ------------------------------------------- [--] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster [--] Data in MyISAM tables: 185M (Tables: 632) [--] Data in InnoDB tables: 967M (Tables: 38) [!!] Total fragmented tables: 73 -------- Security Recommendations ------------------------------------------- [OK] All database users have passwords assigned -------- Performance Metrics ------------------------------------------------- [--] Up for: 1d 2h 26m 9s (5K q [0.058 qps], 1K conn, TX: 4M, RX: 1M) [--] Reads / Writes: 99% / 1% [--] Total buffers: 58.0M global + 2.7M per thread (151 max threads) [OK] Maximum possible memory usage: 463.8M (11% of installed RAM) [OK] Slow queries: 0% (0/5K) [OK] Highest usage of available connections: 1% (2/151) [OK] Key buffer size / total MyISAM indexes: 16.0M/18.6M [OK] Key buffer hit rate: 99.9% (60K cached / 36 reads) [OK] Query cache efficiency: 44.5% (1K cached / 2K selects) [OK] Query cache prunes per day: 0 [OK] Sorts requiring temporary tables: 0% (0 temp sorts / 44 sorts) [OK] Temporary tables created on disk: 24% (162 on disk / 666 total) [OK] Thread cache hit rate: 99% (2 created / 1K connections) [!!] Table cache hit rate: 1% (64 open / 4K opened) [OK] Open file limit used: 8% (88/1K) [OK] Table locks acquired immediately: 100% (1K immediate / 1K locks) [!!] InnoDB data size / buffer pool: 967.7M/8.0M -------- Recommendations ----------------------------------------------------- General recommendations: Run OPTIMIZE TABLE to defragment tables for better performance Enable the slow query log to troubleshoot bad queries Increase table_cache gradually to avoid file descriptor limits Variables to adjust: table_cache (> 64) innodb_buffer_pool_size (>= 967M) UPDATE 1: When testing the queries listed here there is usually no more than one other query taking place, and usually none. Because production is actually handling apache requests that development gets very few of as it's only myself and 1 other who accesses it - could the 4GB of RAM be getting exhausted by using the single machine for both apache and mysql server? Production: sudo hdparm -tT /dev/sda /dev/sda: Timing cached reads: 24872 MB in 2.00 seconds = 12450.72 MB/sec Timing buffered disk reads: 368 MB in 3.00 seconds = 122.49 MB/sec sudo hdparm -tT /dev/sdb /dev/sdb: Timing cached reads: 24786 MB in 2.00 seconds = 12407.22 MB/sec Timing buffered disk reads: 350 MB in 3.00 seconds = 116.53 MB/sec Server version(mysql + ubuntu versions): 5.1.61-0ubuntu0.10.04.1 Development: sudo hdparm -tT /dev/sda /dev/sda: Timing cached reads: 10632 MB in 2.00 seconds = 5319.40 MB/sec Timing buffered disk reads: 400 MB in 3.01 seconds = 132.85 MB/sec Server version(mysql + ubuntu versions): 5.1.62-0ubuntu0.11.10.1 ORIGINAL DATA : This query is NOT the query in question but is related so ill post it. SELECT f.form_question_has_answer_id FROM form_question_has_answer f INNER JOIN project_company_has_user p ON f.form_question_has_answer_user_id = p.project_company_has_user_user_id INNER JOIN company c ON p.project_company_has_user_company_id = c.company_id INNER JOIN project p2 ON p.project_company_has_user_project_id = p2.project_id INNER JOIN user u ON p.project_company_has_user_user_id = u.user_id INNER JOIN form f2 ON p.project_company_has_user_project_id = f2.form_project_id WHERE (f2.form_template_name = 'custom' AND p.project_company_has_user_garbage_collection = 0 AND p.project_company_has_user_project_id = '29') AND (LCASE(c.company_country) LIKE '%ge%' OR LCASE(c.company_country) LIKE '%abcde%') AND f.form_question_has_answer_form_id = '174' And the explain plan for the above query is, run on both dev and production produce the same plan. +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+----------------------------------------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+----------------------------------------------------+------+-------------+ | 1 | SIMPLE | p2 | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index | | 1 | SIMPLE | f | ref | form_question_has_answer_form_id,form_question_has_answer_user_id | form_question_has_answer_form_id | 4 | const | 796 | Using where | | 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | new_klarents.f.form_question_has_answer_user_id | 1 | Using index | | 1 | SIMPLE | p | ref | project_company_has_user_unique_key,project_company_has_user_user_id,project_company_has_user_company_id,project_company_has_user_project_id | project_company_has_user_user_id | 4 | new_klarents.f.form_question_has_answer_user_id | 1 | Using where | | 1 | SIMPLE | f2 | ref | form_project_id | form_project_id | 4 | const | 15 | Using where | | 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 4 | new_klarents.p.project_company_has_user_company_id | 1 | Using where | +----+-------------+-------+--------+----------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+----------------------------------------------------+------+-------------+ This query takes 2 minutes ~20 seconds to execute. The query that is ACTUALLY being run on the server is this one: SELECT COUNT(*) AS num_results FROM (SELECT f.form_question_has_answer_id FROM form_question_has_answer f INNER JOIN project_company_has_user p ON f.form_question_has_answer_user_id = p.project_company_has_user_user_id INNER JOIN company c ON p.project_company_has_user_company_id = c.company_id INNER JOIN project p2 ON p.project_company_has_user_project_id = p2.project_id INNER JOIN user u ON p.project_company_has_user_user_id = u.user_id INNER JOIN form f2 ON p.project_company_has_user_project_id = f2.form_project_id WHERE (f2.form_template_name = 'custom' AND p.project_company_has_user_garbage_collection = 0 AND p.project_company_has_user_project_id = '29') AND (LCASE(c.company_country) LIKE '%ge%' OR LCASE(c.company_country) LIKE '%abcde%') AND f.form_question_has_answer_form_id = '174' GROUP BY f.form_question_has_answer_id;) dctrn_count_query; With explain plans (again same on dev and production): +----+-------------+-------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+----------------------------------------------------+------+------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+----------------------------------------------------+------+------------------------------+ | 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | | 2 | DERIVED | p2 | const | PRIMARY | PRIMARY | 4 | | 1 | Using index | | 2 | DERIVED | f | ref | form_question_has_answer_form_id,form_question_has_answer_user_id | form_question_has_answer_form_id | 4 | | 797 | Using where | | 2 | DERIVED | p | ref | project_company_has_user_unique_key,project_company_has_user_user_id,project_company_has_user_company_id,project_company_has_user_project_id,project_company_has_user_garbage_collection | project_company_has_user_user_id | 4 | new_klarents.f.form_question_has_answer_user_id | 1 | Using where | | 2 | DERIVED | f2 | ref | form_project_id | form_project_id | 4 | | 15 | Using where | | 2 | DERIVED | c | eq_ref | PRIMARY | PRIMARY | 4 | new_klarents.p.project_company_has_user_company_id | 1 | Using where | | 2 | DERIVED | u | eq_ref | PRIMARY | PRIMARY | 4 | new_klarents.p.project_company_has_user_user_id | 1 | Using where; Using index | +----+-------------+-------+--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+---------+----------------------------------------------------+------+------------------------------+ On the production server the information I have is as follows. Upon execution: +-------------+ | num_results | +-------------+ | 3 | +-------------+ 1 row in set (2 min 14.28 sec) Show profile: +--------------------------------+------------+ | Status | Duration | +--------------------------------+------------+ | starting | 0.000016 | | checking query cache for query | 0.000057 | | Opening tables | 0.004388 | | System lock | 0.000003 | | Table lock | 0.000036 | | init | 0.000030 | | optimizing | 0.000016 | | statistics | 0.000111 | | preparing | 0.000022 | | executing | 0.000004 | | Sorting result | 0.000002 | | Sending data | 136.213836 | | end | 0.000007 | | query end | 0.000002 | | freeing items | 0.004273 | | storing result in query cache | 0.000010 | | logging slow query | 0.000001 | | logging slow query | 0.000002 | | cleaning up | 0.000002 | +--------------------------------+------------+ On development the results are as follows. +-------------+ | num_results | +-------------+ | 3 | +-------------+ 1 row in set (0.08 sec) Again the profile for this query: +--------------------------------+----------+ | Status | Duration | +--------------------------------+----------+ | starting | 0.000022 | | checking query cache for query | 0.000148 | | Opening tables | 0.000025 | | System lock | 0.000008 | | Table lock | 0.000101 | | optimizing | 0.000035 | | statistics | 0.001019 | | preparing | 0.000047 | | executing | 0.000008 | | Sorting result | 0.000005 | | Sending data | 0.086565 | | init | 0.000015 | | optimizing | 0.000006 | | executing | 0.000020 | | end | 0.000004 | | query end | 0.000004 | | freeing items | 0.000028 | | storing result in query cache | 0.000005 | | removing tmp table | 0.000008 | | closing tables | 0.000008 | | logging slow query | 0.000002 | | cleaning up | 0.000005 | +--------------------------------+----------+ If i remove user and/or project innerjoins the query is reduced to 30s. Last bit of information I have: Mysqlserver and Apache are on the same box, there is only one box for production. Production output from top: before & after. top - 15:43:25 up 78 days, 12:11, 4 users, load average: 1.42, 0.99, 0.78 Tasks: 162 total, 2 running, 160 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 50.4%sy, 0.0%ni, 49.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4037868k total, 3772580k used, 265288k free, 243704k buffers Swap: 3905528k total, 265384k used, 3640144k free, 1207944k cached top - 15:44:31 up 78 days, 12:13, 4 users, load average: 1.94, 1.23, 0.87 Tasks: 160 total, 2 running, 157 sleeping, 0 stopped, 1 zombie Cpu(s): 0.2%us, 50.6%sy, 0.0%ni, 49.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4037868k total, 3834300k used, 203568k free, 243736k buffers Swap: 3905528k total, 265384k used, 3640144k free, 1207804k cached But this isn't a good representation of production's normal status so here is a grab of it from today outside of executing the queries. top - 11:04:58 up 79 days, 7:33, 4 users, load average: 0.39, 0.58, 0.76 Tasks: 156 total, 1 running, 155 sleeping, 0 stopped, 0 zombie Cpu(s): 3.3%us, 2.8%sy, 0.0%ni, 93.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4037868k total, 3676136k used, 361732k free, 271480k buffers Swap: 3905528k total, 268736k used, 3636792k free, 1063432k cached Development: This one doesn't change during or after. top - 15:47:07 up 110 days, 22:11, 7 users, load average: 0.17, 0.07, 0.06 Tasks: 210 total, 2 running, 208 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4111972k total, 1821100k used, 2290872k free, 238860k buffers Swap: 4183036k total, 66472k used, 4116564k free, 921072k cached

Read the article

Take all fields in a database table and put them straight into a text file

- by DalexL

I have an database file (mdb) file that contains a dictionary of words. A couple thousand of them. I just need the words (in the order they are already in) put into a text file. Currently they have ID's associated with them (e.g. 1, 2, 3) but I don't need it. I just need the words. What is the best way to do this? Actually, if somebody is able to find a dictionary of English words (something along the lines of a scrabble dictionary) that is free online, I'll accept that too. I just can't seem to find any good ones online.

Read the article

Maximum execution time of 300 seconds exceeded error while importing large MySQL database

- by Spacedust

I'm trying to import 641 MB MySQL database with a command: mysql -u root -p ddamiane_fakty < domenyin_damian_fakty.sql but I got an error: ERROR 1064 (42000) at line 2351406: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '<br /> <b>Fatal error</b>: Maximum execution time of 300 seconds exceeded in <b' at line 253 However limits are set much higher: mysql> show global variables like "interactive_timeout"; +---------------------+-------+ | Variable_name | Value | +---------------------+-------+ | interactive_timeout | 28800 | +---------------------+-------+ 1 row in set (0.00 sec) and mysql> show global variables like "wait_timeout"; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | wait_timeout | 28800 | +---------------+-------+ 1 row in set (0.00 sec)

Read the article

Using mongodump with an auth enabled mongodb server

- by bb-generation

I'm trying to do a daily backup of my mongodb server (auth enabled) using the mongodump tool. mongodump provides two parameters to set the credentials: -u [ --username ] arg username -p [ --password ] arg password Unfortunately they don't provide any parameter to read the password from stdin. Therefore everytime I run this command, everyone on the server can read the password (e.g. by using ps aux). The only workaround I have found is stopping the database and directly accessing the database files using the --dbpath parameter. Is there any other solution which allows me to backup the mongodb database without stopping the server and without "publishing" my password? I am using Debian squeeze 6.0.5 amd64 with mongodb 1.4.4-3.

Read the article

Looking for a central image database and tagging system for a group of users

- by jstarek

I'm doing IT support for a small volunteer organisation who needs to centrally store and organize around 2500 photos. Can anyone here recommend a database or similar system which matches the following criteria: Intuitive to use for users with little computer experience Multi-user support, ideally with integration in our existing LDAP user directory Should have a web-interface Not a hosted solution like Picasa (because we have a rather slow internet connection with very slow upload) Should allow tagging of images, sorting by various criteria and storing copyright information If there are native GNOME and/or Windows clients for the tool, that would be a great benefit. Many thanks in advance!

Read the article

Installing Oracle11gr2 on redhat linux

- by KItis

I have basic question about installing applications on linux operating system. i am going to express my issue considering oracle db installation as a example. when installing oracle database , i created a user group called dba and and user in this group called ora112. so this users is allowed to install database. so my question is if ora112 uses umaks is set to 077, then no other uses will be able to configure oracle database. why do we need to follow this practice. is it a accepted procedure in application installation on Linux. please share your experience with me. thanks in advance for looking into this issue say i install java application on this way. then no other application which belongs to different user account won't be able use java running on this computer because of this access restriction.

Read the article

How to find and fix performance problems in ORM powered applications

- by FransBouma

Once in a while we get requests about how to fix performance problems with our framework. As it comes down to following the same steps and looking into the same things every single time, I decided to write a blogpost about it instead, so more people can learn from this and solve performance problems in their O/R mapper powered applications. In some parts it's focused on LLBLGen Pro but it's also usable for other O/R mapping frameworks, as the vast majority of performance problems in O/R mapper powered applications are not specific for a certain O/R mapper framework. Too often, the developer looks at the wrong part of the application, trying to fix what isn't a problem in that part, and getting frustrated that 'things are so slow with <insert your favorite framework X here>'. I'm in the O/R mapper business for a long time now (almost 10 years, full time) and as it's a small world, we O/R mapper developers know almost all tricks to pull off by now: we all know what to do to make task ABC faster and what compromises (because there are almost always compromises) to deal with if we decide to make ABC faster that way. Some O/R mapper frameworks are faster in X, others in Y, but you can be sure the difference is mainly a result of a compromise some developers are willing to deal with and others aren't. That's why the O/R mapper frameworks on the market today are different in many ways, even though they all fetch and save entities from and to a database. I'm not suggesting there's no room for improvement in today's O/R mapper frameworks, there always is, but it's not a matter of 'the slowness of the application is caused by the O/R mapper' anymore. Perhaps query generation can be optimized a bit here, row materialization can be optimized a bit there, but it's mainly coming down to milliseconds. Still worth it if you're a framework developer, but it's not much compared to the time spend inside databases and in user code: if a complete fetch takes 40ms or 50ms (from call to entity object collection), it won't make a difference for your application as that 10ms difference won't be noticed. That's why it's very important to find the real locations of the problems so developers can fix them properly and don't get frustrated because their quest to get a fast, performing application failed. Performance tuning basics and rules Finding and fixing performance problems in any application is a strict procedure with four prescribed steps: isolate, analyze, interpret and fix, in that order. It's key that you don't skip a step nor make assumptions: these steps help you find the reason of a problem which seems to be there, and how to fix it or leave it as-is. Skipping a step, or when you assume things will be bad/slow without doing analysis will lead to the path of premature optimization and won't actually solve your problems, only create new ones. The most important rule of finding and fixing performance problems in software is that you have to understand what 'performance problem' actually means. Most developers will say "when a piece of software / code is slow, you have a performance problem". But is that actually the case? If I write a Linq query which will aggregate, group and sort 5 million rows from several tables to produce a resultset of 10 rows, it might take more than a couple of milliseconds before that resultset is ready to be consumed by other logic. If I solely look at the Linq query, the code consuming the resultset of the 10 rows and then look at the time it takes to complete the whole procedure, it will appear to me to be slow: all that time taken to produce and consume 10 rows? But if you look closer, if you analyze and interpret the situation, you'll see it does a tremendous amount of work, and in that light it might even be extremely fast. With every performance problem you encounter, always do realize that what you're trying to solve is perhaps not a technical problem at all, but a perception problem. The second most important rule you have to understand is based on the old saying "Penny wise, Pound Foolish": the part which takes e.g. 5% of the total time T for a given task isn't worth optimizing if you have another part which takes a much larger part of the total time T for that same given task. Optimizing parts which are relatively insignificant for the total time taken is not going to bring you better results overall, even if you totally optimize that part away. This is the core reason why analysis of the complete set of application parts which participate in a given task is key to being successful in solving performance problems: No analysis -> no problem -> no solution. One warning up front: hunting for performance will always include making compromises. Fast software can be made maintainable, but if you want to squeeze as much performance out of your software, you will inevitably be faced with the dilemma of compromising one or more from the group {readability, maintainability, features} for the extra performance you think you'll gain. It's then up to you to decide whether it's worth it. In almost all cases it's not. The reason for this is simple: the vast majority of performance problems can be solved by implementing the proper algorithms, the ones with proven Big O-characteristics so you know the performance you'll get plus you know the algorithm will work. The time taken by the algorithm implementing code is inevitable: you already implemented the best algorithm. You might find some optimizations on the technical level but in general these are minor. Let's look at the four steps to see how they guide us through the quest to find and fix performance problems. Isolate The first thing you need to do is to isolate the areas in your application which are assumed to be slow. For example, if your application is a web application and a given page is taking several seconds or even minutes to load, it's a good candidate to check out. It's important to start with the isolate step because it allows you to focus on a single code path per area with a clear begin and end and ignore the rest. The rest of the steps are taken per identified problematic area. Keep in mind that isolation focuses on tasks in an application, not code snippets. A task is something that's started in your application by either another task or the user, or another program, and has a beginning and an end. You can see a task as a piece of functionality offered by your application. Analyze Once you've determined the problem areas, you have to perform analysis on the code paths of each area, to see where the performance problems occur and which areas are not the problem. This is a multi-layered effort: an application which uses an O/R mapper typically consists of multiple parts: there's likely some kind of interface (web, webservice, windows etc.), a part which controls the interface and business logic, the O/R mapper part and the RDBMS, all connected with either a network or inter-process connections provided by the OS or other means. Each of these parts, including the connectivity plumbing, eat up a part of the total time it takes to complete a task, e.g. load a webpage with all orders of a given customer X. To understand which parts participate in the task / area we're investigating and how much they contribute to the total time taken to complete the task, analysis of each participating task is essential. Start with the code you wrote which starts the task, analyze the code and track the path it follows through your application. What does the code do along the way, verify whether it's correct or not. Analyze whether you have implemented the right algorithms in your code for this particular area. Remember we're looking at one area at a time, which means we're ignoring all other code paths, just the code path of the current problematic area, from begin to end and back. Don't dig in and start optimizing at the code level just yet. We're just analyzing. If your analysis reveals big architectural stupidity, it's perhaps a good idea to rethink the architecture at this point. For the rest, we're analyzing which means we collect data about what could be wrong, for each participating part of the complete application. Reviewing the code you wrote is a good tool to get deeper understanding of what is going on for a given task but ultimately it lacks precision and overview what really happens: humans aren't good code interpreters, computers are. We therefore need to utilize tools to get deeper understanding about which parts contribute how much time to the total task, triggered by which other parts and for example how many times are they called. There are two different kind of tools which are necessary: .NET profilers and O/R mapper / RDBMS profilers. .NET profiling .NET profilers (e.g. dotTrace by JetBrains or Ants by Red Gate software) show exactly which pieces of code are called, how many times they're called, and the time it took to run that piece of code, at the method level and sometimes even at the line level. The .NET profilers are essential tools for understanding whether the time taken to complete a given task / area in your application is consumed by .NET code, where exactly in your code, the path to that code, how many times that code was called by other code and thus reveals where hotspots are located: the areas where a solution can be found. Importantly, they also reveal which areas can be left alone: remember our penny wise pound foolish saying: if a profiler reveals that a group of methods are fast, or don't contribute much to the total time taken for a given task, ignore them. Even if the code in them is perhaps complex and looks like a candidate for optimization: you can work all day on that, it won't matter. As we're focusing on a single area of the application, it's best to start profiling right before you actually activate the task/area. Most .NET profilers support this by starting the application without starting the profiling procedure just yet. You navigate to the particular part which is slow, start profiling in the profiler, in your application you perform the actions which are considered slow, and afterwards you get a snapshot in the profiler. The snapshot contains the data collected by the profiler during the slow action, so most data is produced by code in the area to investigate. This is important, because it allows you to stay focused on a single area. O/R mapper and RDBMS profiling .NET profilers give you a good insight in the .NET side of things, but not in the RDBMS side of the application. As this article is about O/R mapper powered applications, we're also looking at databases, and the software making it possible to consume the database in your application: the O/R mapper. To understand which parts of the O/R mapper and database participate how much to the total time taken for task T, we need different tools. There are two kind of tools focusing on O/R mappers and database performance profiling: O/R mapper profilers and RDBMS profilers. For O/R mapper profilers, you can look at LLBLGen Prof by hibernating rhinos or the Linq to Sql/LLBLGen Pro profiler by Huagati. Hibernating rhinos also have profilers for other O/R mappers like NHibernate (NHProf) and Entity Framework (EFProf) and work the same as LLBLGen Prof. For RDBMS profilers, you have to look whether the RDBMS vendor has a profiler. For example for SQL Server, the profiler is shipped with SQL Server, for Oracle it's build into the RDBMS, however there are also 3rd party tools. Which tool you're using isn't really important, what's important is that you get insight in which queries are executed during the task / area we're currently focused on and how long they took. Here, the O/R mapper profilers have an advantage as they collect the time it took to execute the query from the application's perspective so they also collect the time it took to transport data across the network. This is important because a query which returns a massive resultset or a resultset with large blob/clob/ntext/image fields takes more time to get transported across the network than a small resultset and a database profiler doesn't take this into account most of the time. Another tool to use in this case, which is more low level and not all O/R mappers support it (though LLBLGen Pro and NHibernate as well do) is tracing: most O/R mappers offer some form of tracing or logging system which you can use to collect the SQL generated and executed and often also other activity behind the scenes. While tracing can produce a tremendous amount of data in some cases, it also gives insight in what's going on. Interpret After we've completed the analysis step it's time to look at the data we've collected. We've done code reviews to see whether we've done anything stupid and which parts actually take place and if the proper algorithms have been implemented. We've done .NET profiling to see which parts are choke points and how much time they contribute to the total time taken to complete the task we're investigating. We've performed O/R mapper profiling and RDBMS profiling to see which queries were executed during the task, how many queries were generated and executed and how long they took to complete, including network transportation. All this data reveals two things: which parts are big contributors to the total time taken and which parts are irrelevant. Both aspects are very important. The parts which are irrelevant (i.e. don't contribute significantly to the total time taken) can be ignored from now on, we won't look at them. The parts which contribute a lot to the total time taken are important to look at. We now have to first look at the .NET profiler results, to see whether the time taken is consumed in our own code, in .NET framework code, in the O/R mapper itself or somewhere else. For example if most of the time is consumed by DbCommand.ExecuteReader, the time it took to complete the task is depending on the time the data is fetched from the database. If there was just 1 query executed, according to tracing or O/R mapper profilers / RDBMS profilers, check whether that query is optimal, uses indexes or has to deal with a lot of data. Interpret means that you follow the path from begin to end through the data collected and determine where, along the path, the most time is contributed. It also means that you have to check whether this was expected or is totally unexpected. My previous example of the 10 row resultset of a query which groups millions of rows will likely reveal that a long time is spend inside the database and almost no time is spend in the .NET code, meaning the RDBMS part contributes the most to the total time taken, the rest is compared to that time, irrelevant. Considering the vastness of the source data set, it's expected this will take some time. However, does it need tweaking? Perhaps all possible tweaks are already in place. In the interpret step you then have to decide that further action in this area is necessary or not, based on what the analysis results show: if the analysis results were unexpected and in the area where the most time is contributed to the total time taken is room for improvement, action should be taken. If not, you can only accept the situation and move on. In all cases, document your decision together with the analysis you've done. If you decide that the perceived performance problem is actually expected due to the nature of the task performed, it's essential that in the future when someone else looks at the application and starts asking questions you can answer them properly and new analysis is only necessary if situations changed. Fix After interpreting the analysis results you've concluded that some areas need adjustment. This is the fix step: you're actively correcting the performance problem with proper action targeted at the real cause. In many cases related to O/R mapper powered applications it means you'll use different features of the O/R mapper to achieve the same goal, or apply optimizations at the RDBMS level. It could also mean you apply caching inside your application (compromise memory consumption over performance) to avoid unnecessary re-querying data and re-consuming the results. After applying a change, it's key you re-do the analysis and interpretation steps: compare the results and expectations with what you had before, to see whether your actions had any effect or whether it moved the problem to a different part of the application. Don't fall into the trap to do partly analysis: do the full analysis again: .NET profiling and O/R mapper / RDBMS profiling. It might very well be that the changes you've made make one part faster but another part significantly slower, in such a way that the overall problem hasn't changed at all. Performance tuning is dealing with compromises and making choices: to use one feature over the other, to accept a higher memory footprint, to go away from the strict-OO path and execute queries directly onto the RDBMS, these are choices and compromises which will cross your path if you want to fix performance problems with respect to O/R mappers or data-access and databases in general. In most cases it's not a big issue: alternatives are often good choices too and the compromises aren't that hard to deal with. What is important is that you document why you made a choice, a compromise: which analysis data, which interpretation led you to the choice made. This is key for good maintainability in the years to come. Most common performance problems with O/R mappers Below is an incomplete list of common performance problems related to data-access / O/R mappers / RDBMS code. It will help you with fixing the hotspots you found in the interpretation step. SELECT N+1: (Lazy-loading specific). Lazy loading triggered performance bottlenecks. Consider a list of Orders bound to a grid. You have a Field mapped onto a related field in Order, Customer.CompanyName. Showing this column in the grid will make the grid fetch (indirectly) for each row the Customer row. This means you'll get for the single list not 1 query (for the orders) but 1+(the number of orders shown) queries. To solve this: use eager loading using a prefetch path to fetch the customers with the orders. SELECT N+1 is easy to spot with an O/R mapper profiler or RDBMS profiler: if you see a lot of identical queries executed at once, you have this problem. Prefetch paths using many path nodes or sorting, or limiting. Eager loading problem. Prefetch paths can help with performance, but as 1 query is fetched per node, it can be the number of data fetched in a child node is bigger than you think. Also consider that data in every node is merged on the client within the parent. This is fast, but it also can take some time if you fetch massive amounts of entities. If you keep fetches small, you can use tuning parameters like the ParameterizedPrefetchPathThreshold setting to get more optimal queries. Deep inheritance hierarchies of type Target Per Entity/Type. If you use inheritance of type Target per Entity / Type (each type in the inheritance hierarchy is mapped onto its own table/view), fetches will join subtype- and supertype tables in many cases, which can lead to a lot of performance problems if the hierarchy has many types. With this problem, keep inheritance to a minimum if possible, or switch to a hierarchy of type Target Per Hierarchy, which means all entities in the inheritance hierarchy are mapped onto the same table/view. Of course this has its own set of drawbacks, but it's a compromise you might want to take. Fetching massive amounts of data by fetching large lists of entities. LLBLGen Pro supports paging (and limiting the # of rows returned), which is often key to process through large sets of data. Use paging on the RDBMS if possible (so a query is executed which returns only the rows in the page requested). When using paging in a web application, be sure that you switch server-side paging on on the datasourcecontrol used. In this case, paging on the grid alone is not enough: this can lead to fetching a lot of data which is then loaded into the grid and paged there. Keep note that analyzing queries for paging could lead to the false assumption that paging doesn't occur, e.g. when the query contains a field of type ntext/image/clob/blob and DISTINCT can't be applied while it should have (e.g. due to a join): the datareader will do DISTINCT filtering on the client. this is a little slower but it does perform paging functionality on the data-reader so it won't fetch all rows even if the query suggests it does. Fetch massive amounts of data because blob/clob/ntext/image fields aren't excluded. LLBLGen Pro supports field exclusion for queries. You can exclude fields (also in prefetch paths) per query to avoid fetching all fields of an entity, e.g. when you don't need them for the logic consuming the resultset. Excluding fields can greatly reduce the amount of time spend on data-transport across the network. Use this optimization if you see that there's a big difference between query execution time on the RDBMS and the time reported by the .NET profiler for the ExecuteReader method call. Doing client-side aggregates/scalar calculations by consuming a lot of data. If possible, try to formulate a scalar query or group by query using the projection system or GetScalar functionality of LLBLGen Pro to do data consumption on the RDBMS server. It's far more efficient to process data on the RDBMS server than to first load it all in memory, then traverse the data in-memory to calculate a value. Using .ToList() constructs inside linq queries. It might be you use .ToList() somewhere in a Linq query which makes the query be run partially in-memory. Example: var q = from c in metaData.Customers.ToList() where c.Country=="Norway" select c; This will actually fetch all customers in-memory and do an in-memory filtering, as the linq query is defined on an IEnumerable<T>, and not on the IQueryable<T>. Linq is nice, but it can often be a bit unclear where some parts of a Linq query might run. Fetching all entities to delete into memory first. To delete a set of entities it's rather inefficient to first fetch them all into memory and then delete them one by one. It's more efficient to execute a DELETE FROM ... WHERE query on the database directly to delete the entities in one go. LLBLGen Pro supports this feature, and so do some other O/R mappers. It's not always possible to do this operation in the context of an O/R mapper however: if an O/R mapper relies on a cache, these kind of operations are likely not supported because they make it impossible to track whether an entity is actually removed from the DB and thus can be removed from the cache. Fetching all entities to update with an expression into memory first. Similar to the previous point: it is more efficient to update a set of entities directly with a single UPDATE query using an expression instead of fetching the entities into memory first and then updating the entities in a loop, and afterwards saving them. It might however be a compromise you don't want to take as it is working around the idea of having an object graph in memory which is manipulated and instead makes the code fully aware there's a RDBMS somewhere. Conclusion Performance tuning is almost always about compromises and making choices. It's also about knowing where to look and how the systems in play behave and should behave. The four steps I provided should help you stay focused on the real problem and lead you towards the solution. Knowing how to optimally use the systems participating in your own code (.NET framework, O/R mapper, RDBMS, network/services) is key for success as well as knowing what's going on inside the application you built. I hope you'll find this guide useful in tracking down performance problems and dealing with them in a useful way.

Search Results

Search found 42428 results on 1698 pages for 'database query'.

Page 60/1698 | < Previous Page | 56 57 58 59 60 61 62 63 64 65 66 67 | Next Page >

- by Anthony

- by Spud1

- by terje

- by pinaldave

- by John Abraham

- by pinaldave

- by Alex Hope O'Connor

- by ybbest

- by Chris

- by webnoob

- by Chris

- by FastAl

- by Boris

- by Ctroy

- by Blake Parcell

- by Arantir

- by Tobias Baumeister

- by FunkyChicken

- by mr12086

- by DalexL

- by Spacedust

- by bb-generation

- by jstarek

- by KItis

- by FransBouma

< Previous Page | 56 57 58 59 60 61 62 63 64 65 66 67 | Next Page >