Search Results

Search found 12803 results on 513 pages for 'lucene index'.

Page 11/513 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >

Two Applications using the same index file with Hibernate Search

- by Dominik Obermaier

Hi, I want to know if it is possible to use the same index file for an entity in two applications. Let me be more specific: We have an online Application with a frondend for the users and an application for the backend tasks (= administrator interface). Both are running on the same JBOSS AS. Both Applications are using the same database, so they are using the same entities. Of course the package names are not the same in both applications for the entities. So this is our usecase: A user should be able to search via the frondend. The user is only allowed to see results which are tagged with "visible". This tagging happens in our admin interface, so the index for the frontend should be updated every time an entity is tagged as "visible" in the backend. Of course both applications do have the same index root folder. In my index folder there are 2 index files: de.x.x.admin.model.Product de.x.x.frondend.model.Product How to "merge" this via Hibernate Search Configuration? I just did not get it via the documentation... Thanks for any help!

Read the article
Lucene DuplicateFilter question

- by chardex

Hi, Why DuplicateFilter doesn't work together with other filters? For example, if a little remake of the test DuplicateFilterTest, then the impression that the filter is not applied to other filters and first trims results: public void testKeepsLastFilter() throws Throwable { DuplicateFilter df = new DuplicateFilter(KEY_FIELD); df.setKeepMode(DuplicateFilter.KM_USE_LAST_OCCURRENCE); Query q = new ConstantScoreQuery(new ChainedFilter(new Filter[]{ new QueryWrapperFilter(tq), // new QueryWrapperFilter(new TermQuery(new Term("text", "out"))), // works right, it is the last document. new QueryWrapperFilter(new TermQuery(new Term("text", "now"))) // why it doesn't work? It is the third document. }, ChainedFilter.AND)); ScoreDoc[] hits = searcher.search(q, df, 1000).scoreDocs; assertTrue("Filtered searching should have found some matches", hits.length > 0); for (int i = 0; i < hits.length; i++) { Document d = searcher.doc(hits[i].doc); String url = d.get(KEY_FIELD); TermDocs td = reader.termDocs(new Term(KEY_FIELD, url)); int lastDoc = 0; while (td.next()) { lastDoc = td.doc(); } assertEquals("Duplicate urls should return last doc", lastDoc, hits[i].doc); } }

Read the article
A little off topic, but can anyone recommend examples where lucene is used on live websites

- by Phelan

I know wikipedia uses it but I am looking for more product based websites. Thanks.

Read the article
Writing Lucene StandardAnalyzer results to text file with OutputStreamWriter

- by user3693192

I'm getting ONLY the last result written to "outputStreamFile.txt". Can't figure out how to revise code so I can get ALL results written to text file. Sample input text: "1st line of text\n" "2nd line of text \n" Results in only 2nd line begin written (and not 1st line) as: "2nd line text\n" private static void analyze(String text) throws IOException { analyzer = new StandardAnalyzer(Version.LUCENE_30); Reader r = new StringReader(text); TokenStream ts = (TokenStream) analyzer.tokenStream("", r); TermAttribute term = ts.addAttribute(TermAttribute.class); File outfile = new File("C:\\Users\\Desktop\\outputStreamFile.txt"); FileOutputStream fileOutputStream = new FileOutputStream(outfile); OutputStreamWriter outputStreamWriter = new OutputStreamWriter(fileOutputStream, "UTF8"); while(ts.incrementToken()) { //System.out.print(term.term() + " "); outputStreamWriter.write(term.term().toString() + "\r\n"); } outputStreamWriter.close(); }

Read the article
Index files for Subdomains

- by user358994

I was finally able to setup subdomains but now I have a problem when I try and access the subdomain by itself. For instance, when I visit sub.domain.com, I get a page not found error. However, when I visit sub.domain.com/index.php, I see my page. My theory is that when I visit sub.domain.com, the index file it searches for is not in the sub/ folder but instead in the root folder. I have directoryindex to look for index.html before index.php. There is a index.html in the root directory that is needed. So when I go to sub.domain.com, it thinks that sub.domain.com/index.html exists but then finds out it doesnt and sends me a 404. That is my theory. How would I fix this? Any ideas? Thanks.

Read the article
Thinking Sphinx index rebuild error on windows xp: searchd is already running

- by Voldy

I have Sphinx installed on windows xp system. A I use Thinking Sphinx plug-in within my rails application. I can't rebuild index with Thinking Sphinx rake task after application server starting up even if I stop it: Stopped search daemon (pid 4492). ... bla bla bla ... total 3 reads, 0.000 sec, 1.3 kb/call avg, 0.0 msec/call avg total 9 writes, 0.000 sec, 1.2 kb/call avg, 0.0 msec/call avg WARNING: could not open pipe (GetLastError()=2) rake aborted! searchd is already running. If I reload system, I can rebuild index. What do you think about? p.s: Does this question suited for serverfault.com?

Read the article
Oracle tuning optimizer index cost adj and optimizer index caching

- by Darryl Braaten

What is the correct way to set the optimizer index cost adj parameter for Oracle. As a developer I have observed huge performance improvements as this parameter is lowered. Common queries are reduced from 2 seconds to 200ms. There are lots of warnings on the net that lowering this value will cause dire issues with the database, but no detail is given on what will start going wrong. I am currently only seeing only an upside, much improved application performance and no downside. I need to better understand the possible negative repercussions of adjusting these parameters.

Read the article
Multiple columns in a single index versus multiple indexes

- by Tim Coker

The short version of my question is what's the difference between three indexes each indexing a single column and one index indexing three columns. Background follows. I'm primarily a programmer but have to do DBA work because we don't have a DBA. I'm evaluating our indexes versus the queries run against a particular table. The table as 3 columns that I'm often filtering against or getting the max value of. Most of the time the queries look like select max(col_a) from table where col_b = 'avalue' or select col_c from table where col_b = 'avalue' and col_a = 'anothervalue' All columns are independently indexed. My question is would I see any difference if I had an index that indexed col_b and col_a together since they can appear in a where clause together?

Read the article
FastVectorHighlighter.Net returning null on GetBestFragment

- by Midhat

Hi I have a large index, on which Highlighter.Net works fine, but FastVectorHighlighter returns null as a Best Fragment on Some documents. the searcher works fine. It is just the highlighter. The field has been indexed in the same manner for all documents, so I fail to understand Why it highlights some documents but not all. Using Lucene.Net 2.9.2, built from trunk rev942061

Read the article
SQL SERVER – Shrinking Database is Bad – Increases Fragmentation – Reduces Performance

- by pinaldave

Earlier, I had written two articles related to Shrinking Database. I wrote about why Shrinking Database is not good. SQL SERVER – SHRINKDATABASE For Every Database in the SQL Server SQL SERVER – What the Business Says Is Not What the Business Wants I received many comments on Why Database Shrinking is bad. Today we will go over a very interesting example that I have created for the same. Here are the quick steps of the example. Create a test database Create two tables and populate with data Check the size of both the tables Size of database is very low Check the Fragmentation of one table Fragmentation will be very low Truncate another table Check the size of the table Check the fragmentation of the one table Fragmentation will be very low SHRINK Database Check the size of the table Check the fragmentation of the one table Fragmentation will be very HIGH REBUILD index on one table Check the size of the table Size of database is very HIGH Check the fragmentation of the one table Fragmentation will be very low Here is the script for the same. USE MASTER GO CREATE DATABASE ShrinkIsBed GO USE ShrinkIsBed GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Create FirstTable CREATE TABLE FirstTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_FirstTable_ID] ON FirstTable ( [ID] ASC ) ON [PRIMARY] GO -- Create SecondTable CREATE TABLE SecondTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_SecondTable_ID] ON SecondTable ( [ID] ASC ) ON [PRIMARY] GO -- Insert One Hundred Thousand Records INSERT INTO FirstTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Insert One Hundred Thousand Records INSERT INTO SecondTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO Let us check the table size and fragmentation. Now let us TRUNCATE the table and check the size and Fragmentation. USE MASTER GO CREATE DATABASE ShrinkIsBed GO USE ShrinkIsBed GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Create FirstTable CREATE TABLE FirstTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_FirstTable_ID] ON FirstTable ( [ID] ASC ) ON [PRIMARY] GO -- Create SecondTable CREATE TABLE SecondTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_SecondTable_ID] ON SecondTable ( [ID] ASC ) ON [PRIMARY] GO -- Insert One Hundred Thousand Records INSERT INTO FirstTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Insert One Hundred Thousand Records INSERT INTO SecondTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO You can clearly see that after TRUNCATE, the size of the database is not reduced and it is still the same as before TRUNCATE operation. After the Shrinking database operation, we were able to reduce the size of the database. If you notice the fragmentation, it is considerably high. The major problem with the Shrink operation is that it increases fragmentation of the database to very high value. Higher fragmentation reduces the performance of the database as reading from that particular table becomes very expensive. One of the ways to reduce the fragmentation is to rebuild index on the database. Let us rebuild the index and observe fragmentation and database size. -- Rebuild Index on FirstTable ALTER INDEX IX_SecondTable_ID ON SecondTable REBUILD GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO You can notice that after rebuilding, Fragmentation reduces to a very low value (almost same to original value); however the database size increases way higher than the original. Before rebuilding, the size of the database was 5 MB, and after rebuilding, it is around 20 MB. Regular rebuilding the index is rebuild in the same user database where the index is placed. This usually increases the size of the database. Look at irony of the Shrinking database. One person shrinks the database to gain space (thinking it will help performance), which leads to increase in fragmentation (reducing performance). To reduce the fragmentation, one rebuilds index, which leads to size of the database to increase way more than the original size of the database (before shrinking). Well, by Shrinking, one did not gain what he was looking for usually. Rebuild indexing is not the best suggestion as that will create database grow again. I have always remembered the excellent post from Paul Randal regarding Shrinking the database is bad. I suggest every one to read that for accuracy and interesting conversation. Let us run following script where we Shrink the database and REORGANIZE. -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO -- Shrink the Database DBCC SHRINKDATABASE (ShrinkIsBed); GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO -- Rebuild Index on FirstTable ALTER INDEX IX_SecondTable_ID ON SecondTable REORGANIZE GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO You can see that REORGANIZE does not increase the size of the database or remove the fragmentation. Again, I no way suggest that REORGANIZE is the solution over here. This is purely observation using demo. Read the blog post of Paul Randal. Following script will clean up the database -- Clean up USE MASTER GO ALTER DATABASE ShrinkIsBed SET SINGLE_USER WITH ROLLBACK IMMEDIATE GO DROP DATABASE ShrinkIsBed GO There are few valid cases of the Shrinking database as well, but that is not covered in this blog post. We will cover that area some other time in future. Additionally, one can rebuild index in the tempdb as well, and we will also talk about the same in future. Brent has written a good summary blog post as well. Are you Shrinking your database? Well, when are you going to stop Shrinking it? Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Index, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article
How to get a Token from a Lucene TokenStream?

- by FarmBoy

I'm trying to use Apache Lucene for tokenizing, and I am baffled at the process to obtain Tokens from a TokenStream. The worst part is that I'm looking at the comments in the JavaDocs that address my question. http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/analysis/TokenStream.html#incrementToken%28%29 Somehow, an AttributeSource is supposed to be used, rather than Tokens. I'm totally at a loss. Can anyone explain how to get token-like information from a TokenStream?

Read the article
How to setup Lucene search for a B2B web app?

- by Bill Paetzke

Given: 5000 databases (spread out over 5 servers) 1 database per client (so you can infer there are 1000 clients) 2 to 2000 users per client (let's say avg is 100 users per client) Clients (databases) come and go every day (let's assume most remain for at least one year) Let's stay agnostic of language or sql brand, since Lucene (and Solr) have a breadth of support The Question: How would you setup Lucene search so that each client can only search within its database? How would you setup the index(es)? Would you need to add a filter to all search queries? If a client cancelled, how would you delete their (part of the) index? (this may be trivial--not sure yet) Possible Solutions: Make an index for each client (database) Pro: Search is faster (than one-index-for-all method). Indices are relative to the size of the client's data. Con: I'm not sure what this entails, nor do I know if this is beyond Lucene's scope. Have a single, gigantic index with a database_name field. Always include database_name as a filter. Pro: Not sure. Maybe good for tech support or billing dept to search all databases for info. Con: Search is slower (than index-per-client method). Flawed security if query filter removed. For Example: Joel Spolsky said in Podcast #11 that his hosted web app product, FogBugz On-Demand, uses Lucene. He has thousands of on-demand clients. And each client gets their own database. His situation is quite similar to mine. Although, he didn't elaborate on the setup (particularly indices); hence, the need for this question. One last thing: I would also accept an answer that uses Solr (the extension of Lucene). Perhaps it's better suited for this problem. Not sure.

Read the article
What is the advantage of Lucene searching and indexing ?

- by Mehdi Amrollahi

I want to know , What is the advantage of Lucene searching and indexing ? Is searching with Lucene as fast as other searching algorithm like Quick Search? What about indexing ? I want to know more about advantage of Lucene rather that others . thanks .

Read the article
Index of elements, jQuery or Javascript

- by ozsenegal

I've a table that contains 3 columns. I need to bind an event that fires off whenever one of those columns is clicked using jQuery. However, I need to know the index of the column clicked. i.e: First column (index 0), Second column (index 1), Third column (index 2), and so on... How can I do that?

Read the article
How to setup Lucene/Solr for a B2B web app?

- by Bill Paetzke

Given: 1 database per client (business customer) 5000 clients Clients have between 2 to 2000 users (avg is ~100 users/client) 100k to 10 million records per database Users need to search those records often (it's the best way to navigate their data) Possibly relevant info: Several new clients each week (any time during business hours) Multiple web servers and database servers (users can login via any web server) Let's stay agnostic of language or sql brand, since Lucene (and Solr) have a breadth of support For Example: Joel Spolsky said in Podcast #11 that his hosted web app product, FogBugz On-Demand, uses Lucene. He has thousands of on-demand clients. And each client gets their own database. They use an index per client and store it in the client's database. I'm not sure on the details. And I'm not sure if this is a serious mod to Lucene. The Question: How would you setup Lucene search so that each client can only search within its database? How would you setup the index(es)? Where do you store the index(es)? Would you need to add a filter to all search queries? If a client cancelled, how would you delete their (part of the) index? (this may be trivial--not sure yet) Possible Solutions: Make an index for each client (database) Pro: Search is faster (than one-index-for-all method). Indices are relative to the size of the client's data. Con: I'm not sure what this entails, nor do I know if this is beyond Lucene's scope. Have a single, gigantic index with a database_name field. Always include database_name as a filter. Pro: Not sure. Maybe good for tech support or billing dept to search all databases for info. Con: Search is slower (than index-per-client method). Flawed security if query filter removed. One last thing: I would also accept an answer that uses Solr (the extension of Lucene). Perhaps it's better suited for this problem. Not sure.

Read the article
Beware Sneaky Reads with Unique Indexes

- by Paul White NZ

A few days ago, Sandra Mueller (twitter | blog) asked a question using twitter’s #sqlhelp hash tag: “Might SQL Server retrieve (out-of-row) LOB data from a table, even if the column isn’t referenced in the query?” Leaving aside trivial cases (like selecting a computed column that does reference the LOB data), one might be tempted to say that no, SQL Server does not read data you haven’t asked for. In general, that’s quite correct; however there are cases where SQL Server might sneakily retrieve a LOB column… Example Table Here’s a T-SQL script to create that table and populate it with 1,000 rows: CREATE TABLE dbo.LOBtest ( pk INTEGER IDENTITY NOT NULL, some_value INTEGER NULL, lob_data VARCHAR(MAX) NULL, another_column CHAR(5) NULL, CONSTRAINT [PK dbo.LOBtest pk] PRIMARY KEY CLUSTERED (pk ASC) ); GO DECLARE @Data VARCHAR(MAX); SET @Data = REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 65540); WITH Numbers (n) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM master.sys.columns C1, master.sys.columns C2 ) INSERT LOBtest WITH (TABLOCKX) ( some_value, lob_data ) SELECT TOP (1000) N.n, @Data FROM Numbers N WHERE N.n <= 1000; Test 1: A Simple Update Let’s run a query to subtract one from every value in the some_value column: UPDATE dbo.LOBtest WITH (TABLOCKX) SET some_value = some_value - 1; As you might expect, modifying this integer column in 1,000 rows doesn’t take very long, or use many resources. The STATITICS IO and TIME output shows a total of 9 logical reads, and 25ms elapsed time. The query plan is also very simple: Looking at the Clustered Index Scan, we can see that SQL Server only retrieves the pk and some_value columns during the scan: The pk column is needed by the Clustered Index Update operator to uniquely identify the row that is being changed. The some_value column is used by the Compute Scalar to calculate the new value. (In case you are wondering what the Top operator is for, it is used to enforce SET ROWCOUNT). Test 2: Simple Update with an Index Now let’s create a nonclustered index keyed on the some_value column, with lob_data as an included column: CREATE NONCLUSTERED INDEX [IX dbo.LOBtest some_value (lob_data)] ON dbo.LOBtest (some_value) INCLUDE ( lob_data ) WITH ( FILLFACTOR = 100, MAXDOP = 1, SORT_IN_TEMPDB = ON ); This is not a useful index for our simple update query; imagine that someone else created it for a different purpose. Let’s run our update query again: UPDATE dbo.LOBtest WITH (TABLOCKX) SET some_value = some_value - 1; We find that it now requires 4,014 logical reads and the elapsed query time has increased to around 100ms. The extra logical reads (4 per row) are an expected consequence of maintaining the nonclustered index. The query plan is very similar to before (click to enlarge): The Clustered Index Update operator picks up the extra work of maintaining the nonclustered index. The new Compute Scalar operators detect whether the value in the some_value column has actually been changed by the update. SQL Server may be able to skip maintaining the nonclustered index if the value hasn’t changed (see my previous post on non-updating updates for details). Our simple query does change the value of some_data in every row, so this optimization doesn’t add any value in this specific case. The output list of columns from the Clustered Index Scan hasn’t changed from the one shown previously: SQL Server still just reads the pk and some_data columns. Cool. Overall then, adding the nonclustered index hasn’t had any startling effects, and the LOB column data still isn’t being read from the table. Let’s see what happens if we make the nonclustered index unique. Test 3: Simple Update with a Unique Index Here’s the script to create a new unique index, and drop the old one: CREATE UNIQUE NONCLUSTERED INDEX [UQ dbo.LOBtest some_value (lob_data)] ON dbo.LOBtest (some_value) INCLUDE ( lob_data ) WITH ( FILLFACTOR = 100, MAXDOP = 1, SORT_IN_TEMPDB = ON ); GO DROP INDEX [IX dbo.LOBtest some_value (lob_data)] ON dbo.LOBtest; Remember that SQL Server only enforces uniqueness on index keys (the some_data column). The lob_data column is simply stored at the leaf-level of the non-clustered index. With that in mind, we might expect this change to make very little difference. Let’s see: UPDATE dbo.LOBtest WITH (TABLOCKX) SET some_value = some_value - 1; Whoa! Now look at the elapsed time and logical reads: Scan count 1, logical reads 2016, physical reads 0, read-ahead reads 0, lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992. CPU time = 172 ms, elapsed time = 16172 ms. Even with all the data and index pages in memory, the query took over 16 seconds to update just 1,000 rows, performing over 52,000 LOB logical reads (nearly 16,000 of those using read-ahead). Why on earth is SQL Server reading LOB data in a query that only updates a single integer column? The Query Plan The query plan for test 3 looks a bit more complex than before: In fact, the bottom level is exactly the same as we saw with the non-unique index. The top level has heaps of new stuff though, which I’ll come to in a moment. You might be expecting to find that the Clustered Index Scan is now reading the lob_data column (for some reason). After all, we need to explain where all the LOB logical reads are coming from. Sadly, when we look at the properties of the Clustered Index Scan, we see exactly the same as before: SQL Server is still only reading the pk and some_value columns – so what’s doing the LOB reads? Updates that Sneakily Read Data We have to go as far as the Clustered Index Update operator before we see LOB data in the output list: [Expr1020] is a bit flag added by an earlier Compute Scalar. It is set true if the some_value column has not been changed (part of the non-updating updates optimization I mentioned earlier). The Clustered Index Update operator adds two new columns: the lob_data column, and some_value_OLD. The some_value_OLD column, as the name suggests, is the pre-update value of the some_value column. At this point, the clustered index has already been updated with the new value, but we haven’t touched the nonclustered index yet. An interesting observation here is that the Clustered Index Update operator can read a column into the data flow as part of its update operation. SQL Server could have read the LOB data as part of the initial Clustered Index Scan, but that would mean carrying the data through all the operations that occur prior to the Clustered Index Update. The server knows it will have to go back to the clustered index row to update it, so it delays reading the LOB data until then. Sneaky! Why the LOB Data Is Needed This is all very interesting (I hope), but why is SQL Server reading the LOB data? For that matter, why does it need to pass the pre-update value of the some_value column out of the Clustered Index Update? The answer relates to the top row of the query plan for test 3. I’ll reproduce it here for convenience: Notice that this is a wide (per-index) update plan. SQL Server used a narrow (per-row) update plan in test 2, where the Clustered Index Update took care of maintaining the nonclustered index too. I’ll talk more about this difference shortly. The Split/Sort/Collapse combination is an optimization, which aims to make per-index update plans more efficient. It does this by breaking each update into a delete/insert pair, reordering the operations, removing any redundant operations, and finally applying the net effect of all the changes to the nonclustered index. Imagine we had a unique index which currently holds three rows with the values 1, 2, and 3. If we run a query that adds 1 to each row value, we would end up with values 2, 3, and 4. The net effect of all the changes is the same as if we simply deleted the value 1, and added a new value 4. By applying net changes, SQL Server can also avoid false unique-key violations. If we tried to immediately update the value 1 to a 2, it would conflict with the existing value 2 (which would soon be updated to 3 of course) and the query would fail. You might argue that SQL Server could avoid the uniqueness violation by starting with the highest value (3) and working down. That’s fine, but it’s not possible to generalize this logic to work with every possible update query. SQL Server has to use a wide update plan if it sees any risk of false uniqueness violations. It’s worth noting that the logic SQL Server uses to detect whether these violations are possible has definite limits. As a result, you will often receive a wide update plan, even when you can see that no violations are possible. Another benefit of this optimization is that it includes a sort on the index key as part of its work. Processing the index changes in index key order promotes sequential I/O against the nonclustered index. A side-effect of all this is that the net changes might include one or more inserts. In order to insert a new row in the index, SQL Server obviously needs all the columns – the key column and the included LOB column. This is the reason SQL Server reads the LOB data as part of the Clustered Index Update. In addition, the some_value_OLD column is required by the Split operator (it turns updates into delete/insert pairs). In order to generate the correct index key delete operation, it needs the old key value. The irony is that in this case the Split/Sort/Collapse optimization is anything but. Reading all that LOB data is extremely expensive, so it is sad that the current version of SQL Server has no way to avoid it. Finally, for completeness, I should mention that the Filter operator is there to filter out the non-updating updates. Beating the Set-Based Update with a Cursor One situation where SQL Server can see that false unique-key violations aren’t possible is where it can guarantee that only one row is being updated. Armed with this knowledge, we can write a cursor (or the WHILE-loop equivalent) that updates one row at a time, and so avoids reading the LOB data: SET NOCOUNT ON; SET STATISTICS XML, IO, TIME OFF; DECLARE @PK INTEGER, @StartTime DATETIME; SET @StartTime = GETUTCDATE(); DECLARE curUpdate CURSOR LOCAL FORWARD_ONLY KEYSET SCROLL_LOCKS FOR SELECT L.pk FROM LOBtest L ORDER BY L.pk ASC; OPEN curUpdate; WHILE (1 = 1) BEGIN FETCH NEXT FROM curUpdate INTO @PK; IF @@FETCH_STATUS = -1 BREAK; IF @@FETCH_STATUS = -2 CONTINUE; UPDATE dbo.LOBtest SET some_value = some_value - 1 WHERE CURRENT OF curUpdate; END; CLOSE curUpdate; DEALLOCATE curUpdate; SELECT DATEDIFF(MILLISECOND, @StartTime, GETUTCDATE()); That completes the update in 1280 milliseconds (remember test 3 took over 16 seconds!) I used the WHERE CURRENT OF syntax there and a KEYSET cursor, just for the fun of it. One could just as well use a WHERE clause that specified the primary key value instead. Clustered Indexes A clustered index is the ultimate index with included columns: all non-key columns are included columns in a clustered index. Let’s re-create the test table and data with an updatable primary key, and without any non-clustered indexes: IF OBJECT_ID(N'dbo.LOBtest', N'U') IS NOT NULL DROP TABLE dbo.LOBtest; GO CREATE TABLE dbo.LOBtest ( pk INTEGER NOT NULL, some_value INTEGER NULL, lob_data VARCHAR(MAX) NULL, another_column CHAR(5) NULL, CONSTRAINT [PK dbo.LOBtest pk] PRIMARY KEY CLUSTERED (pk ASC) ); GO DECLARE @Data VARCHAR(MAX); SET @Data = REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 65540); WITH Numbers (n) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM master.sys.columns C1, master.sys.columns C2 ) INSERT LOBtest WITH (TABLOCKX) ( pk, some_value, lob_data ) SELECT TOP (1000) N.n, N.n, @Data FROM Numbers N WHERE N.n <= 1000; Now here’s a query to modify the cluster keys: UPDATE dbo.LOBtest SET pk = pk + 1; The query plan is: As you can see, the Split/Sort/Collapse optimization is present, and we also gain an Eager Table Spool, for Halloween protection. In addition, SQL Server now has no choice but to read the LOB data in the Clustered Index Scan: The performance is not great, as you might expect (even though there is no non-clustered index to maintain): Table 'LOBtest'. Scan count 1, logical reads 2011, physical reads 0, read-ahead reads 0, lob logical reads 36015, lob physical reads 0, lob read-ahead reads 15992. Table 'Worktable'. Scan count 1, logical reads 2040, physical reads 0, read-ahead reads 0, lob logical reads 34000, lob physical reads 0, lob read-ahead reads 8000. SQL Server Execution Times: CPU time = 483 ms, elapsed time = 17884 ms. Notice how the LOB data is read twice: once from the Clustered Index Scan, and again from the work table in tempdb used by the Eager Spool. If you try the same test with a non-unique clustered index (rather than a primary key), you’ll get a much more efficient plan that just passes the cluster key (including uniqueifier) around (no LOB data or other non-key columns): A unique non-clustered index (on a heap) works well too: Both those queries complete in a few tens of milliseconds, with no LOB reads, and just a few thousand logical reads. (In fact the heap is rather more efficient). There are lots more fun combinations to try that I don’t have space for here. Final Thoughts The behaviour shown in this post is not limited to LOB data by any means. If the conditions are met, any unique index that has included columns can produce similar behaviour – something to bear in mind when adding large INCLUDE columns to achieve covering queries, perhaps. Paul White Email: [email protected] Twitter: @PaulWhiteNZ

Read the article
how to effectively modify index

- by daedlus

Hej everyone, problem : I am looking for right way to convert an index from clustered to non-clustered Description : I have a table as below in sybase db: dbo.UserLog Id | UserId |time | .... This is hash partitioned using UserId. Currently it has 2 indexes UserId : non-clustered time: clustered This table has about 20 million records. I now want to make UserId as clustered index and time as non-clustered index. is it correct to user alter index to change from clustered to non-clustered or do i drop index and recreate. does the fact that userId is used in hash partitioning have any implications to this? To me alter seems way to go but I have not yet tried this.

Read the article
After adding index Execution Plan SqlServer keeps suggesting the index

- by Lieven Cardoen

If I run the execution plan of an sp, Management Studio suggests an index. I add the index, run the execution plan again, and it keeps suggesting the same index... What am I doing wrong here? I already restarted SqlServer and Management Studio, but that's doesn't help.

Read the article
Use a vector to index a matrix without linear index

- by David_G

G'day, I'm trying to find a way to use a vector of [x,y] points to index from a large matrix in MATLAB. Usually, I would convert the subscript points to the linear index of the matrix.(for eg. Use a vector as an index to a matrix in MATLab) However, the matrix is 4-dimensional, and I want to take all of the elements of the 3rd and 4th dimensions that have the same 1st and 2nd dimension. Let me hopefully demonstrate with an example: Matrix = nan(4,4,2,2); % where the dimensions are (x,y,depth,time) Matrix(1,2,:,:) = 999; % note that this value could change in depth (3rd dim) and time (4th time) Matrix(3,4,:,:) = 888; % note that this value could change in depth (3rd dim) and time (4th time) Matrix(4,4,:,:) = 124; Now, I want to be able to index with the subscripts (1,2) and (3,4), etc and return not only the 999 and 888 which exist in Matrix(:,:,1,1) but the contents which exist at Matrix(:,:,1,2),Matrix(:,:,2,1) and Matrix(:,:,2,2), and so on (IRL, the dimensions of Matrix might be more like size(Matrix) = (300 250 30 200) I don't want to use linear indices because I would like the results to be in a similar vector fashion. For example, I would like a result which is something like: ans(time=1) 999 888 124 999 888 124 ans(time=2) etc etc etc etc etc etc I'd also like to add that due to the size of the matrix I'm dealing with, speed is an issue here - thus why I'd like to use subscript indices to index to the data. I should also mention that (unlike this question: Accessing values using subscripts without using sub2ind) since I want all the information stored in the extra dimensions, 3 and 4, of the i and jth indices, I don't think that a slightly faster version of sub2ind still would not cut it..

Read the article
jquery get the index of a row?

- by KnockKnockWhosThere

I'm trying to write a function that will do something if the the row index is 0, and then something else if the row index is greater than 0. The zero part is working, but I can't figure out the syntax for rows that have an index greater than 0. For the tr[0] row, I'm doing this: if($("#mytable > tbody > tr ").index(0)) { ... I tried: if($("#mytable > tbody > tr ").index() > 0 ) { But, that didn't work?

Read the article
Assigning Array with literal string index to an array with numeric index

- by anita

How to Assign an Array with literal string index to an array with numeric index. [IN PHP] Both have the same length...for example Literal array(HELL0=somevalue1,BYE=somevalue2) Numeric index array(1=somevalue1,2=somevalue2) [Expected result in second array] Both have same length / count of values 2. Pls help ASAP Anita

Read the article
JMS message received at only one server

- by BJH

I'm having a problem with a JEE6 application running in a clustered environment using WebSphere ApplicationServer 8. A search index is used for quick search in the UI (using Lucene), which must be re-indexed after new data arrived in the corresponding DB layer. To achieve this we're sending a JMS message to the application, then the search index will be refreshed. The problem is, that the messages only arrives at one of the cluster members. So only there the search index is up to date. At the other servers it remains outdated. How can I achieve that the search index gets updated at all cluster members? Can I receive the message somehow on all servers? Or is there a better way to do this?

Read the article
How to optimize indexing of large number of DB records using Zend_Lucene and Zend_Paginator

- by jdichev

So I have this cron script that is deployed and ran using Cron on a host and indexes all the records in a database table - the index is later used both for the front end of the site and the backed operations as well. After the operation, the index is about 3-4 MB. The problem is it takes a lot of resources (CPU: 30+ and a good chunk of memory) and slows the machine down. My question is about how to optimize the operation described below: First there is a select query built using the Zend Framework API, this query is then passed to a Paginator factory that returns a paginator which I am using to balance the current number of items being indexed and not iterate over too much items. The script is iterating over the current items in the paginator object using a foreach loop until reaching the end and then it starts from the beginning after getting items for the next page. I am suspecting this overhead is caused by the Zend_Lucene but no idea how this could be improved.

Read the article
Index View Index Creation Failing

- by aBetterGamer

I'm trying to create an index on a view and it keeps failing, I'm pretty sure its b/c I'm using an alias for the column. Not sure how or if I can do it this way. Below is a simplified scenario. CREATE VIEW v_contracts WITH SCHEMABINDING AS SELECT t1.contractid as 'Contract.ContractID' t2.name as 'Customer.Name' FROM contract t1 JOIN customer t2 ON t1.contractid = t2.contractid GO CREATE UNIQUE CLUSTERED INDEX v_contracts_idx ON v_contracts(t1.contractid) GO --------------------------- Incorrect syntax near '.'. CREATE UNIQUE CLUSTERED INDEX v_contracts_idx ON v_contracts(contractid) GO --------------------------- Column name 'contractid' does not exist in the target table or view. CREATE UNIQUE CLUSTERED INDEX v_contracts_idx ON v_contracts(Contract.ContractID) GO --------------------------- Incorrect syntax near '.'. Anyone know how to create an indexed view using aliased columns please let me know.

Read the article
Thunderbird 3, must manually re-build folder index

- by david

I have a couple of installations of thunbderbird three that aren't updating the IMAP folders correctly. I have to manually re-build index of the foldres for it to refresh. any ideas how to fix this problem? Im not using synchornishing.

Read the article

< Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >