Search Results

Search found 1124 results on 45 pages for 'indexing'.

Page 18/45 | < Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >

On Disk Substring index

- by emeryc

I have a file (fasta file to be specific) that I would like to index, so that I can quickly locate any substring within the file and then find the location within the original fasta file. This would be easy to do in many cases, using a Trie or substring array, unfortunately the strings I need to index are 800+ MBs which means that doing them in memory in unacceptable, so I'm looking for a reasonable way to create this index on disk, with minimal memory usage. (edit for clarification) I am only interested in the headers of proteins, so for the largest database I'm interested in, this is about 800 MBs of text. I would like to be able to find an exact substring within O(N) time based on the input string. This must be useable on 32 bit machines as it will be shipped to random people, who are not expected to have 64 bit machines. I want to be able to index against any word break within a line, to the end of the line (though lines can be several MBs long). Hopefully this clarifies what is needed and why the current solutions given are not illuminating. I should also add that this needs to be done from within java, and must be done on client computers on various operating systems, so I can't use any OS Specific solution, and it must be a programatic solution.

Read the article
Slow query with unexpected scan

- by zerkms

Hello I have this query: SELECT * FROM SAMPLE SAMPLE INNER JOIN TEST TEST ON SAMPLE.SAMPLE_NUMBER = TEST.SAMPLE_NUMBER INNER JOIN RESULT RESULT ON TEST.TEST_NUMBER = RESULT . TEST_NUMBER WHERE SAMPLED_DATE BETWEEN '2010-03-17 09:00' AND '2010-03-17 12:00' the biggest table here is RESULT, contains 11.1M records. The left 2 tables about 1M. this query works slowly (more than 10 minutes) and returns about 800 records. executing plan shows clustered index scan over all 11M records. RESULT.TEST_NUMBER is a clustered primary key. if I change 2010-03-17 09:00 to 2010-03-17 10:00 - i get about 40 records. it executes for 300ms. and plan shows clustered index seek if i replace * in SELECT clause to RESULT.TEST_NUMBER (covered with index) - then all become fast in first case too. this points to hdd io issues, but doesn't clarifies changing plan. so, any ideas?

Read the article
Is the time cost constant when bulk inserting data into an indexed table?

- by SiLent SoNG

I have created an archive table which will store data for selecting only. Daily there will be a program to transfer a batch of records into the archive table. There are several columns which are indexed; while others are not. I am concerned with time cost per batch insertion: - 1st batch insertion: N1 - 2nd batch insertion: N2 - 3rd batch insertion: N3 The question is: will N1, N2, and N3 roughly be the same, or N3 N2 N1? That is, will the time cost be a constant or incremental, with existence of several indexes? All indexes are non-clustered. The archive table structure is this: create table document ( doc_id int unsigned primary key, owner_id int, -- indexed title smalltext, country char(2), year year(4), time datetime, key ix_owner(owner_id) }

Read the article
How can I find out how many rows of a matrix satisfy a rather complicated criterion (in R)?

- by Brani

As an example, here is a way to get a matrix of all possible outcomes of rolling 4 (fair) dice. z <- as.matrix(expand.grid(c(1:6),c(1:6),c(1:6),c(1:6))) As you may already have understood, I'm trying to work out a question that was closed, though, in my opinion, it's a challenging one. I used counting techniques to solve it (I mean by hand) and I finaly arrived to a number of outcomes, with a sum of subset being 5, equal to 1083 out of 1296. That result is consistent with the answers provided to that question, before it was closed. I was wondering how could that subset of outcomes (say z1, where dim(z1) = [1083,4] ) be generated using R. Do you have any ideas? Thank you.

Read the article
Improve SQL Server 2005 Query Performance

- by user366810

I have a course search engine and when I try to do a search, it takes too long to show search results. You can try to do a search here http://76.12.87.164/cpd/testperformance.cfm At that page you can also see the database tables and indexes, if any. I'm not using Stored Procedures - the queries are inline using Coldfusion. I think I need to create some indexes but I'm not sure what kind (clustered, non-clustered) and on what columns. Thanks

Read the article
MySQL Database is Indexed at Apache Solr, How to access it via URL

- by Wasim

data-config.xml <dataConfig> <dataSource encoding="UTF-8" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/somevisits" user="root" password=""/> <document name="somevisits"> <entity name="login" query="select * from login"> <field column="sv_id" name="sv_id" /> <field column="sv_username" name="sv_username" /> </entity> </document> </dataConfig> schema.xml <?xml version="1.0" encoding="UTF-8" ?> <schema name="example" version="1.5"> <fields> <field name="sv_id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> <field name="username" type="string" indexed="true" stored="true" required="true"/> <field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/> <field name="text" type="string" indexed="true" stored="false" multiValued="true"/> </fields> <uniqueKey>sv_id</uniqueKey> <types> <fieldType name="string" class="solr.StrField" sortMissingLast="true" /> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/> </types> </schema> Solr successfully imported mysql database using full http://[localSolr]:8983/solr/#/collection1/dataimport?command=full-import My question is, how to access that mysql imported database now?

Read the article
Tracking/Counting Word Frequency

- by Joel Martinez

I'd like to get some community consensus on a good design to be able to store and query word frequency counts. I'm building an application in which I have to parse text inputs and store how many times a word has appeared (over time). So given the following inputs: "To Kill a Mocking Bird" "Mocking a piano player" Would store the following values: Word Count ------------- To 1 Kill 1 A 2 Mocking 2 Bird 1 Piano 1 Player 1 And later be able to quickly query for the count value of a given arbitrary word. My current plan is to simply store the words and counts in a database, and rely on caching word count values ... But I suspect that I won't get enough cache hits to make this a viable solution long term. Can anyone suggest algorithms, or data structures, or any other idea that might make this a well-performing solution?

Read the article
PostgreSQL: How to index all foreign keys?

- by biggusjimmus

I am working with a large PostgreSQL database, and I are trying to tune it to get more performance. Our queries and updates seem to be doing a lot of lookups using foreign keys. What I would like is a relatively simple way to add Indexes to all of our foreign keys without having to go through every table (~140) and doing it manually. In researching this, I've come to find that there is no way to have Postgres do this for you automatically (like MySQL does), but I would be happy to hear otherwise there, too.

Read the article
Selecting a Function Value Shows nvarchar(4000) on an Index

- by Jason N. Gaylord

I have a view that I'm trying to setup an Index for. One of the select columns for the view executes a user-defined function that has a return value of varchar(250). However, when I try to setup an Index on that column, I see a size of nvarchar(4000). Why is that and will that cause a problem if I continue to setup my index?

Read the article
Matlab, index from starting location to last index

- by ccook

Say you have an array, data, of unknown length. Is there a shorter method to get elements form a starting index to the end than subdata = data(2:length(data))

Read the article
How to setup Lucene/Solr for a B2B web app?

- by Bill Paetzke

Given: 1 database per client (business customer) 5000 clients Clients have between 2 to 2000 users (avg is ~100 users/client) 100k to 10 million records per database Users need to search those records often (it's the best way to navigate their data) Possibly relevant info: Several new clients each week (any time during business hours) Multiple web servers and database servers (users can login via any web server) Let's stay agnostic of language or sql brand, since Lucene (and Solr) have a breadth of support For Example: Joel Spolsky said in Podcast #11 that his hosted web app product, FogBugz On-Demand, uses Lucene. He has thousands of on-demand clients. And each client gets their own database. They use an index per client and store it in the client's database. I'm not sure on the details. And I'm not sure if this is a serious mod to Lucene. The Question: How would you setup Lucene search so that each client can only search within its database? How would you setup the index(es)? Where do you store the index(es)? Would you need to add a filter to all search queries? If a client cancelled, how would you delete their (part of the) index? (this may be trivial--not sure yet) Possible Solutions: Make an index for each client (database) Pro: Search is faster (than one-index-for-all method). Indices are relative to the size of the client's data. Con: I'm not sure what this entails, nor do I know if this is beyond Lucene's scope. Have a single, gigantic index with a database_name field. Always include database_name as a filter. Pro: Not sure. Maybe good for tech support or billing dept to search all databases for info. Con: Search is slower (than index-per-client method). Flawed security if query filter removed. One last thing: I would also accept an answer that uses Solr (the extension of Lucene). Perhaps it's better suited for this problem. Not sure.

Read the article
database row/ record pointers

- by David

Hi I don't know the correct words for what I'm trying to find out about and as such having a hard time googling. I want to know whether its possible with databases (technology independent but would be interested to hear whether its possible with Oracle, MySQL and Postgres) to point to specific rows instead of executing my query again. So I might initially execute a query find some rows of interest and then wish to avoid searching for them again by having a list of pointers or some other metadata which indicates the location on a database which I can go to straight away the next time I want those results. I realise there is caching on databases, but I want to keep these "pointers" else where and as such caching doesn't ultimately solve this problem. Is this just an index and I store the index and look up by this? most of my current tables don't have indexes and I don't want the speed decrease that sometimes comes with indexes. So whats the magic term I've been trying to put into google? Cheers

Read the article
Can I store and join based on external attributes in Lucene/Solr

- by Kibbee

Is there a way to store information about documents that are stored in Lucene such that I don't have to update the entire document to update certain attributes about the documents? For instance, let's say I had a bunch of documents, and that I wanted to update a permissions list of who was allowed to see the documents on a daily, or more frequent, basis. Would it be possible to update all the permissions each day, without updating all the documents. I could do it by keeping a exactly which permissions were added and removed, but I would rather just be able to take the end list of permissions, and use that, rather than have to keep track of all the permission changes and post those entire documents to Lucene.

Read the article
Can someone recommend a good tutorial on MySQL indexes, specifically when used in an order by clause

- by Philip Brocoum

I could try to post and explain the exact query I'm trying to run, but I'm going by the old adage of, "give a man a fish and he'll eat for a day, teach a man to fish and he'll eat for the rest of his life." SQL optimization seems to be very query-specific, and even if you could solve this one particular query for me, I'm going to have to write many more queries in the future, and I'd like to be educated on how indexes work in general. Still, here's a quick description of my current problem. I have a query that joins three tables and runs in 0.2 seconds flat. Awesome. I add an "order by" clause and it runs in 4 minutes and 30 seconds. Sucky. I denormalize one table so there is one fewer join, add indexes everywhere, and now the query runs in... 20 minutes. What the hell? Finally, I don't use a join at all, but rather a subquery with "where id in (...) order by" and now it runs in 1.5 seconds. Pretty decent. What in God's name is going on? I feel like if I actually understood what indexes were doing I could write some really good SQL. Anybody know some good tutorials? Thanks!

Read the article
Sphinx Mysql Tutorial Help

- by Frederico

I have just implemented the Sphinx Storage Engine, and created my first Sphinx table, I'm trying to convert from using my Fulltext MyISAM table, so I figured I would just dump all the data into a sphinx table.. obviously I was wrong. Are there any great tutorials that really take you through transforming a fulltext search on MySQL to a sphinx table? Thanks in advance.

Read the article
How Indices Cope with MVCC ?

- by geeko

Greetings Overflowers, To my understanding (and I hope I'm not right) changes to indices cannot be MVCCed. I'm wondering if this is also true with big records as copies can be costly. Since records are accessed via indices (usually), how MVCC can be effective ? Do, for e.g., indices keep track of different versions of MVCCed records ? Any recent good reading on this subject ? Really appreciated ! Regards

Read the article
What's the best way to index many-to-one relation with hibernate search?

- by tabdulin

I have an entity with many-to-one mapping. (Product 1-* Regions, unidirectional association) What is the best way to store index of such relation? So it can be easily used to filter search query .

Read the article
Adding more OR searches with CONTAINS Brings Query to Crawl

- by scolja

I have a simple query that relies on two full-text indexed tables, but it runs extremely slow when I have the CONTAINS combined with any additional OR search. As seen in the execution plan, the two full text searches crush the performance. If I query with just 1 of the CONTAINS, or neither, the query is sub-second, but the moment you add OR into the mix the query becomes ill-fated. The two tables are nothing special, they're not overly wide (42 cols in one, 21 in the other; maybe 10 cols are FT indexed in each) or even contain very many records (36k recs in the biggest of the two). I was able to solve the performance by splitting the two CONTAINS searches into their own SELECT queries and then UNION the three together. Is this UNION workaround my only hope? Thanks. SELECT a.CollectionID FROM collections a INNER JOIN determinations b ON a.CollectionID = b.CollectionID WHERE a.CollrTeam_Text LIKE '%fa%' OR CONTAINS(a.*, '"*fa*"') OR CONTAINS(b.*, '"*fa*"') Execution Plan (guess I need more reputation before I can post the image):

Read the article
Is it bad to have a non-clustered index that contains the primary key from the clustered index?

- by Don

If you have a table with a clustered index on the Primary Key (int), is it redundant and bad to have one (ore more) non-clustered indexes that include that primary key column as one of the columns in the non-clustered index?

Read the article
Index question: Select * with WHERE clause. Where and how to create index

- by Mestika

Hi, I’m working on optimizing some of my queries and I have a query that states: select * from SC where c_id ="+c_id” The schema of ** SC** looks like this: SC ( c_id int not null, date_start date not null, date_stop date not null, r_t_id int not null, nt int, t_p decimal, PRIMARY KEY (c_id, r_t_id, date_start, date_stop)); My immediate bid on how the index should be created is a covering index in this order: INDEX(c_id, date_start, date_stop, nt, r_t_id, t_p) The reason for this order I base on: The WHERE clause selects from c_id thus making it the first sorting order. Next, the date_start and date_stop to specify a sort of “range” to be defined in these parameters Next, nt because it will select the nt Next the r_t_id because it is a ID for a specific type of my r_t table And last the t_p because it is just a information. I don’t know if it is at all necessary to order it in a specific way when it is a SELECT ALL statement. I should say, that the SC is not the biggest table. I can say how many rows it contains but a estimate could be between <10 and 1000. The next thing to add is, that the SC, in different queries, inserts the data into the SC, and I know that indexes on tables which have insertions can be cost ineffective, but can I somehow create a golden middle way to effective this performance. Don't know if it makes a different but I'm using IBM DB2 version 9.7 database Sincerely Mestika

Read the article
Index for wildcard match of end of string

- by Anders Abel

I have a table of phone numbers, storing the phone number as varchar(20). I have a requirement to implement searching of both entire numbers, but also on only the last part of the number, so a typical query will be: SELECT * FROM PhoneNumbers WHERE Number LIKE '%1234' How can I put an index on the Number column to make those searchs efficient? Is there a way to create an index that sorts the records on the reversed string? Another option might be to reverse the numbers before storing them, which will give queries like: SELECT * FROM PhoneNumbers WHERE ReverseNumber LIKE '4321%' However that will require all users of the database to always reverse the string. It might be solved by storing both the normal and reversed number and having the reversed number being updated by a trigger on insert/update. But that kind of solution is not very elegant. Any other suggestions?

Read the article
Does Oracle 11g automatically index fields frequently used for full table scans?

- by gustafc

I have an app using an Oracle 11g database. I have a fairly large table (~50k rows) which I query thus: SELECT omg, ponies FROM table WHERE x = 4 Field x was not indexed, I discovered. This query happens a lot, but the thing is that the performance wasn't too bad. Adding an index on x did make the queries approximately twice as fast, which is far less than I expected. On, say, MySQL, it would've made the query ten times faster, at the very least. I'm suspecting Oracle adds some kind of automatic index when it detects that I query a non-indexed field often. Am I correct? I can find nothing even implying this in the docs.

Read the article
Are upper bounds of indexed ranges always assumed to be exclusive?

- by polygenelubricants

So in Java, whenever an indexed range is given, the upper bound is almost always exclusive. From java.lang.String: substring(int beginIndex, int endIndex) Returns a new string that is a substring of this string. The substring begins at the specified beginIndex and extends to the character at index endIndex - 1 From java.util.Arrays: copyOfRange(T[] original, int from, int to) from - the initial index of the range to be copied, inclusive to - the final index of the range to be copied, exclusive. From java.util.BitSet: set(int fromIndex, int toIndex) fromIndex - index of the first bit to be set. toIndex - index after the last bit to be set. As you can see, it does look like Java tries to make it a consistent convention that upper bounds are exclusive. My questions are: Is this the official authoritative recommendation? Are there notable violations that we should be wary of? Is there a name for this system? (ala "0-based" vs "1-based")

Read the article
Adding a clustered index to a SQL table: what dangers exist for a live production system?

- by MoSlo

Right, keep in mind i need to describe this by abstracting all possible confidential info: I've been put in charge of a 10-year old transactional system of which the majority business logic is implemented at database level (triggers, stored procedures etc). Win2000 server, MSSQL 2000 Enterprise. No immediate plans for replacing/updating the system are being considered :( The core process is a program that executes transactions - specifically, it executes a stored procedure with various parameters, lets call it sp_ProcessTrans. The program executes the stored procedure at asynchronous intervals. By itself, things work fine. But there are 30 instances of this program on remotely located workstations, all of them asynchronously executing sp_ProcessTrans and then retrieving data from the SQL server (execution is pretty regular - ranging 0 to 60 times a minute, depending on what items the program instance is responsible for) . Performance of the system has dropped considerably with 10 yrs of data growth: the reason is the deadlocks and specifically deadlock wait times. The deadlock is on the Employee table. I have discovered: In sp_ProcessTrans' execution, it selects from an Employee table 7 times (dont ask) The select is done on a field that is NOT the primary key No index exists on this field. Thus a table scan is performed. 7 times. per transaction So the reason for deadlocks is clear. I created a non-unique ordered clustered index on the field (field looks good, almost unique, NUM(7), very rarely changes). Immediate improvement in the test environment. The problem is that i cannot simulate the deadlocks in a test environment (I'd need 30 workstations; i'd need to simulate 'realistic' activity on those stations, so visualization is out). I need to know if i must schedule downtime. Creating an index shouldn't be a risky operation for MSSQL, but is there any danger (data corruption in transactions/select statements/extra wait time etc) to create this field index on the production database while the transactions are still taking place? (although i can select a time when transactions are fairly quiet through the 30 stations) Are there any hidden dangers i'm not seeing (not looking forward to needing to restore the DB if something goes wrong, restoring would take a lot of time with 10yrs of data).

Read the article
Clustered index on frequently changing reference table of one or more foreign keys

- by Ian

My specific concern is related to the performance of a clustered index on a reference table that has many rapid inserts and deletes. Table 1 "Collection" collection_pk int (among other fields) Table 2 "Item" item_pk int (among other fields) Reference Table "Collection_Items" collection_pk int, item_pk int (combined primary key) Because the primary key is composed of both pks, a clustered index is created and the data physically ordered in the table according to the combined keys. I have many users creating and deleting collections and adding and removing items to those collections very frequently affecting the "Collection_Items" table, and its clustered index. QUESTION PART: Since the "Collection_Items" table is so dynamic, wouldn't there be a big performance hit on constantly resorting the table rows because of the clustered index ? If yes, what should I do to minimize this ?

Read the article

< Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >