Search Results

Search found 1124 results on 45 pages for 'indexing'.

Page 40/45 | < Previous Page | 36 37 38 39 40 41 42 43 44 45  | Next Page >

  • MS Source Server - source stream is apparently not there when viewing with srctool

    - by Tim Peel
    Hi, I have been playing around with the MS Source Server stuff in the MS Debugging Tools install. At present, I am running my code/pdbs through the Subversion indexing command, which is now running as expected. It creates the stream for a given pdb file and writes it to the pdb file. However when I use that DLL and associated pdb in visual studio 2008, it says the source code cannot be retrieved. If I check the pdb against srctool is says none of the source files contained are indexed, which is very strange as the process prior ran fine. If I check the stream that was generated from the svnindex.cmd run for the pdb, srctool says all source files are indexed. Why would there be a difference? I have opened the pdb file in a text editor and I can see the original references to the source files on my machine (also under the srcsrv header name) and the new "injected" source server links to my subversion repository). Should both references still exist in the pdb? I would have expected one to be removed? Either way, visual studio 2008 will not pick up my source references so I am a bit lost as to what to try next. As far as I can tell, I have done everything I should have. Anyone have similar experiences? Many thanks.

    Read the article

  • What are the reasons to store documents into DBMS when using Alfresco CMS

    - by Julia
    Hello guys! I have interview for an internship with company that wants to implement document management system and they are considering on the first place open source solutions, their top choice being Alfresco, but decision is still not final, part of my work there would be to investigate is Alfresco the best solution. What I have seen from project description, is that they would implement Alfresco with MySQL database, and not to use DBMS just for document metadata and indexing, but they actually want to store documents inside. By company profile, type of documents would be mostly PDF and .doc, not images. I have researched a bit, and I have read all the topics here related to storing files into the database, not to duplicate a question. So from what I understand, storing BLOBS is generally not recomendable, and by the profile of the company and their legal obligations with archiving, I see they will have to store larger amount of docs. I would like to be ready as much as I can for the interview and that is why I would like your opinion on these questions: What will be your reasons for deciding to store documents into the DBMS, (especially having in mind that you are installing Alfresco, which stores files in the FS)??? Do you have any experiences with storing documents into the MySQL database specifically??? All the help is very much appreciated, I am really excited about interview and really want this internship, so this is one of things i really want to understand before!! Thank you!!!!

    Read the article

  • How can I test this SQL Server performance Utility?

    - by Martin Smith
    As part of my MSc I need to do a three month project later this year. I have decided to do something which will likely be useful for me in the workplace and spend the time getting to understand SQL Server internals. The deliverable for this project will be a performance advisor looking at a variety of different rules. Some static such as finding redundant indexes, some more dynamic such as using XEvents to find outlying invocations of stored procedure execution times when certain parameters are passed. I am struggling to come up with a good way of testing this though. I can obviously design a "bad" database and a synthetic workload that my tool will pick up issues on but I also need to demonstrate that it has real world utility. Looking at the self tuning database literature it is common to use TPC benchmarks but I've had a look at the TPCC site and it looks very time consuming to implement and not that good a fit to my project's testing needs in any event (I would still be able to "rig" it by the decisions I made on indexing or physical architecture). Plan A would be to find willing beta tester(s) but in the event that isn't possible I will need a fallback plan. The best idea I have come up with so far is to use the various MS sample applications as examples of real world applications. e.g. http://msftdpprodsamples.codeplex.com/ http://www.asp.net/community/projects/ Does anyone have any better suggestions?

    Read the article

  • using indexer to retrieve Linq to SQL object from datastore

    - by fearofawhackplanet
    class UserDatastore : IUserDatastore { ... public IUser this[Guid userId] { get { User user = (from u in _dataContext.Users where u.Id == userId select u).FirstOrDefault(); return user; } } ... } One of the developers in our team is arguing that an indexer in the above situation is not appropriate and that a GetUser(Guid id) method should be prefered. The arguments being that: 1) We aren't indexing into an in-memory collection, the indexer is basically performing a hidden SQL query 2) Using a Guid in an indexer is bad (FxCop flagged this also) 3) Returning null from an indexer isn't normal behaviour 4) An API user generally wouldn't expect any of this behaviour I agree to an extent with (most of) these points. But I'm also inclined to argue that one of the characteristics of Linq is to abstract the database access to make it appear that you're simply working with a bunch of collections, even though the lazy evaluation paradigm means those collections aren't evaluated until you run a query over them. It doesn't seem inconsistent to me to access the datastore in the same manner as if it was a concrete in-memory collection here. Also bearing in mind this is an inherited codebase which uses this pattern extensively and consistently, is it worth the refactoring? I accept that it might have been better to use a Get method from the start, but I'm not yet convinced that it's completely incorrect to be using an indexer. I'd be interested to hear all opinions, thanks.

    Read the article

  • What are the reasons to store documents into DBMS when using Alfresco DMS

    - by Julia
    Hello guys! I have interview for an internship with company that wants to implement document management system and they are considering on the first place open source solutions, their top choice being Alfresco, but decision is still not final, part of my work there would be to investigate is Alfresco the best solution. What I have seen from project description, is that they would implement Alfresco with MySQL database, and not to use DBMS just for document metadata and indexing, but they actually want to store documents inside. By company profile, type of documents would be mostly PDF and .doc, not images. I have researched a bit, and I have read all the topics here related to storing files into the database, not to duplicate a question. So from what I understand, storing BLOBS is generally not recomendable, and by the profile of the company and their legal obligations with archiving, I see they will have to store larger amount of docs. I would like to be ready as much as I can for the interview and that is why I would like your opinion on these questions: 1) What will be your reasons for deciding to store documents into the DBMS, (especially having in mind that you are installing Alfresco, which stores files in the FS)??? 2) Do you have any experiences with storing documents into the MySQL database specifically??? All the help is very much appreciated, I am really excited about interview and really want this internship, so this is one of things i really want to understand before!! Thank you!!!!

    Read the article

  • How to define index by several columns in hibernate entity?

    - by foobar
    Morning. I need to add indexing in hibernate entity. As I know it is possible to do using @Index annotation to specify index for separate column but I need an index for several fields of entity. I've googled and found jboss annotation @Table, that allows to do this (by specification). But (I don't know why) this functionality doesn't work. May be jboss version is lower than necessary, or maybe I don't understant how to use this annotation, but... complex index is not created. Why index may not be created? jboss version 4.2.3.GA Entity example: package somepackage; import org.hibernate.annotations.Index; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.GeneratedValue; import javax.persistence.Id; @Entity @org.hibernate.annotations.Table(appliesTo = House.TABLE_NAME, indexes = { @Index(name = "IDX_XDN_DFN", columnNames = {House.XDN, House.DFN} ) } ) public class House { public final static String TABLE_NAME = "house"; public final static String XDN = "xdn"; public final static String DFN = "dfn"; @Id @GeneratedValue private long Id; @Column(name = XDN) private long xdn; @Column(name = DFN) private long dfn; @Column private String address; public long getId() { return Id; } public void setId(long id) { this.Id = id; } public long getXdn() { return xdn; } public void setXdn(long xdn) { this.xdn = xdn; } public long getDfn() { return dfn; } public void setDfn(long dfn) { this.dfn = dfn; } public String getAddress() { return address; } public void setAddress(String address) { this.address = address; } } When jboss/hibernate tries to create table "house" it throws following exception: Reason: org.hibernate.AnnotationException: @org.hibernate.annotations.Table references an unknown table: house

    Read the article

  • How to identify HTML elements (in javascript) in a browser agnostic fashion

    - by gatapia
    Hi All, I need to identify all html elements on a page in a browser agnostic fashion. What I am basically doing is using mouse events to record clicks on the page. I need to record which element was clicked. So I add a mouse down listener to the document.body element. And on mouse down I get the element under the mouse. Lets say its a div. I then use the index of that div inside the document.getElementsByTagName('*') nodelist and the nodeName ('div') to identify that div. A sample element id would be div45 which means its a div and its the 45th element in the '*' nodelist. This is all fine and good until I use IE which gives me different indexes. So div45 in FireFox may be div47 in IE. Anyone have any ideas? I just need the id of all elements on the page to be the same in any browser, perhaps indexing is not good enough but I really don't have any more ideas. Thanks Guido

    Read the article

  • Radix Sort in Python [on hold]

    - by Steven Ramsey
    I could use some help. How would you write a program in python that implements a radix sort? Here is some info: A radix sort for base 10 integers is a based on sorting punch cards, but it turns out the sort is very ecient. The sort utilizes a main bin and 10 digit bins. Each bin acts like a queue and maintains its values in the order they arrive. The algorithm begins by placing each number in the main bin. Then it considers the ones digit for each value. The rst value is removed and placed in the digit bin corresponding to the ones digit. For example, 534 is placed in digit bin 4 and 662 is placed in the digit bin 2. Once all the values in the main bin are placed in the corresponding digit bin for ones, the values are collected from bin 0 to bin 9 (in that order) and placed back in the main bin. The process continues with the tens digit, the hundreds, and so on. After the last digit is processed, the main bin contains the values in order. Use randint, found in random, to create random integers from 1 to 100000. Use a list comphrension to create a list of varying sizes (10, 100, 1000, 10000, etc.). To use indexing to access the digits rst convert the integer to a string. For this sort to work, all numbers must have the same number of digits. To zero pad integers with leading zeros, use the string method str.zfill(). Once main bin is sorted, convert the strings back to integers. I'm not sure how to start this, Any help is appreciated. Thank you.

    Read the article

  • How to access C arrays from assembler for Windows x64?

    - by 0xdword32
    I've written an assembler function to speed up a few things for image processing (images are created with CreateDIBSection). For Win32 the assembler code works without problems, but for Win64 I get a crash as soon as I try to access my array data. I put the relevant info in a struct and my assembler function gets a pointer to this struct. The struct pointer is put into ebx/rbx and with indexing I read the data from the struct. Any idea what I am doing wrong? I use nasm together with Visual Studio 2008 and for Win64 I set "default rel". C++ code: struct myData { tUInt32 ulParam1; void* pData; }; CallMyAssemblerFunction(&myData); Assembler Code: Win32: ... push ebp; mov ebp,esp mov ebx, [ebp + 8]; pointer to our struct mov eax, [ebx]; ulParam1 mov esi, [ebx + 4]; pData, 4 byte pointer movd xmm0, [esi]; ... Win64: ... mov rbx, rcx; pointer to our struct mov eax, [rbx]; ulParam1 mov rsi, [rbx + 4]; pData, 8 byte pointer movd xmm0, [rsi]; CRASH! ...

    Read the article

  • SQL Server 2005, wide indexes, computed columns, and sargable queries

    - by luksan
    In my database, assume we have a table defined as follows: CREATE TABLE [Chemical]( [ChemicalId] int NOT NULL IDENTITY(1,1) PRIMARY KEY, [Name] nvarchar(max) NOT NULL, [Description] nvarchar(max) NULL ) The value for Name can be very large, so we must use nvarchar(max). Unfortunately, we want to create an index on this column, but nvarchar(max) is not supported inside an index. So we create the following computed column and associated index based upon it: ALTER TABLE [Chemical] ADD [Name_Indexable] AS LEFT([Name], 20) CREATE INDEX [IX_Name] ON [Chemical]([Name_Indexable]) INCLUDE([Name]) The index will not be unique but we can enforce uniqueness via a trigger. If we perform the following query, the execution plan results in a index scan, which is not what we want: SELECT [ChemicalId], [Name], [Description] FROM [Chemical] WHERE [Name]='[1,1''-Bicyclohexyl]-2-carboxylic acid, 4'',5-dihydroxy-2'',3-dimethyl-5'',6-bis[(1-oxo-2-propen-1-yl)oxy]-, methyl ester' However, if we modify the query to make it "sargable," then the execution plan results in an index seek, which is what we want: SELECT [ChemicalId], [Name], [Description] FROM [Chemical] WHERE [Indexable_Name]='[1,1''-Bicyclohexyl]-' AND [Name]='[1,1''-Bicyclohexyl]-2-carboxylic acid, 4'',5-dihydroxy-2'',3-dimethyl-5'',6-bis[(1-oxo-2-propen-1-yl)oxy]-, methyl ester' Is this a good solution if we control the format of all queries executed against the database via our middle tier? Is there a better way? Is this a major kludge? Should we be using full-text indexing?

    Read the article

  • database design suggestion needed

    - by JMSA
    I need to design a table for daily sales of pharmaceutical products. There are hundreds of types of products available {Name, code}. Thousands of sales-persons are employed to sell those products{name, code}. They collect products from different depots{name, code}. They work in different Areas - Zones - Markets - Outlets, etc. {All have names and codes} Each product has various types of prices {Production Price, Trade Price, Business Price, Discount Price, etc.}. And, sales-persons are free to choose from those combination to estimate the sales price. The problem is, daily sales requires huge amount of data-entry. Within couple of years there may be gigabytes of data (if not terabytes). If I need to show daily, weekly, monthly, quarterly and yearly sales reports there will be various types of sql queries I shall need. This is my initial design: Product {ID, Code, Name, IsActive} ProductXYZPriceHistory {ID, ProductID, Date, EffectDate, Price, IsCurrent} SalesPerson {ID, Code, Name, JoinDate, and so on..., IsActive} SalesPersonSalesAraeaHistory {ID, SalesPersonID, SalesAreaID, IsCurrent} Depot {ID, Code, Name, IsActive} Outlet {ID, Code, Name, AreaID, IsActive} AreaHierarchy {ID, Code, Name, PrentID, AreaLevel, IsActive} DailySales {ID, ProductID, SalesPersonID, OutletID, Date, PriceID, SalesPrice, Discount, etc...} Now, apart from indexing, how can I normalize my DailySales table to have a fine grained design that I shall not need to change for years to come? Please show me a sample design of only the DailySales data-entry table (from which all types of reports would be queried) on the basis of above information. I don't need a detailed design advice. I just need an advice regarding only the DailySales table. Is there any way to break this particular table to achieve granularity?

    Read the article

  • Mysql Server Optimization

    - by Ish Kumar
    Hi Geeks, We are having serious MySQL(InnoDB) performance issues at a moment when we do: (10-20) insertions on TABLE1 (10-20) updates on TABLE2 Note: Both above operations happens within fraction of a second. And this occurs every few (10-15) minutes. And all online users (approx 400-600) doing read operation on join of TABLE1 & TABLE2 every 1 second. Here is our mysql configuration info: http://docs.google.com/View?id=dfrswh7c_117fmgcmb44 Issues: Lot queries wait and expire later (saw it from phpmyadmin / processes). My poor MySQL server crashes sometimes Questions Q1: Any suggestions to optimize at MySQL level? Q2: I thinking to use persistent connections at application level, is it right? Info Added Later: Database Engine: InnoDB TABLE1 : 400,000 rows (inserting 8,000 daily) & TABLE2: 8,000 rows 1 second query: SELECT b.id, b.user_id, b.description, b.debit, b.created, b.price, u.username, u.email, u.mobile FROM TABLE1 b, TABLE2 u WHERE b.credit = 0 AND b.user_id = u.id AND b.auction_id = "12345" ORDER BY b.id DESC LIMIT 10; // there are few more but they are not so critical. Indexing is good, we are using them wisely. In above query all id's are indexed And TABLE1 has frequent insertions and TABLE2 has frequent updates.

    Read the article

  • "Exclusive" DirectDraw palette isn't actually exclusive

    - by CyberShadow
    We're maintaining an old video game that uses a full-screen 256-color graphics mode with DirectDraw. The problem is, some applications running in the background sometimes try to change the system palette while the game is running, which results in corrupted graphics. We can (sometimes) detect when this happens by processing the WM_PALETTECHANGED message. A few update versions ago we added logging (just log the window title/class/process name), which helped users identify offending applications and close them. MSN Live Messenger was a common culprit. The problem got worse when we found out that Windows Vista (and 7) does it "by itself". The WM_PALETTECHANGED parameters point towards CSRSS and the desktop window. In Vista, a workaround that often worked was to open any folder (Computer, Documents, etc.) and leave it open while running the game. Sounds ridiculous, but it worked - in most cases. In Windows 7, not even this workaround worked any more. Users found that stopping some services (Windows Update and the indexing service) also resolved the problem on some configurations. Some time ago I just started trying random things in hope of finding a solution. I found that setting the GDI palette (using Create/SelectPalette) before setting the DirectDraw palette (using IDirectDrawPalette::SetEntries) would restore the palette after it became corrupted (WM_PALETTECHANGED handler). SetSystemPaletteUse and calling SetPalette on the primary surface helped some more. However, there is still perceivable flickering when an application tries to steal the palette, which is especially prominent during fades. Question: is there a way to get a "real" exclusive palette, which completely disallows other applications changing the Windows palette as long as our game retains focus?

    Read the article

  • Rails - handling global site settings

    - by egarcia
    I'm developing a new rails application which is supposed to be installed several times in order to implement several sites. There are some things, like the "Site Title" or the "Default Number of Items per Page" that clearly belong to a "global settings" table / config file. I've made a list of the things I think I'll need: ActiveRecord model that is capable of: Storing different kinds of data. I suppose this would be accomplished encoding the values on a string on the db, probably with a "type" field. Indexing settings by name Validations based on a "type" attribute (i.e. don't accept invalid dates on "date" settings) Validations based on a allows_nil property. A controller that allows me to change settings via views. I'm pretty sure I could implement this myself, but I'm not willing to reinvent the wheel. I've done some searching, but I could only find rails-settings, which doesn't really serve me: I need a proper model & controller so I can use declarative-authorization, and it does not provide any controller or view facilities. Is there a gem or plugin out there that implements what I want, or any library I should look at? Thanks a lot.

    Read the article

  • Pronunciation of programming structures (particularly in c#)

    - by Andrzej Nosal
    As a non-English speaking person I often have problems pronouncing certain programming structures and abbreviations. I've been watching some video tutorials and listening to podcasts as well, though I couldn't catch them all. My question is what is the common or correct pronunciation of the following code snippets? Generics, like IEnumerable<int> or in a method void Swap<T>(T lhs, T rhs) Collections indexing and indexer access e.g. garage[i], rectangular arrays myArray[2,1] or jagged[1][2][3] Lambda operator =>, e.g. in a where extension method .Where(animal => animal.Color == Color.Brown) or in an anonymous method () => { return false;} Inheritance class Derived : Base (extends?) class SomeClass : IDisposable (implements?) Arithemtic operators += -= *= /= %= ! Are += and -= pronounced the same for events? Collections initializers new int[] { 4, 5, 8, 9, 12, 13, 16, 17 }; Casting MyEnum foo = (MyEnum)(int)yourFloat; (as?) Nullables DateTime? dt = new DateTime?(); I tagged the question with C# as some of them are specific to C# only.

    Read the article

  • Selecting a good SQL Server 2008 spatial index with large polygons

    - by andynormancx
    I'm having some fun trying to pick a decent SQL Server 2008 spatial index setup for a data set I am dealing with. The dataset is polygons, representing contours over the whole globe. There are 106,000 rows in the table, the polygons are stored in a geometry field. The issue I have is that many of the polygons cover a large portion of the globe. This seems to make it very hard to get a spatial index that will eliminate many rows in the primary filter. For example, look at the following query: SELECT "ID","CODE","geom".STAsBinary() as "geom" FROM "dbo"."ContA" WHERE "geom".Filter( geometry::STGeomFromText('POLYGON ((-142.03193662573682 59.53396984952896, -142.03193662573682 59.88928136451884, -141.32743833481925 59.88928136451884, -141.32743833481925 59.53396984952896, -142.03193662573682 59.53396984952896))', 4326) ) = 1 This is querying an area which intersects with only two of the polygons in the table. No matter what combination of spatial index settings I chose, that Filter() always returns around 60,000 rows. Replacing Filter() with STIntersects() of course returns just the two polygons I want, but of course takes much longer (Filter() is 6 seconds, STIntersects() is 12 seconds). Can anyone give me any hints on whether there is a spatial index setup that is likely to improve on 60,000 rows or is my dataset just not a good match for SQL Server's spatial indexing ?

    Read the article

  • Lucene (.NET) Document stucture and performance suggestions.

    - by Josh Handel
    Hello, I am indexing about 100M documents that consist of a few string identifiers and a hundred or so numaric terms.. I won't be doing range queries, so I haven't dugg too deep into Numaric Field but I'm not thinking its the right choose here. My problem is that the query performance degrades quickly when I start adding OR criteria to my query.. All my queries are on specific numaric terms.. So a document looks like StringField:[someString] and N DataField:[someNumber].. I then query it with something like DataField:((+1 +(2 3)) (+75 +(3 5 52)) (+99 +88 +(102 155 199))). Currently these queries take about 7 to 16 seconds to run on my laptop.. I would like to make sure thats really the best they can do.. I am open to suggestions on field structure and query structure :-). Thanks Josh PS: I have already read over all the other lucene performance discussions on here, and on the Lucene wiki and at lucid imiagination... I'm a bit further down the rabbit hole then that...

    Read the article

  • Using the Search API with Sharepoint Foundation 2010 - 0 results

    - by MB
    I am a sharepoint newbee and am having trouble getting any search results to return using the search API in Sharepoint 2010 Foundation. Here are the steps I have taken so far. The Service Sharepoint Foundation Search v4 is running and logged in as Local Service Under Team Site - Site Settings - Search and Offline Availability, Indexing Site Content is enabled. Running the PowerShell script Get-SPSearchServiceInstance returns TypeName : SharePoint Foundation Search Description : Search index file on the search server Id : 91e01ce1-016e-44e0-a938-035d37613b70 Server : SPServer Name=V-SP2010 Service : SPSearchService Name=SPSearch4 IndexLocation : C:\Program Files\Common Files\Microsoft Shared\Web Server Exten sions\14\Data\Applications ProxyType : Default Status : Online When I do a search using the search textbox on the team site I get a results as I would expect. Now, when I try to duplicate the search results using the Search API I either receive an error or 0 results. Here is some sample code: using Microsoft.SharePoint.Search.Query; using (var site = new SPSite(_sharepointUrl, token)) { // FullTextSqlQuery fullTextSqlQuery = new FullTextSqlQuery(site) { QueryText = String.Format("SELECT Title, SiteName, Path FROM Scope() WHERE \"scope\"='All Sites' AND CONTAINS('\"{0}\"')", searchPhrase), //QueryText = String.Format("SELECT Title, SiteName, Path FROM Scope()", searchPhrase), TrimDuplicates = true, StartRow = 0, RowLimit = 200, ResultTypes = ResultType.RelevantResults //IgnoreAllNoiseQuery = false }; ResultTableCollection resultTableCollection = fullTextSqlQuery.Execute(); ResultTable result = resultTableCollection[ResultType.RelevantResults]; DataTable tbl = new DataTable(); tbl.Load(result, LoadOption.OverwriteChanges); } When the scope is set to All Sites I retrieve an error about the search scope not being available. Other search just return 0 results. Any ideas about what I am doing wrong?

    Read the article

  • Mysql Database Question about Large Columns

    - by murat
    Hi, I have a table that has 100.000 rows, and soon it will be doubled. The size of the database is currently 5 gb and most of them goes to one particular column, which is a text column for PDF files. We expect to have 20-30 GB or maybe 50 gb database after couple of month and this system will be used frequently. I have couple of questions regarding with this setup 1-) We are using innodb on every table, including users table etc. Is it better to use myisam on this table, where we store text version of the PDF files? (from memory usage /performance perspective) 2-) We use Sphinx for searching, however the data must be retrieved for highlighting. Highlighting is done via sphinx API but still we need to retrieve 10 rows in order to send it to Sphinx again. This 10 rows may allocate 50 mb memory, which is quite large. So I am planning to split these PDF files into chunks of 5 pages in the database, so these 100.000 rows will be around 3-4 million rows and couple of month later, instead of having 300.000-350.000 rows, we'll have 10 million rows to store text version of these PDF files. However, we will retrieve less pages, so again instead of retrieving 400 pages to send Sphinx for highlighting, we can retrieve 5 pages and it will have a big impact on the performance. Currently, when we search a term and retrieve PDF files that have more than 100 pages, the execution time is 0.3-0.35 seconds, however if we retrieve PDF files that have less than 5 pages, the execution time reduces to 0.06 seconds, and it also uses less memory. Do you think, this is a good trade-off? We will have million of rows instead of having 100k-200k rows but it will save memory and improve the performance. Is it a good approach to solve this problem and do you have any ideas how to overcome this problem? The text version of the data is used only for indexing and highlighting. So, we are very flexible. Thanks,

    Read the article

  • How to structure an index for type ahead for extremely large dataset using Lucene or similar?

    - by Pete
    I have a dataset of 200million+ records and am looking to build a dedicated backend to power a type ahead solution. Lucene is of interest given its popularity and license type, but I'm open to other open source suggestions as well. I am looking for advice, tales from the trenches, or even better direct instruction on what I will need as far as amount of hardware and structure of software. Requirements: Must have: The ability to do starts with substring matching (I type in 'st' and it should match 'Stephen') The ability to return results very quickly, I'd say 500ms is an upper bound. Nice to have: The ability to feed relevance information into the indexing process, so that, for example, more popular terms would be returned ahead of others and not just alphabetical, aka Google style. In-word substring matching, so for example ('st' would match 'bestseller') Note: This index will purely be used for type ahead, and does not need to serve standard search queries. I am not worried about getting advice on how to set up the front end or AJAX, as long as the index can be queried as a service or directly via Java code. Up votes for any useful information that allows me to get closer to an enterprise level type ahead solution

    Read the article

  • File Storage for Web Applications: Filesystem vs DB vs NoSQL engines

    - by El Yobo
    I have a web application that stores a lot of user generated files. Currently these are all stored on the server filesystem, which has several downsides for me. When we move "folders" (as defined by our application) we also have to move the files on disk (although this is more due to strange design decisions on the part of the original developers than a requirement of storing things on the filesystem). It's hard to write tests for file system actions; I have a mock filesystem class that logs actions like move, delete etc, without performing them, which more or less does the job, but I don't have 100% confidence in the tests. I will be adding some other jobs which need to access the files from other service to perform additional tasks (e.g. indexing in Solr, generating thumbnails, movie format conversion), so I need to get at the files remotely. Doing this over network shares seems dodgy... Dealing with permissions on the filesystem as sometimes given us problems in the past, although now that we've moved to a pure Linux environment this should be less of an issue. What are the downsides of storing files as BLOBs in MySQL? I guess that it would massively increase the database size and reduce the effectiveness of caches, but are there other problems? Do the same problems exist with NoSQL systems like Cassandra? Does anyone have any other suggestions that might be appropriate?

    Read the article

  • Need a tool to search large structure text documents for words, phrases and related phrases

    - by pitosalas
    I have to keep up with structured documents containing things such as requests for proposals, government program reports, threat models and all kinds of things like that. They are in techno-legalese as I would call them: highly structured, with section numbering and 3, 4 and 5 levels of nesting. All in English I need a more efficient way to locate those paragraphs of nuggets that matter to me. So what I’d like is kind of a local document index/repository, that would allow me to have some standing queries and easily locate sections in documents that talk about my queries. Here’s an example: I’d like to load in 10 large PDF files, each of say 100 pages. Each PDF contains English text, formatted very nicely into paragraphs and sections. I’d like to specify that I am interested in “blogging platforms”, “weaknesses in Ruby”, “localization and internationalization” Ideally then look at a list that showed the section of text, the name of the document, and other information that seemed to be related to and/or include the words and phrases I specified. I am sure something like this exists. I would call it something like document indexing, document comprehension or structured searching.

    Read the article

  • Django sitemap intermittent www

    - by Jen Z
    The automatic sitemap for my Django site fluctuates between including the www on urls and leaving it out (I'm aiming to have it in all the time). This has ramifications in google not indexing my pages properly so I'm trying to narrow down what would be causing this issue. I have set PREPEND_WWW = True and my site record in the sites framework is set to include the www e.g. it's set to www.example.com as opposed to example.com. I'm using memcached but pages should expire from the cache after 48 hours so I wouldn't have thought that would be causing the issue? You can see the problem in effect at http://www.livingspaceltd.co.uk/sitemap.xml (refresh the page a few times). My sitemaps setup is fairly prosaic so I'm doubtful that that is the issue, but in case it's something obvious I'm missing here's the code: ***urls.py*** sitemaps = { 'subpages': Subpages_Sitemap, 'standalone_pages': Standalone_Sitemap, 'categories': Categories_Sitemap, } urlpatterns = patterns('', (r'^sitemap\.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps}), ... ***sitemaps.py*** # -*- coding: utf-8 -*- from django_ls.livingspace.models import Page, Category, Standalone_Page, Subpage from django.contrib.sitemaps import Sitemap class Subpages_Sitemap(Sitemap): changefreq = "monthly" priority = 0.4 def items(self): return Subpage.objects.filter(restricted_to__isnull=True) class Standalone_Sitemap(Sitemap): changefreq = "weekly" priority = 1 def items(self): return Standalone_Page.objects.all() class Categories_Sitemap(Sitemap): changefreq = "weekly" priority = 0.7 def items(self): return Category.objects.all()

    Read the article

  • Improve speed of a JOIN in MySQL

    - by ran2
    Dear all, I know there a similar threads around, but this is really the first time I realize that query speed might affect me - so it´s not that easy for me to really make the transfer from other folks problems. That being said I have using the following query successfully with smaller data, but if I use it on what are mildly large tables (about 120,000 records). I am waiting for hours. INSERT INTO anothertable (id,someint1,someint1,somevarchar1,somevarchar1) SELECT DISTINCT md.id,md.someint1,md.someint1,md.somevarchar1,pd.somevarchar1 from table1 AS md JOIN table2 AS pd ON (md.id = pd.id); Tables 1 and 2 contain about 120,000 records. The query has been running for almost 2 hours right now. Is this normal? Do I just have to wait. I really have no idea, but I am pretty sure that one could do it better since it´s my very first try. I read about indexing, but dont know yet what to index in my case? Thanks for any suggestions - feel free to point my to the very beginners guides ! best matt

    Read the article

  • Is there a limit for the number of files in a directory on an SD card?

    - by jamesh
    I have a project written for Android devices. It generates a large number of files, each day. These are all text files and images. The app uses a database to reference these files. The app is supposed to clear up these files after a little use (perhaps after a few days), but this process may or may not be working. This is not the subject of this question. Due to a historic accident, the organization of the files are somewhat naive: everything is in the same directory; a .hidden directory which contains a zero byte .nomedia file to prevent the MediaScanner indexing it. Today, I am seeing an error reported: java.io.IOException: Cannot create: /sdcard/.hidden/file-4200.html at java.io.File.createNewFile(File.java:1263) Regarding the sdcard, I see it has plenty of storage left, but counting $ cd /Volumes/NO_NAME/.hidden $ ls | wc -w 9058 Deleting a number of files seems to have allowed the file creation for today to proceed. Regrettably, I did not try touching a new file to try and reproduce the error on a commandline; I also deleted several hundred files rather than a handful. However, my question is: are there hard limits on filesize or number of files in a directory? am I even on the right track here? Nota Bene: The SD card is as-is - i.e. I haven't formatted it, so I would guess it would be a FAT-* format. The FAT-32 format has hard limits of filesize of 2GB (well above the filesizes I am dealing with) and a limit of number of files in the root directory. I am definitely not writing files in the root directory.

    Read the article

< Previous Page | 36 37 38 39 40 41 42 43 44 45  | Next Page >