Search Results

Search found 1631 results on 66 pages for 'optimize'.

Page 53/66 | < Previous Page | 49 50 51 52 53 54 55 56 57 58 59 60  | Next Page >

  • Which technology is best suited to store and query a huge readonly graph?

    - by asmaier
    I have a huge directed graph: It consists of 1.6 million nodes and 30 million edges. I want the users to be able to find all the shortest connections (including incoming and outgoing edges) between two nodes of the graph (via a web interface). At the moment I have stored the graph in a PostgreSQL database. But that solution is not very efficient and elegant, I basically need to store all the edges of the graph twice (see my question PostgreSQL: How to optimize my database for storing and querying a huge graph). It was suggested to me to use a GraphDB like neo4j or AllegroGraph. However the free version of AllegroGraph is limited to 50 million nodes and also has a very high-level API (RDF), which seems too powerful and complex for my problem. Neo4j on the other hand has only a very low level API (and the python interface is not mature yet). Both of them seem to be more suited for problems, where nodes and edges are frequently added or removed to a graph. For a simple search on a graph, these GraphDBs seem to be too complex. One idea I had would be to "misuse" a search engine like Lucene for the job, since I'm basically only searching connections in a graph. Another idea would be, to have a server process, storing the whole graph (500MB to 1GB) in memory. The clients could then query the server process and could transverse the graph very quickly, since the graph is stored in memory. Is there an easy possibility to write such a server (preferably in Python) using some existing framework? Which technology would you use to store and query such a huge readonly graph?

    Read the article

  • Query optimization (OR based)

    - by john194
    I have googled but I can't find answers for these questions. Your advice is appreciated. centOS on vps with 512MB RAM, nginx, php5 (fastcgi), mysql5 (myisam, not innodb). I need to optimize this app created by some ex-employee. This app is working, but it's slow. Table: t1(id[bigint(20)],c1[mediumtext],c2[mediumtext],c3[mediumtext],c4[mediumtext]) id is some random big number, and is PK Those mediumtext rows look like this: c1="|box-002877|" c2="|ct-2348|rd-11124854|hw-3949|wd-8872|hw-119037736|...etc.. " c3="|fg-2448|wd-11172|hw-1656|...etc.. " c4="|hg-2448|qd-16667|...etc." (some columns contain a lot of data, around 900 KiB, database around 300 MiB) Yes, mediumtext "is bad", and (20) is too big... but I didn't create this. Those codes can be found on any of those 4 mediumtext's... //he needs all the columns of the row containing $code, so he wrote this: function f1($code) { SELECT * FROM t1 WHERE c1 LIKE '%$code%' OR c2 LIKE '%$code%' OR c3 LIKE '%$code%' OR c4 LIKE '%$code%'; Questions: Q1. If $code is found on c1... mysql automatically stops checking and returns row=id+c1+c2+c3+c4? or it will continue (wasting time) checking c2, c3 and c4?... Q2. Mysql is working with this table on disk (not RAM) because of the mediumtext, right? is this the primary cause of slowness? Q3. That query can be cached by mysql (if using a big query_cache_size=128M value on the my.cnf)? or that's not cacheable due to the mediumtexts, or due to the "OR LIKE"...? Q4. Do you recommend rewriting this with mysql's INSTR() / LOCATE() / MATCH..AGAINST [FULLTEXT]?

    Read the article

  • How to implement Excel Solver functionality in C#?

    - by Vic
    Hi, I have an application in C#, I need to do some optimization calculations, like Excel Solver Add-in does, one option is certainly to write my own solver implementation, but I'm kind of short of time, so I'm looking into libraries that already exist that can help me with this. I've been trying the Microsoft Solver Foundation, which seems pretty neat and cool, the problem is that it doesn't seem to work with the kind of calculations that I need to do. At the end of this question I'm adding the information about the calculations I need to perform and optimize. So basically my question is if any of you know of any other library that I can use for this purpose, or any tutorial that can help to do my own solver, or any idea that gives me a lead to solve this issue. Thanks. Additional Info: This is the data I need to calculate: I have 7 variables, lets call them var1, var2,...,var7 The constraints for these variables are: All of them need to be 0 <= varn <= 0.5 (where n is the number of the variable) The sum of all the variables should be equal to 1 The objective is to maximize the target formula, which in Excel looks like this: (MMULT(TRANSPOSE(L26:L32),M14:M20)) / (SQRT(MMULT(MMULT(TRANSPOSE(L26:L32),M4:S10),L26:L32))) The range that you see in this formula, L26:L32, is actually the range with the variables from above, var1, var2,..., varn. M14:M20 and M4:S10 are ranges with data that I get from different sources, there are more likely decimal values. As I said before, I was using Microsoft Solver Foundation, I modeled pretty much everything with it, I created functions that handle the operations of the target formula, but when I tried to solve the model it always fail, I think it is because of the complexity of the operations. In any case, I just wanted to show these data so you can have an idea about the kind of calculations that I need to implement.

    Read the article

  • php: showing my country based on my IP, mysql optimized

    - by andufo
    I'm downloaded WIPmania's worldip table from http://www.wipmania.com/en/base/ -- the table has 3 fields and around 79k rows: startip // example: 3363110912 endip // example: 3363112063 country // example: AR (Argentina) So, lets suppose i'm in Argentina and my IP address is: 200.117.248.17 1) I use this function to convert my ip to long function ip_address_to_number($ip) { if(!$ip) { return false; } else { $ip = split('\.',$ip); return($ip[0]*16777216 + $ip[1]*65536 + $ip[2]*256 + $ip[3]); } } 2) I search for the proper country code by matching the long converted ip: $sql = 'SELECT * FROM worldip WHERE '.ip_address_to_number($_SERVER['REMOTE_ADDR']).' BETWEEN startip AND endip'; which is equivalent to: SELECT country FROM worldip WHERE 3363174417 BETWEEN startip AND endip (benchmark: Showing rows 0 - 0 (1 total, Query took 0.2109 sec)) Now comes the real question. What if another bunch of argentinian guys also open the website and they all have these ip addresses: 200.117.248.17 200.117.233.10 200.117.241.88 200.117.159.24 Since i'm caching all the sql queries; instead of matching EACH of the ip queries in the database, would it be better (and right) just to match the 2 first sections of the ip by modifying the function like this? function ip_address_to_number($ip) { if(!$ip) { return false; } else { $ip = split('\.',$ip); return($ip[0]*16777216 + $ip[1]*65536); } } (notice that the 3rd and 4th splitted values of the IP have been removed). That way instead of querying these 4 values: 3363174417 3363170570 3363172696 3363151640 ...all i have to query is: 3363110912 (which is 200.117.0.0 converted to long). Is this right? any other ideas to optimize this process? Thanks!

    Read the article

  • Javascript scope chain

    - by Geromey
    Hi, I am trying to optimize my program. I think I understand the basics of closure. I am confused about the scope chain though. I know that in general you want a low scope (to access variables quickly). Say I have the following object: var my_object = (function(){ //private variables var a_private = 0; return{ //public //public variables a_public : 1, //public methods some_public : function(){ debugger; alert(this.a_public); alert(a_private); }; }; })(); My understanding is that if I am in the some_public method I can access the private variables faster than the public ones. Is this correct? My confusion comes with the scope level of this. When the code is stopped at debugger, firebug shows the public variable inside the this keyword. The this word is not inside a scope level. How fast is accessing this? Right now I am storing any this.properties as another local variable to avoid accessing it multiple times. Thanks very much!

    Read the article

  • Codechef practice question help needed - find trailing zeros in a factorial

    - by manugupt1
    I have been working on this for 24 hours now, trying to optimize it. The question is how to find the number of trailing zeroes in factorial of a number in range of 10000000 and 10 million test cases in about 8 secs. The code is as follows: #include<iostream> using namespace std; int count5(int a){ int b=0; for(int i=a;i>0;i=i/5){ if(i%15625==0){ b=b+6; i=i/15625; } if(i%3125==0){ b=b+5; i=i/3125; } if(i%625==0){ b=b+4; i=i/625; } if(i%125==0){ b=b+3; i=i/125; } if(i%25==0){ b=b+2; i=i/25; } if(i%5==0){ b++; } else break; } return b; } int main(){ int l; int n=0; cin>>l; //no of test cases taken as input int *T = new int[l]; for(int i=0;i<l;i++) cin>>T[i]; //nos taken as input for the same no of test cases for(int i=0;i<l;i++){ n=0; for(int j=5;j<=T[i];j=j+5){ n+=count5(j); //no of trailing zeroes calculted } cout<<n<<endl; //no for each trialing zero printed } delete []T; } Please help me by suggesting a new approach, or suggesting some modifications to this one.

    Read the article

  • What computer science topic am I trying to describe?

    - by ItzWarty
    I've been programming for around... 6-8 years, and I've begun to realize that I don't really know what really happens at the low-ish level when I do something like int i = j%348 The thing is, I know what j%348 does, it divides j by 348 and finds the remainder. What I don't know is HOW the computer does this. Similarly, I know that try { blah(); }catch(Exception e){ blah2(); } will invoke blah and if blah throws, it will invoke blah2... however, I have no idea how the computer does this instead of err... crashing or ending execution. And I figure that in order for me to get "better" at programming, I should probably know what my code is really doing. [This would probably also help me optimize and... err... not do stupid things] I figure that what I'm asking for is probably something huge taught in universities or something, but to be honest, if I could learn a little, I would be happy. The point of the question is: What topic/computer-science-course am I asking about? Because in all honesty, I don't know. Since I don't know what the topic is called, I'm unable to actually find a book or online resource to learn about the topic, so I'm sort of stuck. I'd be eternally thankful if someone helped me =/

    Read the article

  • Fast JSON serialization (and comparison with Pickle) for cluster computing in Python?

    - by user248237
    I have a set of data points, each described by a dictionary. The processing of each data point is independent and I submit each one as a separate job to a cluster. Each data point has a unique name, and my cluster submission wrapper simply calls a script that takes a data point's name and a file describing all the data points. That script then accesses the data point from the file and performs the computation. Since each job has to load the set of all points only to retrieve the point to be run, I wanted to optimize this step by serializing the file describing the set of points into an easily retrievable format. I tried using JSONpickle, using the following method, to serialize a dictionary describing all the data points to file: def json_serialize(obj, filename, use_jsonpickle=True): f = open(filename, 'w') if use_jsonpickle: import jsonpickle json_obj = jsonpickle.encode(obj) f.write(json_obj) else: simplejson.dump(obj, f, indent=1) f.close() The dictionary contains very simple objects (lists, strings, floats, etc.) and has a total of 54,000 keys. The json file is ~20 Megabytes in size. It takes ~20 seconds to load this file into memory, which seems very slow to me. I switched to using pickle with the same exact object, and found that it generates a file that's about 7.8 megabytes in size, and can be loaded in ~1-2 seconds. This is a significant improvement, but it still seems like loading of a small object (less than 100,000 entries) should be faster. Aside from that, pickle is not human readable, which was the big advantage of JSON for me. Is there a way to use JSON to get similar or better speed ups? If not, do you have other ideas on structuring this? (Is the right solution to simply "slice" the file describing each event into a separate file and pass that on to the script that runs a data point in a cluster job? It seems like that could lead to a proliferation of files). thanks.

    Read the article

  • Any suggestions for good automated web load testing tool?

    - by fmunkert
    What are some good automated tools for load testing (stress testing) web applications, that do not use record and replay of HTTP network packets? I am aware that there are numerous load testing tools on the market that record and replay HTTP network packets. But these are unsuitable for my purpose, because of this: The HTTP packet format changes very often in our application (e.g. when we optimize an AJAX call). We do not want to adapt all test scripts just because there is a slight change in HTTP packet format. Our test team shall not need to know any internals about our application to write their test scripts. A tool that replays HTTP packets, however, requires the team to know the format of HTTP requests and responses, such that they can adapt details of the replayed HTTP packets (e.g. user name). The automated load testing tool I am looking for should be able to let the test team write "black box" test scripts such as: Invoke web page at URL http://... . First, enter XXX into text field XXX. Then, press button XXX. Wait until response has been received from web server. Verify that text field XXX now contains the text XXX. The tool should be able to simulate up to several 1000 users, and it should be compatible with web applications using ASP.NET and AJAX.

    Read the article

  • Should I move big data blobs in JSON or in separate binary connection?

    - by Amagrammer
    QUESTION: Is it better to send large data blobs in JSON for simplicity, or send them as binary data over a separate connection? If the former, can you offer tips on how to optimize the JSON to minimize size? If the latter, is it worth it to logically connect the JSON data to the binary data using an identifier that appears in both, e.g., as "data" : "< unique identifier " in the JSON and with the first bytes of the data blob being < unique identifier ? CONTEXT: My iPhone application needs to receive JSON data over the 3G network. This means that I need to think seriously about efficiency of data transfer, as well as the load on the CPU. Most of the data transfers will be relatively small packets of text data for which JSON is a natural format and for which there is no point in worrying much about efficiency. However, some of the most critical transfers will be big blobs of binary data -- definitely at least 100 kilobytes of data, and possibly closer to 1 megabyte as customers accumulate a longer history with the product. (Note: I will be caching what I can on the iPhone itself, but the data still has to be transferred at least once.) It is NOT streaming data. I will probably use a third-party JSON SDK -- the one I am using during development is here. Thanks

    Read the article

  • compact Number formatting behavior in Java (automatically switch between decimal and scientific notation)

    - by kostmo
    I am looking for a way to format a floating point number dynamically in either standard decimal format or scientific notation, depending on the value of the number. For moderate magnitudes, the number should be formatted as a decimal with trailing zeros suppressed. If the floating point number is equal to an integral value, the decimal point should also be suppressed. For extreme magnitudes (very small or very large), the number should be expressed in scientific notation. Alternately stated, if the number of characters in the expression as standard decimal notation exceeds a certain threshold, switch to scientific notation. I should have control over the maximum number of digits of precision, but I don't want trailing zeros appended to express the minimum precision; all trailing zeros should be suppressed. Basically, it should optimize for compactness and readability. 2.80000 - 2.8 765.000000 - 765 0.0073943162953 - 0.00739432 (limit digits of precision—to 6 in this case) 0.0000073943162953 - 7.39432E-6 (switch to scientific notation if the magnitude is small enough—less than 1E-5 in this case) 7394316295300000 - 7.39432E+6 (switch to scientific notation if the magnitude is large enough—for example, when greater than 1E+10) 0.0000073900000000 - 7.39E-6 (strip trailing zeros from significand in scientific notation) 0.000007299998344 - 7.3E-6 (rounding from the 6-digit precision limit causes this number to have trailing zeros which are stripped) Here's what I've found so far: The .toString() method of the Number class does most of what I want, except it doesn't upconvert to integer representation when possible, and it will not express large integral magnitudes in scientific notation. Also, I'm not sure how to adjust the precision. The "%G" format string to the String.format(...) function allows me to express numbers in scientific notation with adjustable precision, but does not strip trailing zeros. I'm wondering if there's already some library function out there that meets these criteria. I guess the only stumbling block for writing this myself is having to strip the trailing zeros from the significand in scientific notation produced by %G.

    Read the article

  • Datastore performance, my code or the datastore latency

    - by fredrik
    I had for the last month a bit of a problem with a quite basic datastore query. It involves 2 db.Models with one referring to the other with a db.ReferenceProperty. The problem is that according to the admin logs the request takes about 2-4 seconds to complete. I strip it down to a bare form and a list to display the results. The put works fine, but the get accumulates (in my opinion) way to much cpu time. #The get look like this: outputData['items'] = {} labelsData = Label.all() for label in labelsData: labelItem = label.item.name if labelItem not in outputData['items']: outputData['items'][labelItem] = { 'item' : labelItem, 'labels' : [] } outputData['items'][labelItem]['labels'].append(label.text) path = os.path.join(os.path.dirname(__file__), 'index.html') self.response.out.write(template.render(path, outputData)) #And the models: class Item(db.Model): name = db.StringProperty() class Label(db.Model): text = db.StringProperty() lang = db.StringProperty() item = db.ReferenceProperty(Item) I've tried to make it a number of different way ie. instead of ReferenceProperty storing all Label keys in the Item Model as a db.ListProperty. My test data is just 10 rows in Item and 40 in Label. So my questions: Is it a fools errand to try to optimize this since the high cpu usage is due to the problems with the datastore or have I just screwed up somewhere in the code? ..fredrik

    Read the article

  • Shouldn't prepared statements be much faster?

    - by silversky
    $s = explode (" ", microtime()); $s = $s[0]+$s[1]; $con = mysqli_connect ('localhost', 'test', 'pass', 'db') or die('Err'); for ($i=0; $i<1000; $i++) { $stmt = $con -> prepare( " SELECT MAX(id) AS max_id , MIN(id) AS min_id FROM tb "); $stmt -> execute(); $stmt->bind_result($M,$m); $stmt->free_result(); $rand = mt_rand( $m , $M ).'<br/>'; $res = $con -> prepare( " SELECT * FROM tb WHERE id >= ? LIMIT 0,1 "); $res -> bind_param("s", $rand); $res -> execute(); $res->free_result(); } $e = explode (" ", microtime()); $e = $e[0]+$e[1]; echo number_format($e-$s, 4, '.', ''); // and: $link = mysql_connect ("localhost", "test", "pass") or die (); mysql_select_db ("db") or die ("Unable to select database".mysql_error()); for ($i=0; $i<1000; $i++) { $range_result = mysql_query( " SELECT MAX(`id`) AS max_id , MIN(`id`) AS min_id FROM tb "); $range_row = mysql_fetch_object( $range_result ); $random = mt_rand( $range_row->min_id , $range_row->max_id ); $result = mysql_query( " SELECT * FROM tb WHERE id >= $random LIMIT 0,1 "); } defenitly prepared statements are much more safer but also every where it says that they are much faster BUT in my test on the above code I have: - 2.45 sec for prepared statements - 5.05 sec for the secon example What do you think I'm doing wrong? Should I use the second solution or I should try to optimize the prep stmt?

    Read the article

  • SQL: Speed Improvement - Cluttered union query

    - by vol7ron
    SELECT * FROM ( SELECT a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name FROM current_tbl a INNER JOIN import_tbl b ON ( a.user_id = b.user_id ) UNION SELECT a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name FROM current_tbl a INNER JOIN import_tbl b ON ( lower(a.f_name)=lower(b.f_name) AND lower(a.l_name)=lower(b.l_name) ) ) foo -- UNION -- SELECT a.user_id , a.f_name , a.l_name , '' , '' , '' FROM current_tbl a WHERE a.user_id NOT IN ( select user_id from( SELECT a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name FROM current_tbl a INNER JOIN import_tbl b ON ( a.user_id = b.user_id ) UNION SELECT a.user_id, a.f_name, a.l_name, b.user_id, b.f_name, b.l_name FROM current_tbl a INNER JOIN import_tbl b ON ( lower(a.f_name)=lower(b.f_name) AND lower(a.l_name)=lower(b.l_name) ) ) bar ) ORDER BY user_id Example of table population: current_tbl: ------------------------------- user_id | f_name | l_name ---------+----------+---------- A1 | Adam | Acorn A2 | Beth | Berry A3 | Calv | Chard | | import_tbl: ------------------------------- user_id | f_name | l_name ---------+----------+---------- A1 | Adam | Acorn A2 | Beth | Butcher <- last_name different | | Expected Output: ----------------------------------------------------------------------- user_id1 | f_name1 | l_name1 | user_id2 | f_name2 | l_name2 ----------+-----------+-----------+------------+-----------+----------- A1 | Adam | Acorn | A1 | Adam | Acorn A2 | Beth | Berry | A2 | Beth | Butcher A3 | Calv | Chard | | | Doing this method gets rid of conditions where the row would be: A2 | Beth | Berry | A2 | Beth | Butcher But it keeps the A3 row I hope this makes sense and I haven't overly simplified it. This is a continuation question from my other question. The succession of these improvements has dropped the query down from ~32000ms to where it's at now ~1200ms - quite an improvement. I supect I can optimize by using UNION ALL in the subquery and of course the usual index optimizations, but I'm looking for the best SQL optimization. FYI this particular case is for PostgreSQL.

    Read the article

  • Design suggestion for expression tree evaluation with time-series data

    - by Lirik
    I have a (C#) genetic program that uses financial time-series data and it's currently working but I want to re-design the architecture to be more robust. My main goals are: sequentially present the time-series data to the expression trees. allow expression trees to access previous data rows when needed. to optimize performance of the data access while evaluating the expression trees. keep a common interface so various types of data can be used. Here are the possible approaches I've thought about: I can evaluate the expression tree by passing in a data row into the root node and let each child node use the same data row. I can evaluate the expression tree by passing in the data row index and letting each node get the data row from a shared DataSet (currently I'm passing the row index and going to multiple synchronized arrays to get the data). Hybrid: an immutable data set is accessible by all of the expression trees and each expression tree is evaluated by passing in a data row. The benefit of the first approach is that the data row is being passed into the expression tree and there is no further query done on the data set (which should increase performance in a multithreaded environment). The drawback is that the expression tree does not have access to the rest of the data (in case some of the functions need to do calculations using previous data rows). The benefit of the second approach is that the expression trees can access any data up to the latest data row, but unless I specify what that row is, I'll have to iterate through the rows and figure out which one is the last one. The benefit of the hybrid is that it should generally perform better and still provide access to the earlier data. It supports two basic "views" of data: the latest row and the previous rows. Do you guys know of any design patterns or do you have any tips that can help me build this type of system? Should I use a DataSet to hold and present the data, or are there more efficient ways to present rows of data while maintaining a simple interface? FYI: All of my code is written in C#.

    Read the article

  • Most efficient way to check for DBNull and then assign to a variable?

    - by ilitirit
    This question comes up occasionally but I haven't seen a satisfactory answer. A typical pattern is (row is a DataRow): if (row["value"] != DBNull.Value) { someObject.Member = row["value"]; } My first question is which is more efficient (I've flipped the condition): row["value"] == DBNull.Value; // Or row["value"] is DBNull; // Or row["value"].GetType() == typeof(DBNull) // Or... any suggestions? This indicates that .GetType() should be faster, but maybe the compiler knows a few tricks I don't? Second question, is it worth caching the value of row["value"] or does the compiler optimize the indexer away anyway? eg. object valueHolder; if (DBNull.Value == (valueHolder = row["value"])) {} Disclaimers: row["value"] exists. I don't know the column index of the column (hence the column name lookup) I'm asking specifically about checking for DBNull and then assignment (not about premature optimization etc). Edit: I benchmarked a few scenarios (time in seconds, 10000000 trials): row["value"] == DBNull.Value: 00:00:01.5478995 row["value"] is DBNull: 00:00:01.6306578 row["value"].GetType() == typeof(DBNull): 00:00:02.0138757 Object.ReferenceEquals has the same performance as "==" The most interesting result? If you mismatch the name of the column by case (eg. "Value" instead of "value", it takes roughly ten times longer (for a string): row["Value"] == DBNull.Value: 00:00:12.2792374 The moral of the story seems to be that if you can't look up a column by it's index, then ensure that the column name you feed to the indexer matches the DataColumn's name exactly. Caching the value also appears to be nearly twice as fast: No Caching: 00:00:03.0996622 With Caching: 00:00:01.5659920 So the most efficient method seems to be: object temp; string variable; if (DBNull.Value != (temp = row["value"]) { variable = temp.ToString(); } This was a good learning experience.

    Read the article

  • Optimizing MySql query to avoid using "Using filesort"

    - by usef_ksa
    I need your help to optimize the query to avoid using "Using filesort".The job of the query is to select all the articles that belongs to specific tag. The query is: "select title from tag,article where tag='Riyad' AND tag.article_id=article.id order by tag.article_id". the tables structure are the following: Tag table CREATE TABLE `tag` ( `tag` VARCHAR( 30 ) NOT NULL , `article_id` INT NOT NULL , INDEX ( `tag` ) ) ENGINE = MYISAM ; Article table CREATE TABLE `article` ( `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY , `title` VARCHAR( 60 ) NOT NULL ) ENGINE = MYISAM Sample data INSERT INTO `article` VALUES (1, 'About Riyad'); INSERT INTO `article` VALUES (2, 'About Newyork'); INSERT INTO `article` VALUES (3, 'About Paris'); INSERT INTO `article` VALUES (4, 'About London'); INSERT INTO `tag` VALUES ('Riyad', 1); INSERT INTO `tag` VALUES ('Saudia', 1); INSERT INTO `tag` VALUES ('Newyork', 2); INSERT INTO `tag` VALUES ('USA', 2); INSERT INTO `tag` VALUES ('Paris', 3); INSERT INTO `tag` VALUES ('France', 3);

    Read the article

  • Remove undesired indexed keywords from Sql Server FTS Index

    - by Scott
    Could anyone tell me if SQL Server 2008 has a way to prevent keywords from being indexed that aren't really relevant to the types of searches that will be performed? For example, we have the IFilters for PDF and Word hooked in and our documents are being indexed properly as far as I can tell. These documents, however, have lots of numeric values in them that people won't really be searching for or bring back meaningful results. These are still being indexed and creating lots of entries in the full text catalog. Basically we are trying to optimize our search engine in any way we can and assumed all these unnecessary entries couldn't be helping performance. I want my catalog to consist of alphabetic keywords only. The current iFilters work better than I would be able to write in the time I have but it just has more than I need. This is an example of some of the terms from sys.dm_fts_index_keywords_by_document that I want out: $1,000, $100, $250, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 129, 13.1, 14, 14.12, 145, 15, 16.2, 16.4, 18, 18.1, 18.2, 18.3, 18.4, 18.5 These are some examples from the same management view that I think are desirable for keeping and searching on: above, accordingly, accounts, add, addition, additional, additive Any help would be greatly appreciated!

    Read the article

  • Lucene and Special Characters

    - by Brandon
    I am using Lucene.Net 2.0 to index some fields from a database table. One of the fields is a 'Name' field which allows special characters. When I perform a search, it does not find my document that contains a term with special characters. I index my field as such: Directory DALDirectory = FSDirectory.GetDirectory(@"C:\Indexes\Name", false); Analyzer analyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(DALDirectory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED); Document doc = new Document(); doc.Add(new Field("Name", "Test (Test)", Field.Store.YES, Field.Index.TOKENIZED)); indexWriter.AddDocument(doc); indexWriter.Optimize(); indexWriter.Close(); And I search doing the following: value = value.Trim().ToLower(); value = QueryParser.Escape(value); Query searchQuery = new TermQuery(new Term(field, value)); Searcher searcher = new IndexSearcher(DALDirectory); TopDocCollector collector = new TopDocCollector(searcher.MaxDoc()); searcher.Search(searchQuery, collector); ScoreDoc[] hits = collector.TopDocs().scoreDocs; If I perform a search for field as 'Name' and value as 'Test', it finds the document. If I perform the same search as 'Name' and value as 'Test (Test)', then it does not find the document. Even more strange, if I remove the QueryParser.Escape line do a search for a GUID (which, of course, contains hyphens) it finds documents where the GUID value matches, but performing the same search with the value as 'Test (Test)' still yields no results. I am unsure what I am doing wrong. I am using the QueryParser.Escape method to escape the special characters and am storing the field and searching by the Lucene.Net's examples. Any thoughts?

    Read the article

  • UIView: how to do non-destructive drawing?

    - by Caffeine Coma
    My original question: I'm creating a simple drawing application and need to be able to draw over existing, previously drawn content in my drawRect. What is the proper way to draw on top of existing content without entirely replacing it? Based on answers received here and elsewhere, here is the deal. You should be prepared to redraw the entire rectangle whenever drawRect is called. You cannot prevent the contents from being erased by doing the following: [self setClearsContextBeforeDrawing: NO]; This is merely a hint to the graphics engine that there is no point in having it pre-clear the view for you, since you will likely need to re-draw the whole area anyway. It may prevent your view from being automatically erased, but you cannot depend on it. To draw on top of your view without erasing, do your drawing to an off-screen bitmap context (which is never cleared by the system.) Then in your drawRect, copy from this off-screen buffer to the view. Example: - (id) initWithCoder: (NSCoder*) coder { if (self = [super initWithCoder: coder]) { self.backgroundColor = [UIColor clearColor]; CGSize size = self.frame.size; drawingContext = [self createDrawingBufferContext: size]; } return self; } - (CGContextRef) createOffscreenContext: (CGSize) size { CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB(); CGContextRef context = CGBitmapContextCreate(NULL, size.width, size.height, 8, size.width*4, colorSpace, kCGImageAlphaPremultipliedLast); CGColorSpaceRelease(colorSpace); CGContextTranslateCTM(context, 0, size.height); CGContextScaleCTM(context, 1.0, -1.0); return context; } - (void)drawRect:(CGRect) rect { UIGraphicsPushContext(drawingContext); CGImageRef cgImage = CGBitmapContextCreateImage(drawingContext); UIImage *uiImage = [[UIImage alloc] initWithCGImage:cgImage]; UIGraphicsPopContext(); CGImageRelease(cgImage); [uiImage drawInRect: rect]; [uiImage release]; } TODO: can anyone optimize the drawRect so that only the (usually tiny) modified rectangle region is used for the copy?

    Read the article

  • Best Practice - Removing item from generic collection in C#

    - by Matt Davis
    I'm using C# in Visual Studio 2008 with .NET 3.5. I have a generic dictionary that maps types of events to a generic list of subscribers. A subscriber can be subscribed to more than one event. private static Dictionary<EventType, List<ISubscriber>> _subscriptions; To remove a subscriber from the subscription list, I can use either of these two options. Option 1: ISubscriber subscriber; // defined elsewhere foreach (EventType event in _subscriptions.Keys) { if (_subscriptions[event].Contains(subscriber)) { _subscriptions[event].Remove(subscriber); } } Option 2: ISubscriber subscriber; // defined elsewhere foreach (EventType event in _subscriptions.Keys) { _subscriptions[event].Remove(subscriber); } I have two questions. First, notice that Option 1 checks for existence before removing the item, while Option 2 uses a brute force removal since Remove() does not throw an exception. Of these two, which is the preferred, "best-practice" way to do this? Second, is there another, "cleaner," more elegant way to do this, perhaps with a lambda expression or using a LINQ extension? I'm still getting acclimated to these two features. Thanks. EDIT Just to clarify, I realize that the choice between Options 1 and 2 is a choice of speed (Option 2) versus maintainability (Option 1). In this particular case, I'm not necessarily trying to optimize the code, although that is certainly a worthy consideration. What I'm trying to understand is if there is a generally well-established practice for doing this. If not, which option would you use in your own code?

    Read the article

  • In languages which create a new scope each time in a loop block, a new local copy of the local loop

    - by Jian Lin
    It seems that in language like C, Java, and Ruby (as opposed to Javascript), a new scope is created for each iteration of a loop block, and the local variable defined for the loop is actually made into a local variable every single time and recorded in this new scope? For example, in Ruby: p RUBY_VERSION $foo = [] (1..5).each do |i| $foo[i] = lambda { p i } end (1..5).each do |j| $foo[j].call() end the print out is: [MacBook01:~] $ ruby scope.rb "1.8.6" 1 2 3 4 5 [MacBook01:~] $ So, it looks like when a new scope is created, a new local copy of i is also created and recorded in this new scope, so that when the function is executed at a later time, the "i" is found in those scope chains as 1, 2, 3, 4, 5 respectively. Is this true? (It sounds like a heavy operation). Contrast that with p RUBY_VERSION $foo = [] i = 0 (1..5).each do |i| $foo[i] = lambda { p i } end (1..5).each do |j| $foo[j].call() end This time, the i is defined before entering the loop, so Ruby 1.8.6 will not put this i in the new scope created for the loop block, and therefore when the i is looked up in the scope chain, it always refer to the i that was in the outside scope, and give 5 every time: [MacBook01:~] $ ruby scope2.rb "1.8.6" 5 5 5 5 5 [MacBook01:~] $ I heard that in Ruby 1.9, i will be treated as a local defined for the loop even when there is an i defined earlier? The operation of creating a new scope, creating a new local copy of i each time through the loop seems heavy, as it seems it wouldn't have matter if we are not invoking the functions at a later time. So when the functions don't need to be invoked at a later time, could the interpreter and the compiler to C / Java try to optimize it so that there is not local copy of i each time?

    Read the article

  • HTML5 optimalize to IE6.0

    - by marharépa
    Hello! Do you know any method to optimize this HTML Code to IE6 or 7 (or 8) without adding any HTML elements, or the IE is skipping all the HTML5 elements? <!DOCTYPE HTML> <head> <meta charset="UTF-8"> <title>title</title> <link type="text/css" rel="stylesheet" href="reset.css"> <link type="text/css" rel="stylesheet" href="style.css"> </head> <body> <header>code of header</header> <nav> code of nav </nav> <section> code of gallery </section> <article> code of article </article> <footer>code of footer</footer> </body> </html> Thank you.

    Read the article

  • Oracle: Insertion on an indexed table, avoiding duplicates. Looking for tips and advice.

    - by Tom
    Hi everyone, Im looking for the best solution (performance wise) to achieve this. I have to insert records into a table, avoiding duplicates. For example, take table A Insert into A ( Select DISTINCT [FIELDS] from B,C,D.. WHERE (JOIN CONDITIONS ON B,C,D..) AND NOT EXISTS ( SELECT * FROM A ATMP WHERE ATMP.SOMEKEY = A.SOMEKEY ) ); I have an index over A.SOMEKEY, just to optimize the NOT EXISTS query, but i realize that inserting on an indexed table will be a performance hit. So I was thinking of duplicating Table A in a Global Temporary Table, where I would keep the index. Then, removing the index from Table A and executing the query, but modified Insert into A ( Select DISTINCT [FIELDS] from B,C,D.. WHERE (JOIN CONDITIONS ON B,C,D..) AND NOT EXISTS ( SELECT * FROM GLOBAL_TEMPORARY_TABLE_A ATMP WHERE ATMP.SOMEKEY = A.SOMEKEY ) ); This would solve the "inserting on an index table", but I would have to update the Global Temporary A with each insertion I make. I'm kind of lost here, Is there a better way to achieve this? Thanks in advance,

    Read the article

  • How to perform spatial partitioning in n-dimensions?

    - by kevin42
    I'm trying to design an implementation of Vector Quantization as a c++ template class that can handle different types and dimensions of vectors (e.g. 16 dimension vectors of bytes, or 4d vectors of doubles, etc). I've been reading up on the algorithms, and I understand most of it: here and here I want to implement the Linde-Buzo-Gray (LBG) Algorithm, but I'm having difficulty figuring out the general algorithm for partitioning the clusters. I think I need to define a plane (hyperplane?) that splits the vectors in a cluster so there is an equal number on each side of the plane. [edit to add more info] This is an iterative process, but I think I start by finding the centroid of all the vectors, then use that centroid to define the splitting plane, get the centroid of each of the sides of the plane, continuing until I have the number of clusters needed for the VQ algorithm (iterating to optimize for less distortion along the way). The animation in the first link above shows it nicely. My questions are: What is an algorithm to find the plane once I have the centroid? How can I test a vector to see if it is on either side of that plane?

    Read the article

< Previous Page | 49 50 51 52 53 54 55 56 57 58 59 60  | Next Page >