Search Results

Search found 4242 results on 170 pages for 'mark hornick'.

Page 9/170 | < Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16  | Next Page >

  • Slow wired internet connection on Realtek RTL8168-8111 (Rev 6)

    - by Mark
    I keep on seeing issues people are having with their wireless. I am having an issue with being wired into my router. I installed Ubuntu 11.10 the other day, on my custom PC. The motherboard I have installed on this PC is an ASUS P8H61-M. The issue I am having is with my speed. I have a dual boot, windows 7 and the new Ubuntu. On my Windows install I am getting test speeds from Speakeasy at 17Mbps and actual downloads around 2-3MB/s. With Ubuntu, I am getting test speeds from Speakeasy at 1.14Mbps and actual downloads around 60KB/s. I have disabled IPv6 and am no using GoogleDNS for my DNS, but it hasn't resolved the issue. I have scanned my router (WRT54GS Linksys) to disable IPv6 connections, and I am not seeing any option for that. I cannot figure out why I am getting such sluggish internet connection. Any help to resolve would be great! I performed an iconfig -a with these results: mark@Mark-ASUS:~$ ifconfig -a eth0 Link encap:Ethernet HWaddr f4:6d:04:d1:2c:4e inet addr:192.168.1.103 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::f66d:4ff:fed1:2c4e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:21888 errors:0 dropped:21888 overruns:0 frame:21888 TX packets:21068 errors:0 dropped:90 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:26348337 (26.3 MB) TX bytes:2217140 (2.2 MB) Interrupt:46 Base address:0xc000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:7 errors:0 dropped:0 overruns:0 frame:0 TX packets:7 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:952 (952.0 B) TX bytes:952 (952.0 B) My specs are: mark@Mark-ASUS:~$ sudo lspci -nn 04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller [10ec:8168] (rev 06) 06:00.0 PCI bridge [0604]: ASMedia Technology Inc. Device [1b21:1080] (rev 01) udev information: KERNEL[11.351405] add /devices/pci0000:00/0000:00:1c.2/0000:04:00.0/net/eth0 (net) UDEV_LOG=3 ACTION=add DEVPATH=/devices/pci0000:00/0000:00:1c.2/0000:04:00.0/net/eth0 SUBSYSTEM=net INTERFACE=eth0 IFINDEX=2 SEQNUM=1542 UDEV [11.363905] add /devices/pci0000:00/0000:00:1c.2/0000:04:00.0/net/eth0 (net) UDEV_LOG=3 ACTION=add DEVPATH=/devices/pci0000:00/0000:00:1c.2/0000:04:00.0/net/eth0 SUBSYSTEM=net INTERFACE=eth0 IFINDEX=2 SEQNUM=1542 ID_VENDOR_FROM_DATABASE=Realtek Semiconductor Co., Ltd. ID_MODEL_FROM_DATABASE=RTL8111/8168B PCI Express Gigabit Ethernet controller ID_BUS=pci ID_VENDOR_ID=0x10ec ID_MODEL_ID=0x8168 ID_MM_CANDIDATE=1 dmesg information: [ 2.855982] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded [ 2.856366] r8169 0000:04:00.0: eth0: RTL8168b/8111b at 0xffffc9000064c000, f4:6d:04:d1:2c:4e, XID 0c900800 IRQ 46 [ 12.540956] r8169 0000:04:00.0: eth0: link down [ 12.540961] r8169 0000:04:00.0: eth0: link down [ 12.541173] ADDRCONF(NETDEV_UP): eth0: link is not ready I took out a lot of information that did not pertain to eth0, because the previous edits wouldn't save. If I need any more information, let me know. I would love to get this resolved. The other issue I am noticing is sometimes my connection disconnects all together, for about a minute, then it reconnects.

    Read the article

  • How to change shortcut keys in Thunderbird 3 for Linux?

    - by Josh
    I use Thunderbird on Ubuntu Linux and have just upgraded to Ubuntu 10 / Thunderbird 3. One of my gripes however is that Thunderbird uses a number of shortcut keys that have no secondary key requirements, for example, "Mark as Read" is M. Not ControlM. Just M. Worse, "Mark as Junk" is J. Which means I sometimes inadvertently mark messages as Junk. How can I customize Thunderbird's shortcuts so, for example, "Mark as Junk" is ControlJ?

    Read the article

  • Iphone SDK - adding UITableView to UIView

    - by Shashi
    Hi, I am trying to learn how to use different views, for this sample test app, i have a login page, upon successful logon, the user is redirected to a table view and then upon selection of an item in the table view, the user is directed to a third page showing details of the item. the first page works just fine, but the problem occurs when i go to the second page, the table shown doesn't have title and i cannot add title or toolbar or anything other than the content of the tables themselves. and when i click on the item, needless to say nothing happens. no errors as well. i am fairly new to programming and have always worked on Java but never on C(although i have some basic knowledge of C) and Objective C is new to me. Here is the code. import @interface NavigationTestAppDelegate : NSObject { UIWindow *window; UIViewController *viewController; IBOutlet UITextField *username; IBOutlet UITextField *password; IBOutlet UILabel *loginError; //UINavigationController *navigationController; } @property (nonatomic, retain) IBOutlet UIViewController *viewController; @property (nonatomic, retain) IBOutlet UIWindow *window; @property (nonatomic, retain) IBOutlet UITextField *username; @property (nonatomic, retain) IBOutlet UITextField *password; @property (nonatomic, retain) IBOutlet UILabel *loginError; -(IBAction) login; -(IBAction) hideKeyboard: (id) sender; @end import "NavigationTestAppDelegate.h" import "RootViewController.h" @implementation NavigationTestAppDelegate @synthesize window; @synthesize viewController; @synthesize username; @synthesize password; @synthesize loginError; pragma mark - pragma mark Application lifecycle -(IBAction) hideKeyboard: (id) sender{ [sender resignFirstResponder]; } (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions { // Override point for customization after app launch //RootViewController *rootViewController = [[RootViewController alloc] init]; //[window addSubview:[navigationController view]]; [window addSubview:[viewController view]]; [window makeKeyAndVisible]; return YES; } -(IBAction) login { RootViewController *rootViewController = [[RootViewController alloc] init]; //NSString *user = [[NSString alloc] username. if([username.text isEqualToString:@"test"]&&[password.text isEqualToString:@"test"]){ [window addSubview:[rootViewController view]]; //[window addSubview:[navigationController view]]; [window makeKeyAndVisible]; //rootViewController.awakeFromNib; } else { loginError.text = @"LOGIN ERROR"; [window addSubview:[viewController view]]; [window makeKeyAndVisible]; } } (void)applicationWillTerminate:(UIApplication *)application { // Save data if appropriate } pragma mark - pragma mark Memory management (void)dealloc { //[navigationController release]; [viewController release]; [window release]; [super dealloc]; } @end import @interface RootViewController : UITableViewController { IBOutlet NSMutableArray *views; } @property (nonatomic, retain) IBOutlet NSMutableArray * views; @end // // RootViewController.m // NavigationTest // // Created by guest on 4/23/10. // Copyright MyCompanyName 2010. All rights reserved. // import "RootViewController.h" import "OpportunityOne.h" @implementation RootViewController @synthesize views; //@synthesize navigationViewController; pragma mark - pragma mark View lifecycle (void)viewDidLoad { views = [ [NSMutableArray alloc] init]; OpportunityOne *opportunityOneController; for (int i=1; i<=20; i++) { opportunityOneController = [[OpportunityOne alloc] init]; opportunityOneController.title = [[NSString alloc] initWithFormat:@"Opportunity %i",i]; [views addObject:[NSDictionary dictionaryWithObjectsAndKeys: [[NSString alloc] initWithFormat:@"Opportunity %i",i], @ "title", opportunityOneController, @"controller", nil]]; self.title=@"GPS"; } /*UIBarButtonItem *temporaryBarButtonItem = [[UIBarButtonItem alloc] init]; temporaryBarButtonItem.title = @"Back"; self.navigationItem.backBarButtonItem = temporaryBarButtonItem; [temporaryBarButtonItem release]; */ //self.title =@"Global Platform for Sales"; [super viewDidLoad]; //[temporaryBarButtonItem release]; //[opportunityOneController release]; // Uncomment the following line to display an Edit button in the navigation bar for this view controller. // self.navigationItem.rightBarButtonItem = self.editButtonItem; } pragma mark - pragma mark Table view data source // Customize the number of sections in the table view. - (NSInteger)numberOfSectionsInTableView:(UITableView *)tableView { return 1; } // Customize the number of rows in the table view. - (NSInteger)tableView:(UITableView *)tableView numberOfRowsInSection:(NSInteger)section { return [views count]; } // Customize the appearance of table view cells. - (UITableViewCell *)tableView:(UITableView *)tableView cellForRowAtIndexPath:(NSIndexPath *)indexPath { static NSString *CellIdentifier = @"Cell"; UITableViewCell *cell = [tableView dequeueReusableCellWithIdentifier:CellIdentifier]; if (cell == nil) { cell = [[[UITableViewCell alloc] initWithStyle:UITableViewCellStyleDefault reuseIdentifier:CellIdentifier] autorelease]; } // Configure the cell. cell.textLabel.text = [[views objectAtIndex:indexPath.row] objectForKey:@"title"]; return cell; } pragma mark - pragma mark Table view delegate (void)tableView:(UITableView *)tableView didSelectRowAtIndexPath:(NSIndexPath *)indexPath { //UIViewController *targetViewController = [[views objectAtIndex:indexPath.row] objectForKey:@"controller"]; UIViewController *targetViewController = [[views objectAtIndex:indexPath.row] objectForKey:@"controller"]; [[self navigationController] pushViewController:targetViewController animated:YES]; } pragma mark - pragma mark Memory management (void)didReceiveMemoryWarning { // Releases the view if it doesn't have a superview. [super didReceiveMemoryWarning]; // Relinquish ownership any cached data, images, etc that aren't in use. } (void)viewDidUnload { // Relinquish ownership of anything that can be recreated in viewDidLoad or on demand. // For example: self.myOutlet = nil; } (void)dealloc { [views release]; [super dealloc]; } @end Wow, i was finding it real hard to post the code. i apologize for the bad formatting, but i just couldn't get past the formatting rules for this text editor. Thanks, Shashi

    Read the article

  • How do you mark class with TypeConverter that is not in referenced solution?

    - by Greg McGuffey
    I have a class that I've written a TypeConverter for. I want to keep the TypeConverter separate from the main solution, as it is only needed at design time and have an extensibility project now that contains the TypeConverter. Thus, when I deploy, I don't need to deploy the extensibility assembly at all. However, I can't figure out the appropriate string to use in the attribute to actually connect the class to the converter. Note I can't use this: [TypeConverter(typeof(MyConverter)] because MyConverter is in a project that isn't referenced. I need to use the string overload, but can't figure out what to use: [TypeConverter("what the heck goes in here!")] I think I need maybe a path to the assembly, maybe a GUID, the class name...just not sure...

    Read the article

  • ASP.NET MVC: How to allow some HTML mark-up in Html Encoded content?

    - by Dr. Zim
    Is there some magic existing code in MVC 2 to Html.Encode() strings and allow certain html markup, like paragraph marks and breaks? (coming from a Linq to SQL database field) A horrible code example to achieve the effect: Html.Encode(Model.fieldName).Replace("&lt;br /&gt;", "<br />") What would be really nice is to overload something and pass to it an array (or object) full of allowed html tags.

    Read the article

  • What does the question mark at then end of a css include url do?

    - by Bob Dylan
    I've noticed that on some websites (including SO) the link to the CSS will look like: <link rel="stylesheet" href="http://sstatic.net/so/all.css?v=6638"> I would say its safe to assume that ?v=6638 tells the browser to load version 6638 of the css file. But can I do this on my websites and can I include different versions of my CSS file just by changing the numbers?

    Read the article

  • What does the question mark at then end of a css file mean/do?

    - by Bob Dylan
    I've noticed that on some websites (including SO) the link to the CSS will look like: <link rel="stylesheet" href="http://sstatic.net/so/all.css?v=6638"> I would say its safe to assume that ?v=6638 tells the browser to load version 6638 of the css file. But can I do this on my websites and can I include different versions of my CSS file just by changing the numbers?

    Read the article

  • Is there a way to get Apache to serve files with the question mark in their name?

    - by ldrg
    I scraped a bunch of pages using wget -m -k -E. The resulting files have names in the form foo.php?bar.html. Apache guesses everything after the ? is a query string, is there a way to tell it to ignore the ? as the query string delimiter (and see foo.php?bar.html as the requested file and not foo.php)? To save you a trip to wget manpage: -m : mirror recursively -E : foo.php?bar becomes foo.php?bar.html -k : convert links in pages (foo.php?bar now links to foo.php?bar.html inside of all the pages so they display properly)

    Read the article

  • Regex to identify rows that do not contain exact number of occurences of quotemark character using Notepad++

    - by SamAspin
    I would like to be able to jump to rows that dont contain 6 quotemarks in a quoted-CSV file as it feels like a good way to identify broken rows. I think using a regular expression with Notepad++'s find features would be a sensible approach but I'm not sure how to pick the rows up. 6 quotemarks (") would suggest a complete row so I want to skip to any row that does not contain 6. Here is some sample data to play with, in this example its the 4th line I'd like to jump to "sam","mark","dave" "sam","mark","dave" "sam","mark","dave" "sam","mark"," dave" "sam","mark","dave" "sam","mark","dave"

    Read the article

  • When I mark a property as transient, does the type matter?

    - by mystify
    For example, I make a fullName property and set it to transient. Does it matter what data type that property is, in this case? For example, does it matter if it's set to int or string? As far as I get it, a transient property is almost "ignored" by Core Data. I make my accessors for that and when someone accesses fullName, I simply construct a string and return that. Did I miss something?

    Read the article

  • How can I accept a hash mark in a URL via $_GET?

    - by bccarlso
    From what I have been able to understand, hash marks (#) aren't sent to the server, so it doesn't seem likely that I will be able to use raw PHP to parse data like in the URL below: index.php?name=Ben&address=101 S 10th St Suite #301 I'm looking to pre-populate form fields with this $_GET data. How would I do this with Javascript (or jQuery), and is there a fallback that wouldn't break my form for people not using Javascript? Currently if there is a hash (usually in the address field), everything after that is not parsed in or stored in $_GET.

    Read the article

  • How to mark array value types in PHP (Java)Doc?

    - by Christian Sciberras
    It might be a bit difficult to explain, so I'll give some example code. Note that I'm using NetBeans IDE (latest). class Dummy { public function say(){ } } /** * Builds dummy class and returns it. * @return Dummy The dummy class. */ function say_something(){ return new Dummy(); } $s=say_something(); While developing in netbeans I can invoke auto-complete by hitting ctrl+space after typing "$s-". In the the hint window that follows, there is the item "say()". This is because the javadoc says say_something returns a Dummy and NetBeans parsed Dummy class to know that it has a method called "say()". So far so good. My problem is with arrays. Example code follows: /** * Builds array of 2 dummy classes and returns it. * @return Array The dummy class. (*) */ function say_something2(){ return array(new Dummy(),new Dummy()); } $s=say_something2(); If I try the auto-complete thing again but with "$s[0]-" instead, I don't get the methods fro Dummy class. This is because in the JavaDoc I only said that it is an array, but not the values' type. So the question would be, is there any JavaDoc syntax, cheat, whatever which allows me to tell JavaDoc what type of variables to expect in an array?

    Read the article

  • Does correcting a value in an validation method mark the object as dirty?

    - by dontWatchMyProfile
    From the docs: If you change the input value in a validate:error: method, you must ensure that you only change the value if it is invalid or uncoerced. The reason is that, since the object and context are now dirtied, Core Data may validate that key again later. If you keep performing a coercion in a validation method, this can therefore produce an infinite loop. So when I modify a value in a validation method, the context gets dirtied? And the next time I save, the validation happens again - and when I change the value even if the validation is OK, then the context is again dirtied, and revalidated again - and I change the value, and Core Data validates, again, because the context is dirtied. And so on...for ever... is that right? Or did they try to say something different?

    Read the article

  • MS Access Mark Duplicates in order of appearance - using the function RankOfDup: (SELECT Count(*) ...)

    - by veska stoyanova
    I'm trying to create Ranking that shows the sequence of agreements for the two fields Customers and Agreements. The number for agreements must be unique, whereas customers can repeat. The formula RankOfDup: (SELECT Count(*) FROM Data a WHERE a.customer=Data.customer And a.agreement >= Data.agreement) Works beautifully but after this query with columns Agreement, Customer and RankofDup, I need to create crosstab that transposes the RankofDub. It works when I make the table first and then create query but my data is too large so I'm trying to put the select query with the ranking in a crosstab query. However, when I try to do this Access gives error message that microsoft jet ... doesn't recognise Data.customer? Any ideas how I can fix this?

    Read the article

  • How can I mark a file as descended from 2 other files in Mercurial?

    - by Matt Joiner
    I had 2 Python similar scripts, that I've since merged into one (and now takes some parameters to differ the behaviour appropriately). Both of the previous files are in the tip of my Mercurial repository. How can I indicate that the new file, is a combination of the 2 older files that I intend to remove? Also note, that 1 file has been chosen in favor of the other, and some code moved across, so if it's not possible to create a version controlled file with a new name, then assimilating one file's history into the other will suffice.

    Read the article

  • Twitter Bootstrap: How to use containers and rows

    - by StackOverflowNewbie
    Assume the following layout: I'm trying to learn how to use Twitter's Bootstrap. What should the general structure of the framework's markup be? Is it this: <div class="container-fluid"> <div class="row-fluid"> <div class="span12"> // Outside link mark up here </div> </div> </div> <div class="container-fluid"> <div class="row-fluid"> <div class="span1"> // Logo mark up here </div> <div class="span4"> // Main nav mark up here </div> <div class="span3 offset4"> // User nav mark up here </div> </div> </div> <div class="container-fluid"> <div class="row-fluid"> <div class="span3"> // Left sidebar mark up here </div> <div class="span6"> <div class="span6"> // Breadcrumb mark up here </div> <div class="span6"> // Main content mark up here </div> </div> <div class="span3"> // Right sidebar mark up here </div> </div> </div> <div class="container-fluid"> <div class="row-fluid"> <div class="span3 offset9"> // Footer link mark up here </div> </div> </div>

    Read the article

  • How to mark multiple coordinates in KML using Java?

    - by Akshay
    I'm working on a project that involves KML creation using Java. Currently, I'm fooling with the sample Java code from the KML example at Micromata Labs JAK Example. I tried to "extend" the code by adding multiple coordinates and getting two markers, but I could not get it to work. Can you please tell me how I can add multiple coordinates and put markers on them, and also, draw a line between the markers. Thank you for your help! PS: I need to do this via the program. I saw sample code of them using DOM and XML, but not pure Java/JAK as such. Please guide me. I got as far as this: kml .createAndSetDocument().withName("My Markers"). createAndAddPlacemark().withName("London, UK").withOpen(Boolean.TRUE) .createAndSetPoint().addToCoordinates(-0.126236, 51.500152); kml.createAndSetPlacemark() .withName("Somewhere near London, UK").withOpen(Boolean.TRUE) .createAndSetPoint().addToCoordinates(-0.129800,52.700152); But I know I'm going wrong somewhere. Please point me in the right direction.

    Read the article

  • Using a "white list" for extracting terms for Text Mining, Part 2

    - by [email protected]
    In my last post, we set the groundwork for extracting specific tokens from a white list using a CTXRULE index. In this post, we will populate a table with the extracted tokens and produce a case table suitable for clustering with Oracle Data Mining. Our corpus of documents will be stored in a database table that is defined as create table documents(id NUMBER, text VARCHAR2(4000)); However, any suitable Oracle Text-accepted data type can be used for the text. We then create a table to contain the extracted tokens. The id column contains the unique identifier (or case id) of the document. The token column contains the extracted token. Note that a given document many have many tokens, so there will be one row per token for a given document. create table extracted_tokens (id NUMBER, token VARCHAR2(4000)); The next step is to iterate over the documents and extract the matching tokens using the index and insert them into our token table. We use the MATCHES function for matching the query_string from my_thesaurus_rules with the text. DECLARE     cursor c2 is       select id, text       from documents; BEGIN     for r_c2 in c2 loop        insert into extracted_tokens          select r_c2.id id, main_term token          from my_thesaurus_rules          where matches(query_string,                        r_c2.text)>0;     end loop; END; Now that we have the tokens, we can compute the term frequency - inverse document frequency (TF-IDF) for each token of each document. create table extracted_tokens_tfidf as   with num_docs as (select count(distinct id) doc_cnt                     from extracted_tokens),        tf       as (select a.id, a.token,                            a.token_cnt/b.num_tokens token_freq                     from                        (select id, token, count(*) token_cnt                        from extracted_tokens                        group by id, token) a,                       (select id, count(*) num_tokens                        from extracted_tokens                        group by id) b                     where a.id=b.id),        doc_freq as (select token, count(*) overall_token_cnt                     from extracted_tokens                     group by token)   select tf.id, tf.token,          token_freq * ln(doc_cnt/df.overall_token_cnt) tf_idf   from num_docs,        tf,        doc_freq df   where df.token=tf.token; From the WITH clause, the num_docs query simply counts the number of documents in the corpus. The tf query computes the term (token) frequency by computing the number of times each token appears in a document and divides that by the number of tokens found in the document. The doc_req query counts the number of times the token appears overall in the corpus. In the SELECT clause, we compute the tf_idf. Next, we create the nested table required to produce one record per case, where a case corresponds to an individual document. Here, we COLLECT all the tokens for a given document into the nested column extracted_tokens_tfidf_1. CREATE TABLE extracted_tokens_tfidf_nt              NESTED TABLE extracted_tokens_tfidf_1                  STORE AS extracted_tokens_tfidf_tab AS              select id,                     cast(collect(DM_NESTED_NUMERICAL(token,tf_idf)) as DM_NESTED_NUMERICALS) extracted_tokens_tfidf_1              from extracted_tokens_tfidf              group by id;   To build the clustering model, we create a settings table and then insert the various settings. Most notable are the number of clusters (20), using cosine distance which is better for text, turning off auto data preparation since the values are ready for mining, the number of iterations (20) to get a better model, and the split criterion of size for clusters that are roughly balanced in number of cases assigned. CREATE TABLE km_settings (setting_name  VARCHAR2(30), setting_value VARCHAR2(30)); BEGIN  INSERT INTO km_settings (setting_name, setting_value) VALUES     VALUES (dbms_data_mining.clus_num_clusters, 20);  INSERT INTO km_settings (setting_name, setting_value)     VALUES (dbms_data_mining.kmns_distance, dbms_data_mining.kmns_cosine);   INSERT INTO km_settings (setting_name, setting_value) VALUES     VALUES (dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_off);   INSERT INTO km_settings (setting_name, setting_value) VALUES     VALUES (dbms_data_mining.kmns_iterations,20);   INSERT INTO km_settings (setting_name, setting_value) VALUES     VALUES (dbms_data_mining.kmns_split_criterion,dbms_data_mining.kmns_size);   COMMIT; END; With this in place, we can now build the clustering model. BEGIN     DBMS_DATA_MINING.CREATE_MODEL(     model_name          => 'TEXT_CLUSTERING_MODEL',     mining_function     => dbms_data_mining.clustering,     data_table_name     => 'extracted_tokens_tfidf_nt',     case_id_column_name => 'id',     settings_table_name => 'km_settings'); END;To generate cluster names from this model, check out my earlier post on that topic.

    Read the article

  • Using a "white list" for extracting terms for Text Mining

    - by [email protected]
    In Part 1 of my post on "Generating cluster names from a document clustering model" (part 1, part 2, part 3), I showed how to build a clustering model from text documents using Oracle Data Miner, which automates preparing data for text mining. In this process we specified a custom stoplist and lexer and relied on Oracle Text to identify important terms.  However, there is an alternative approach, the white list, which uses a thesaurus object with the Oracle Text CTXRULE index to allow you to specify the important terms. INTRODUCTIONA stoplist is used to exclude, i.e., black list, specific words in your documents from being indexed. For example, words like a, if, and, or, and but normally add no value when text mining. Other words can also be excluded if they do not help to differentiate documents, e.g., the word Oracle is ubiquitous in the Oracle product literature. One problem with stoplists is determining which words to specify. This usually requires inspecting the terms that are extracted, manually identifying which ones you don't want, and then re-indexing the documents to determine if you missed any. Since a corpus of documents could contain thousands of words, this could be a tedious exercise. Moreover, since every word is considered as an individual token, a term excluded in one context may be needed to help identify a term in another context. For example, in our Oracle product literature example, the words "Oracle Data Mining" taken individually are not particular helpful. The term "Oracle" may be found in nearly all documents, as with the term "Data." The term "Mining" is more unique, but could also refer to the Mining industry. If we exclude "Oracle" and "Data" by specifying them in the stoplist, we lose valuable information. But it we include them, they may introduce too much noise. Still, when you have a broad vocabulary or don't have a list of specific terms of interest, you rely on the text engine to identify important terms, often by computing the term frequency - inverse document frequency metric. (This is effectively a weight associated with each term indicating its relative importance in a document within a collection of documents. We'll revisit this later.) The results using this technique is often quite valuable. As noted above, an alternative to the subtractive nature of the stoplist is to specify a white list, or a list of terms--perhaps multi-word--that we want to extract and use for data mining. The obvious downside to this approach is the need to specify the set of terms of interest. However, this may not be as daunting a task as it seems. For example, in a given domain (Oracle product literature), there is often a recognized glossary, or a list of keywords and phrases (Oracle product names, industry names, product categories, etc.). Being able to identify multi-word terms, e.g., "Oracle Data Mining" or "Customer Relationship Management" as a single token can greatly increase the quality of the data mining results. The remainder of this post and subsequent posts will focus on how to produce a dataset that contains white list terms, suitable for mining. CREATING A WHITE LIST We'll leverage the thesaurus capability of Oracle Text. Using a thesaurus, we create a set of rules that are in effect our mapping from single and multi-word terms to the tokens used to represent those terms. For example, "Oracle Data Mining" becomes "ORACLEDATAMINING." First, we'll create and populate a mapping table called my_term_token_map. All text has been converted to upper case and values in the TERM column are intended to be mapped to the token in the TOKEN column. TERM                                TOKEN DATA MINING                         DATAMINING ORACLE DATA MINING                  ORACLEDATAMINING 11G                                 ORACLE11G JAVA                                JAVA CRM                                 CRM CUSTOMER RELATIONSHIP MANAGEMENT    CRM ... Next, we'll create a thesaurus object my_thesaurus and a rules table my_thesaurus_rules: CTX_THES.CREATE_THESAURUS('my_thesaurus', FALSE); CREATE TABLE my_thesaurus_rules (main_term     VARCHAR2(100),                                  query_string  VARCHAR2(400)); We next populate the thesaurus object and rules table using the term token map. A cursor is defined over my_term_token_map. As we iterate over  the rows, we insert a synonym relationship 'SYN' into the thesaurus. We also insert into the table my_thesaurus_rules the main term, and the corresponding query string, which specifies synonyms for the token in the thesaurus. DECLARE   cursor c2 is     select token, term     from my_term_token_map; BEGIN   for r_c2 in c2 loop     CTX_THES.CREATE_RELATION('my_thesaurus',r_c2.token,'SYN',r_c2.term);     EXECUTE IMMEDIATE 'insert into my_thesaurus_rules values                        (:1,''SYN(' || r_c2.token || ', my_thesaurus)'')'     using r_c2.token;   end loop; END; We are effectively inserting the token to return and the corresponding query that will look up synonyms in our thesaurus into the my_thesaurus_rules table, for example:     'ORACLEDATAMINING'        SYN ('ORACLEDATAMINING', my_thesaurus)At this point, we create a CTXRULE index on the my_thesaurus_rules table: create index my_thesaurus_rules_idx on        my_thesaurus_rules(query_string)        indextype is ctxsys.ctxrule; In my next post, this index will be used to extract the tokens that match each of the rules specified. We'll then compute the tf-idf weights for each of the terms and create a nested table suitable for mining.

    Read the article

< Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16  | Next Page >