Search Results

Search found 5543 results on 222 pages for 'legacy terms'.

Page 2/222 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Using a "white list" for extracting terms for Text Mining

    - by [email protected]
    In Part 1 of my post on "Generating cluster names from a document clustering model" (part 1, part 2, part 3), I showed how to build a clustering model from text documents using Oracle Data Miner, which automates preparing data for text mining. In this process we specified a custom stoplist and lexer and relied on Oracle Text to identify important terms.  However, there is an alternative approach, the white list, which uses a thesaurus object with the Oracle Text CTXRULE index to allow you to specify the important terms. INTRODUCTIONA stoplist is used to exclude, i.e., black list, specific words in your documents from being indexed. For example, words like a, if, and, or, and but normally add no value when text mining. Other words can also be excluded if they do not help to differentiate documents, e.g., the word Oracle is ubiquitous in the Oracle product literature. One problem with stoplists is determining which words to specify. This usually requires inspecting the terms that are extracted, manually identifying which ones you don't want, and then re-indexing the documents to determine if you missed any. Since a corpus of documents could contain thousands of words, this could be a tedious exercise. Moreover, since every word is considered as an individual token, a term excluded in one context may be needed to help identify a term in another context. For example, in our Oracle product literature example, the words "Oracle Data Mining" taken individually are not particular helpful. The term "Oracle" may be found in nearly all documents, as with the term "Data." The term "Mining" is more unique, but could also refer to the Mining industry. If we exclude "Oracle" and "Data" by specifying them in the stoplist, we lose valuable information. But it we include them, they may introduce too much noise. Still, when you have a broad vocabulary or don't have a list of specific terms of interest, you rely on the text engine to identify important terms, often by computing the term frequency - inverse document frequency metric. (This is effectively a weight associated with each term indicating its relative importance in a document within a collection of documents. We'll revisit this later.) The results using this technique is often quite valuable. As noted above, an alternative to the subtractive nature of the stoplist is to specify a white list, or a list of terms--perhaps multi-word--that we want to extract and use for data mining. The obvious downside to this approach is the need to specify the set of terms of interest. However, this may not be as daunting a task as it seems. For example, in a given domain (Oracle product literature), there is often a recognized glossary, or a list of keywords and phrases (Oracle product names, industry names, product categories, etc.). Being able to identify multi-word terms, e.g., "Oracle Data Mining" or "Customer Relationship Management" as a single token can greatly increase the quality of the data mining results. The remainder of this post and subsequent posts will focus on how to produce a dataset that contains white list terms, suitable for mining. CREATING A WHITE LIST We'll leverage the thesaurus capability of Oracle Text. Using a thesaurus, we create a set of rules that are in effect our mapping from single and multi-word terms to the tokens used to represent those terms. For example, "Oracle Data Mining" becomes "ORACLEDATAMINING." First, we'll create and populate a mapping table called my_term_token_map. All text has been converted to upper case and values in the TERM column are intended to be mapped to the token in the TOKEN column. TERM                                TOKEN DATA MINING                         DATAMINING ORACLE DATA MINING                  ORACLEDATAMINING 11G                                 ORACLE11G JAVA                                JAVA CRM                                 CRM CUSTOMER RELATIONSHIP MANAGEMENT    CRM ... Next, we'll create a thesaurus object my_thesaurus and a rules table my_thesaurus_rules: CTX_THES.CREATE_THESAURUS('my_thesaurus', FALSE); CREATE TABLE my_thesaurus_rules (main_term     VARCHAR2(100),                                  query_string  VARCHAR2(400)); We next populate the thesaurus object and rules table using the term token map. A cursor is defined over my_term_token_map. As we iterate over  the rows, we insert a synonym relationship 'SYN' into the thesaurus. We also insert into the table my_thesaurus_rules the main term, and the corresponding query string, which specifies synonyms for the token in the thesaurus. DECLARE   cursor c2 is     select token, term     from my_term_token_map; BEGIN   for r_c2 in c2 loop     CTX_THES.CREATE_RELATION('my_thesaurus',r_c2.token,'SYN',r_c2.term);     EXECUTE IMMEDIATE 'insert into my_thesaurus_rules values                        (:1,''SYN(' || r_c2.token || ', my_thesaurus)'')'     using r_c2.token;   end loop; END; We are effectively inserting the token to return and the corresponding query that will look up synonyms in our thesaurus into the my_thesaurus_rules table, for example:     'ORACLEDATAMINING'        SYN ('ORACLEDATAMINING', my_thesaurus)At this point, we create a CTXRULE index on the my_thesaurus_rules table: create index my_thesaurus_rules_idx on        my_thesaurus_rules(query_string)        indextype is ctxsys.ctxrule; In my next post, this index will be used to extract the tokens that match each of the rules specified. We'll then compute the tf-idf weights for each of the terms and create a nested table suitable for mining.

    Read the article

  • Have unit test generators helped you when working with legacy code?

    - by Duncan Bayne
    I am looking at a small (~70kLOC including generated) C# (.NET 4.0, some Silverlight) code-base that has very low test coverage. The code itself works in that it has passed user acceptance testing, but it is brittle and in some areas not very well factored. I would like to add solid unit test coverage around the legacy code using the usual suspects (NMock, NUnit, StatLight for the Silverlight bits). My normal approach is to start working through the project, unit testing & refactoring, until I am satisfied with the state of the code. I've done this many times in the past, and it's worked well. However, this time I'm thinking of using a test generator (in particular Pex) to create the test framework, then manually fleshing it out. My question is: have you used unit test generators in the past when commencing work on a legacy codebase, and if so, would you recommend them? My fear is that the generated tests will miss the semantic nuances of the code-base, leading to the dreaded situation of having tests for the sake of the coverage metric, rather than tests which clearly express the intended behaviour in code.

    Read the article

  • Working with Legacy code #3 : Build a safety net.

    - by andrewstopford
    The first port of call in changing legacy code is a safety net, without one your fingers will get burnt. Make your safety net a high level functional test over the major areas of the application. Automate the test, plug it into your CI builds and run it every night. The test should act as a final fail safe as you work.

    Read the article

  • Are there any formal approaches for familiarising oneself with a new or legacy codebase? [closed]

    - by codecowboy
    Possible Duplicate: How do you dive into large code bases? As a contractor, I often encounter legacy codebases which might have little or no supporting documentation. Are there any techniques or best practices? I work with PHP and web applications, though also face situations in which I have to edit code in an unfamiliar language. How can I leave a codebase in better shape, learn something along the way and impress the team I'm working with?

    Read the article

  • Google Maps Terms of Service - saving some data to a database

    - by R.M.
    I've read the terms of service, and, from what I understand, I'm not allowed to store any information I retrieve from the Google Maps API. Are there any exceptions to this? More to the point, I'm planning on building an application that shows the user several points of interest (like restaurants, libraries etc) at a certain distance around a location he chooses (it can be in one city or more, depending on the distance he chooses). There are two problems: The first problem is that (at least for my country) the geocoder doesn't locate exact addresses, at best it only locates street names (but completely ignores street numbers) in larger cities. It is even worse for smaller rural areas. So the only way to accurately show the places on the map is by storing their coordinates in the database. Another problem seems to be with calculating distances. To show the points located below a certain distance from the user, it would mean I would have to use GDirections to get all distances between the user's location and the other points, to see which ones to show. That would be really slow for the user (since I also have to set a small delay between requests), and it would also send a pretty large amount of requests to google. Would I be allowed to store those distances in a database? The users would not be able to access a list of all the stored information, they would only see the names of the places, and a map with some markers on it. Thank you.

    Read the article

  • Map null column as 0 in a legacy database (JPA)

    - by Indrek
    Using Play! framework and it's JPASupport class I have run into a problem with a legacy database. I have the following class: @Entity @Table(name="product_catalog") public class ProductCatalog extends JPASupport { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) public Integer product_catalog; @OneToOne @JoinColumn(name="upper_catalog") public ProductCatalog upper_catalog; public String name; } Some product catalogs don't have an upper catalog, and this is referenced as 0 in a legacy database. If I supply the upper_catalog as NULL, then expectedly JPA inserts a NULL value to that database column. How could I force the null values to be 0 when writing to the database and the other way around when reading from the database?

    Read the article

  • Chainload boot of Ubuntu installed on 32GB SD card from legacy Grub boot on USB

    - by Gary Darsey
    I have Ubuntu installed on a 32 GB SD card (in the Storage Expansion slot on an Acer Aspire One) with Grub2 installed in the same partition. I boot into legacy Grub on a USB drive and would like to boot by chainloading Grub2 from Grub (kernel/initrd or symlink booting would also be fine), but I haven't figured out how to do this from legacy Grub CLI. Output from blkid for this partition is /dev/mmcblk0p1: LABEL="Ubuntu" UUID="7ceb9fa7-238c-4c5d-bb8e-2c655652ddec" TYPE='ext4" / fdisk -lu information Boot indicator ID 83. Related entries in grub.cfg: search --no-floppy --fs-uuid --set-root 7ceb9fa7-238c-4c5d-bb8e-2c655652ddec linux /boot/vmlinuz-3.5.0-17-generic root=UUID=7ceb9fa7-238c-4c5d-bb8e-2c655652ddec... initrd /boot/initrd.img-3.5.0-17-generic I can't seem to replicate this in legacy Grub. Is there any way get Grub2 to chainload? How do I set root with UUID in legacy Grub? I prefer to boot from USB. Would Grub2 on USB (copying the grub.cfg generated during installation) be an option?

    Read the article

  • Dual booting Ubuntu 12.04: UEFI and Legacy

    - by cmhughes
    I'm trying to dual boot Ubuntu 12.04 (or 12.10) with Windows 8 on a new Sony Vaio, but have run into some problems :) Specifically, my problems seem to come from choosing UEFI or Legacy as the Bootmode in the BIOS. Here is what I have found so far: Windows 8 needs to boot using UEFI, and doesn't work in Legacy mode Ubuntu (both 12.04 and 12.10) needs to boot using Legacy, and won't boot (at least from the live disk) in UEFI mode I have been able to boot Ubuntu using a live USB disc, provided that I change the Bootmode to Legacy. I haven't committed to installing it yet, because I don't really understand the consequences. My main concerns are that instead of simply selecting Windows or Ubuntu in Grub, I would also have to change my Bootmode every single time, which seems like a lot more trouble than it should be. So, the question: how can I install Ubuntu 12.04 or 12.10 in UEFI boot mode?

    Read the article

  • NHibernate Legacy Database Mappings Impossible?

    - by Corey Coogan
    I'm hoping someone can help me with mapping a legacy database. The problem I'm describing here has plagued others, yet I was unable to find a real good solution around the web. DISCLAIMER: this is a legacy DB. I have no control over the composite keys. They suck and can't be changed no matter much you tell me they suck. I have 2 tables, both with composite keys. One of the keys from one table is used as part of the key to get a collection from the other table. In short, the keys don't fully match between the table. ClassB is used everywhere I would like to avoid adding properties for the sake of this mapping if possible. public class ClassA { //[PK] public string SsoUid; //[PK] public string PolicyNumber; public IList<ClassB> Others; //more properties.... } public class ClassB { //[PK] public string PolicyNumber; //[PK] public string PolicyDateTime; //more properties } I want to get an instance of ClassA and get all ClassB rows that match PolicyNumber. I am trying to get something going with a one-to-many, but I realize that this may technically be a many-to-many that I am just treating as one-to-many. I've tried using an association class but didn't get far enough to see if it works. I'm new to these more complex mappings and am looking for advice. I'm open to pretty much any ideas. Thanks, Corey

    Read the article

  • Project Architecture For Enhancing Legacy project.

    - by vijay.shad
    I am working on legacy project. Now the situation demands project to be divided into parts. What strategy I should follow to do this task. Description: The legacy project (A) is fully functional web application with almost well defined layers. But now i need to extend the project to a further enhancement. This project usage maven as build tool. But it is used only for dependency managements only. (project exported to war form inside eclipse). The new enhancement needs me to add new data table, new UI(jsp, css, js and images). What should be my strategy to enhance to application. My proposed design. I am planing to create two new projects Project B : Main Enhancement works will done in this project. Will have all layers like service layer, dao layer and UI layer in itself. And will be a web application in itself. Project C : Extract some common model and service code form project-A and create this project. This project will be added as dependency to both the projects. If my this approach is okay! Then i presume there will be problem be problem in deployment. These two projects will demand to deploy separately(currently tomcat is used). But I must deploy these two projects as one war. So, i need to have a plan to change the web.xml entries to have configurations for both projects. This will comes with some more complexities with project. What should be my design for the project? Does my plan sounds good.

    Read the article

  • Unit testing a 'legacy' WPF Application

    - by sc_ray
    The product I have been working on has been in development for the past six years. It started as a generic data entry portal into an insanely complex part WPF/part legacy application. The system has been developed for all these years without a single Unit test in its fold. Now, the point has been raised for a comprehensive unit testing framework. I have been recruited recently to work on this product and have been tasked to get the 'Testing' in order. Since the team that worked on the product for the last six years adopted 'Agile', the project lacks any documentation of the business rules or any design documents. I have been trying to write unit tests for some of the modules. But I am not sure what to Mock, how to setup my Test fixture and eventually what to Test for, since a casual glance of the methods does not reveal its intentions. Also, it has come to my attention that the code was not developed with a particular methodology in mind. Given the situation, I was wondering if the good people of Stackoverflow could provide me with some advise on how to salvage this situation. I have heard about the book 'Working with Legacy Code' that has something to say about this general situation but I was thinking about getting some pointers from individuals who have encountered similar situations within the technology stack(C#,VB,C++,.NET 3.5,WCF,SQL Server 2005).

    Read the article

  • Does it make sense to write tests for legacy code when there is no time for a complete refactoring?

    - by is4
    I usually try to follow the advice of the book Working Effectively with Legacy Code. I break dependencies, move parts of the code to @VisibleForTesting public static methods and to new classes to make the code (or at least some part of it) testable. And I write tests to make sure that I don't break anything when I'm modifying or adding new functions. A colleague says that I shouldn't do this. His reasoning: The original code might not work properly in the first place. And writing tests for it makes future fixes and modifications harder since devs have to understand and modify the tests too. If it's GUI code with some logic (~12 lines, 2-3 if/else block, for example), a test isn't worth the trouble since the code is too trivial to begin with. Similar bad patterns could exist in other parts of the codebase, too (which I haven't seen yet, I'm rather new); it will be easier to clean them all up in one big refactoring. Extracting out logic could undermine this future possibility. Should I avoid extracting out testable parts and writing tests if we don't have time for complete refactoring? Is there any disadvantage to this that I should consider?

    Read the article

  • How should a non-IT manager secure the long-term maintenance and development of essential legacy software?

    - by user105977
    I've been hunting for a place to ask this question for quite a while; maybe this is the place, although I'm afraid it's not the kind of "question with an answer" this site would prefer. We are a small, very specialized, benefits administration firm with an extremely useful, robust collection of software, some written in COBOL but most in BASIC. Two full-time consultants have ably maintained and improved this system over more than 30 years. Needless to say they will soon retire. (One of them has been desperate to retire for several years but is loyal to a fault and so hangs on despite her husband's insistence that golf should take priority.) We started down the path of converting to a system developed by one of only three firms in the country that offer the type of software we use. We now feel that although this this firm is theoretically capable of completing the conversion process, they don't have the resources to do so timely, and we have come to believe that they will be unable to offer the kind of service we need to run our business. (There's nothing like being able to set one's own priorities and having the authority to allocate one's resources as one sees fit.) Hardware is not a problem--we are able to emulate very effectively on modern servers. If COBOL and BASIC were modern languages, we'd be willing to take the risk that we could find replacements for our current consultants going forward. It seems like there ought to be a business model for an IT support firm that concentrates on legacy platforms like this and provides the programming and software development talent to support a system like ours, removing from our backs the risks of finding the right programming talent and the job of convincing younger programmers that they can have a productive, rewarding career, in part in an old, non-sexy language like BASIC. Where do I find such firms?

    Read the article

  • Legacy code - when to move on

    - by Mmarquee
    My team and support a large number of legacy applications all of which are currently functional but problematic to support and maintain. They all depend on code that the compiler manufacture has officially no support for. So the question is should we leave the code as is, and risk a new compiler breaking our code, or should we bite the bullet and update all the code?

    Read the article

  • Diagnosing the .NET Legacy

    - by Xencor
    Assume you are taking over a legacy .NET app. [pre - 3.0] What are the top 5 diagnostic measures, profiling or otherwise that you would employ to assess the health of the application?

    Read the article

  • The .NET Legacy

    - by Xencor
    Assume you are taking over a legacy .NET app. pre - 3.0 What are the top 5 diagnostic measures, profiling or otherwise that you would employ to assess the health of the application?

    Read the article

  • wpf legacy server call

    - by Shah Al
    Hi, We have a legacy application running tomcat that publishes data in a simple html table. I have no control on the remote server publishing the data. I am looking to extract the data into a WPF desktop application and display it as a table. Is there any way a WPF application can make a url call, get the result and parse the data. This would be similar to AJAX from JSP. Any thoughts/ideas? Please advice. Regards,

    Read the article

  • Extand legacy site with another server-side programming platform best practice

    - by Andrew Florko
    Company I work for have a site developed 6-8 years ago by a team that was enthusiastic enough to use their own private PHP-based CMS. I have to put dynamic data from one intranet company database on this site in one week: 2-3 pages. I contacted company site administrator and she showed me administrative part - CMS allows only to insert html blocks & manage site map (site is deployed on machine that is inside company & fully accessible & upgradable). I'm not a PHP-guy & I don't want to dive into legacy hardly-who-ever-heard-about CMS engine I also don't want to contact developers team, 'cos I'm not sure they are still present and capable enough to extend this old days site and it'll take too much time anyway. I am about to deploy helper asp.net site on iis with 2-3 pages required & refer helper site via iframe from present site. New pages will allow to download some dynamic content from present site also. Is it ok and what are the pitfalls with iframe approach?

    Read the article

  • Extend legacy site with another server-side programming platform best practice

    - by Andrew Florko
    Company I work for have a site developed 6-8 years ago by a team that was enthusiastic enough to use their own private PHP-based CMS. I have to put dynamic data from one intranet company database on this site in one week: 2-3 pages. I contacted company site administrator and she showed me administrative part - CMS allows only to insert html blocks & manage site map (site is deployed on machine that is inside company & fully accessible & upgradeable). I'm not a PHP-guy & I don't want to dive into legacy hardly-who-ever-heard-about CMS engine I also don't want to contact developers team, 'cos I'm not sure they are still present and capable enough to extend this old days site and it'll take too much time anyway. I am about to deploy helper asp.net site on IIS with 2-3 pages required & refer helper site via iframe from present site. New pages will allow to download some dynamic content from present site also. Is it ok and what are the pitfalls with iframe approach?

    Read the article

  • Working with Legacy code #1 : Draw up a plan.

    - by andrewstopford
    Blackfield applications are a minefield, reaking of smells and awash with technical debt. The codebase is a living hell. Your first plan of attack is a plan. Your boss (be that you, your manager, your client or whoever) needs to understand what you are trying to achieve and in what time. Your team needs to know what the plan of attack will be and where. Start with the greatest pain points, what are the biggest areas of technical debt, what takes the most time to work with\change and where are the areas with the higest number of defects. Work out what classes\functions are mud balls and where all the hard dependencies are. In working out the pain points you will begin to understand structure (or lack of) and where the fundmentals are. If know one in the team knows an area then profile it, understand what lengths the code is going to.  When your done drawing up the list then work out what the common problems are, is the code hard tied to the database, file system or some other hard dependency. Is the code repeating it's self in structure\form over and over etc. From the list work out what are the areas with the biggest number of problems and make those your starting point. Now you have a plan of what needs to change and where then you can work out how it fits into your development plan. Manage your plan, put it into a defect tracker, work item tracker or use notepad or excel etc. Mark off the items on your plan as and when you have attacked them, if you find more items then get them on your plan, keep the movement going and slowly the codebase will become better and better.

    Read the article

  • How to modernize an enormous legacy database?

    - by smayers81
    I have a question, just looking for suggestions here. So, my application is 'modernizing' a desktop application by converting it to the web, with an ICEFaces UI and server side written in Java. However, they are keeping around the same Oracle database, which at current count has about 700-900 tables and probably a billion total records in the tables. Some individual tables have 250 million rows, many have over 25 million. Needless to say, the database is not scaling well. As a result, the performance of the application is looking to be abysmal. The architects / decision makers-that-be have all either refused or are unwilling to restructure the persistence. So, basically we are putting a fresh coat of paint on a functional desktop application that currently serves most user needs and does so with relative ease and quick performance. I am having trouble sleeping at night thinking of how poorly this application is going to perform and how difficult it is going to be for everyday users to do their job. So, my question is, what options do I have to mitigate this impending disaster? Is there some type of intermediate layer I can put in between the database and the Java code to speed up performance while at the same time keeping the database structure intact? Caching is obviously an option, but I don't see that as being a cure-all. Is it possible to layer a NoSQL DB in between or something?

    Read the article

  • Legacy Java code use of com.sun.net.ssl.internal.ssl.Provider()

    - by Dan
    I am working with some code from back in 2003. There is a reference to the following class: new com.sun.net.ssl.internal.ssl.Provider() It is causing an error: Access restriction: The type Provider is not accessible due to restriction on required library /Library/Java/JavaVirtualMachines/1.7.0.jdk/Contents/Home/jre/lib/jsse.jar Does anyone have any suggestions for a suitable alternative to using this class?

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >