Search Results

Search found 227 results on 10 pages for 'classification'.

Page 1/10 | 1 2 3 4 5 6 7 8 9 10  | Next Page >

  • Quick guide to Oracle IRM 11g: Classification design

    - by Simon Thorpe
    Quick guide to Oracle IRM 11g indexThis is the final article in the quick guide to Oracle IRM. If you've followed everything prior you will now have a fully functional and tested Information Rights Management service. It doesn't matter if you've been following the 10g or 11g guide as this next article is common to both. ContentsWhy this is the most important part... Understanding the classification and standard rights model Identifying business use cases Creating an effective IRM classification modelOne single classification across the entire businessA context for each and every possible granular use caseWhat makes a good context? Deciding on the use of roles in the context Reviewing the features and security for context roles Summary Why this is the most important part...Now the real work begins, installing and getting an IRM system running is as simple as following instructions. However to actually have an IRM technology easily protecting your most sensitive information without interfering with your users existing daily work flows and be able to scale IRM across the entire business, requires thought into how confidential documents are created, used and distributed. This article is going to give you the information you need to ask the business the right questions so that you can deploy your IRM service successfully. The IRM team here at Oracle have over 10 years of experience in helping customers and it is important you understand the following to be successful in securing access to your most confidential information. Whatever you are trying to secure, be it mergers and acquisitions information, engineering intellectual property, health care documentation or financial reports. No matter what type of user is going to access the information, be they employees, contractors or customers, there are common goals you are always trying to achieve.Securing the content at the earliest point possible and do it automatically. Removing the dependency on the user to decide to secure the content reduces the risk of mistakes significantly and therefore results a more secure deployment. K.I.S.S. (Keep It Simple Stupid) Reduce complexity in the rights/classification model. Oracle IRM lets you make changes to access to documents even after they are secured which allows you to start with a simple model and then introduce complexity once you've understood how the technology is going to be used in the business. After an initial learning period you can review your implementation and start to make informed decisions based on user feedback and administration experience. Clearly communicate to the user, when appropriate, any changes to their existing work practice. You must make every effort to make the transition to sealed content as simple as possible. For external users you must help them understand why you are securing the documents and inform them the value of the technology to both your business and them. Before getting into the detail, I must pay homage to Martin White, Vice President of client services in SealedMedia, the company Oracle acquired and who created Oracle IRM. In the SealedMedia years Martin was involved with every single customer and was key to the design of certain aspects of the IRM technology, specifically the context model we will be discussing here. Listening carefully to customers and understanding the flexibility of the IRM technology, Martin taught me all the skills of helping customers build scalable, effective and simple to use IRM deployments. No matter how well the engineering department designed the software, badly designed and poorly executed projects can result in difficult to use and manage, and ultimately insecure solutions. The advice and information that follows was born with Martin and he's still delivering IRM consulting with customers and can be found at www.thinkers.co.uk. It is from Martin and others that Oracle not only has the most advanced, scalable and usable document security solution on the market, but Oracle and their partners have the most experience in delivering successful document security solutions. Understanding the classification and standard rights model The goal of any successful IRM deployment is to balance the increase in security the technology brings without over complicating the way people use secured content and avoid a significant increase in administration and maintenance. With Oracle it is possible to automate the protection of content, deploy the desktop software transparently and use authentication methods such that users can open newly secured content initially unaware the document is any different to an insecure one. That is until of course they attempt to do something for which they don't have any rights, such as copy and paste to an insecure application or try and print. Central to achieving this objective is creating a classification model that is simple to understand and use but also provides the right level of complexity to meet the business needs. In Oracle IRM the term used for each classification is a "context". A context defines the relationship between.A group of related documents The people that use the documents The roles that these people perform The rights that these people need to perform their role The context is the key to the success of Oracle IRM. It provides the separation of the role and rights of a user from the content itself. Documents are sealed to contexts but none of the rights, user or group information is stored within the content itself. Sealing only places information about the location of the IRM server that sealed it, the context applied to the document and a few other pieces of metadata that pertain only to the document. This important separation of rights from content means that millions of documents can be secured against a single classification and a user needs only one right assigned to be able to access all documents. If you have followed all the previous articles in this guide, you will be ready to start defining contexts to which your sensitive information will be protected. But before you even start with IRM, you need to understand how your own business uses and creates sensitive documents and emails. Identifying business use cases Oracle is able to support multiple classification systems, but usually there is one single initial need for the technology which drives a deployment. This need might be to protect sensitive mergers and acquisitions information, engineering intellectual property, financial documents. For this and every subsequent use case you must understand how users create and work with documents, to who they are distributed and how the recipients should interact with them. A successful IRM deployment should start with one well identified use case (we go through some examples towards the end of this article) and then after letting this use case play out in the business, you learn how your users work with content, how well your communication to the business worked and if the classification system you deployed delivered the right balance. It is at this point you can start rolling the technology out further. Creating an effective IRM classification model Once you have selected the initial use case you will address with IRM, you need to design a classification model that defines the access to secured documents within the use case. In Oracle IRM there is an inbuilt classification system called the "context" model. In Oracle IRM 11g it is possible to extend the server to support any rights classification model, but the majority of users who are not using an application integration (such as Oracle IRM within Oracle Beehive) are likely to be starting out with the built in context model. Before looking at creating a classification system with IRM, it is worth reviewing some recognized standards and methods for creating and implementing security policy. A very useful set of documents are the ISO 17799 guidelines and the SANS security policy templates. First task is to create a context against which documents are to be secured. A context consists of a group of related documents (all top secret engineering research), a list of roles (contributors and readers) which define how users can access documents and a list of users (research engineers) who have been given a role allowing them to interact with sealed content. Before even creating the first context it is wise to decide on a philosophy which will dictate the level of granularity, the question is, where do you start? At a department level? By project? By technology? First consider the two ends of the spectrum... One single classification across the entire business Imagine that instead of having separate contexts, one for engineering intellectual property, one for your financial data, one for human resources personally identifiable information, you create one context for all documents across the entire business. Whilst you may have immediate objections, there are some significant benefits in thinking about considering this. Document security classification decisions are simple. You only have one context to chose from! User provisioning is simple, just make sure everyone has a role in the only context in the business. Administration is very low, if you assign rights to groups from the business user repository you probably never have to touch IRM administration again. There are however some obvious downsides to this model.All users in have access to all IRM secured content. So potentially a sales person could access sensitive mergers and acquisition documents, if they can get their hands on a copy that is. You cannot delegate control of different documents to different parts of the business, this may not satisfy your regulatory requirements for the separation and delegation of duties. Changing a users role affects every single document ever secured. Even though it is very unlikely a business would ever use one single context to secure all their sensitive information, thinking about this scenario raises one very important point. Just having one single context and securing all confidential documents to it, whilst incurring some of the problems detailed above, has one huge value. Once secured, IRM protected content can ONLY be accessed by authorized users. Just think of all the sensitive documents in your business today, imagine if you could ensure that only everyone you trust could open them. Even if an employee lost a laptop or someone accidentally sent an email to the wrong recipient, only the right people could open that file. A context for each and every possible granular use case Now let's think about the total opposite of a single context design. What if you created a context for each and every single defined business need and created multiple contexts within this for each level of granularity? Let's take a use case where we need to protect engineering intellectual property. Imagine we have 6 different engineering groups, and in each we have a research department, a design department and manufacturing. The company information security policy defines 3 levels of information sensitivity... restricted, confidential and top secret. Then let's say that each group and department needs to define access to information from both internal and external users. Finally add into the mix that they want to review the rights model for each context every financial quarter. This would result in a huge amount of contexts. For example, lets just look at the resulting contexts for one engineering group. Q1FY2010 Restricted Internal - Engineering Group 1 - Research Q1FY2010 Restricted Internal - Engineering Group 1 - Design Q1FY2010 Restricted Internal - Engineering Group 1 - Manufacturing Q1FY2010 Restricted External- Engineering Group 1 - Research Q1FY2010 Restricted External - Engineering Group 1 - Design Q1FY2010 Restricted External - Engineering Group 1 - Manufacturing Q1FY2010 Confidential Internal - Engineering Group 1 - Research Q1FY2010 Confidential Internal - Engineering Group 1 - Design Q1FY2010 Confidential Internal - Engineering Group 1 - Manufacturing Q1FY2010 Confidential External - Engineering Group 1 - Research Q1FY2010 Confidential External - Engineering Group 1 - Design Q1FY2010 Confidential External - Engineering Group 1 - Manufacturing Q1FY2010 Top Secret Internal - Engineering Group 1 - Research Q1FY2010 Top Secret Internal - Engineering Group 1 - Design Q1FY2010 Top Secret Internal - Engineering Group 1 - Manufacturing Q1FY2010 Top Secret External - Engineering Group 1 - Research Q1FY2010 Top Secret External - Engineering Group 1 - Design Q1FY2010 Top Secret External - Engineering Group 1 - Manufacturing Now multiply the above by 6 for each engineering group, 18 contexts. You are then creating/reviewing another 18 every 3 months. After a year you've got 72 contexts. What would be the advantages of such a complex classification model? You can satisfy very granular rights requirements, for example only an authorized engineering group 1 researcher can create a top secret report for access internally, and his role will be reviewed on a very frequent basis. Your business may have very complex rights requirements and mapping this directly to IRM may be an obvious exercise. The disadvantages of such a classification model are significant...Huge administrative overhead. Someone in the business must manage, review and administrate each of these contexts. If the engineering group had a single administrator, they would have 72 classifications to reside over each year. From an end users perspective life will be very confusing. Imagine if a user has rights in just 6 of these contexts. They may be able to print content from one but not another, be able to edit content in 2 contexts but not the other 4. Such confusion at the end user level causes frustration and resistance to the use of the technology. Increased synchronization complexity. Imagine a user who after 3 years in the company ends up with over 300 rights in many different contexts across the business. This would result in long synchronization times as the client software updates all your offline rights. Hard to understand who can do what with what. Imagine being the VP of engineering and as part of an internal security audit you are asked the question, "What rights to researchers have to our top secret information?". In this complex model the answer is not simple, it would depend on many roles in many contexts. Of course this example is extreme, but it highlights that trying to build many barriers in your business can result in a nightmare of administration and confusion amongst users. In the real world what we need is a balance of the two. We need to seek an optimum number of contexts. Too many contexts are unmanageable and too few contexts does not give fine enough granularity. What makes a good context? Good context design derives mainly from how well you understand your business requirements to secure access to confidential information. Some customers I have worked with can tell me exactly the documents they wish to secure and know exactly who should be opening them. However there are some customers who know only of the government regulation that requires them to control access to certain types of information, they don't actually know where the documents are, how they are created or understand exactly who should have access. Therefore you need to know how to ask the business the right questions that lead to information which help you define a context. First ask these questions about a set of documentsWhat is the topic? Who are legitimate contributors on this topic? Who are the authorized readership? If the answer to any one of these is significantly different, then it probably merits a separate context. Remember that sealed documents are inherently secure and as such they cannot leak to your competitors, therefore it is better sealed to a broad context than not sealed at all. Simplicity is key here. Always revert to the first extreme example of a single classification, then work towards essential complexity. If there is any doubt, always prefer fewer contexts. Remember, Oracle IRM allows you to change your mind later on. You can implement a design now and continue to change and refine as you learn how the technology is used. It is easy to go from a simple model to a more complex one, it is much harder to take a complex model that is already embedded in the work practice of users and try to simplify it. It is also wise to take a single use case and address this first with the business. Don't try and tackle many different problems from the outset. Do one, learn from the process, refine it and then take what you have learned into the next use case, refine and continue. Once you have a good grasp of the technology and understand how your business will use it, you can then start rolling out the technology wider across the business. Deciding on the use of roles in the context Once you have decided on that first initial use case and a context to create let's look at the details you need to decide upon. For each context, identify; Administrative rolesBusiness owner, the person who makes decisions about who may or may not see content in this context. This is often the person who wanted to use IRM and drove the business purchase. They are the usually the person with the most at risk when sensitive information is lost. Point of contact, the person who will handle requests for access to content. Sometimes the same as the business owner, sometimes a trusted secretary or administrator. Context administrator, the person who will enact the decisions of the Business Owner. Sometimes the point of contact, sometimes a trusted IT person. Document related rolesContributors, the people who create and edit documents in this context. Reviewers, the people who are involved in reviewing documents but are not trusted to secure information to this classification. This role is not always necessary. (See later discussion on Published-work and Work-in-Progress) Readers, the people who read documents from this context. Some people may have several of the roles above, which is fine. What you are trying to do is understand and define how the business interacts with your sensitive information. These roles obviously map directly to roles available in Oracle IRM. Reviewing the features and security for context roles At this point we have decided on a classification of information, understand what roles people in the business will play when administrating this classification and how they will interact with content. The final piece of the puzzle in getting the information for our first context is to look at the permissions people will have to sealed documents. First think why are you protecting the documents in the first place? It is to prevent the loss of leaking of information to the wrong people. To control the information, making sure that people only access the latest versions of documents. You are not using Oracle IRM to prevent unauthorized people from doing legitimate work. This is an important point, with IRM you can erect many barriers to prevent access to content yet too many restrictions and authorized users will often find ways to circumvent using the technology and end up distributing unprotected originals. Because IRM is a security technology, it is easy to get carried away restricting different groups. However I would highly recommend starting with a simple solution with few restrictions. Ensure that everyone who reasonably needs to read documents can do so from the outset. Remember that with Oracle IRM you can change rights to content whenever you wish and tighten security. Always return to the fact that the greatest value IRM brings is that ONLY authorized users can access secured content, remember that simple "one context for the entire business" model. At the start of the deployment you really need to aim for user acceptance and therefore a simple model is more likely to succeed. As time passes and users understand how IRM works you can start to introduce more restrictions and complexity. Another key aspect to focus on is handling exceptions. If you decide on a context model where engineering can only access engineering information, and sales can only access sales data. Act quickly when a sales manager needs legitimate access to a set of engineering documents. Having a quick and effective process for permitting other people with legitimate needs to obtain appropriate access will be rewarded with acceptance from the user community. These use cases can often be satisfied by integrating IRM with a good Identity & Access Management technology which simplifies the process of assigning users the correct business roles. The big print issue... Printing is often an issue of contention, users love to print but the business wants to ensure sensitive information remains in the controlled digital world. There are many cases of physical document loss causing a business pain, it is often overlooked that IRM can help with this issue by limiting the ability to generate physical copies of digital content. However it can be hard to maintain a balance between security and usability when it comes to printing. Consider the following points when deciding about whether to give print rights. Oracle IRM sealed documents can contain watermarks that expose information about the user, time and location of access and the classification of the document. This information would reside in the printed copy making it easier to trace who printed it. Printed documents are slower to distribute in comparison to their digital counterparts, so time sensitive information in printed format may present a lower risk. Print activity is audited, therefore you can monitor and react to users abusing print rights. Summary In summary it is important to think carefully about the way you create your context model. As you ask the business these questions you may get a variety of different requirements. There may be special projects that require a context just for sensitive information created during the lifetime of the project. There may be a department that requires all information in the group is secured and you might have a few senior executives who wish to use IRM to exchange a small number of highly sensitive documents with a very small number of people. Oracle IRM, with its very flexible context classification system, can support all of these use cases. The trick is to introducing the complexity to deliver them at the right level. In another article i'm working on I will go through some examples of how Oracle IRM might map to existing business use cases. But for now, this article covers all the important questions you need to get your IRM service deployed and successfully protecting your most sensitive information.

    Read the article

  • Beginner's resources/introductions to classification algorithms.

    - by Dirk
    Hi, everybody. I am entirely new to the topic of classification algorithms, and need a few good pointers about where to start some "serious reading". I am right now in the process of finding out, whether machine learning and automated classification algorithms could be a worthwhile thing to add to some application of mine. I already scanned through "How to Solve It: Modern heuristics" by Z. Michalewicz and D. Fogel (in particular, the chapters about linear classifiers using neuronal networks), and on the practical side, I am currently looking through the WEKA toolkit source code. My next (planned) step would be to dive into the realm of Bayesian classification algorithms. Unfortunately, I am lacking a serious theoretical foundation in this area (let alone, having used it in any way as of yet), so any hints at where to look next would be appreciated; in particular, a good introduction of available classification algorithms would be helpful. Being more a craftsman and less a theoretician, the more practical, the better... Hints, anyone?

    Read the article

  • Bag of words Classification

    - by AlgoMan
    I need find words training words and their classification. Simple classification such as . Sports Entertainment and Politics things like that. Where Can i find the words and their classifications. I know many universities have done Bag of words classifications. Is there any repository of training examples ?

    Read the article

  • Classification: Dealing with Abstain/Rejected Class

    - by abner.ayala
    I am asking for your input and/help on a classification problem. If anyone have any references that I can read to help me solve my problem even better. I have a classification problem of four discrete and very well separated classes. However my input is continuous and has a high frequency (50Hz), since its a real-time problem. The circles represent the clusters of the classes, the blue line the decision boundary and Class 5 equals the (neutral/resting do nothing class). This class is the rejected class. However the problem is that when I move from one class to the other I activate a lot of false positives in the transition movements, since the movement is clearly non-linear. For example, every time I move from class 5 (neutral class) to 1 I first see a lot of 3's before getting to the 1 class. Ideally, I will want my decision boundary to look like the one in the picture below where the rejected class is Class =5. Has a higher decision boundary than the others classes to avoid misclassification during transition. I am currently implementing my algorithm in Matlab using naive bayes, kNN, and SVMs optimized algorithms using Matlab. Question: What is the best/common way to handle abstain/rejected classes classes? Should I use (fuzzy logic, loss function, should I include resting cluster in the training)?

    Read the article

  • Java text classification problem

    - by yox
    Hello, I have a set of Books objects, classs Book is defined as following : Class Book{ String title; ArrayList<tags> taglist; } Where title is the title of the book, example : Javascript for dummies. and taglist is a list of tags for our example : Javascript, jquery, "web dev", .. As I said a have a set of books talking about different things : IT, BIOLOGY, HISTORY, ... Each book has a title and a set of tags describing it.. I have to classify automaticaly those books into separated sets by topic, example : IT BOOKS : Java for dummies Javascript for dummies Learn flash in 30 days C++ programming HISTORY BOOKS : World wars America in 1960 Martin luther king's life BIOLOGY BOOKS : .... Do you guys know a classification algorithm/method to apply for that kind of problems ? A solution is to use an external API to define the category of the text, but the problem here is that books are in different languages : french, spanish, english ..

    Read the article

  • Measuring the performance of classification algorithm

    - by Silver Dragon
    I've got a classification problem in my hand, which I'd like to address with a machine learning algorithm ( Bayes, or Markovian probably, the question is independent on the classifier to be used). Given a number of training instances, I'm looking for a way to measure the performance of an implemented classificator, with taking data overfitting problem into account. That is: given N[1..100] training samples, if I run the training algorithm on every one of the samples, and use this very same samples to measure fitness, it might stuck into a data overfitting problem -the classifier will know the exact answers for the training instances, without having much predictive power, rendering the fitness results useless. An obvious solution would be seperating the hand-tagged samples into training, and test samples; and I'd like to learn about methods selecting the statistically significant samples for training. White papers, book pointers, and PDFs much appreciated!

    Read the article

  • SVM Classification - minimum number of input sets for each class

    - by Amol Joshi
    Im trying to build an app to detect images which are advertisements from the webpages. Once I detect those Ill not be allowing those to be displayed on the client side. From the help that I got here in stackoverflow, I thought SVM is the best approach to my aim. So, I have coded SVM and an SMO myself. The dataset which I have got from UCI data repository has 3280 instances ( Link to Dataset- http://archive.ics.uci.edu/ml/datasets/Internet+Advertisements )where around 400 of them are from class representing Advertisement images and rest of them representing non-advertisement images. Right now Im taking the first 2800 input sets and training the SVM. But after looking at the accuracy rate I realised that most of those 2800 input sets are from non-advertisement image class. So Im getting very good accuracy for that class. So what can I do here? About how many input set shall I give to SVM to train and how many of them for each class? Thanks. Cheers. ( Basically made a new question because the context was different from my previous question. http://stackoverflow.com/questions/1991113/optimization-of-neural-network-input-data )

    Read the article

  • A simple explanation of Naive Bayes Classification

    - by Jaggerjack
    I am finding it hard to understand the process of Naive Bayes, and I was wondering if someone could explained it with a simple step by step process in English. I understand it takes comparisons by times occurred as a probability, but I have no idea how the training data is related to the actual dataset. Please give me an explanation of what role the training set plays. I am giving a very simple example for fruits here, like banana for example training set--- round-red round-orange oblong-yellow round-red dataset---- round-red round-orange round-red round-orange oblong-yellow round-red round-orange oblong-yellow oblong-yellow round-red

    Read the article

  • Naive Bayes matlab, row classification

    - by Jungle Boogie
    How do you classify a row of seperate cells in matlab? Atm I can classify single coloums like so: training = [1;0;-1;-2;4;0;1]; % this is the sample data. target_class = ['posi';'zero';'negi';'negi';'posi';'zero';'posi']; % target_class are the different target classes for the training data; here 'positive' and 'negetive' are the two classes for the given training data % Training and Testing the classifier (between positive and negative) test = 10*randn(25, 1); % this is for testing. I am generating random numbers. class = classify(test,training, target_class, 'diaglinear') % This command classifies the test data depening on the given training data using a Naive Bayes classifier Unlike the above im looking at wanting to classify: A B C Row A | 1 | 1 | 1 = a house Row B | 1 | 2 | 1 = a garden Can anyone help? Here is a code example from matlabs site: nb = NaiveBayes.fit(training, class) nb = NaiveBayes.fit(..., 'param1',val1, 'param2',val2, ...) I dont understand what param1 is or what val1 etc should be?

    Read the article

  • Machine leaning algorithm for data classification.

    - by twk
    Hi all, I'm looking for some guidance about which techniques/algorithms I should research to solve the following problem. I've currently got an algorithm that clusters similar-sounding mp3s using acoustic fingerprinting. In each cluster, I have all the different metadata (song/artist/album) for each file. For that cluster, I'd like to pick the "best" song/artist/album metadata that matches an existing row in my database, or if there is no best match, decide to insert a new row. For a cluster, there is generally some correct metadata, but individual files have many types of problems: Artist/songs are completely misnamed, or just slightly mispelled the artist/song/album is missing, but the rest of the information is there the song is actually a live recording, but only some of the files in the cluster are labeled as such. there may be very little metadata, in some cases just the file name, which might be artist - song.mp3, or artist - album - song.mp3, or another variation A simple voting algorithm works fairly well, but I'd like to have something I can train on a large set of data that might pick up more nuances than what I've got right now. Any links to papers or similar projects would be greatly appreciated. Thanks!

    Read the article

  • Text Classification Implementation

    - by Hearty
    I've been trying to implement a text classification system. It needs to read a text file, and extract the words and the word frequency. So far, I've been planning to parse the words, put them in a dictionary and save it to an XML file. I am using C++/CLI. Is this a good implementation or is there a simpler or better implementation? May be related question (some code implementation): http://stackoverflow.com/questions/10631309/save-dictionary-to-xml-file

    Read the article

  • Bug severity classification issues

    - by KyleMinn
    In a book I have, there is a following classification of defect: Critical : A defect receives a “critical” severity level if one or more critical system functionalities are impaired by a defect with is impaired and there is no workaround. High: A defect receives a “high” severity level if some fundamental system functionalities are impaired but a workaround exists. Medium: A defect receives a “medium” severity level if no critical functionality is impaired and a workaround exists for the defect. Low: A defect receives a “low” severity level if the problem involves a cosmetic feature of the system. To be honest, I do not get it.. For example point 2. What if fundamental but not critical feature is impaired and there is NOT a workaround. The same for point 3: what if no critical functionality is affected but there is no workaround? E.g. optional field in the registration form does not work. No workaround but barely an issue.

    Read the article

  • What's the correct terminology for something that isn't quite classification nor regression?

    - by TC
    Let's say that I have a problem that is basicly classification. That is, given some input and a number of possible output classes, find the correct class for the given input. Neural networks and decision trees are some of the algorithms that may be used to solve such problems. These algorithms typically only emit a single result however: the resulting classification. Now what if I weren't only interested in one classification, but in the posterior probabilities that the input belongs to each of the classes. I.E., instead of the answer "This input belongs in class A", I want the answer "This input belongs to class A with 80%, class B with 15% and class C with 5%". My question is not on how to obtain these posterior probabilities, but rather on the correct terminology to describe the process of finding them. You could call it regression, since we are now trying to estimate a number of real valued numbers, but I am not quite sure if that's right. I feel it's not exactly classification either, it's something in between the two. Is there a word that describes the process of finding the class conditional posterior probabilities that some input belongs in each of the possible output classes? P.S. I'm not exactly sure if this question is enough of a programming question, but since it's about machine learning and machine learning generally involves a decent amount of programming, let's give it a shot.

    Read the article

  • User Acceptance Testing Defect Classification when developing for an outside client

    - by DannyC
    I am involved in a large development project in which we (a very small start up) are developing for an outside client (a very large company). We recently received their first output from UAT testing of a fairly small iteration, which listed 12 'defects', triaged into three categories : Low, Medium and High. The issue we have is around whether everything in this list should be recorded as a 'defect' - some of the issues they found would be better described as refinements, or even 'nice-to-haves', and some we think are not defects at all. They client's QA lead says that it is standard for them to label every issues they identify as a defect, however, we are a bit uncomfortable about this. Whilst the relationship is good, we don't see a huge problem with this, but we are concerned that, if the relationship suffers in the future, these lists of 'defects' could prove costly for us. We don't want to come across as being difficult, or taking things too personally here, and we are happy to make all of the changes identified, however we are a bit concerned especially as there is a uneven power balance at play in our relationship. Are we being paranoid here? Or could we be setting ourselves up for problems down the line by agreeing to this classification?

    Read the article

  • Software bug/defect classification

    - by Dustin K
    We're trying to come up with terms that better describe our bugs/defects. To us, the term 'bug' or 'defect' is too generic and doesn't accurately reflect what is happening. For example, instead of saying that there is a bug (in the general sense), we'd rather say what type of bug (an error, or enhancement, or improvement, etc.). What names do you use for describing 'bugs'? I found http://www.softwaredevelopment.ca/bugs.shtml which has some pretty good classifications. How do you classify them?

    Read the article

  • Transaction classification. Artificial intelligence

    - by Alex
    For a project, I have to classify a list of banking transactions based on their description. Supose I have 2 categories: health and entertainment. Initially, the transactions will have basic information: date and time, ammount and a description given by the user. For example: Transaction 1: 09/17/2012 12:23:02 pm - 45.32$ - "medicine payments" Transaction 2: 09/18/2012 1:56:54 pm - 8.99$ - "movie ticket" Transaction 3: 09/18/2012 7:46:37 pm - 299.45$ - "dentist appointment" Transaction 4: 09/19/2012 6:50:17 am - 45.32$ - "videogame shopping" The idea is to use that description to classify the transaction. 1 and 3 would go to "health" category while 2 and 4 would go to "entertainment". I want to use the google prediction API to do this. In reality, I have 7 different categories, and for each one, a lot of key words related to that category. I would use some for training and some for testing. Is this even possible? I mean, to determine the category given a few words? Plus, the number of words is not necesarally the same on every transaction. Thanks for any help or guidance! Very appreciated Possible solution: https://developers.google.com/prediction/docs/hello_world?hl=es#theproblem

    Read the article

  • Software bug/defect classification

    - by Dustin K
    We're trying to come up with terms that better describe our bugs/defects. To us, the term 'bug' or 'defect' is too generic and doesn't accurately reflect what is happening. For example, instead of saying that there is a bug (in the general sense), we'd rather say what type of bug (an error, or enhancement, or improvement, etc.). What names do you use for describing 'bugs'? We found http://www.softwaredevelopment.ca/bugs.shtml which has some pretty good classifications. How do you classify them?

    Read the article

  • How to identify a PDF classification problem?

    - by burtonic
    We are crawling and downloading lots of companies' PDFs and trying to pick out the ones that are Annual Reports. Such reports can be downloaded from most companies' investor-relations pages. The PDFs are scanned and the database is populated with, among other things, the: Title Contents (full text) Page count Word count Orientation First line Using this data we are checking for the obvious phrases such as: Annual report Financial statement Quarterly report Interim report Then recording the frequency of these phrases and others. So far we have around 350,000 PDFs to scan and a training set of 4,000 documents that have been manually classified as either a report or not. We are experimenting with a number of different approaches including Bayesian classifiers and weighting the different factors available. We are building the classifier in Ruby. My question is: if you were thinking about this problem, where would you start?

    Read the article

  • Image Classification - Detecting an image is cartoon-like

    - by kingb
    I have a large amount of jpeg thumbnail images ranging in size from 120x90 to 320x240 and I would like to classify them as either Real Life-like or Cartoon-like. Are there any applications that will have cartoon classification capabilities? This application should work on Linux, and should take an image path on the command-line and return either 0 or 1 (echo $?).

    Read the article

  • Calculate posterior distribution of unknown mis-classification with PRTools in MATLAB

    - by Samuel Lampa
    I'm using the PRTools MATLAB library to train some classifiers, generating test data and testing the classifiers. I have the following details: N: Total # of test examples k: # of mis-classification for each classifier and class I want to do: Calculate and plot Bayesian posterior distributions of the unknown probabilities of mis-classification (denoted q), that is, as probability density functions over q itself (so, P(q) will be plotted over q, from 0 to 1). I have that (math formulae, not matlab code!): P(q|k,N) = Posterior * Prior / Normalization constant = P(k|q,N) * P(q|N) / P(k|N) The prior is set to 1, so I only need to calculate the posterior and normalization constant. I know that the posterior can be expressed as (where B(N,k) is the binomial coefficient): P(k|q,N) = B(N,k) * q^k * (1-q)^(N-k) ... so the Normalization constant is simply an integral of the posterior above, from 0 to 1: P(k|N) = B(N,k) * integralFromZeroToOne( q^k * (1-q)^(N-k) ) (The Binomial coefficient ( B(N,k) ) can be omitted thoughappears in both the posterior and normalization constant, so it can be omitted.) Now, I've heard that the integral for the normalization constant should be able to be calculated as a series ... something like: k!(N-k)! / (N+1)! Is that correct? (I have some lecture notes from with this series, but can't figure out if it is for the normalization constant integral, or for the posterior distribution of mis-classification (q)) Also, hints are welcome as how to practically calculate this? (factorials are easily creating truncation errors right?) ... AND, how to practically calculate the final plot (the posterior distribution over q, from 0 to 1).

    Read the article

  • Needed list of special characters classification with respective characters

    - by pravin
    I am working on one web application , It's related to machine translation support i.e. which takes source text for translation and translated in to user specified language Currently it's in unit testing phase. Here, i want to check that, whether my machine translation feature is fully working for all the special characters. Because of different test cases I stuck at one point where i need all the special characters with classification. I needed all the special characters listing with classification. e.g. 1st : class name : Punctuation Characters : !?,"| etc test cases : segment1? segment2! segment3. 2nd : Class name : HTML entities characters : all the characters which belong under this class test cases : respective test cases 3rd : Class name : Extended ASCII characters :all the characters which belong under this class test cases : respective test cases Please folks provide this, if anyone has any idea or links so that i can make product perfect Thanks a lot

    Read the article

  • Issue in understanding how to compare performance of classifier using ROC

    - by user1214586
    I am trying to demystify pattern recognition techniques and understood few of them. I am trying to design a classifier M. A gesture is classified based on the hamming distance between the sample time series y and the training time series x. The result of the classifier are probabilistic values. There are 3 classes/categories with labels A,B,C which classifies hand gestures where there are 100 samples for each class which are to be classified (single feature and data length=100). The data are different time series (x coordinate vs time). The training set is used to assign probabilities indicating which gesture has occured how many times. So,out of 10 training samples if gesture A appeared 6 times then probability that a gesture falls under category A is P(A)=0.6 similarly P(B)=0.3 and P(C)=0.1 Now, I am trying to compare the performance of this classifier with Bayes classifier, K-NN, Principal component analysis (PCA) and Neural Network. On what basis,parameter and method should I do it if I consider ROC or cross validate since the features for my classifier are the probabilistic values for the ROC plot hence what shall be the features for k-nn,bayes classification and PCA? Is there a code for it which will be useful. What should be the value of k is there are 3 classes of gestures? Please help. I am in a fix.

    Read the article

  • ITIL Incident Classification - Fault vs SR vs Technical Incident

    - by ExceptionLimeCat
    I am new to ITIL and Incident classifcations and I am trying learn more about them and understand how they could integrate in our organization. I have found it difficult to find a clear definition of Fault vs. Service Request vs. Technical incidents. I am basing my definitions on this article: http://www.itsmsolutions.com/newsletters/DITYvol6iss27.htm As I understand it: Service Request - Service provided by IT as part of regular administration of a system. Fault - An unexpected error in a system. Technical Incident - An interruption or potential interruption in IT service due to an expected incident caused by some IT policy.

    Read the article

  • Image classification using openCV and weka

    - by simk
    Hi i want to do image classification, so i am planning to use openCV for the preprocessing of image and weka to check which ML algorithm gives best result, so the problem i am facing is converting the image data in to weka ARFF file format, when i apply some image transformation to image and write the image data it become so large and not sure how to define @ATTRIBUTE for that. If you have done similar thing before please suggest how can i solve this and it particularly doesn't have to be openCV i can use other tools also, please suggest.

    Read the article

1 2 3 4 5 6 7 8 9 10  | Next Page >