Search Results

Search found 85 results on 4 pages for 'classifier'.

Page 2/4 | < Previous Page | 1 2 3 4  | Next Page >

  • Bayes Misclassification error and plot : pattern recognition [closed]

    - by user1214586
    Below is a Matlab code for Bayes classifier which classifies arbitrary numbers into their classes. training = [3;5;17;19;24;27;31;38;45;48;52;56;66;69;73;78;84;88]; target_class = [0;0;10;10;20;20;30;30;40;40;50;50;60;60;70;70;80;80]; test = [1:2:90]'; class = classify(test,training, target_class, 'diaglinear'); % Naive Bayes classifier [test class] If someone could provide code snippets for calculating the Bayes error for misclassification and accuracy.Also, is it possible to plot a scatter plot and histogram indicating the number of data points belonging to different classes? Thank you.

    Read the article

  • Store variables during loop and use in it another template

    - by krisvandenbergh
    Is there a way to store a variable/param during a for-each loop in a sort of array, and use it in another template, namely <xsl:template match="Foundation.Core.Classifier.feature">. All the classname values that appear during the for-each should be stored. How would you implement that in XSLT? Here's my current code. <xsl:for-each select="Foundation.Core.Class"> <xsl:for-each select="Foundation.Core.ModelElement.name"> <xsl:param name="classname"> <xsl:value-of select="Foundation.Core.ModelElement.name"/> </xsl:param> </xsl:for-each> <xsl:apply-templates select="Foundation.Core.Classifier.feature" /> </xsl:for-each> Here's the template in which the classname parameters should be used. <xsl:template match="Foundation.Core.Classifier.feature"> <xsl:for-each select="Foundation.Core.Attribute"> <owl:DatatypeProperty rdf:ID="{Foundation.Core.ModelElement.name}"> <rdfs:domain rdf:resource="$classname" /> </owl:DatatypeProperty> </xsl:for-each> </xsl:template> The input file can be found at http://krisvandenbergh.be/uml_pricing.xml

    Read the article

  • Actually XSLT Lookup (Store variables during loop and use in it another template)

    - by krisvandenbergh
    Is there a way to store a variable/param during a for-each loop in a sort of array, and use it in another template, namely <xsl:template match="Foundation.Core.Classifier.feature">. All the classname values that appear during the for-each should be stored. How would you implement that in XSLT? Here's my current code. <xsl:for-each select="Foundation.Core.Class"> <xsl:for-each select="Foundation.Core.ModelElement.name"> <xsl:param name="classname"> <xsl:value-of select="Foundation.Core.ModelElement.name"/> </xsl:param> </xsl:for-each> <xsl:apply-templates select="Foundation.Core.Classifier.feature" /> </xsl:for-each> Here's the template in which the classname parameters should be used. <xsl:template match="Foundation.Core.Classifier.feature"> <xsl:for-each select="Foundation.Core.Attribute"> <owl:DatatypeProperty rdf:ID="{Foundation.Core.ModelElement.name}"> <rdfs:domain rdf:resource="$classname" /> </owl:DatatypeProperty> </xsl:for-each> </xsl:template> The input file can be found at http://krisvandenbergh.be/uml_pricing.xml

    Read the article

  • Why doesn't Gradle include transitive dependencies in compile / runtime classpath?

    - by Francis Toth
    I'm learning how Gradle works, and I can't understand how it resolves a project transitive dependencies. For now, I have two projects : projectA : which has a couple of dependencies on external libraries projectB : which has only one dependency on projectA No matter how I try, when I build projectB, gradle doesn't include any projectA dependencies (X and Y) in projectB's compile or runtime classpath. I've only managed to make it work by including projectA's dependencies in projectB's build script, which, in my opinion does not make any sense. These dependencies should be automatically attached to projectB. I'm pretty sure I'm missing something but I can't figure out what. I've read about "lib dependencies", but it seems to apply only to local projects like described here, not on external dependencies. Here is the build.gradle I use in the root project (the one that contains both projectA and projectB) : buildscript { repositories { mavenCentral() } dependencies { classpath 'com.android.tools.build:gradle:0.3' } } subprojects { apply plugin: 'java' apply plugin: 'idea' group = 'com.company' repositories { mavenCentral() add(new org.apache.ivy.plugins.resolver.SshResolver()) { name = 'customRepo' addIvyPattern "ssh://.../repository/[organization]/[module]/[revision]/[module].xml" addArtifactPattern "ssh://.../[organization]/[module]/[revision]/[module](-[classifier]).[ext]" } } sourceSets { main { java { srcDir 'src/' } } } idea.module { downloadSources = true } // task that create sources jar task sourceJar(type: Jar) { from sourceSets.main.java classifier 'sources' } // Publishing configuration uploadArchives { repositories { add project.repositories.customRepo } } artifacts { archives(sourceJar) { name "$name-sources" type 'source' builtBy sourceJar } } } This one concerns projectA only : version = '1.0' dependencies { compile 'com.company:X:1.0' compile 'com.company:B:1.0' } And this is the one used by projectB : version = '1.0' dependencies { compile ('com.company:projectA:1.0') { transitive = true } } Thank you in advance for any help, and please, apologize me for my bad English.

    Read the article

  • ClassNotFoundException error in implementing Bayesian algorithm in Apache Mahout on Hadoop

    - by Shweta
    Hi, I have a problem in executing the Bayesian algorithm in Mahout. I built it with Maven and the job file is in target directory. When run from terminal using hadoop, I'm getting the ClassNotFoundException error. What should be done? $HADOOP_HOME/bin/hadoop jar mahout-core-0.3-SNAPSHOT.job org.apache.mahout.classifier.bayes.mapreduce.bayes.bayesdriver -i test -o output Exception in thread "main" java.lang.ClassNotFoundException: org.apache.mahout.classifier.bayes.mapreduce.bayes.bayesdriver at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

    Read the article

  • Clojure / HBase: How to Import HBaseTestingUtility in v0.94.6.1

    - by David Williams
    In Clojure, if I want to start a test cluster using the hbase testing utility, I have to annotate my dependencies with: [org.apache.hbase/hbase "0.92.2" :classifier "tests" :scope "test"] First of all, I have no idea what this means. According to leiningens sample project.clj ;; Dependencies are listed as [group-id/name version]; in addition ;; to keywords supported by Pomegranate, you can use :native-prefix ;; to specify a prefix. This prefix is used to extract natives in ;; jars that don't adhere to the default "<os>/<arch>/" layout that ;; Leiningen expects. Question 1: What does that mean? Question 2: If I upgrade the version: [org.apache.hbase/hbase "0.94.6.1" :classifier "tests" :scope "test"] Then I receive a ClassNotFoundException Exception in thread "main" java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration Whats going on here and how do I fix it?

    Read the article

  • Measuring the performance of classification algorithm

    - by Silver Dragon
    I've got a classification problem in my hand, which I'd like to address with a machine learning algorithm ( Bayes, or Markovian probably, the question is independent on the classifier to be used). Given a number of training instances, I'm looking for a way to measure the performance of an implemented classificator, with taking data overfitting problem into account. That is: given N[1..100] training samples, if I run the training algorithm on every one of the samples, and use this very same samples to measure fitness, it might stuck into a data overfitting problem -the classifier will know the exact answers for the training instances, without having much predictive power, rendering the fitness results useless. An obvious solution would be seperating the hand-tagged samples into training, and test samples; and I'd like to learn about methods selecting the statistically significant samples for training. White papers, book pointers, and PDFs much appreciated!

    Read the article

  • UML Receptions and AcceptEventActions

    - by Silli
    What shall be the relationship between the receptions of a class (was classifier before Aadaam correction) and the AcceptEventActions in the activity describing the behavior of its instances? I understand the former is related to signals reception of the type while the latter is related to runtime ReceiveSignalEvent events of the class instances (objects). But it is not totally clear to me how to express consistency among these constructs.

    Read the article

  • Resources related to data-mining and gaming on social networks

    - by darren
    Hi all I'm interested in the problem of patterning mining among players of social networking games. For example detecting cheaters of a game, given a company's user database. So far I have been following the usual recipe for a data mining project: construct a data warehouse that aggregates significant information select a classifier, and train it with a subsectio of records from the warehouse validate classifier with another test set lather, rinse, repeat Surprisingly, I've found very little in this area regarding literature, best practices, etc. I am hoping to crowdsource the information gathering problem here. Specifically what I'm looking for: What classifiers have worked will for this type of pattern mining (it seems highly temporal, users playing games, users receiving rewards, users transferring prizes etc). Are there any highly agreed upon attributes specific to social networking / gaming data? What is a practical amount of information that should be considered? One problem I've run into is data overload, where queries and data cleansing may take days to complete. Related to point above, what hardware resources are required to produce results? I've found it difficult to estimate the amount of computing power I will require for production use. It has become apparent that a white box in the corner does not have enough horse-power for such a project. Are companies generally resorting to cloud solutions? Are they buying clusters? Basically, any resources (theoretical, academic, or practical) about implementing a social networking / gaming pattern-mining program would be very much appreciated. Thanks.

    Read the article

  • How do you parse a paragraph of text into sentences? (perferrably in Ruby)

    - by henry74
    How do you take paragraph or large amount of text and break it into sentences (perferably using Ruby) taking into account cases such as Mr. and Dr. and U.S.A? (Assuming you just put the sentences into an array of arrays) UPDATE: One possible solution I thought of involves using a parts-of-speech tagger (POST) and a classifier to determine the end of a sentence: Getting data from Mr. Jones felt the warm sun on his face as he stepped out onto the balcony of his summer home in Italy. He was happy to be alive. CLASSIFIER Mr./PERSON Jones/PERSON felt/O the/O warm/O sun/O on/O his/O face/O as/O he/O stepped/O out/O onto/O the/O balcony/O of/O his/O summer/O home/O in/O Italy/LOCATION ./O He/O was/O happy/O to/O be/O alive/O ./O POST Mr./NNP Jones/NNP felt/VBD the/DT warm/JJ sun/NN on/IN his/PRP$ face/NN as/IN he/PRP stepped/VBD out/RP onto/IN the/DT balcony/NN of/IN his/PRP$ summer/NN home/NN in/IN Italy./NNP He/PRP was/VBD happy/JJ to/TO be/VB alive./IN Can we assume, since Italy is a location, the period is the valid end of the sentence? Since ending on "Mr." would have no other parts-of-speech, can we assume this is not a valid end-of-sentence period? Is this the best answer to the my question? Thoughts?

    Read the article

  • What is the basic pattern for using (N)Hibernate?

    - by Vilx-
    I'm creating a simple Windows Forms application with NHibernate and I'm a bit confused about how I'm supposed to use it. To quote the manual: ISession (NHibernate.ISession) A single-threaded, short-lived object representing a conversation between the application and the persistent store. Wraps an ADO.NET connection. Factory for ITransaction. Holds a mandatory (first-level) cache of persistent objects, used when navigating the object graph or looking up objects by identifier. Now, suppose I have the following scenario: I have a simple classifier which is a MSSQL table with two columns - ID (auto_increment) and Name (nvarchar). To edit this classifier I create a form which contains a single gridview and two buttons - OK and Cancel. The user can nearly directly edit the table in the gridview, and when he hits OK the changes he made are persisted to the DB (or if he hits cancel, nothing happens). Now, I have several questions about how to organize this: What should the lifetime of my ISession be? Should I create a single ISession for my whole application; an ISession for each of my forms (the application is single-threaded MDI); or an ISession for every DB operation/transaction? Does NHibernate offer some kind of built-in dirty tracking or must I do this myself? The manual mentions something like it here and there but does not go into details. How is this done? Is there not a huge overhead? Is it somehow tied with the cache(s) that NHibernate has? What are these caches for? Are they not specific to a single ISession? That is, if I use a seperate ISession for every transaction, won't it break the dirty tracking? How does the built-in dirty tracking detect deleted objects?

    Read the article

  • Naive Bayes matlab, row classification

    - by Jungle Boogie
    How do you classify a row of seperate cells in matlab? Atm I can classify single coloums like so: training = [1;0;-1;-2;4;0;1]; % this is the sample data. target_class = ['posi';'zero';'negi';'negi';'posi';'zero';'posi']; % target_class are the different target classes for the training data; here 'positive' and 'negetive' are the two classes for the given training data % Training and Testing the classifier (between positive and negative) test = 10*randn(25, 1); % this is for testing. I am generating random numbers. class = classify(test,training, target_class, 'diaglinear') % This command classifies the test data depening on the given training data using a Naive Bayes classifier Unlike the above im looking at wanting to classify: A B C Row A | 1 | 1 | 1 = a house Row B | 1 | 2 | 1 = a garden Can anyone help? Here is a code example from matlabs site: nb = NaiveBayes.fit(training, class) nb = NaiveBayes.fit(..., 'param1',val1, 'param2',val2, ...) I dont understand what param1 is or what val1 etc should be?

    Read the article

  • CodePlex Daily Summary for Monday, November 07, 2011

    CodePlex Daily Summary for Monday, November 07, 2011Popular ReleasesGoogleMap Control: GoogleMap Control 6.0: Major design changes to the control in order to achieve better scalability and extensibility for the new features comming with GoogleMaps API. GoogleMap control switched to GoogleMaps API v3 and .NET 4.0. GoogleMap control is 100% ScriptControl now, it requires ScriptManager to be registered on the pages where and before it is used. Markers, polylines, polygons and directions were implemented as ExtenderControl, instead of being inner properties of GoogleMap control. Better perfomance. Better...WDTVHubGen - Adds Metadata, thumbnails and subtitles to WDTV Live Hubs: V2.1: Version 2.1 (click on the right) this uses V4.0 of .net Version 2.1 adds the following features: (apologize if I forget some, added a lot of little things) Manual Lookup with TV or Movie (finally huh!), you can look up a movie or TV episode directly, you can right click on anythign, and choose manual lookup, then will allow you to type anything you want to look up and it will assign it to the file you right clicked. No Rename: a very popular request, this is an option you can set so that t...Bulk Copy Test Cases Tool for Microsoft Test Manager & TFS: Bulk Copy Test Cases Tool: A while ago I had written a blog post Microsoft Test Manager Test Case Versioning on how to manage Test Cases over multiple releases which required you to manually copy test cases individually. Now there is a tool to help with the bulk copying of Test Cases that updates the Iteration field at the same time.Self-Tracking Entity Generator for WPF and Silverlight: Self-Tracking Entity Generator v 0.9.9 Update 2: Self-Tracking Entity Generator v 0.9.9 for Entity Framework 4.0. No change to the self-tracking entity generator v 0.9.9. WPF sample (SchoolSample) is updated with unit testing for both ViewModel and Model classes.SubExtractor: Release 1020: Feature: added "baseline double quotes" character to selector box Feature: added option to save SRT files as ANSI (instead of previous UTF-8 only) Feature: made "Save Sup files to Source directory" apply to both Sup and Idx source files. Fix: removed SDH text (...) or [...] that is split over 2 lines Fix: better decision-making in when to prefix a line with a '-' because SDH was removedAcDown????? - Anime&Comic Downloader: AcDown????? v3.6.1: ?? ● AcDown??????????、??????,??????????????????????,???????Acfun、Bilibili、???、???、???、Tucao.cc、SF???、?????80????,???????????、?????????。 ● AcDown???????????????????????????,???,???????????????????。 ● AcDown???????C#??,????.NET Framework 2.0??。?????"Acfun?????"。 ????32??64? Windows XP/Vista/7 ????????????? ??:????????Windows XP???,?????????.NET Framework 2.0???(x86)?.NET Framework 2.0???(x64),?????"?????????"??? ??????????????,??????????: ??"AcDown?????"????????? ?? v3.6.1?? ??.hlv...Track Folder Changes: Track Folder Changes 1.1: Fixed exception when right-clicking the root nodeKinect Toolbox: Kinect Toolbox v1.1.0.2: This version adds support for the Kinect for Windows SDK beta 2.MapWindow 4: MapWindow GIS v4.8.6 - Final release - 32Bit: This is the final release of MapWindow v4.8. It has 4.8.6 as version number. This version has been thoroughly tested. If you do get an exception send the exception to us. Don't forget to include your e-mail address. Use the forums at http://www.mapwindow.org/phorum/ for questions. Please consider donating a small portion of the money you have saved by having free GIS tools: http://www.mapwindow.org/pages/donate.php What’s New in 4.8.6 (Final release) · A few minor issues have been fixed Wha...Kinect Mouse Cursor: Kinect Mouse Cursor 1.1: Updated for Kinect for Windows SDK v1.0 Beta 2!Coding4Fun Kinect Toolkit: Coding4Fun Kinect Toolkit 1.1: Updated for Kinect for Windows SDK v1.0 Beta 2!Async Executor: 1.0: Source code of the AsyncExecutorMedia Companion: MC 3.421b Weekly: Ensure .NET 4.0 Full Framework is installed. (Available from http://www.microsoft.com/download/en/details.aspx?id=17718) Ensure the NFO ID fix is applied when transitioning from versions prior to 3.416b. (Details here) TV Show Resolutions... Fix to show the season-specials.tbn when selecting an episode from season 00. Before, MC would try & load season00.tbn Fix for issue #197 - new show added by 'Manually Add Path' not being picked up. Also made non-visible the same thing in Root Folders...Nearforums - ASP.NET MVC forum engine: Nearforums v7.0: Version 7.0 of Nearforums, the ASP.NET MVC Forum Engine, containing new features: UI: Flexible layout to handle both list and table-like template layouts. Theming - Visual choice of themes: Deliver some templates on installation, export/import functionality, preview. Allow site owners to choose default list sort order for the forums. Forum latest activity. Visit the project Roadmap for more details. Webdeploy packages sha1 checksum: e6bb913e591543ab292a753d1a16cdb779488c10?????????? - ????????: All-In-One Code Framework ??? 2011-11-02: http://download.codeplex.com/Project/Download/FileDownload.aspx?ProjectName=1codechs&DownloadId=216140 ??????,11??,?????20????Microsoft OneCode Sample,????6?Program Language Sample,2?Windows Base Sample,2?GDI+ Sample,4?Internet Explorer Sample?6?ASP.NET Sample。?????????????。 ????,?????。http://i3.codeplex.com/Project/Download/FileDownload.aspx?ProjectName=1code&DownloadId=128165 Program Language CSImageFullScreenSlideShow VBImageFullScreenSlideShow CSDynamicallyBuildLambdaExpressionWithFie...Python Tools for Visual Studio: 1.1 Alpha: We’re pleased to announce the release of Python Tools for Visual Studio 1.1 Alpha. Python Tools for Visual Studio (PTVS) is an open-source plug-in for Visual Studio which supports programming with the Python programming language. This release includes new core IDE features, a couple of new sample libraries for interacting with Kinect and Excel, and many bug fixes for issues reported since the release of 1.0. For the core IDE features we’ve added many new features which improve the basic edit...BExplorer (Better Explorer): Better Explorer 2.0.0.631 Alpha: Changelog: Added: Some new functions in ribbon Added: Possibility to choose displayed columns Added: Basic Search Fixed: Some bugs after navigation Fixed: Attempt to fix slow navigation and slow start Known issues: - BreadcrumbBar fails on some situations - Basic search not work quite well in some situations Please if anyone find bugs be kind and report them at the Issue Tracker! Thanks!DotNetNuke® Community Edition: 05.06.04: Major Highlights Fixed issue with upgrades on systems that had upgraded the Telerik library to 6.0.0 Fixed issue with Razor Host upgrade to 5.6.3 The logic for module administration checks contains incorrect logic in 1 place, opening the possibility of a user with edit permissions gaining access to functionality they should not have through a particularly crafted url Security FixesBrowsers support the ability to remember common strings such as usernames/addresses etc. Code was adde...Terminals: Version 2.0 - Beta 3 Release: Beta 3 Refresh Dont forget to backup your config files BEFORE upgrading! The team has finally put the nail into the official release date for version 2.0. As bugs are winding down on the 2.0 Roadmap we decided to push out another build - the first 2.0 Beta build. Please take time to use and abuse this release. We left logging in place, and this is a debug build so be sure to submit your logs on each bug reported, and please do report all bugs! Check the source code page on the site, th...iTuner - The iTunes Companion: iTuner 1.4.4322: Added German (unverified, apologies if incorrect) Properly source invariant resources with correct resIDs Replaced obsolete lyric providers with working providers Fix Pseudolater to correctly morph every third char Fix null reference in CatalogBaseNew ProjectsA Blog: This is a blog plus personal web page frameworkAccess 1-D Intersection: This is an Access VBA Module containing functions that allow make it easy to determine overlaps in 1-D intervals. For instance if table A contains a range of 0-7 and Table B contains a range of 5-10, the intersection is 5-7.AkismetPC: A C# implementation of the popular anti-spam plugin Akismet. There aren't many .NET versions of Akismet so I decided to write one and that can be used with .NET blog engines such as Subtext, etc.AlertMonkey: A multicast chat client that enables users to send html, images, sounds, and files to connected users. Provides specialized alert types such as lunch and happy hour, as well as channel support.Azzeton: azzetonBKWork: private project.Blue: Blue is a web application for italian baseball and softball umpires.Build Javascript Models from .Net Classes: Build JavaScript Data Models from .Net Classes automaticallycmpp: cmppCRM 2011 TreeView for Dependent Picklist: This utility will allow CRM Customizer to configure Dependent Picklist items which will be shown as TreeView control on CRM form.DirSign: DirSign is a console exe that evaluates or checks directory signature. DirSign is used to check if something in a directory tree has changed (a file date or a file size or a new or missing file). You can use DirSign in scenario where you need to check if something changed since last time but where you can't install a file system watcher.epictactics: Game for WP7Export SharePoint 2010 External List to Excel: Export SharePoint 2010 external list to Excel with custom ribbon plugin. Export current external list with selected view to office 97 - 2003 or office 2007 - 2010.Floridum: Project for a XML Database.GNU ISO8583: GISO (GNU ISO) is a tool that makes it easier to analyze ISO 8583 financial transactions and also provides a platform to create a host simulator, capable of receiving requests and sending back the responses. It’s a WinForms application and it’s developed using C#.G's Syndication Pocket: G's Syndication Pocket is simple RSS Aggregate application. This is suitable for .NET Compact Framework. I checked it on Sharp's W-ZERO3.Hatena Netfx Library: .NET Library for Hatena Services.inohigo: a programming language that was developed by inohiro.Internet Cache Examiner: Internet Cache Examiner allows Internet Explorer INDEX.DAT files to be read directly, allowing the extraction of more information than is displayed in Internet Explorer, and without being limited to viewing only the activity of the current user. It's developed in C#.Javascript to IQueryable: javascript to IQueryable is an implementation that allows to write a simple query in javascript and then execute it on the server with EntityFramework or a linq provider that implement IQueryable.kisd: Just my code, wanted to keep it safe.LUCA UI for Silverlight 4: LUCA UI is a collection of flexible layout controls for Silverlight 4. Basically, using these controls you can create the same type of user-definable UI that Visual Studio and Expression Blend have.Messenger Game - Starter Kit: Kom godt i gang med at lave spil til Messenger med dette komplette Starter Kit. Indeholder et komplet netværksspil lavet med Messenger Activity API og Silverlight.Music Keys: Music KeysMyNote: MyNoteOpen Source Data System: DataSystem is a file based database system that is thread safe. It is a dynamically generated database meaning developers can either structure it outside the application prior or development. PhotoDesktop: Create background images for your desktop using hundreds of your photos off your local computer. (coming soon - use flickr [or other RSS] feeds)SharePoint Backup Augmentation Cmdlets: The SharePoint Backup Augmentation Cmdlets (SharePointBAC) provide administrators with additional PowerShell cmdlets to complement and extend SharePoint 2010's native backup and restore capabilities. SharePointBAC makes it possible to groom backup sets, archive backups, and more.SharpClassifier: C "Classifier" is an AI software component that tries to classify instances from given evidence (if shiny then diamond). A famous example is classifying email spam, separating it from ham. SharpClassifier currently only contains a single classifier - A Bayesian Naive Classifier. Most Bayesian Naive Classifiers for C# you'll find out there only handles two classes (spam/ham), but this implementation supports any number of classses.Shell Sort Web service and Application: this is a webservice of Sorting methode. use Shell sort methode to sorthing a unsorted number, and it can give a boundary as you input this project is made by Information System students, Ma Chung University , Malang - East Java - Indonesia [url:www.Machung.ac.id] Anna Letizia & SetiawanEka Prayuda Barbiezztissa@gmail.com & setya_09@hotmail.comSistema UELS: adsfasdfSorting Number use Insertion Sort on Web Service: This program can simulate the insertion sort easily.TA_Sorted_App01: First implementation of TA_Sorted Algorithm ThinkDeeper MVC framework: ThinkDeeper MVC is a WPF MVC for .NET 3.5. Typing Game: The Nottingham Game Developer's first game.xBlog: xBlog is a project to build a simple and extensible Blog Engine based on xml and linqXNA DebugDrawer Using Spritebatch: This project serves to show how to draw lines and rectangles using XNA's Spritebatch. This project uses XNA 4.0 and C# programming languageYet another Scedule Planner: YASP - Yet another Scedule Planner

    Read the article

  • Where would you start if you were trying to solve this PDF classification problem?

    - by burtonic
    We are crawling and downloading lots of companies' PDFs and trying to pick out the ones that are Annual Reports. Such reports can be downloaded from most companies' investor-relations pages. The PDFs are scanned and the database is populated with, among other things, the: Title Contents (full text) Page count Word count Orientation First line Using this data we are checking for the obvious phrases such as: Annual report Financial statement Quarterly report Interim report Then recording the frequency of these phrases and others. So far we have around 350,000 PDFs to scan and a training set of 4,000 documents that have been manually classified as either a report or not. We are experimenting with a number of different approaches including Bayesian classifiers and weighting the different factors available. We are building the classifier in Ruby. My question is: if you were thinking about this problem, where would you start?

    Read the article

  • How to identify a PDF classification problem?

    - by burtonic
    We are crawling and downloading lots of companies' PDFs and trying to pick out the ones that are Annual Reports. Such reports can be downloaded from most companies' investor-relations pages. The PDFs are scanned and the database is populated with, among other things, the: Title Contents (full text) Page count Word count Orientation First line Using this data we are checking for the obvious phrases such as: Annual report Financial statement Quarterly report Interim report Then recording the frequency of these phrases and others. So far we have around 350,000 PDFs to scan and a training set of 4,000 documents that have been manually classified as either a report or not. We are experimenting with a number of different approaches including Bayesian classifiers and weighting the different factors available. We are building the classifier in Ruby. My question is: if you were thinking about this problem, where would you start?

    Read the article

  • Error compiling Visual Studio 2010 EditorClassifier Extension Template

    - by nick.ueda
    I'm trying to create an Editor Classifier Template project and run it. When I attempt to build I get an error message stating: "Error trying to read the VSIX manifest file 'extension.vsixmanifest'. Exception has been thrown by target of invocation." Any thoughts? I've tried googling this but didn't have any luck. I am working with Visual Studio 2010 Ultimate and the VS 2010 SDK Beta 1. Thanks, Nick

    Read the article

  • Anyone know of a good open source spam checker in java or c#?

    - by Spines
    I'm creating a site where users can write articles and comment on the articles. I want to automatically check to see if a new article or comment is spam. What are good libraries for doing this? I looked at bayesian classifier libraries, but it seems that I would have to gather a large amount of samples and classify them all as spam or not spam myself... I'm looking for something that can hopefully just tell me right out of the box.

    Read the article

  • please help me to interpret the naive bayes result in weka..

    - by resmi
    Anybody please help me to interpret the following result generated in weka for classification using naive bayes.....Please explain clearly what is this Normal Distribution , Mean , StandardDev , WeightSum and Precision.Please help me.Am new in weka. ** Naive Bayes Classifier Class Normal: Prior probability = 0.5 1374195_at: Normal Distribution. Mean = 218.06 StandardDev = 6.0572 WeightSum = 3 Precision = 36.34333334 1373315_at: Normal Distribution. Mean = 1142.58 StandardDev = 21.1589 WeightSum = 3 Precision = 126.95333339999999

    Read the article

  • Information Extraction Toolkits

    - by MathGladiator
    I'm looking for information extraction libraries where I can have semi structured information that may have either hidden or incomplete data. I want to train some classifiers to pull out content based on the structure. I'm working on building a tool where I can select text in the browser, and it will generate (via some web service call) a classifier that can be used on other documents to pull out text. I'm primarily looking at how the structure of the document can be used to indicate what the content is.

    Read the article

  • Problem with Precision floating point operation in C

    - by Microkernel
    Hi Guys, For one of my course project I started implementing "Naive Bayesian classifier" in C. My project is to implement a document classifier application (especially Spam) using huge training data. Now I have problem implementing the algorithm because of the limitations in the C's datatype. ( Algorithm I am using is given here, http://en.wikipedia.org/wiki/Bayesian_spam_filtering ) PROBLEM STATEMENT: The algorithm involves taking each word in a document and calculating probability of it being spam word. If p1, p2 p3 .... pn are probabilities of word-1, 2, 3 ... n. The probability of doc being spam or not is calculated using Here, probability value can be very easily around 0.01. So even if I use datatype "double" my calculation will go for a toss. To confirm this I wrote a sample code given below. #define PROBABILITY_OF_UNLIKELY_SPAM_WORD (0.01) #define PROBABILITY_OF_MOSTLY_SPAM_WORD (0.99) int main() { int index; long double numerator = 1.0; long double denom1 = 1.0, denom2 = 1.0; long double doc_spam_prob; /* Simulating FEW unlikely spam words */ for(index = 0; index < 162; index++) { numerator = numerator*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD; denom2 = denom2*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD; denom1 = denom1*(long double)(1 - PROBABILITY_OF_UNLIKELY_SPAM_WORD); } /* Simulating lot of mostly definite spam words */ for (index = 0; index < 1000; index++) { numerator = numerator*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD; denom2 = denom2*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD; denom1 = denom1*(long double)(1- PROBABILITY_OF_MOSTLY_SPAM_WORD); } doc_spam_prob= (numerator/(denom1+denom2)); return 0; } I tried Float, double and even long double datatypes but still same problem. Hence, say in a 100K words document I am analyzing, if just 162 words are having 1% spam probability and remaining 99838 are conspicuously spam words, then still my app will say it as Not Spam doc because of Precision error (as numerator easily goes to ZERO)!!!. This is the first time I am hitting such issue. So how exactly should this problem be tackled?

    Read the article

  • Preoblem with Precision floating point operation in C

    - by Microkernel
    Hi Guys, For one of my course project I started implementing "Naive Bayesian classifier" in C. My project is to implement a document classifier application (especially Spam) using huge training data. Now I have problem implementing the algorithm because of the limitations in the C's datatype. ( Algorithm I am using is given here, http://en.wikipedia.org/wiki/Bayesian_spam_filtering ) PROBLEM STATEMENT: The algorithm involves taking each word in a document and calculating probability of it being spam word. If p1, p2 p3 .... pn are probabilities of word-1, 2, 3 ... n. The probability of doc being spam or not is calculated using Here, probability value can be very easily around 0.01. So even if I use datatype "double" my calculation will go for a toss. To confirm this I wrote a sample code given below. #define PROBABILITY_OF_UNLIKELY_SPAM_WORD (0.01) #define PROBABILITY_OF_MOSTLY_SPAM_WORD (0.99) int main() { int index; long double numerator = 1.0; long double denom1 = 1.0, denom2 = 1.0; long double doc_spam_prob; /* Simulating FEW unlikely spam words */ for(index = 0; index < 162; index++) { numerator = numerator*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD; denom2 = denom2*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD; denom1 = denom1*(long double)(1 - PROBABILITY_OF_UNLIKELY_SPAM_WORD); } /* Simulating lot of mostly definite spam words */ for (index = 0; index < 1000; index++) { numerator = numerator*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD; denom2 = denom2*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD; denom1 = denom1*(long double)(1- PROBABILITY_OF_MOSTLY_SPAM_WORD); } doc_spam_prob= (numerator/(denom1+denom2)); return 0; } I tried Float, double and even long double datatypes but still same problem. Hence, say in a 100K words document I am analyzing, if just 162 words are having 1% spam probability and remaining 99838 are conspicuously spam words, then still my app will say it as Not Spam doc because of Precision error (as numerator easily goes to ZERO)!!!. This is the first time I am hitting such issue. So how exactly should this problem be tackled?

    Read the article

  • Agile: User Stories for Machine Learning Project?

    - by benjismith
    I've just finished up with a prototype implementation of a supervised learning algorithm, automatically assigning categorical tags to all the items in our company database (roughly 5 million items). The results look good, and I've been given the go-ahead to plan the production implementation project. I've done this kind of work before, so I know how the functional components of the software. I need a collection of web crawlers to fetch data. I need to extract features from the crawled documents. Those documents need to be segregated into a "training set" and a "classification set", and feature-vectors need to be extracted from each document. Those feature vectors are self-organized into clusters, and the clusters are passed through a series of rebalancing operations. Etc etc etc etc. So I put together a plan, with about 30 unique development/deployment tasks, each with time estimates. The first stage of development -- ignoring some advanced features that we'd like to have in the long-term, but aren't high enough priority to make it into the development schedule yet -- is slated for about two months worth of work. (Keep in mind that I already have a working prototype, so the final implementation is significantly simpler than if the project was starting from scratch.) My manager said the plan looked good to him, but he asked if I could reorganize the tasks into user stories, for a few reasons: (1) our project management software is totally organized around user stories; (2) all of our scheduling is based on fitting entire user stories into sprints, rather than individually scheduling tasks; (3) other teams -- like the web developers -- have made great use of agile methodologies, and they've benefited from modelling all the software features as user stories. So I created a user story at the top level of the project: As a user of the system, I want to search for items by category, so that I can easily find the most relevant items within a huge, complex database. Or maybe a better top-level story for this feature would be: As a content editor, I want to automatically create categorical designations for the items in our database, so that customers can easily find high-value data within our huge, complex database. But that's not the real problem. The tricky part, for me, is figuring out how to create subordinate user stories for the rest of the machine learning architecture. Case in point... I know that the algorithm requires two major architectural subdivisions: (A) training, and (B) classification. And I know that the training portion of the architecture requires construction of a cluster-space. All the Agile Development literature I've read seems to indicate that a user story should be the "smallest possible implementation that provides any business value". And that makes a lot of sense when designing a piece of end-user software. Start small, and then incrementally add value when users demand additional functionality. But a cluster-space, in and of itself, provides zero business value. Nor does a crawler, or a feature-extractor. There's no business value (not for the end-user, or for any of the roles internal to the company) in a partial system. A trained cluster-space is only possible with the crawler and feature extractor, and only relevant if we also develop an accompanying classifier. I suppose it would be possible to create user stories where the subordinate components of the system act as the users in the stories: As a supervised-learning cluster-space construction routine, I want to consume data from a feature extractor, so that I can exist. But that seems really weird. What benefit does it provide me as the developer (or our users, or any other stakeholders, for that matter) to model my user stories like that? Although the main story can be easily divided along architectural-component boundaries (crawler, trainer, classifier, etc), I can't think of any useful decomposition from a user's perspective. What do you guys think? How do you plan Agile user stories for sophisticated, indivisible, non-user-facing components?

    Read the article

< Previous Page | 1 2 3 4  | Next Page >