similarity analyzer - Page 8

Convert Chunk of Data into Tabular Format Using Perl

- by neversaint

I have a data that looks like this 1:SRX000566 Submitter: WoldLab Study: RNASeq expression profiling for ENCODE project(SRP000228) Sample: Human cell line GM12878(SRS000567) Instrument: Solexa 1G Genome Analyzer Total: 4 runs, 62.7M spots, 2.1G bases Run #1: SRR002055, 11373440 spots, 375323520 bases Run #2: SRR002063, 22995209 spots, 758841897 bases Run #3: SRR005091, 13934766 spots, 459847278 bases Run #4: SRR005096, 14370900 spots, 474239700 bases 2:SRX000565 Submitter: WoldLab Study: RNASeq expression profiling for ENCODE project(SRP000228) Sample: Human cell line GM12878(SRS000567) Instrument: Solexa 1G Genome Analyzer Total: 3 runs, 51.2M spots, 1.7G bases Run #1: SRR002052, 12607931 spots, 416061723 bases Run #2: SRR002054, 12880281 spots, 425049273 bases Run #3: SRR002060, 25740337 spots, 849431121 bases 3:SRX012407 Submitter: GEO Study: GSE17153: Illumina sequencing of small RNAs from C. elegans embryos(SRP001363) Sample: Caenorhabditis elegans(SRS006961) Instrument: Illumina Genome Analyzer II Total: 1 run, 3M spots, 106.8M bases Run #1: SRR029428, 2965597 spots, 106761492 bases Is there a compact way to convert them into tabular format (tab separated). Hence 1 entry/row per chunk. In these case 3 rows. I tried this but doesn't seem to work. perl -laF/\n/ -000ne"print join chr(9),@F" myfile.txt

Read the article

The algorithm used to generate recommendations in Google News?

- by Siddhant

Hi everyone. I'm study recommendation engines, and I went through the paper that defines how Google News generates recommendations to users for news items which might be of their interest, based on collaborative filtering. One interesting technique that they mention is Minhashing. I went through what it does, but I'm pretty sure that what I have is a fuzzy idea and there is a strong chance that I'm wrong. The following is what I could make out of it :- Collect a set of all news items. Define a hash function for a user. This hash function returns the index of the first item from the news items which this user viewed, in the list of all news items. Collect, say "n" number of such values, and represent a user with this list of values. Based on the similarity count between these lists, we can calculate the similarity between users as the number of common items. This reduces the number of comparisons a lot. Based on these similarity measures, group users into different clusters. This is just what I think it might be. In Step 2, instead of defining a constant hash function, it might be possible that we vary the hash function in a way that it returns the index of a different element. So one hash function could return the index of the first element from the user's list, another hash function could return the index of the second element from the user's list, and so on. So the nature of the hash function satisfying the minwise independent permutations condition, this does sound like a possible approach. Could anyone please confirm if what I think is correct? Or the minhashing portion of Google News Recommendations, functions in some other way? I'm new to internal implementations of recommendations. Any help is appreciated a lot. Thanks!

Read the article

How can I evaluate the connectedness of my nodes?

- by Travis Leleu

I've got a space that has nodes that are all interconnected, based on a "similarity score". I would like to determine how "connected" a node is with the others. My purpose is to find nodes that are poorly connected to make sure that the backlink from the other node is prioritized. Perhaps an example would help. I've got a web page that links to my other pages based on a similarity score. Suppose I have the pages: A, B, C, ... A has a backlink from every other page, so it's very well connected. It also has links to all my other pages (each line in the graph is essentially bidirectional). B only has 1 backlink, from A. C has a link from A and D. I would like to make sure that the A-B link is prioritized over the A-C link (even if the similarity score between C and A is higher than B and A). In short, I would like to evaluate which nodes are least and best connected, so that I can mangle the results to my means. I believe this is Graph Connectedness, but I'm at a loss to develop a (simple) algorithm that will help me here. Simply counting the backlinks to a node may be a starting point -- but then how do I take the next step, which is to properly weight the links on the original node (A, in the example above)?

Read the article

Enforce SSIS naming conventions using BI-xPress

- by jamiet

A long long long time ago (in 2006 in fact) I published a blog post entitled Suggested Best Practises and naming conventions in which I suggested a bunch of acronyms that folks could use to prefix object names in their SSIS packages, thus allowing easier identification of those objects in log records, here is a sample of some of those suggestions: If you have adopted these naming conventions (and I am led to believe that a bunch of people have) then you might like to know that you can now check for adherence to these conventions using a tool called BI-xPress from Pragmatic Works. BI-xPress includes a feature called the Best Practices Analyzer that scans your packages and assess them according to some rules that you specify. In addition Pragmatic Works have made available a collection of these rules that adhere to the naming conventions I specified in 2006 You can download this collection however I recommend you first read the accompanying article that demonstrates the capabilities of the Best Practices Analyzer. Pretty cool stuff. @Jamiet

Read the article

Low disk space: home/user folder occupies 94GB

- by tedtoy

I am low on disk space and when I check the Disk Usage analyzer (using gksudo baobab) it indicates that my home/teddy folder is using 94GB, but when I browse through its contents I can only account for about 1gb of that usage. I've tried sudo apt-get clean and deleting the cached package files from Synaptic Package Manager, emptied trash but that has not changed the amount of free space I have. It seems similar to this problem But using the root disk usage analyzer has not given any insight into what is consuming so much space. Any ideas?

Read the article

Need a Holistic view of your Concurrent Processing?

- by cwarticki

Need a Holistic view of your Concurrent Processing? Choose CP AnalyzerGo to Doc 1411723.1 for more details and script download. The Concurrent Processing Analyzer is a Self-Service Health-Check script which reviews the overall Concurrent Processing Footprint, analyzes the current configurations and settings for the environment providing feedback and recommendations on Best Practices. This is a non-invasive script which provides recommended actions to be performed on the instance it was run on. For production instances, always apply any changes to a recent clone to ensure an expected outcome. E-Business Applications Concurrent Processing Analyzer Overview E-Business Applications Concurrent Request Analysis E-Business Applications Concurrent Manager Analysis Identifies Concurrent System Setup and configurations Identifies and recommends Concurrent Best Practices Easy to add Tool for regular Concurrent Maintenance Execute Analysis anytime to compare trending from past outputs Feedback welcome!

Read the article

Manic Monday - More OpenWorld Solaris Sessions: Developers, Cloud, Customer Insights, Hardware Optimization

- by Larry Wake

We're overflowing with Monday sessions; literally more than one person can take in. Learn more about what's new in Oracle Solaris Studio, hear about the latest x86 and SPARC hardware optimizations, get some insights on cloud deployment strategies, and find out from your peers what they're doing with Oracle Solaris. If you're an OpenWorld attendee, go to to Schedule Builder to guarantee your space in any session or lab. See yesterday's blog post and the "Focus on Oracle Solaris" guide for even more sessions. Monday, October 1st: 10:45 AM - Maximizing Your SPARC T4 Oracle Solaris Application Performance(CON6382, Marriott Marquis - Golden Gate C3) Hear how customers and commercial software partners have reached peak performance on SPARC T4 servers and engineered systems with Oracle Solaris Studio and its latest tools for analyzing, reporting, and improving runtime performance: Autoparallelizing, high-performance compilers Performance Analyzer (used to find performance hotspots) Thread Analyzer (to expose data races and deadlocks) Code Analyzer (used to discover latent memory corruption issues) 10:45 Cloud Formation: Implementing IaaS in Practice with Oracle Solaris(CON8787, Moscone South 302) Decisions, decisions--at the same time, we've got a session that covers why Oracle Solaris is the ideal OS for public or private clouds, IaaS or PaaS, with built-in features for elastic infrastructure, unrivaled security, superfast installation and deployment, nonstop availability, and crystal-clear observability. This session will include a customer study on how Oracle Solaris is used in the cloud today to implement the Oracle stack. 12:15 PM - Customer Insight: Oracle Solaris on Oracle Exadata, Oracle Exalogic, and SPARC SuperCluster(CON8760, Moscone South 270) Hear from customers what benefits they have realized from using the Oracle stack on Oracle Exadata and Oracle’s SPARC SuperCluster and from using Oracle Solaris on those engineered systems, taking advantage of built-in lightweight OS virtualization (Zones), enterprise reliability and scale, and other key features. 1:45 PM - Case Study: Mobile Tornado Uses Oracle Technology for Better RAS and TCO?(CON4281, Moscone West 2005) Mobile Tornado develops and markets instant communication platforms, replacing traditional radio networks with cellular networks. Its critical concern is uptime. Find out how they've used Oracle Solaris, Netra SPARC T4, and Oracle Solaris Cluster, including Oracle Solaris ZFS and Zones, for their Oracle Database deployments to improve reliability and drive down cost. 3:15 PM - Technical Panel: Developing High Performance Applications on Oracle Solaris(CON7196, Marriott Marquis - Golden Gate C2) Engineers from the Oracle Solaris, Oracle Database, and Oracle Tuxedo development teams, and Oracle ISV Engineering discuss how they develop high-performance enterprise applications that take advantage of Oracle's SPARC and x86 servers, with Oracle Solaris Studio and new Oracle Solaris 11 features. Topics will include developer tools, parallel frameworks, best practices, and methodologies, as well as insights and case studies on parallelizing and optimizing application performance on Oracle Solaris. Bring your best questions! 3:15 PM - x86 Power Management with Oracle Solaris: Current State, Opportunities, and Future(CON6271, Moscone West 2012) Another option for this time slot: learn about how Intel Xeon and Oracle Solaris work together to reduce server power consumption. This presentation addresses some of the recent power management improvements in Oracle Solaris, opportunities to further improve energy efficiency, and some future directions for Oracle Solaris power management.

Read the article

HTML Tidy in NetBeans IDE

- by Geertjan

First step in integrating HTML Tidy (via its JTidy implementation) into NetBeans IDE: The reason why I started doing this is because I want to integrate this into the pluggable analyzer functionality of NetBeans IDE that I recently blogged about, i.e., where the FindBugs functionality is found. So a logical first step is to get it working in an Action class, after which I can port it into the analyzer infrastructure: import java.awt.event.ActionEvent; import java.awt.event.ActionListener; import java.io.IOException; import java.io.PrintWriter; import java.io.StringWriter; import org.openide.awt.ActionID; import org.openide.awt.ActionReference; import org.openide.awt.ActionReferences; import org.openide.awt.ActionRegistration; import org.openide.cookies.EditorCookie; import org.openide.cookies.LineCookie; import org.openide.loaders.DataObject; import org.openide.text.Line; import org.openide.text.Line.ShowOpenType; import org.openide.util.Exceptions; import org.openide.util.NbBundle.Messages; import org.openide.windows.IOProvider; import org.openide.windows.InputOutput; import org.openide.windows.OutputEvent; import org.openide.windows.OutputListener; import org.openide.windows.OutputWriter; import org.w3c.tidy.Tidy; @ActionID( category = "Tools", id = "org.jtidy.TidyAction") @ActionRegistration( displayName = "#CTL_TidyAction") @ActionReferences({ @ActionReference(path = "Loaders/text/html/Actions", position = 150), @ActionReference(path = "Editors/text/html/Popup", position = 750) }) @Messages("CTL_TidyAction=Run HTML Tidy") public final class TidyAction implements ActionListener { private final DataObject context; private final OutputWriter writer; private EditorCookie ec = null; public TidyAction(DataObject context) { this.context = context; ec = context.getLookup().lookup(org.openide.cookies.EditorCookie.class); InputOutput io = IOProvider.getDefault().getIO("HTML Tidy", false); io.select(); writer = io.getOut(); } @Override public void actionPerformed(ActionEvent ev) { Tidy tidy = new Tidy(); try { writer.reset(); StringWriter stringWriter = new StringWriter(); PrintWriter errorWriter = new PrintWriter(stringWriter); tidy.setErrout(errorWriter); tidy.parse(context.getPrimaryFile().getInputStream(), System.out); String[] split = stringWriter.toString().split("\n"); for (final String string : split) { final int end = string.indexOf(" c"); if (string.startsWith("line")) { writer.println(string, new OutputListener() { @Override public void outputLineAction(OutputEvent oe) { LineCookie lc = context.getLookup().lookup(LineCookie.class); int lineNumber = Integer.parseInt(string.substring(0, end).replace("line ", "")); Line line = lc.getLineSet().getOriginal(lineNumber - 1); line.show(ShowOpenType.OPEN, Line.ShowVisibilityType.FOCUS); } @Override public void outputLineSelected(OutputEvent oe) {} @Override public void outputLineCleared(OutputEvent oe) {} }); } } } catch (IOException ex) { Exceptions.printStackTrace(ex); } } } The string parsing above is ugly but gets the job done for now. A problem integrating this into the pluggable analyzer functionality is the limitation of its scope. The analyzer lets you select one or more projects, or individual files, but not a folder. So it doesn't work on folders in the Favorites window, for example, which is where I'd like to apply HTML Tidy, across multiple folders via the analyzer functionality. That's a bit of a bummer that I'm hoping to get around somehow.

Read the article

Wireless Activity Monitoring for PCI DSS Compliance

- by dkusleika

In an effort to be PCI DSS compliant, I took a trustkeeper.net questionnaire. I failed the question that asks Is the presence of wireless access points tested for by using a wireless analyzer at least quarterly or by deploying a wireless IDS/IPS to identify all wireless devices in use? (SAQ #11.1) My only wireless access point is outside my firewall, so even if you cracked my wireless you couldn't get inside my domain (unless you crack that too). My firewall doesn't have IPS and I couldn't tell if it had IDS. I looked around for a wireless analyzer, but what I found was $500, which is a little pricey for my size business. And even if I got it, I'm not sure I would understand what it tells me. Surely there are smaller/less sophisticated businesses that take credit cards and have solved this. My questions are: What are the risks if someone were to crack my wireless? (Could they read all internet traffic? Just wireless traffic? Just use my internet connection?) And what is the best/cheapest way to test my connection point quarterly? Should I buy the $500 analyzer? Domain is Windows Server 2000. Firewall is Sonicwall Pro 2040. Router is 8 port D-link.

Read the article

Wildcard searching and highlighting with Solr 1.4

- by andy

Hey guys, I've got a pretty much vanilla install of SOLR 1.4 apart from a few small config and schema changes. <requestHandler name="standard" class="solr.SearchHandler" default="true">  <lst name="defaults"> <str name="defType">dismax</str> <str name="echoParams">explicit</str> <str name="qf"> text </str> <str name="spellcheck.dictionary">default</str> <str name="spellcheck.onlyMorePopular">false</str> <str name="spellcheck.extendedResults">false</str> <str name="spellcheck.count">1</str> </lst> </requestHandler> The main field type I'm using for Indexing is this: <fieldType name="textNoHTML" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <charFilter class="solr.HTMLStripCharFilterFactory" /> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> </analyzer> </fieldType> now, when I perform a search using "q=search+term&hl=on" I get highlighting, and nice accurate scores. BUT, for wildcard, I'm assuming you need to use "q.alt"? Is that true? If so my query looks like this: "q.alt=search*&hl=on" When I use the above query, highlighting doesn't work, and all the scores are "1.0". What am I doing wrong? is what I want possible without bypassing some of the really cool SOLR optimizations. cheers!

Read the article

Lucene: Wildcards are missing from index

- by Eleasar

Hi - i am building a search index that contains special names - containing ! and ? and & and + and ... I have to tread the following searches different: me & you me + you But whatever i do (did try with queryparser escaping before indexing, escaped it manually, tried different indexers...) - if i check the search index with Luke they do not show up (question marks and @-symbols and the like show up) The logic behind is that i am doing partial searches for a live suggestion (and the fields are not that large) so i split it up into "m" and "me" and "+" and "y" and "yo" and "you" and then index it (that way it is way faster than a wildcard query search (and the index size is not a big problem). So what i would need is to also have this special wildcard characters be inserted into the index. This is my code: using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Text; using Lucene.Net.Analysis; using Lucene.Net.Util; namespace AnalyzerSpike { public class CustomAnalyzer : Analyzer { public override TokenStream TokenStream(string fieldName, TextReader reader) { return new ASCIIFoldingFilter(new LowerCaseFilter(new CustomCharTokenizer(reader))); } } public class CustomCharTokenizer : CharTokenizer { public CustomCharTokenizer(TextReader input) : base(input) { } public CustomCharTokenizer(AttributeSource source, TextReader input) : base(source, input) { } public CustomCharTokenizer(AttributeFactory factory, TextReader input) : base(factory, input) { } protected override bool IsTokenChar(char c) { return c != ' '; } } } The code to create the index: private void InitIndex(string path, Analyzer analyzer) { var writer = new IndexWriter(path, analyzer, true); //some multiline textbox that contains one item per line: var all = new List<string>(txtAllAvailable.Text.Replace("\r","").Split('\n')); foreach (var item in all) { writer.AddDocument(GetDocument(item)); } writer.Optimize(); writer.Close(); } private static Document GetDocument(string name) { var doc = new Document(); doc.Add(new Field( "name", DeNormalizeName(name), Field.Store.YES, Field.Index.ANALYZED)); doc.Add(new Field( "raw_name", name, Field.Store.YES, Field.Index.NOT_ANALYZED)); return doc; } (Code is with Lucene.net in version 1.9.x (EDIT: sorry - was 2.9.x) but is compatible with Lucene from Java) Thx

Read the article

How can I effectively test a scripting engine?

- by ChaosPandion

I have been working on an ECMAScript implementation and I am currently working on polishing up the project. As a part of this, I have been writing tests like the following: [TestMethod] public void ArrayReduceTest() { var engine = new Engine(); var request = new ExecScriptRequest(@" var a = [1, 2, 3, 4, 5]; a.reduce(function(p, c, i, o) { return p + c; }); "); var response = (ExecScriptResponse)engine.PostWithReply(request); Assert.AreEqual((double)response.Data, 15D); } The problem is that there are so many points of failure in this test and similar tests that it almost doesn't seem worth it. It almost seems like my effort would be better spent reducing coupling between modules. To write a true unit test I would have to assume something like this: [TestMethod] public void CommentTest() { const string toParse = "/*First Line\r\nSecond Line*/"; var analyzer = new LexicalAnalyzer(toParse); { Assert.IsInstanceOfType(analyzer.Next(), typeof(MultiLineComment)); Assert.AreEqual(analyzer.Current.Value, "First Line\r\nSecond Line"); } } Doing this would require me to write thousands of tests which once again does not seem worth it.

Read the article

Algorithm for voice comparison

- by Horace Ho

Given two recorded voices in digital format, is there an algorithm to compare the two and return a coefficient of similarity?

Read the article

Lucene and Special Characters

- by Brandon

I am using Lucene.Net 2.0 to index some fields from a database table. One of the fields is a 'Name' field which allows special characters. When I perform a search, it does not find my document that contains a term with special characters. I index my field as such: Directory DALDirectory = FSDirectory.GetDirectory(@"C:\Indexes\Name", false); Analyzer analyzer = new StandardAnalyzer(); IndexWriter indexWriter = new IndexWriter(DALDirectory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED); Document doc = new Document(); doc.Add(new Field("Name", "Test (Test)", Field.Store.YES, Field.Index.TOKENIZED)); indexWriter.AddDocument(doc); indexWriter.Optimize(); indexWriter.Close(); And I search doing the following: value = value.Trim().ToLower(); value = QueryParser.Escape(value); Query searchQuery = new TermQuery(new Term(field, value)); Searcher searcher = new IndexSearcher(DALDirectory); TopDocCollector collector = new TopDocCollector(searcher.MaxDoc()); searcher.Search(searchQuery, collector); ScoreDoc[] hits = collector.TopDocs().scoreDocs; If I perform a search for field as 'Name' and value as 'Test', it finds the document. If I perform the same search as 'Name' and value as 'Test (Test)', then it does not find the document. Even more strange, if I remove the QueryParser.Escape line do a search for a GUID (which, of course, contains hyphens) it finds documents where the GUID value matches, but performing the same search with the value as 'Test (Test)' still yields no results. I am unsure what I am doing wrong. I am using the QueryParser.Escape method to escape the special characters and am storing the field and searching by the Lucene.Net's examples. Any thoughts?

Read the article

Parsing Chunk of Data into Hash of Array With Perl

- by neversaint

I have data that looks like this: #info #info2 1:SRX004541 Submitter: UT-MGS, UT-MGS Study: Glossina morsitans transcript sequencing project(SRP000741) Sample: Glossina morsitans(SRS002835) Instrument: Illumina Genome Analyzer Total: 1 run, 8.3M spots, 299.9M bases Run #1: SRR016086, 8330172 spots, 299886192 bases 2:SRX004540 Submitter: UT-MGS Study: Anopheles stephensi transcript sequencing project(SRP000747) Sample: Anopheles stephensi(SRS002864) Instrument: Solexa 1G Genome Analyzer Total: 1 run, 8.4M spots, 401M bases Run #1: SRR017875, 8354743 spots, 401027664 bases 3:SRX002521 Submitter: UT-MGS Study: Massive transcriptional start site mapping of human cells under hypoxic conditions.(SRP000403) Sample: Human DLD-1 tissue culture cell line(SRS001843) Instrument: Solexa 1G Genome Analyzer Total: 6 runs, 27.1M spots, 977M bases Run #1: SRR013356, 4801519 spots, 172854684 bases Run #2: SRR013357, 3603355 spots, 129720780 bases Run #3: SRR013358, 3459692 spots, 124548912 bases Run #4: SRR013360, 5219342 spots, 187896312 bases Run #5: SRR013361, 5140152 spots, 185045472 bases Run #6: SRR013370, 4916054 spots, 176977944 bases What I want to do is to create a hash of array with first line of each chunk as keys and SR## part of lines with "^Run" as its array member: $VAR = { 'SRX004541' => ['SRR016086'], # etc } But why my construct doesn't work. And it must be a better way to do it. use Data::Dumper; my %bighash; my $head = ""; my @temp = (); while ( <> ) { chomp; next if (/^\#/); if ( /^\d{1,2}:(\w+)/ ) { print "$1\n"; $head = $1; } elsif (/^Run \#\d+: (\w+),.*/){ print "\t$1\n"; push @temp, $1; } elsif (/^$/) { push @{$bighash{$head}}, [@temp]; @temp =(); } } print Dumper \%bighash ;

Read the article

how to lucene serch in android

- by xyz Sad

Lucen with android logic ..??? public class TestAndroidLuceneActivity extends Activity { @Override public void onCreate(Bundle icicle) { super.onCreate(icicle); setContentView(R.layout.main); try { Directory directory = new RAMDirectory(); Analyzer analyzer = new StandardAnalyzer(); Document doc = new Document(); doc.add(new Field("header", "ABC", Field.Store.YES,Field.Index.TOKENIZED)); indexWriter.addDocument(doc); doc.add(new Field("header", "DEF", Field.Store.YES,Field.Index.TOKENIZED)); indexWriter.addDocument(doc); doc.add(new Field("header", "GHI", Field.Store.YES,Field.Index.TOKENIZED)); indexWriter.addDocument(doc); doc.add(new Field("header", "JKL", Field.Store.YES,Field.Index.TOKENIZED)); indexWriter.addDocument(doc); indexWriter.optimize(); indexWriter.close(); IndexSearcher indexSearcher = new IndexSearcher(directory); QueryParser parser = new QueryParser("header", analyzer); // Query query = parser.parse("(" + "Anil" + ")"); Query query = parser.parse("(" + "ABC" + ")"); Hits hits = indexSearcher.search(query); for (int i = 0; i < hits.length(); i++) { Document hitDoc = hits.doc(i); Log.i("TestAndroidLuceneActivity", "Lucene: " +hitDoc.get("header")); // Toast.makeText(this, hitDoc.get("header"),Toast.LENGTH_LONG).show(); } indexSearcher.close(); directory.close(); } catch (Exception ex) { System.out.println(ex.getMessage()); } } } i have this code but i m not able to understnd plz send me related or modifed and set it main.xml show me some out put plzz..its does not serch after "ABC" plz tell me wat is the problem in logic any thing missing???..

Read the article

Approaches for Content-based Item Recommendations

- by PartlyCloudy

Hello, I'm currently developing an application where I want to group similar items. Items (like videos) can be created by users and also their attributes can be altered or extended later (like new tags). Instead of relying on users' preferences as most collaborative filtering mechanisms do, I want to compare item similarity based on the items' attributes (like similar length, similar colors, similar set of tags, etc.). The computation is necessary for two main purposes: Suggesting x similar items for a given item and for clustering into groups of similar items. My application so far is follows an asynchronous design and I want to decouple this clustering component as far as possible. The creation of new items or the addition of new attributes for an existing item will be advertised by publishing events the component can then consume. Computations can be provided best-effort and "snapshotted", which means that I'm okay with the best result possible at a given point in time, although result quality will eventually increase. So I am now searching for appropriate algorithms to compute both similar items and clusters. At important constraint is scalability. Initially the application has to handle a few thousand items, but later million items might be possible as well. Of course, computations will then be executed on additional nodes, but the algorithm itself should scale. It would also be nice if the algorithm supports some kind of incremental mode on partial changes of the data. My initial thought of comparing each item with each other and storing the numerical similarity sounds a little bit crude. Also, it requires n*(n-1)/2 entries for storing all similarities and any change or new item will eventually cause n similarity computations. Thanks in advance! UPDATE tl;dr To clarify what I want, here is my targeted scenario: User generate entries (think of documents) User edit entry meta data (think of tags) And here is what my system should provide: List of similar entries to a given item as recommendation Clusters of similar entries Both calculations should be based on: The meta data/attributes of entries (i.e. usage of similar tags) Thus, the distance of two entries using appropriate metrics NOT based on user votings, preferences or actions (unlike collaborative filtering). Although users may create entries and change attributes, the computation should only take into account the items and their attributes, and not the users associated with (just like a system where only items and no users exist). Ideally, the algorithm should support: permanent changes of attributes of an entry incrementally compute similar entries/clusters on changes scale something better than a simple distance table, if possible (because of the O(n²) space complexity)

Read the article

How can I read and parse chunks of data into a Perl hash of arrays?

- by neversaint

I have data that looks like this: #info #info2 1:SRX004541 Submitter: UT-MGS, UT-MGS Study: Glossina morsitans transcript sequencing project(SRP000741) Sample: Glossina morsitans(SRS002835) Instrument: Illumina Genome Analyzer Total: 1 run, 8.3M spots, 299.9M bases Run #1: SRR016086, 8330172 spots, 299886192 bases 2:SRX004540 Submitter: UT-MGS Study: Anopheles stephensi transcript sequencing project(SRP000747) Sample: Anopheles stephensi(SRS002864) Instrument: Solexa 1G Genome Analyzer Total: 1 run, 8.4M spots, 401M bases Run #1: SRR017875, 8354743 spots, 401027664 bases 3:SRX002521 Submitter: UT-MGS Study: Massive transcriptional start site mapping of human cells under hypoxic conditions.(SRP000403) Sample: Human DLD-1 tissue culture cell line(SRS001843) Instrument: Solexa 1G Genome Analyzer Total: 6 runs, 27.1M spots, 977M bases Run #1: SRR013356, 4801519 spots, 172854684 bases Run #2: SRR013357, 3603355 spots, 129720780 bases Run #3: SRR013358, 3459692 spots, 124548912 bases Run #4: SRR013360, 5219342 spots, 187896312 bases Run #5: SRR013361, 5140152 spots, 185045472 bases Run #6: SRR013370, 4916054 spots, 176977944 bases What I want to do is to create a hash of array with first line of each chunk as keys and SR## part of lines with "^Run" as its array member: $VAR = { 'SRX004541' => ['SRR016086'], # etc } But why my construct doesn't work. And it must be a better way to do it. use Data::Dumper; my %bighash; my $head = ""; my @temp = (); while ( <> ) { chomp; next if (/^\#/); if ( /^\d{1,2}:(\w+)/ ) { print "$1\n"; $head = $1; } elsif (/^Run \#\d+: (\w+),.*/){ print "\t$1\n"; push @temp, $1; } elsif (/^$/) { push @{$bighash{$head}}, [@temp]; @temp =(); } } print Dumper \%bighash ;

Read the article

Real Excel Templates I

- by Tim Dexter

As promised, I'm starting to document the new Excel templates that I teased you all with a few weeks back. Leslie is buried in 11g documentation and will not get to officially documenting the templates for a while. I'll do my best to be professional and not ramble on about this and that, although the weather here has finally turned and its 'scorchio' here in Colorado today. Maybe our stand of Aspen will finally come into leaf ... but I digress. Preamble These templates are not actually that new, I helped in a small way to develop them a few years back with Excel 'meistress' Shirley for a company that was trying to use the Report Manager(RR) Excel FSG outputs under EBS 12. The functionality they needed was just not there in the RR FSG templates, the templates are actually XSL that is created from the the RR Excel template builder and fed to BIP for processing. Think of Excel from our RTF templates and you'll be there ie not really Excel but HTML masquerading as Excel. Although still under controlled release in EBS they have now made their way to the standlone release and are willing to share their Excel goodness. You get everything you have with hte Excel Analyzer Excel templates plus so much more. Therein lies a question, what will happen to the Analyzer templates? My understanding is that both will come together into a single Excel template format some time in the post-11g release world. The new XLSX format for Exce 2007/10 is also in the mix too so watch this space. What more do these templates offer? Well, you can structure data in the Excel output. Similar to RTF templates you can create sheets of data that have master-detail n relationships. Although the analyzer templates can do this, you have to get into macros whereas BIP will do this all for you. You can also use native XSL functions in your data to manipulate it prior to rendering. BP functions are not currently supported. The most impressive, for me at least, is the sheet 'bursting'. You can split your hierarchical data across multiple sheets and dynamically name those sheets. Finally, you of course, still get all the native Excel functionality. Pre-reqs You must be on 10.1.3.4.1 plus the latest rollup patch, 9546699. You can patch upa BIP instance running with OBIEE, no problem You need Excel 2000 or above to build the templates Some patience - there is no Excel template builder for these new templates. So its all going to have to be done by hand. Its not that tough but can get a little 'fiddly'. You can not test the template from Excel , it has to be deployed and then run. Limitations The new templates are definitely superior to the Analyzer templates but there are a few limitations. Re-grouping is not supported. You can only follow a data hierarchy not bend it to your will unless you want to get into macros. No support for BIP functions. The templates support native XSL functions only. No template builder Getting Started The templates make the use of named cells and groups of cells to allow BIP to find the insertion point for data points. It also uses a hidden sheet to store calculation mappings from named cells to XML data elements. To start with, in the great BIP tradition, we need some sample XML data. Becasue I wanted to show the master-detail output we need some hierarchical data. If you have not yet gotten into the data templates, now is a good time, I wrote a post a while back starting from the simple to more complex. They generate ideal data sets for these templates. Im working with the following data set: <EMPLOYEES> <LIST_G_DEPT> <G_DEPT> <DEPARTMENT_ID>10</DEPARTMENT_ID> <DEPARTMENT_NAME>Administration</DEPARTMENT_NAME> <LIST_G_EMP> <G_EMP> <EMPLOYEE_ID>200</EMPLOYEE_ID> <EMP_NAME>Jennifer Whalen</EMP_NAME> <EMAIL>JWHALEN</EMAIL> <PHONE_NUMBER>515.123.4444</PHONE_NUMBER> <HIRE_DATE>1987-09-17T00:00:00.000-06:00</HIRE_DATE> <SALARY>4400</SALARY> </G_EMP> </LIST_G_EMP> <TOTAL_EMPS>1</TOTAL_EMPS> <TOTAL_SALARY>4400</TOTAL_SALARY> <AVG_SALARY>4400</AVG_SALARY> <MAX_SALARY>4400</MAX_SALARY> <MIN_SALARY>4400</MIN_SALARY> </G_DEPT> ... <LIST_G_DEPT> <EMPLOYEES> Simple enough to follow and bread and butter stuff for an RTF template. Building the Template For an Excel template we need to start by thinking about how we want to render the data. Come up with a sample output in Excel. Its all dummy data, nothing marked up yet with one row of data for each level. I have the department name and then a repeating row for the employees. You can apply Excel formatting to the layout. The total is going to be derived from a data element. We'll get to Excel functions later. Marking Up Cells Next we need to start marking up the cells with custom names to map them to data elements. The cell names need to follow a specific format: For data grouping, XDO_GROUP_?group_name? For data elements, XDO_?element_name? Notice the question mark delimter, the group_name and element_name are case sensitive. The next step is to find how to name cells; the easiest method is to highlight the cell and then type in the name. You can also find the Name Manager dialog. I use 2007 and its available on the ribbon under the Formulas section Go thorugh the process of naming all the cells for the element values you have. Using my data set from above.You should end up with something like this in your 'Name Manager' dialog. You can update any mistakes you might have made through this dialog. Creating Groups In the image above you can see there are a couple of named group cells. To create these its a simple case of highlighting the cells that make up the group and then naming them. For the EMP group, highlight the employee row and then type in the name, XDO_GROUP?G_EMP? Notice the 10,000 total is outside of the G_EMP group. Its actually named, XDO_?TOTAL_SALARY?, a query calculated value. For the department group, we need to include the department name cell and the sub EMP grouping and name it, XDO_GROUP?G_DEPT? Notice, the 10,000 total is included in the G_DEPT group. This will ensure it repeats at the department level. Lastly, we do need to include a special sheet in the workbook. We will not have anything meaningful in there for now, but it needs to be present. Create a new sheet and name it XDO_METADATA. The name is important as the BIP rendering engine will looking for it. For our current example we do not need anything other than the required stuff in our XDO_METADATA sheet but, it must be present. Easy enough to hide it. Here's what I have: The only cell that is important is the 'Data Constraints:' cell. The rest is optional. To save curious users getting distracted, hide the metadata sheet. Deploying & Running Templates We should now have a usable Excel template. Loading it into a report is easy enough using the browser UI, just like an RTF template. Set the template type to Excel. You will now be able to run the report and hopefully get something like this. You will not get the red highlighting, thats just some conditional formatting I added to the template using Excel functionality. Your dates are probably going to look raw too. I got around this for now using an Excel function on the cell: =--REPLACE(SUBSTITUTE(E8,"T"," "),LEN(E8)-6,6,"") Google to the rescue on that one. Try some other stuff out. To avoid constantly loading the template through the UI. If you have BIP running locally or you can access the reports repository, once you have loaded the template the first time. Just save the template directly into the report folder. I have put together a sample report using a sample data set, available here. Just drop the xml data file, EmpbyDeptExcelData.xml into 'demo files' folder and you should be good to go. Thats the basics, next we'll start using some XSL functions in the template and move onto the 'bursting' across sheets.

Read the article

SQL SERVER – Simple Explanation and Puzzle with SOUNDEX Function and DIFFERENCE Function

- by pinaldave

Earlier this week I asked a question where I asked how to Swap Values of the column without using CASE Statement. Read here: A Puzzle – Swap Value of Column Without Case Statement,there were more than 50 solutions proposed in the comment. There were many creative solutions. I have mentioned my personal favorite (different ones) here: Solution of Puzzle – Swap Value of Column Without Case Statement. However, I received lots of questions regarding one of the Solution by SIJIN KUMAR V P. He has used the function SOUNDEX in his solution. The request was to explain how SOUNDEX and DIFFERENCE works. Well, there are pretty decent documentations provided over here SOUNDEX function and DIFFERENCE over on MSDN and if I attempt to explain this function I will end up writing the same details which are available on MSDN. Instead of writing theory, we will try to learn this function by using a couple of simple puzzles. You try to solve the puzzles using the MSDN and see if you can learn something very quickly. In simple words - SOUNDEX converts an alphanumeric string to a four-character code to find similar-sounding words or names. The first character of the code is the first character of character_expression and the second through fourth characters of the code are numbers that represent the letters in the expression. Vowels incharacter_expression are ignored unless they are the first letter of the string. DIFFERENCE function returns an integer value. The integer returned is the number of characters in the SOUNDEX values that are the same. The return value ranges from 0 through 4: 0 indicates weak or no similarity, and 4 indicates strong similarity or the same values. Learning Puzzle 1: Now let us run following four queries and observe its output. SELECT SOUNDEX('SQLAuthority') SdxValue SELECT SOUNDEX('SLTR') SdxValue SELECT SOUNDEX('SaLaTaRa') SdxValue SELECT SOUNDEX('SaLaTaRaM') SdxValue When you look at the result set all the four values are same. The reason for all the values to be same is as for SQL Server SOUNDEX function all the four strings are similarly sounding string. Learning Puzzle 2: Now let us run following five queries and observe its output. SELECT DIFFERENCE (SOUNDEX('SLTR'),SOUNDEX('SQLAuthority')) SELECT DIFFERENCE (SOUNDEX('TH'),SOUNDEX('SQLAuthority')) SELECT DIFFERENCE ('SQLAuthority',SOUNDEX('SQLAuthority')) SELECT DIFFERENCE ('SLTR',SOUNDEX('SQLAuthority')) SELECT DIFFERENCE ('SLTR','SQLAuthority') When you look at the result set you will get the result in the ranges from 1 to 4. Here is how it works if your result is 0 which means absolutely not relevant to each other and if your result is 1 which means the results are relevant to each other. Have you ever used above two functions in your business need or on production server? If yes, would you please leave a comment with use cases. I believe it will be beneficial to everyone. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Puzzle, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article

Unit testing statically typed functional code

- by back2dos

I wanted to ask you people, in which cases it makes sense to unit test statically typed functional code, as written in haskell, scala, ocaml, nemerle, f# or haXe (the last is what I am really interested in, but I wanted to tap into the knowledge of the bigger communities). I ask this because from my understanding: One aspect of unit tests is to have the specs in runnable form. However when employing a declarative style, that directly maps the formalized specs to language semantics, is it even actually possible to express the specs in runnable form in a separate way, that adds value? The more obvious aspect of unit tests is to track down errors that cannot be revealed through static analysis. Given that type safe functional code is a good tool to code extremely close to what your static analyzer understands. However a simple mistake like using x instead of y (both being coordinates) in your code cannot be covered. However such a mistake could also arise while writing the test code, so I am not sure whether its worth the effort. Unit tests do introduce redundancy, which means that when requirements change, the code implementing them and the tests covering this code must both be changed. This overhead of course is about constant, so one could argue, that it doesn't really matter. In fact, in languages like Ruby it really doesn't compared to the benefits, but given how statically typed functional programming covers a lot of the ground unit tests are intended for, it feels like it's a constant overhead one can simply reduce without penalty. From this I'd deduce that unit tests are somewhat obsolete in this programming style. Of course such a claim can only lead to religious wars, so let me boil this down to a simple question: When you use such a programming style, to which extents do you use unit tests and why (what quality is it you hope to gain for your code)? Or the other way round: do you have criteria by which you can qualify a unit of statically typed functional code as covered by the static analyzer and hence needs no unit test coverage?

Read the article

Ignore "Bad: new and old password are too similar"

- by user999

I receive this message when trying to change my password: "Bad: new and old password are too similar" The passwords' "similarity" is irrelevant for my needs, so I'd like to bypass this. I tried sudo passwd $my_username I thought this had worked because I got a message: passwd: password updated successfully However, the password change has no effect after leaving the terminal, and my old password is still the only one recognized. Any ideas? thanks

Read the article

How are Implicit-Heap dynamic Storage Binding and Dynamic type binding similar?

- by Appy

"Concepts of Programming languages" by Robert Sebesta says - Implicit Heap-Dynamic Storage Binding: Implicit Heap-Dynamic variables are bound to heap storage only when they are assigned values. It is similar to dynamic type binding. Can anyone explain the similarity with suitable examples. I understand the meaning of both the phrases, but I am an amateur when it comes to in-depth details.

Read the article

Advisor Webcasts in July for the EBS Technology area

- by Oracle_EBS

For July 2012 we have scheduled 2 Webcasts: The first one is an E-Business Suite OAM Overview and Usage session. The second is about the E-Business Suite Workflow Avisor as a follow-up session. As every time we are driving 2 sessions for a better global alignment : E-Business Suite - OAM Overview and Monitoring Agenda Oracle Applications Manager (OAM) Overview Log files Diagnostics and Logging Concurrent processing through OAM Applications Dashboard Troubleshooting Patch Management. Patch Wizard OAM "How To" Documents Questions &Answers EMEA Session : July 10, 2012 at 09:00 AM UK / 10:00 AM CET / 13:30 India / 17:00 Japan / 18:00 Australia Details & Registration : Note 1466056.1 Direct link to register in WebEx US Session : July 11, 2012 at 18:00 UK / 19:00 CET / 10:00 AM Pacific / 11:00 AM Mountain/ 01:00 PM Eastern Details & Registration : Note 1466057.1 Direct link to register in WebEx E-Business Suite - Workflow Analyzer - Follow-Up Agenda Overview of Workflow Analyzer Enhancements implemented in the latest Release Questions & Answers EMEA Session : July 24, 2012 at 09:00 AM UK / 10:00 AM CET / 13:30 India / 17:00 Japan / 18:00 Australia Details & Registration : Note 1466058.1 Direct link to register in WebEx US Session : July 25, 2012 at 18:00 UK / 19:00 CET / 10:00 AM Pacific / 11:00 AM Mountain/ 01:00 PM Eastern Details & Registration : Note 1466059.1 Direct link to register in WebEx Schedules, recordings and the Presentations of the Advisor Webcast drove under the EBS Applications Technology area can be found in Note 1186338.1. Current Schedules of Advisor Webcast for all Oracle Products can be found on Note 740966.1 Post Presentation Recordings of the Advisor Webcasts for all Oracle Products can be found on Note 740964.1

Read the article

Upcoming EBS Webcasts for June, July, August 2012

- by user793553

See the following upcoming webcasts for June, July and August 2012. Flag Doc ID 740966.1 as a favourite, to keep up to date with latest advisor schedule. Additionally, see Doc ID 740964.1 for access to all archived advisor webcasts Oracle E-Business Suite Oracle E-Business Suite Title Date Summary None at this time. EBS Agile Title Date Summary None at this time. EBS Applications Technologies Group (ATG) Title Date Summary EBS – OAM Tuning and Monitoring EMEA July 10, 2012 Abstract EBS – OAM Tuning and Monitoring US July 11, 2012 Abstract Workflow Analyzer Followup EMEA July 24, 2012 Abstract Workflow Analyzer Followup US July 25, 2012 Abstract EBS CRM & Industries Title Date Summary None at this time. EBS Financials Title Date Summary EBS Fixed Assets: Achieve Success Using Proactive Tools For Fixed Assets Support July 10, 2012 Abstract Overview and Flow of Oracle Project Resource Management July 17, 2012 Abstract Leveraging My Oracle Support To Increase Knowledge July 30, 2012 Abstract EBS HCM (HRMS) Title Date Summary Oracle Time and Labor (OTL) Rollback Functionality Session 1 July 25, 2012 Abstract Oracle Time and Labor (OTL) Rollback Functionality Session 2 July 25, 2012 Abstract EBS Manufacturing Title Date Summary Using Personalization in Oracle eAM June 21, 2012 Abstract OM Guided Resolutions - Finding Known Resolutions Easily July 17, 2012 Abstract Material Move Orders Flow July 25, 2012 Abstract Diagnosing Signal 11 Issues In ASCP Planning August 9, 2012 Abstract Interface Trip Stop - Best Practices and Debugging August 21, 2012 Abstract EBS Procurement Title Date Summary Punchout in iProcurement June 26, 2012 Abstract

Search Results

Search found 607 results on 25 pages for 'similarity analyzer'.

Page 8/25 | < Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15 | Next Page >

- by neversaint

- by Siddhant

- by Travis Leleu

- by jamiet

- by tedtoy

- by cwarticki

- by Larry Wake

- by Geertjan

- by dkusleika

- by andy

- by Eleasar

- by ChaosPandion

- by Horace Ho

- by Brandon

- by neversaint

- by xyz Sad

- by PartlyCloudy

- by neversaint

- by Tim Dexter

- by pinaldave

- by back2dos

- by user999

- by Appy

- by Oracle_EBS

- by user793553

< Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15 | Next Page >