Search Results

Search found 9016 results on 361 pages for 'regex libraries'.

Page 71/361 | < Previous Page | 67 68 69 70 71 72 73 74 75 76 77 78  | Next Page >

  • Ruby libraries for parsing .doc files?

    - by Platinum Azure
    Hi all, I was just wondering if anyone knew of any good libraries for parsing .doc files (and similar formats, like .odt) to extract text, yet also keep formatting information where possible for display on a website. Capability of doing similarly for PDFs would be a bonus, but I'm not looking as much for that. This is for a Rails project, if that helps at all. Thanks in advance!

    Read the article

  • c# Truncate HTML safely for article summary

    - by WickedW
    Hi All, Does anyone have a c# variation of this? This is so I can take some html and display it without breaking as a summary lead in to an article? http://stackoverflow.com/questions/1193500/php-truncate-html-ignoring-tags Save me from reinventing the wheel! Thank you very much ---------- edit ------------------ Sorry, new here, and your right, should have phrased the question better, heres a bit more info I wish to take a html string and truncate it to a set number of words (or even char length) so I can then show the start of it as a summary (which then leads to the main article). I wish to preserve the html so I can show the links etc in preview. The main issue I have to solve is the fact that we may well end up with unclosed html tags if we truncate in the middle of 1 or more tags! The idea I have for solution is to a) truncate the html to N words (words better but chars ok) first (be sure not to stop in the middle of a tag and truncate a require attribute) b) work through the opened html tags in this truncated string (maybe stick them on stack as I go?) c) then work through the closing tags and ensure they match the ones on stack as I pop them off? d) if any open tags left on stack after this, then write them to end of truncated string and html should be good to go!!!! -- edit 12112009 Here is what I have bumbled together so far as a unittest file in VS2008, this 'may' help someone in future My hack attempts based on Jan code are at top for char version + word version (DISCLAIMER: this is dirty rough code!! on my part) I assume working with 'well-formed' HTML in all cases (but not necessarily a full document with a root node as per XML version) Abels XML version is at bottom, but not yet got round to fully getting tests to run on this yet (plus need to understand the code) ... I will update when I get chance to refine having trouble with posting code? is there no upload facility on stack? Thanks for all comments :) using System; using System.Collections.Generic; using System.Text.RegularExpressions; using System.Xml; using System.Xml.XPath; using Microsoft.VisualStudio.TestTools.UnitTesting; namespace PINET40TestProject { [TestClass] public class UtilityUnitTest { public static string TruncateHTMLSafeishChar(string text, int charCount) { bool inTag = false; int cntr = 0; int cntrContent = 0; // loop through html, counting only viewable content foreach (Char c in text) { if (cntrContent == charCount) break; cntr++; if (c == '<') { inTag = true; continue; } if (c == '>') { inTag = false; continue; } if (!inTag) cntrContent++; } string substr = text.Substring(0, cntr); //search for nonclosed tags MatchCollection openedTags = new Regex("<[^/](.|\n)*?>").Matches(substr); MatchCollection closedTags = new Regex("<[/](.|\n)*?>").Matches(substr); // create stack Stack<string> opentagsStack = new Stack<string>(); Stack<string> closedtagsStack = new Stack<string>(); // to be honest, this seemed like a good idea then I got lost along the way // so logic is probably hanging by a thread!! foreach (Match tag in openedTags) { string openedtag = tag.Value.Substring(1, tag.Value.Length - 2); // strip any attributes, sure we can use regex for this! if (openedtag.IndexOf(" ") >= 0) { openedtag = openedtag.Substring(0, openedtag.IndexOf(" ")); } // ignore brs as self-closed if (openedtag.Trim() != "br") { opentagsStack.Push(openedtag); } } foreach (Match tag in closedTags) { string closedtag = tag.Value.Substring(2, tag.Value.Length - 3); closedtagsStack.Push(closedtag); } if (closedtagsStack.Count < opentagsStack.Count) { while (opentagsStack.Count > 0) { string tagstr = opentagsStack.Pop(); if (closedtagsStack.Count == 0 || tagstr != closedtagsStack.Peek()) { substr += "</" + tagstr + ">"; } else { closedtagsStack.Pop(); } } } return substr; } public static string TruncateHTMLSafeishWord(string text, int wordCount) { bool inTag = false; int cntr = 0; int cntrWords = 0; Char lastc = ' '; // loop through html, counting only viewable content foreach (Char c in text) { if (cntrWords == wordCount) break; cntr++; if (c == '<') { inTag = true; continue; } if (c == '>') { inTag = false; continue; } if (!inTag) { // do not count double spaces, and a space not in a tag counts as a word if (c == 32 && lastc != 32) cntrWords++; } } string substr = text.Substring(0, cntr) + " ..."; //search for nonclosed tags MatchCollection openedTags = new Regex("<[^/](.|\n)*?>").Matches(substr); MatchCollection closedTags = new Regex("<[/](.|\n)*?>").Matches(substr); // create stack Stack<string> opentagsStack = new Stack<string>(); Stack<string> closedtagsStack = new Stack<string>(); foreach (Match tag in openedTags) { string openedtag = tag.Value.Substring(1, tag.Value.Length - 2); // strip any attributes, sure we can use regex for this! if (openedtag.IndexOf(" ") >= 0) { openedtag = openedtag.Substring(0, openedtag.IndexOf(" ")); } // ignore brs as self-closed if (openedtag.Trim() != "br") { opentagsStack.Push(openedtag); } } foreach (Match tag in closedTags) { string closedtag = tag.Value.Substring(2, tag.Value.Length - 3); closedtagsStack.Push(closedtag); } if (closedtagsStack.Count < opentagsStack.Count) { while (opentagsStack.Count > 0) { string tagstr = opentagsStack.Pop(); if (closedtagsStack.Count == 0 || tagstr != closedtagsStack.Peek()) { substr += "</" + tagstr + ">"; } else { closedtagsStack.Pop(); } } } return substr; } public static string TruncateHTMLSafeishCharXML(string text, int charCount) { // your data, probably comes from somewhere, or as params to a methodint XmlDocument xml = new XmlDocument(); xml.LoadXml(text); // create a navigator, this is our primary tool XPathNavigator navigator = xml.CreateNavigator(); XPathNavigator breakPoint = null; // find the text node we need: while (navigator.MoveToFollowing(XPathNodeType.Text)) { string lastText = navigator.Value.Substring(0, Math.Min(charCount, navigator.Value.Length)); charCount -= navigator.Value.Length; if (charCount <= 0) { // truncate the last text. Here goes your "search word boundary" code: navigator.SetValue(lastText); breakPoint = navigator.Clone(); break; } } // first remove text nodes, because Microsoft unfortunately merges them without asking while (navigator.MoveToFollowing(XPathNodeType.Text)) { if (navigator.ComparePosition(breakPoint) == XmlNodeOrder.After) { navigator.DeleteSelf(); } } // moves to parent, then move the rest navigator.MoveTo(breakPoint); while (navigator.MoveToFollowing(XPathNodeType.Element)) { if (navigator.ComparePosition(breakPoint) == XmlNodeOrder.After) { navigator.DeleteSelf(); } } // moves to parent // then remove *all* empty nodes to clean up (not necessary): // TODO, add empty elements like <br />, <img /> as exclusion navigator.MoveToRoot(); while (navigator.MoveToFollowing(XPathNodeType.Element)) { while (!navigator.HasChildren && (navigator.Value ?? "").Trim() == "") { navigator.DeleteSelf(); } } // moves to parent navigator.MoveToRoot(); return navigator.InnerXml; } [TestMethod] public void TestTruncateHTMLSafeish() { // Case where we just make it to start of HREF (so effectively an empty link) // 'simple' nested none attributed tags Assert.AreEqual(@"<h1>1234</h1><b><i>56789</i>012</b>", TruncateHTMLSafeishChar( @"<h1>1234</h1><b><i>56789</i>012345</b>", 12)); // In middle of a! Assert.AreEqual(@"<h1>1234</h1><a href=""testurl""><b>567</b></a>", TruncateHTMLSafeishChar( @"<h1>1234</h1><a href=""testurl""><b>5678</b></a><i><strong>some italic nested in string</strong></i>", 7)); // more Assert.AreEqual(@"<div><b><i><strong>1</strong></i></b></div>", TruncateHTMLSafeishChar( @"<div><b><i><strong>12</strong></i></b></div>", 1)); // br Assert.AreEqual(@"<h1>1 3 5</h1><br />6", TruncateHTMLSafeishChar( @"<h1>1 3 5</h1><br />678<br />", 6)); } [TestMethod] public void TestTruncateHTMLSafeishWord() { // zero case Assert.AreEqual(@" ...", TruncateHTMLSafeishWord( @"", 5)); // 'simple' nested none attributed tags Assert.AreEqual(@"<h1>one two <br /></h1><b><i>three ...</i></b>", TruncateHTMLSafeishWord( @"<h1>one two <br /></h1><b><i>three </i>four</b>", 3), "we have added ' ...' to end of summary"); // In middle of a! Assert.AreEqual(@"<h1>one two three </h1><a href=""testurl""><b class=""mrclass"">four ...</b></a>", TruncateHTMLSafeishWord( @"<h1>one two three </h1><a href=""testurl""><b class=""mrclass"">four five </b></a><i><strong>some italic nested in string</strong></i>", 4)); // start of h1 Assert.AreEqual(@"<h1>one two three ...</h1>", TruncateHTMLSafeishWord( @"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i>", 3)); // more than words available Assert.AreEqual(@"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i> ...", TruncateHTMLSafeishWord( @"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i>", 99)); } [TestMethod] public void TestTruncateHTMLSafeishWordXML() { // zero case Assert.AreEqual(@" ...", TruncateHTMLSafeishWord( @"", 5)); // 'simple' nested none attributed tags string output = TruncateHTMLSafeishCharXML( @"<body><h1>one two </h1><b><i>three </i>four</b></body>", 13); Assert.AreEqual(@"<body>\r\n <h1>one two </h1>\r\n <b>\r\n <i>three</i>\r\n </b>\r\n</body>", output, "XML version, no ... yet and addeds '\r\n + spaces?' to format document"); // In middle of a! Assert.AreEqual(@"<h1>one two three </h1><a href=""testurl""><b class=""mrclass"">four ...</b></a>", TruncateHTMLSafeishCharXML( @"<body><h1>one two three </h1><a href=""testurl""><b class=""mrclass"">four five </b></a><i><strong>some italic nested in string</strong></i></body>", 4)); // start of h1 Assert.AreEqual(@"<h1>one two three ...</h1>", TruncateHTMLSafeishCharXML( @"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i>", 3)); // more than words available Assert.AreEqual(@"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i> ...", TruncateHTMLSafeishCharXML( @"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i>", 99)); } } }

    Read the article

  • linear algebra libraries for clusters

    - by Abruzzo Forte e Gentile
    Hi all I need to develop applications doing linear algebra + eigenvalue + linear equation solutions over a cluster of pcs ( I have a lot of machines available ). I discovered Scalapack libraries but they seem to me developed long time ago. Do you know if these are other libs available that I should learn doing math & linear algebra in a cluster? My language is C++ and off course I am newbie to this topic. Kind Regards to everybody AFG

    Read the article

  • problem with linked libraries or classes??

    - by hemant
    i recently finished one project..now when i create a new navigation project in xcode and try to run it in simulator the application crashes and error in debugger window shows that i am missing some classes which i had used in my previous project(not in this one) and in some cases it gives Couldn't register com.yourcompany.GuessGame with the bootstrap server. Error: unknown error code. This generally means that another instance of this process was already running or is hung in the debugger. is this some problem related to linked libraries??

    Read the article

  • Any simple shape recognition libraries for Java?

    - by Phil
    I am working on a on-screen keyboard for Android, and I need to recognize starting points, turning points and end points of lines drawn by the user on the keyboard. A simple straightening function would be nice, as it is difficult to draw a perfectly straight line even with a stylus, not to mention finger-only touchscreens today. What I am trying to write is something like Swype. Any good libraries that I can use or make reference to?

    Read the article

  • Recommended Bean Utility Libraries for Java

    - by Jim Ferrans
    I'm looking for a good, well-supported, and efficient Java library that uses reflection to automate JavaBean operations. These include making a deep copy of an arbitrary bean hierarchy (with nested lists and maps of beans), comparing two bean hierarchies for deep equality, and "transmorphing" one bean to another of a different class. Some possibilities include Apache Commons BeanUtils, Spring's BeanUtils, and Java's Bean support. Which libraries would you recommend?

    Read the article

  • Pyjamas + Django: project without any external libraries

    - by gruszczy
    I would like to create small project using django and pyjamas. I tried googling for some solution on how to merge those two, but I found only projects using some external libraries using json services. Could anyone give me some advice on how to build such project so I wouldn't have to use them? I would like to use django auth system, but I don't know how to build it all without django templates and rendering.

    Read the article

  • Linking against multiple shared libraries that all linked against a common static library

    - by live2dream95
    Say you have 2 share libraries, lib1.so and lib2.so, that both have libcommon.a statically linked into them. Would the compiler complain about ambiguous symbol reference if you were to dynamically link both lib1.so and lib2.so? Or would be the compiler be smart enough to know libcommon symbols are shared between lib1 and lib2 and allow you to dynamically link against both?

    Read the article

  • Storing third-party libraries in source control

    - by graham.reeds
    Should libraries that the application relies on be stored in source control? One part of me says it should and another part say's no. It feels wrong to add a 20mb library that dwarfs the entire app just because you rely on a couple of functions from it (albeit rather heavily). Should you just store the jar/dll or maybe even the distributed zip/tar of the project? What do other people do?

    Read the article

  • compiling numpy with sunperf atlas libraries

    - by user288558
    I would like to use the sunperf libraries when compiling scipy and numpy. I tried using setupscons.py which seems to check from SUNPERF libraries, but it didnt recognize where mine are: here is a listing of /pkg/linux/SS12/sunstudio12.1 (thats where the sunperf library lives): wkerzend@mosura:/home/wkerzend>ls /pkg/linux/SS12/sunstudio12.1/lib/ CCios/ libdbx_agent.so@ libsunperf.so.3@ amd64/ libfcollector.so@ libtha.so@ collector.jar@ libfsu.so@ libtha.so.1@ dbxrc@ libfsu.so.1@ locale/ debugging.so@ libfui.so@ make.rules@ er.rc@ libfui.so.1@ rw7/ libblacs_openmpi.so@ librtc.so@ sse2/ libblacs_openmpi.so.1@ libscalapack.so@ stlport4/ libcollectorAPI.so@ libscalapack.so.1@ svr4.make.rules@ libcollectorAPI.so.1@ libsunperf.so@ tools_svc_mgr@ I tried to specify this directory in sites.cfg, but I still get the following errors: Checking if g77 needs dummy main - MAIN__. Checking g77 name mangling - '_', '', lower-case. Checking g77 C compatibility runtime ...-L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 - L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -L/usr/lib/gcc/x86_64-redhat- linux/3.4.6/../../../../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../.. -L/lib/../lib64 -L/usr/lib/../lib64 -lfrtbegin -lg2c -lm Checking MKL ... Failed (could not check header(s) : check config.log in build/scons/scipy/integrate for more details) Checking ATLAS ... Failed (could not check header(s) : check config.log in build/scons/scipy/integrate for more details) Checking SUNPERF ... Failed (could not check symbol cblas_sgemm : check config.log in build/scons/scipy/integrate for more details)) Checking Generic BLAS ... yes Checking for BLAS (Generic BLAS) ... Failed: BLAS (Generic BLAS) test could not be linked and run Exception: Could not find F77 BLAS, needed for integrate package: File "/priv/manana1/wkerzend/install_dir/scipy-0.7.1/scipy/integrate/SConstruct", line 2: GetInitEnvironment(ARGUMENTS).DistutilsSConscript('SConscript') File "/home/wkerzend/python_coala/numscons-0.10.1-py2.6.egg/numscons/core/numpyenv.py", line 108: build_dir = '$build_dir', src_dir = '$src_dir') File "/priv/manana1/wkerzend/python_coala/numscons-0.10.1-py2.6.egg/numscons/scons-local/scons-local-1.2.0/SCons/Script/SConscript.py", line 549: return apply(_SConscript, [self.fs,] + files, subst_kw) File "/priv/manana1/wkerzend/python_coala/numscons-0.10.1-py2.6.egg/numscons/scons-local/scons-local-1.2.0/SCons/Script/SConscript.py", line 259: exec _file_ in call_stack[-1].globals File "/priv/manana1/wkerzend/install_dir/scipy-0.7.1/build/scons/scipy/integrate/SConscript", line 15: raise Exception("Could not find F77 BLAS, needed for integrate package") error: Error while executing scons command. See above for more information. If you think it is a problem in numscons, you can also try executing the scons command with --log-level option for more detailed output of what numscons is doing, for example --log-level=0; the lowest the level is, the more detailed the output it.----- any help is appreciated Wolfgang

    Read the article

  • Makefile: finding include/lib for libraries installed through macports

    - by Henk
    Libraries/include files installed by macports go in /opt/local/lib and /opt/local/include, neither of which are scanned by gcc/ld by default. As a result, a project I'm working on won't compile in that environment. Should this be fixed by manually adding -L/opt/local/lib to my Makefile's LDFLAGS (and -I... as well), or is there some configuration that should be done to fix this globally on the computer?

    Read the article

  • Available alternative libraries in java to generate PDF documents

    - by Fazal
    I have been using XSL-FO and FOP Engine to generate PDF documents for required data. This works great, but lately I have seen some limitations in FOP especially when it comes to allowing user to enter text in a html editor which can be transformed to XSL-FO and given to FOP driver. This brought me to point to ask this large community of well informed individuals about what are possible Open Source or even non open source libraries to generate PDF documents in Java?

    Read the article

  • Qmake project dependencies (linked libraries)

    - by Stick it to THE MAN
    I have a project that links to a number of shared libraries. Lets say project A depends on projects B and C Ideally, I want to impose the following dependencies in my project file: Rebuild project A if either B or C has been rebuilt since last time project A was built Use the output for the relevant configuration (i.e. if building project A in debug mode, then use the debug versions of the libs for project B and C) Does anyone know how I may explicitly express such dependencies in my project file?

    Read the article

  • PHP beautifiers (libraries for formatting code)

    - by takeshin
    Previously, my intention was to ask: Do you know any open source SQL formatter/beautifier library for PHP projects? But I think, I'd better ask: Which code formatting libraries written in PHP are the best? Let's list them all in one place. My types: for CSS syntax: Css Tidy for PHP: PEAR's PHP_Beautifier for HTML syntax: Tidy

    Read the article

  • iPhone plotting / charting libraries

    - by sashaeve
    I am looking for good plotting library (line, pie, column charts) which allows to interact with user touches something like in Stocks app. I found a core-plot library but seems like that interaction logic is not well-covered. Please suggest what libraries can be used?

    Read the article

  • parallel java libraries

    - by jetru
    I'm looking for Java libraries/applications which are parallel and feature objects that can be queried in parallel. That is, there is/are objects in which multiple types of operations can be made from different threads and these will be synchronized. It would be helpful if someone could ideas of where I could find such applications as well. EDIT: Actually, language doesn't matter so much, so C++, Python, anything is welcome

    Read the article

< Previous Page | 67 68 69 70 71 72 73 74 75 76 77 78  | Next Page >