Search Results

Search found 7251 results on 291 pages for 'pdf parsing'.

Page 20/291 | < Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27 | Next Page >

Parsing text file in python

- by Ockonal

Hello, I have html-file. I have to replace all text between this: [%anytext%]. As I understand, it's very easy to do with BeautifulSoup for parsing hmtl. But what is regular expression and how to remove&write back text data?

Read the article
MSBuild 4.0 Regex parsing

- by Chandam

I have heard that MSBuild 4.0 has increased Regex parsing support. However, I am unable to find any detailed documentation/links/material on this. Can anyone give a brief description of the new features and/or possibly give pointers to more material? Thanks in advance.

Read the article
Customized command line parsing in Python

- by Moshe

I'm writing a shell for a project of mine, which by design parses commands that looks like this: COMMAND_NAME ARG1="Long Value" ARG2=123 [email protected] My problem is that Python's command line parsing libraries (getopt and optparse) forces me to use '-' or '--' in front of the arguments. This behavior doesn't match my requirements. Any ideas how can this be solved? Any existing library for this?

Read the article
Python PLY zero or more occurrences of a parsing item

- by None

I am using Python with PLY to parse LISP-like S-Expressions and when parsing a function call there can be zero or more arguments. How can I put this into the yacc code. This is my function so far: def p_EXPR(p): '''EXPR : NUMBER | STRING | LPAREN funcname [EXPR] RPAREN''' if len(p) == 2: p[0] = p[1] else: p[0] = ("Call", p[2], p[3:-1]) I need to replace "[EXPR]" with something that allows zero or more EXPR's. How can I do this?

Read the article
Clarification needed about Python CSV file format parsing

- by HH

Format is like: CHINA;2002-06-25 00:00:00.000;5,60 CHINA;2002-06-26 00:00:00.000;5,32 CHINA;2002-06-27 00:00:00.000;5,31 and I try to use Python's CSV tools to parse it but cannot understand the paragraph, source: And while the module doesn’t directly support parsing strings, it can easily be done: import csv for row in csv.reader(['one,two,three']): print row Could someone clarify the line ['one,two,three']? How would you use it with format A;B;C?

Read the article
Expecting End of File Exception in parsing xml data for Blackberry application

- by tejaswini

Hi All, I am getting parser exception as "Expecting End of File" while parsing xml data for Blackberry application? How do I fix it?

Read the article
Haskell -> After parsing how to work with strings

- by bito08

Hello after doing the parsing with a script in Haskell I got a file with the 'appearance' of lists of strings. However when I call the file content with the function getContents or hGetContents, ie, reading the contents I get something like: String with lines (schematically what I want is: "[" aaa "," bbb "" ccc "]" - ["aaa", "bbb" "ccc"]). I have tried with the read function but without results. I need to work with these lists of strings to concatenating them all in a single list. Thanks.

Read the article
Parsing Huge XML Files in PHP

- by Ian

I'm trying to parse the dmoz content/structures xml files into mysql, but all existing scripts to do this are very old and don't work well. How can I go about opening a large (+1GB) xml file in php for parsing?

Read the article
Android Google-Shopping API force closes while parsing

- by Sam Jackson

I'm trying to send a request to the Google-Shopping API with the following static method which I think is working: public static String GET_TITLE(String url) throws JSONException { InputStream is = null; String result = ""; JSONObject jArray = null; // http post try { HttpClient httpclient = new DefaultHttpClient(); HttpPost httppost = new HttpPost(url); HttpResponse response = httpclient.execute(httppost); HttpEntity entity = response.getEntity(); is = entity.getContent(); } catch(Exception e) { Log.e("log_tag", "Error in http connection "+e.toString()); } The URL I'm passing is this BTW: https://www.googleapis.com/shopping/search/v1/public/products/country=US&q=shirts&alt=json &rankBy=relevancy&key=AIzaSyDRKgGmJrdG6pV6DIg2m-nmIbXydxvpjww Next I try to parse this response (where I think the problem comes in) in the same method: try { jArray = new JSONObject(result); } catch(JSONException e){ Log.e("log_tag", "Error parsing data "+e.toString()); } JSONObject itemObject = jArray.getJSONObject("items"); JSONObject productObject = itemObject.getJSONObject("product"); String attributeGoogleId = productObject.getString("googleId"); String attributeProviderId = productObject.getString("providerId"); String attributeTitle = productObject.getString("title");*/ String attributePrice = productObject.getString("price"); JSONObject popupObject = productObject.getJSONObject("popup"); return attributeTitle; } This has been so frustrating, I know it should be simple but everywhere I look I just can't quite get it to work, I'm not exactly sure what the error is since I'm testing it on my HTC Desire because my emulator gives an 'invalid command-line parameter' error when starting, but that's a different issue, anyway, thanks in advance! EDIT: The first one makes it look like there's a problem with the URL, should I change it and see if it makes a difference? 04-01 12:09:05.142: ERROR/log_tag(24968): Error in http connection java.net.UnknownHostException: www.googleapis.com 04-01 12:09:05.142: ERROR/log_tag(24968): Error converting result java.lang.NullPointerException 04-01 12:09:05.142: ERROR/log_tag(24968): Error parsing data org.json.JSONException: End of input at character 0 of 04-01 12:09:05.142: DEBUG/AndroidRuntime(24968): Shutting down VM 04-01 12:09:05.142: WARN/dalvikvm(24968): threadid=1: thread exiting with uncaught exception (group=0x400259f8) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): FATAL EXCEPTION: main 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): java.lang.RuntimeException: Failure delivering result ResultInfo{who=null, request=0, result=-1, data=Intent { act=com.google.zxing.client.android.SCAN flg=0x80000 (has extras) }} to activity {com.spectrum.stock/com.spectrum.stock.CaptureActivity}: java.lang.NullPointerException 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): at android.app.ActivityThread.deliverResults(ActivityThread.java:3734) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): at android.app.ActivityThread.handleSendResult(ActivityThread.java:3776) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): at android.app.ActivityThread.access$2800(ActivityThread.java:135) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2166) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): at android.os.Handler.dispatchMessage(Handler.java:99) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): at android.os.Looper.loop(Looper.java:144) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): at android.app.ActivityThread.main(ActivityThread.java:4937) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): at java.lang.reflect.Method.invokeNative(Native Method) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): at java.lang.reflect.Method.invoke(Method.java:521) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:868) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:626) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): at dalvik.system.NativeStart.main(Native Method) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): Caused by: java.lang.NullPointerException 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): at com.spectrum.stock.JSONResponse.GET_TITLE(JSONResponse.java:61) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): at com.spectrum.stock.CaptureActivity.onActivityResult(CaptureActivity.java:78) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): at android.app.Activity.dispatchActivityResult(Activity.java:3931) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): at android.app.ActivityThread.deliverResults(ActivityThread.java:3730) 04-01 12:09:05.152: ERROR/AndroidRuntime(24968): ... 11 more 04-01 12:09:05.162: WARN/ActivityManager(96): Force finishing activity com.spectrum.stock/.CaptureActivity

Read the article
Trouble with ITextSharp - Converting XML to PDF

- by AllenG

Okay... I'm trying to use the most recent version of ITextSharp to turn an XML file into a PDF. It isn't working. The documentation on SourceForge doesn't seem to have kept up with the actual releases; the code in the provided example won't even compile under the newest version. Here is my test XML: <Remittance> <RemitHeader> <Payer>BlueCross</Payer> <Provider>Maricopa</Provider> <CheckDate>20100329</CheckDate> <CheckNumber>123456789</CheckNumber> </RemitHeader> <RemitDetail> <NPI>NPI_GOES_HERE</NPI> <Patient>Patient Name</Patient> <PCN>0034567</PCN> <DateOfService>20100315</DateOfService> <TotalCharge>125.57</TotalCharge> <TotalPaid>55.75</TotalPaid> <PatientShare>35</PatientShare> </RemitDetail> </Remittance> And here is the code I'm attempting to use to turn that into a PDF. Document doc = new Document(PageSize.LETTER, 36, 36, 36, 36); iTextSharp.text.pdf.PdfWriter.GetInstance(doc, new StreamWriter(fileOutputPath).BaseStream); doc.Open(); SimpleXMLParser.Parse((ISimpleXMLDocHandler)doc, new StreamReader(fileInputPath).BaseStream); doc.Close(); Now, I was pretty sure the (ISimpleXMLDocHandler)doc piece wasn't going to work, but I can't actually find anything in the source that both a) implements ISimleXMLDocHandler and b) will accept a standard XML document and parse it to PDF. FYI- I did try an older version which would compile using the example code from sourceforge, but it wasn't working either.

Read the article
Using Java PDFBox library to write Russian PDF

- by Brad

I am using a Java library called PDFBox trying to write text to a PDF. It works perfect for English text, but when i tried to write Russian text inside the PDF the letters appeared so strange. It seems the problem is in the font used, but i am not so sure about that, so i hope if anyone could guide me through this. Here is the important code lines : PDTrueTypeFont font = PDTrueTypeFont.loadTTF( pdfFile, new File( "fonts/VREMACCI.TTF" ) ); // Windows Russian font imported to write the Russian text. font.setEncoding( new WinAnsiEncoding() ); // Define the Encoding used in writing. // Some code here to open the PDF & define a new page. contentStream.drawString( "??????? ????????????" ); // Write the Russian text. The WinAnsiEncoding source code is : Click here --------------------- Edit on 18 November 2009 After some investigation, i am now sure it is an Encoding problem, this could be solved by defining my own Encoding using the helpful PDFBox class called DictionaryEncoding. I am not sure how to use it, but here is what i have tried until now : COSDictionary cosDic = new COSDictionary(); cosDic.setString( COSName.getPDFName("Ercyrillic"), "0420 " ); // Russian letter. font.setEncoding( new DictionaryEncoding( cosDic ) ); This does not work, as it seems i am filling the dictionary in a wrong way, when i write a PDF page using this it appears blank. The DictionaryEncoding source code is : Click here Thanks . . .

Read the article
Multiselect Form Field in PDF

- by Jason R. Coombs

Using PDF, is it possible to create a single form element with multiple fields of which several can be selected? For example, in HTML, one can create a set of checkboxes associated with the same field name: <div>Select one for Member of the School Board</div> <input type="checkbox" name="field(school)" value="vote1"> <span class="label">Libby T. Garvey</span><br/> <input type="checkbox" name="field(school)" value="vote2"> <span class="label">Emma N. Violand-Sanchez</span><br/> In this case, the field name is "field(school)", and when the form is submitted, "field(school)" can be supplied 0, 1, or 2 times. Is there an equivalent construct in PDF where a single field can have multiple values. So far in my investigation, it appears that if fields are assigned the same name, it is only possible to select one field. If it is possible to implement this in PDF, what is this construct called and how can it be implemented? Edit: To clarify, I am aware that a PDF can contain multiple form fields with different field names, and those can be selected independently, but then the grouping is implicit and not explicit as with the HTML form. I would like to use a construct that makes the grouping of options explicit, and preferably allows for restrictions (e.g. at least one required, no more than 2 allowed, etc).

Read the article
Using Java PDFBox library to write Russian PDF

- by Brad

Hello , I am using a Java library called PDFBox trying to write text to a PDF. It works perfect for English text, but when i tried to write Russian text inside the PDF the letters appeared so strange. It seems the problem is in the font used, but i am not so sure about that, so i hope if anyone could guide me through this. Here is the important code lines : PDTrueTypeFont font = PDTrueTypeFont.loadTTF( pdfFile, new File( "fonts/VREMACCI.TTF" ) ); // Windows Russian font imported to write the Russian text. font.setEncoding( new WinAnsiEncoding() ); // Define the Encoding used in writing. // Some code here to open the PDF & define a new page. contentStream.drawString( "??????? ????????????" ); // Write the Russian text. The WinAnsiEncoding source code is : Click here --------------------- Edit on 18 November 2009 After some investigation, i am now sure it is an Encoding problem, this could be solved by defining my own Encoding using the helpful PDFBox class called DictionaryEncoding. I am not sure how to use it, but here is what i have tried until now : COSDictionary cosDic = new COSDictionary(); cosDic.setString( COSName.getPDFName("Ercyrillic"), "0420 " ); // Russian letter. font.setEncoding( new DictionaryEncoding( cosDic ) ); This does not work, as it seems i am filling the dictionary in a wrong way, when i write a PDF page using this it appears blank. The DictionaryEncoding source code is : Click here Thanks . . .

Read the article
Sending mail with a Php with a pdf attachment

- by Jake

Hi, I'm trying to send an email from the php mail command. I've been able to what I've tried so far, but can't seem to get it to work with an attachment. I've looked around the web and the best code I've found led me to this: $fileatt_name = 'JuneFlyer.pdf'; $fileatt_type = 'application/pdf'; $fileatt = 'JuneFlyer.pdf'; $file = fopen($fileatt,'rb'); $data = fread($file,filesize($fileatt)); $data = chunk_split(base64_encode($data)); $MAEmail = "[email protected]"; mail("$email_address", "$subject", "$message", "From: ".$MAEmail."\n". "MIME-Version: 1.0\n". "Content-type: text/html; charset=iso-8859-1". "--{$mime_boundary}\n" . "Content-Type: {$fileatt_type};\n" . " name=\"{$fileatt_name}\"\n" . "Content-Disposition: attachment;\n" . " filename=\"{$fileatt_name}\"\n" . "Content-Transfer-Encoding: base64\n\n" .$data. "\n\n" ); There are two problems when I do this. First, the contents of the email dissappear. Second, there is an error on the attachment. "Adobe Reader could not open June_flyer.pdf because it is either not a supported file type or because the file has been damaged (for example it was sent as an email attachment and wasn't correctly decoded)" Any ideas of how to deal with this? Thanks, JB

Read the article
Removing PDF attachments via itext

- by r00fus

I'm trying to remove attachments from a number of my pdf files (I can extract via pdftk, so preserving them is not an issue). I coded the following based on an example found from a google search: import java.io.FileOutputStream; import java.io.IOException; import com.lowagie.text.*; import com.lowagie.text.pdf.*; class pdfremoveattachment { public static void main(String[] args) throws IOException, DocumentException { if (args.length < 2) { System.out.println("Usage: java PDFremoveAttachments source.pdf target.pdf"); System.exit(1); } PdfReader sourcePDF = new PdfReader(args[0]); removeAttachments(sourcePDF); FileOutputStream targetFOS = new FileOutputStream(args[1]); PdfStamper targetPDF = new PdfStamper(sourcePDF,targetFOS); targetPDF.close(); } public static void removeAttachments(PdfReader reader) { PdfDictionary catalog = reader.getCatalog(); // System.out.println(catalog.getAsDict(PdfName.NAME)); // this would print "null" PdfDictionary names = (PdfDictionary)PdfReader.getPdfObject(catalog.get(PdfName.NAMES)); // System.out.println(names.toString()); //this returns NPE if( names != null) { PdfDictionary files = (PdfDictionary) PdfReader.getPdfObject(names.get(PdfName.FILEATTACHMENT)); if( files!= null ){ for( Object key : files.getKeys() ){ files.remove((PdfName)key); } files = (PdfDictionary) PdfReader.getPdfObject(names.get(PdfName.EMBEDDEDFILES)); } if( files!= null ){ for( Object key : files.getKeys() ){ files.remove((PdfName)key); } reader.removeUnusedObjects(); } } } } If anyone knows how I might be able to remove attachments, I'd greatly appreciate a reply.

Read the article
Generating PDF document using XSLT

- by Nair

I have one huge XML document. I have set of XSL representing each node in the XML. These XSL also have java script to generate the dynamic content. It uses images which are in seperate images folder and it uses fonts as well. At present, I have a program which displays all the nodes that can be transformed and user click on one of the node and the program performs XSLT and display the content in HTML format on IE screen. I want to write a program (.Net , C# or any .Net language) which will allow user to do XSLT tranform on all the available notes and create one PDF document. My initial requirement was to display all the document in the IE itself, so I reused the existing code, and foreach node, perform XSLT and then append it to the current HTML with a page break and it worked ok till we hit huge files. So the requirement changed to create one PDF file with all the nodes. I have couple of questions, 1. What is the best way to create PDF file using XSLT transformation? 2. Since the images are relative path, if we generate the XSLT in html and then write it to a output stream will it loose the images? 3. Will the font be preserved in the PDF document? Really appriciate if someone could point me to some good example that I can take and run with it. Thanks a lot.

Read the article
Display pdf file inline in Rails app

- by Martas

Hi, I have a pdf file attachment saved in the cloud. The file is attached using attachment_fu. All I do to display it in the view is: <%= image_tag @model.pdf_attachment.public_filename %> When I load the page with this code in the browser, it does what I want: it displays the attached pdf file. But only on Mac. On Windows, browsers will display a broken image placeholder. Chrome's Developer Tools report: "Resource interpreted as image but transferred with MIME type application/pdf." I also tried sending the file from controller: in PdfAttachmentController: def send_pdf_attachment pdf_attachment = PdfAttachment.find params[:id] send_file pdf_attachment.public_filename, :type => pdf_attachment.content_type, :file_name => pdf_attachment.filename, :disposition => 'inline' end in routes.rb: map.send_pdf_attachment '/pdf_attachments/send_pdf_attachment/:id', :controller => 'pdf_attachments', :action => 'send_pdf_attachment' and in the view: <%= send_pdf_attachment_path @model.pdf_attachment %> or <%= image_tag( send_pdf_attachment_path @model.pdf_attachment ) %> And that doesn't display the file on Mac (I didn't try on Windows), it displays the path: pdf_attachments/send_pdf_attachment/35 So, my question is: what do I do to properly display a pdf file inline? Thanks martin

Read the article
Which pdf elements could cause crashes?

- by Felixyz

This is a very general question but it's based on a specific problem. I've created a pdf reader app for the iPad and it works fine except for certain pdf pages which always crash the app. We now found out that the very same pages cause Safari to crash as well, so as I had started to suspect the problem is somewhere in Apple's pdf rendering code. From what I have been able to see, the crashing pages cause the rendering libraries to start allocating memory like mad until the app is killed. I have nothing else to help me pinpoint what triggers this process. It doesn't necessarily happen with the largest documents, or the ones with the most shapes. In fact, we haven't found any parameter that helps us predict which pages will crash and which not. Now we just discovered that running the pages through a consumer program that lets you merge docs gets rid of the problem, but I haven't been able to detect which attribute or element it is that is the key. Changing documents by hand is also not an option for us in the long run. We need to run an automated process on our server. I'm hoping someone with deeper knowledge about the pdf file format would be able to point me in a reasonable direction to look for document features that could cause this kind of behavior. All I've found so far is something about JBIG2 images, and I don't think we have any of those.

Read the article
PDF search on the iPhone

- by pt2ph8

After two days trying to read annotations from a PDF using Quartz, I've managed to do it and posted my code. Now I'd like to do the same for another frequently asked question: searching PDF documents with Quartz. Same situation as before, this question has been asked many times with almost no practical answers. So I need some pointers first, as I still haven't implemented this myself. What I tried: I tried using CGPDFScannerScan handling the TJ and Tj operators - returns the right text on some PDF, whereas on other documents it returns mostly random letters. Maybe it's related to text encoding? Someone pointed out that text blocks (marked by BT/ET operators) should be handled instead, but I still haven't managed to do so. Anyone managed to extract text from any PDF? After that, searching should be easy by storing all the text in a NSMutableString and using rangeOfString (if there's a better way please let me know). But then how to highlight the result? I know there are a few operators to find the glyph sizes, so I could calculate the resulting rect based on those values, but I've been reading the spec for hours... it's a bloated mess and I'm going insane. Anyone with a practical explanation? Thanks.

Read the article
Using JSON.NET for dynamic JSON parsing

- by Rick Strahl

With the release of ASP.NET Web API as part of .NET 4.5 and MVC 4.0, JSON.NET has effectively pushed out the .NET native serializers to become the default serializer for Web API. JSON.NET is vastly more flexible than the built in DataContractJsonSerializer or the older JavaScript serializer. The DataContractSerializer in particular has been very problematic in the past because it can't deal with untyped objects for serialization - like values of type object, or anonymous types which are quite common these days. The JavaScript Serializer that came before it actually does support non-typed objects for serialization but it can't do anything with untyped data coming in from JavaScript and it's overall model of extensibility was pretty limited (JavaScript Serializer is what MVC uses for JSON responses). JSON.NET provides a robust JSON serializer that has both high level and low level components, supports binary JSON, JSON contracts, Xml to JSON conversion, LINQ to JSON and many, many more features than either of the built in serializers. ASP.NET Web API now uses JSON.NET as its default serializer and is now pulled in as a NuGet dependency into Web API projects, which is great. Dynamic JSON Parsing One of the features that I think is getting ever more important is the ability to serialize and deserialize arbitrary JSON content dynamically - that is without mapping the JSON captured directly into a .NET type as DataContractSerializer or the JavaScript Serializers do. Sometimes it isn't possible to map types due to the differences in languages (think collections, dictionaries etc), and other times you simply don't have the structures in place or don't want to create them to actually import the data. If this topic sounds familiar - you're right! I wrote about dynamic JSON parsing a few months back before JSON.NET was added to Web API and when Web API and the System.Net HttpClient libraries included the System.Json classes like JsonObject and JsonArray. With the inclusion of JSON.NET in Web API these classes are now obsolete and didn't ship with Web API or the client libraries. I re-linked my original post to this one. In this post I'll discus JToken, JObject and JArray which are the dynamic JSON objects that make it very easy to create and retrieve JSON content on the fly without underlying types. Why Dynamic JSON? So, why Dynamic JSON parsing rather than strongly typed parsing? Since applications are interacting more and more with third party services it becomes ever more important to have easy access to those services with easy JSON parsing. Sometimes it just makes lot of sense to pull just a small amount of data out of large JSON document received from a service, because the third party service isn't directly related to your application's logic most of the time - and it makes little sense to map the entire service structure in your application. For example, recently I worked with the Google Maps Places API to return information about businesses close to me (or rather the app's) location. The Google API returns a ton of information that my application had no interest in - all I needed was few values out of the data. Dynamic JSON parsing makes it possible to map this data, without having to map the entire API to a C# data structure. Instead I could pull out the three or four values I needed from the API and directly store it on my business entities that needed to receive the data - no need to map the entire Maps API structure. Getting JSON.NET The easiest way to use JSON.NET is to grab it via NuGet and add it as a reference to your project. You can add it to your project with: PM> Install-Package Newtonsoft.Json From the Package Manager Console or by using Manage NuGet Packages in your project References. As mentioned if you're using ASP.NET Web API or MVC 4 JSON.NET will be automatically added to your project. Alternately you can also go to the CodePlex site and download the latest version including source code: http://json.codeplex.com/ Creating JSON on the fly with JObject and JArray Let's start with creating some JSON on the fly. It's super easy to create a dynamic object structure with any of the JToken derived JSON.NET objects. The most common JToken derived classes you are likely to use are JObject and JArray. JToken implements IDynamicMetaProvider and so uses the dynamic keyword extensively to make it intuitive to create object structures and turn them into JSON via dynamic object syntax. Here's an example of creating a music album structure with child songs using JObject for the base object and songs and JArray for the actual collection of songs:[TestMethod] public void JObjectOutputTest() { // strong typed instance var jsonObject = new JObject(); // you can explicitly add values here using class interface jsonObject.Add("Entered", DateTime.Now); // or cast to dynamic to dynamically add/read properties dynamic album = jsonObject; album.AlbumName = "Dirty Deeds Done Dirt Cheap"; album.Artist = "AC/DC"; album.YearReleased = 1976; album.Songs = new JArray() as dynamic; dynamic song = new JObject(); song.SongName = "Dirty Deeds Done Dirt Cheap"; song.SongLength = "4:11"; album.Songs.Add(song); song = new JObject(); song.SongName = "Love at First Feel"; song.SongLength = "3:10"; album.Songs.Add(song); Console.WriteLine(album.ToString()); } This produces a complete JSON structure: { "Entered": "2012-08-18T13:26:37.7137482-10:00", "AlbumName": "Dirty Deeds Done Dirt Cheap", "Artist": "AC/DC", "YearReleased": 1976, "Songs": [ { "SongName": "Dirty Deeds Done Dirt Cheap", "SongLength": "4:11" }, { "SongName": "Love at First Feel", "SongLength": "3:10" } ] } Notice that JSON.NET does a nice job formatting the JSON, so it's easy to read and paste into blog posts :-). JSON.NET includes a bunch of configuration options that control how JSON is generated. Typically the defaults are just fine, but you can override with the JsonSettings object for most operations. The important thing about this code is that there's no explicit type used for holding the values to serialize to JSON. Rather the JSON.NET objects are the containers that receive the data as I build up my JSON structure dynamically, simply by adding properties. This means this code can be entirely driven at runtime without compile time restraints of structure for the JSON output. Here I use JObject to create a album 'object' and immediately cast it to dynamic. JObject() is kind of similar in behavior to ExpandoObject in that it allows you to add properties by simply assigning to them. Internally, JObject values are stored in pseudo collections of key value pairs that are exposed as properties through the IDynamicMetaObject interface exposed in JSON.NET's JToken base class. For objects the syntax is very clean - you add simple typed values as properties. For objects and arrays you have to explicitly create new JObject or JArray, cast them to dynamic and then add properties and items to them. Always remember though these values are dynamic - which means no Intellisense and no compiler type checking. It's up to you to ensure that the names and values you create are accessed consistently and without typos in your code. Note that you can also access the JObject instance directly (not as dynamic) and get access to the underlying JObject type. This means you can assign properties by string, which can be useful for fully data driven JSON generation from other structures. Below you can see both styles of access next to each other:// strong type instance var jsonObject = new JObject(); // you can explicitly add values here jsonObject.Add("Entered", DateTime.Now); // expando style instance you can just 'use' properties dynamic album = jsonObject; album.AlbumName = "Dirty Deeds Done Dirt Cheap"; JContainer (the base class for JObject and JArray) is a collection so you can also iterate over the properties at runtime easily:foreach (var item in jsonObject) { Console.WriteLine(item.Key + " " + item.Value.ToString()); } The functionality of the JSON objects are very similar to .NET's ExpandObject and if you used it before, you're already familiar with how the dynamic interfaces to the JSON objects works. Importing JSON with JObject.Parse() and JArray.Parse() The JValue structure supports importing JSON via the Parse() and Load() methods which can read JSON data from a string or various streams respectively. Essentially JValue includes the core JSON parsing to turn a JSON string into a collection of JsonValue objects that can be then referenced using familiar dynamic object syntax. Here's a simple example:public void JValueParsingTest() { var jsonString = @"{""Name"":""Rick"",""Company"":""West Wind"", ""Entered"":""2012-03-16T00:03:33.245-10:00""}"; dynamic json = JValue.Parse(jsonString); // values require casting string name = json.Name; string company = json.Company; DateTime entered = json.Entered; Assert.AreEqual(name, "Rick"); Assert.AreEqual(company, "West Wind"); } The JSON string represents an object with three properties which is parsed into a JObject class and cast to dynamic. Once cast to dynamic I can then go ahead and access the object using familiar object syntax. Note that the actual values - json.Name, json.Company, json.Entered - are actually of type JToken and I have to cast them to their appropriate types first before I can do type comparisons as in the Asserts at the end of the test method. This is required because of the way that dynamic types work which can't determine the type based on the method signature of the Assert.AreEqual(object,object) method. I have to either assign the dynamic value to a variable as I did above, or explicitly cast ( (string) json.Name) in the actual method call. The JSON structure can be much more complex than this simple example. Here's another example of an array of albums serialized to JSON and then parsed through with JsonValue():[TestMethod] public void JsonArrayParsingTest() { var jsonString = @"[ { ""Id"": ""b3ec4e5c"", ""AlbumName"": ""Dirty Deeds Done Dirt Cheap"", ""Artist"": ""AC/DC"", ""YearReleased"": 1976, ""Entered"": ""2012-03-16T00:13:12.2810521-10:00"", ""AlbumImageUrl"": ""http://ecx.images-amazon.com/images/I/61kTaH-uZBL._AA115_.jpg"", ""AmazonUrl"": ""http://www.amazon.com/gp/product/…ASIN=B00008BXJ4"", ""Songs"": [ { ""AlbumId"": ""b3ec4e5c"", ""SongName"": ""Dirty Deeds Done Dirt Cheap"", ""SongLength"": ""4:11"" }, { ""AlbumId"": ""b3ec4e5c"", ""SongName"": ""Love at First Feel"", ""SongLength"": ""3:10"" }, { ""AlbumId"": ""b3ec4e5c"", ""SongName"": ""Big Balls"", ""SongLength"": ""2:38"" } ] }, { ""Id"": ""7b919432"", ""AlbumName"": ""End of the Silence"", ""Artist"": ""Henry Rollins Band"", ""YearReleased"": 1992, ""Entered"": ""2012-03-16T00:13:12.2800521-10:00"", ""AlbumImageUrl"": ""http://ecx.images-amazon.com/images/I/51FO3rb1tuL._SL160_AA160_.jpg"", ""AmazonUrl"": ""http://www.amazon.com/End-Silence-Rollins-Band/dp/B0000040OX/ref=sr_1_5?ie=UTF8&qid=1302232195&sr=8-5"", ""Songs"": [ { ""AlbumId"": ""7b919432"", ""SongName"": ""Low Self Opinion"", ""SongLength"": ""5:24"" }, { ""AlbumId"": ""7b919432"", ""SongName"": ""Grip"", ""SongLength"": ""4:51"" } ] } ]"; JArray jsonVal = JArray.Parse(jsonString) as JArray; dynamic albums = jsonVal; foreach (dynamic album in albums) { Console.WriteLine(album.AlbumName + " (" + album.YearReleased.ToString() + ")"); foreach (dynamic song in album.Songs) { Console.WriteLine("\t" + song.SongName); } } Console.WriteLine(albums[0].AlbumName); Console.WriteLine(albums[0].Songs[1].SongName); } JObject and JArray in ASP.NET Web API Of course these types also work in ASP.NET Web API controller methods. If you want you can accept parameters using these object or return them back to the server. The following contrived example receives dynamic JSON input, and then creates a new dynamic JSON object and returns it based on data from the first:[HttpPost] public JObject PostAlbumJObject(JObject jAlbum) { // dynamic input from inbound JSON dynamic album = jAlbum; // create a new JSON object to write out dynamic newAlbum = new JObject(); // Create properties on the new instance // with values from the first newAlbum.AlbumName = album.AlbumName + " New"; newAlbum.NewProperty = "something new"; newAlbum.Songs = new JArray(); foreach (dynamic song in album.Songs) { song.SongName = song.SongName + " New"; newAlbum.Songs.Add(song); } return newAlbum; } The raw POST request to the server looks something like this: POST http://localhost/aspnetwebapi/samples/PostAlbumJObject HTTP/1.1User-Agent: FiddlerContent-type: application/jsonHost: localhostContent-Length: 88 {AlbumName: "Dirty Deeds",Songs:[ { SongName: "Problem Child"},{ SongName: "Squealer"}]} and the output that comes back looks like this: { "AlbumName": "Dirty Deeds New", "NewProperty": "something new", "Songs": [ { "SongName": "Problem Child New" }, { "SongName": "Squealer New" } ]} The original values are echoed back with something extra appended to demonstrate that we're working with a new object. When you receive or return a JObject, JValue, JToken or JArray instance in a Web API method, Web API ignores normal content negotiation and assumes your content is going to be received and returned as JSON, so effectively the parameter and result type explicitly determines the input and output format which is nice. Dynamic to Strong Type Mapping You can also map JObject and JArray instances to a strongly typed object, so you can mix dynamic and static typing in the same piece of code. Using the 2 Album jsonString shown earlier, the code below takes an array of albums and picks out only a single album and casts that album to a static Album instance.[TestMethod] public void JsonParseToStrongTypeTest() { JArray albums = JArray.Parse(jsonString) as JArray; // pick out one album JObject jalbum = albums[0] as JObject; // Copy to a static Album instance Album album = jalbum.ToObject<Album>(); Assert.IsNotNull(album); Assert.AreEqual(album.AlbumName,jalbum.Value<string>("AlbumName")); Assert.IsTrue(album.Songs.Count > 0); } This is pretty damn useful for the scenario I mentioned earlier - you can read a large chunk of JSON and dynamically walk the property hierarchy down to the item you want to access, and then either access the specific item dynamically (as shown earlier) or map a part of the JSON to a strongly typed object. That's very powerful if you think about it - it leaves you in total control to decide what's dynamic and what's static. Strongly typed JSON Parsing With all this talk of dynamic let's not forget that JSON.NET of course also does strongly typed serialization which is drop dead easy. Here's a simple example on how to serialize and deserialize an object with JSON.NET:[TestMethod] public void StronglyTypedSerializationTest() { // Demonstrate deserialization from a raw string var album = new Album() { AlbumName = "Dirty Deeds Done Dirt Cheap", Artist = "AC/DC", Entered = DateTime.Now, YearReleased = 1976, Songs = new List<Song>() { new Song() { SongName = "Dirty Deeds Done Dirt Cheap", SongLength = "4:11" }, new Song() { SongName = "Love at First Feel", SongLength = "3:10" } } }; // serialize to string string json2 = JsonConvert.SerializeObject(album,Formatting.Indented); Console.WriteLine(json2); // make sure we can serialize back var album2 = JsonConvert.DeserializeObject<Album>(json2); Assert.IsNotNull(album2); Assert.IsTrue(album2.AlbumName == "Dirty Deeds Done Dirt Cheap"); Assert.IsTrue(album2.Songs.Count == 2); } JsonConvert is a high level static class that wraps lower level functionality, but you can also use the JsonSerializer class, which allows you to serialize/parse to and from streams. It's a little more work, but gives you a bit more control. The functionality available is easy to discover with Intellisense, and that's good because there's not a lot in the way of documentation that's actually useful. Summary JSON.NET is a pretty complete JSON implementation with lots of different choices for JSON parsing from dynamic parsing to static serialization, to complex querying of JSON objects using LINQ. It's good to see this open source library getting integrated into .NET, and pushing out the old and tired stock .NET parsers so that we finally have a bit more flexibility - and extensibility - in our JSON parsing. Good to go! Resources Sample Test Project http://json.codeplex.com/© Rick Strahl, West Wind Technologies, 2005-2012Posted in .NET Web Api AJAX Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
Simple libxml2 HTML parsing example, using Objective-c, Xcode, and HTMLparser.h

- by Stu

Please can somebody show me a simple example of parsing some HTML using libxml. #import <libxml2/libxml/HTMLparser.h> NSString *html = @"<ul><li><input type=\"image\" name=\"input1\" value=\"string1value\" /></li><li><input type=\"image\" name=\"input2\" value=\"string2value\" /></li></ul><span class=\"spantext\"><b>Hello World 1</b></span><span class=\"spantext\"><b>Hello World 2</b></span>"; 1) Say I want to parse the value of the input whose name = input2. Should output "string2value". 2) Say I want to parse the inner contents of each span tag whose class = spantext. Should output: "Hello World 1" and "Hello World 2".

Read the article
How to parse invalid HTML with Perl?

- by bodacydo

I maintain a database of articles with HTML formatting. Unfortunately the editors who wrote articles didn't know proper HTML, so they often have written stuff like: <div class="highlight"><html><head></head><body><p>Note that ...</p></html></div> I tried using HTML::TreeBuilder to parse this HTML but after parsing it and dumping the resulting tree, all the elements between <div class="highlight">...</div> are gone. I'm left with just <div class="highlight"></div>. The editors often have also done things like: <div class="article"><style>@font-face { font-family: "Cambria"; }</style>Article starts here</div> Parsing this with HTML::TreeBuilder results in empty <div class="article"></div> again. Any ideas how to approach this broken HTML and actually make sense out of it?

Read the article
SSIS parsing of an irregular flat file?

- by ElHaix

I'm pretty familiar with SSIS parsing of regular delimited text data files, however, I'm looking for some advice on an approach to tackle a file that looks like this test file: ISA*00* *00* *01*220220220 *ZZ*RL CODE 01*060327*1212*U*00300*000008859*0*P*:~ GS*RA*CPA-BPT*LOCALUTILITY*060319*1212*970819003*X*003030~ ST*820*000000001~ BPR*C*321.91*C*X12*CBC*04*000300488**9918939***04*000300002**1598564*070319~ TRN*1*00075319970819105029~ REF*RR*0003199708190000174858~ DTM*097*070318~ DTM*107*070318~ N1*PR*DIRECT PAYMENT~ N1*PE*ABC CORPORATE BILLER*ZZ*90005836~ ENT*1~ N1*PR*BILLING - TEST - NATTRASS~ RMR*CR*0009381082105011**142.15~ REF*TN*000303965~ DTM*109*070316~ ENT*2~ N1*PR*BILL FREID TEST~ RMR*CR*0011010451800011**179.76~ REF*TN*000304189~ The 321.91 is the total of the transaction. I would prefer to do this with SSIS, but could also do create a C# parser. Suggestions would be appreciated. Thank you.

Read the article
Excel parsing (xls) date error

- by tau-neutrino

I'm working on a project where I have to parse excel files for a client to extract data. An odd thing is popping up here: when I parse a date in the format of 5/9 (may 9th) in the excel sheet, I get 39577 in my program. I'm not sure if the year is encoded here (it is 2008 for these sheets). Are these dates the number of days since some sort of epoch? Does anyone know how to convert these numbers to something meaningful? I'm not looking for a solution that would convert these properly at time of parsing from the excel file (we already have thousands of extracted files that required a human to select relevant information - re-doing the extraction is not an option).

Read the article
Adding text over existing PDFs using reportlab

- by Shane

I'm interested in filling out existing PDF forms programatically. All I really need to do is pull information from user input and then place the appropriate text over an existing PDF in the appropriate locations. I can already do this with reportlab by feeding the same sheet of paper into a printer, twice, but this just really rubs me the wrong way. I'm tempted to just personally reverse engineer each existing PDF and draw every line and character myself before adding the user-inputted text, but I wanted to check to see if there was an easy way to take an existing PDF and set it as a background for some extra text. I'd really prefer to use python as it's the only language I feel comfortable with. I also realize that I could just scan the document itself and use the resulting raster image as a background, but I would prefer the precision of vector graphics. It seems like ReportLab has a commercial product with this functionality, and the specific function I'm looking for is in it (copyPages) - but it seems like overkill to pay for a 4 figure product for a single, simple function for a nonprofit use.

Read the article

Search Results

Search found 7251 results on 291 pages for 'pdf parsing'.

Page 20/291 | < Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27 | Next Page >

- by Ockonal

- by Chandam

- by Moshe

- by None

- by HH

- by tejaswini

- by bito08

- by Ian

- by Sam Jackson

- by AllenG

- by Brad

- by Jason R. Coombs

- by Brad

- by Jake

- by r00fus

- by Nair

- by Martas

- by Felixyz

- by pt2ph8

- by Rick Strahl

- by Stu

- by bodacydo

- by ElHaix

- by tau-neutrino

- by Shane

< Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27 | Next Page >