How best to use XPath with very large XML files in .NET?

Posted by glenatron on Stack Overflow See other posts from Stack Overflow or by glenatron
Published on 2009-01-02T16:39:24Z Indexed on 2010/04/25 4:33 UTC
Read the original article Hit count: 488

Filed under:

c#

|

xpath

|

Xml

|

large-files

|

.NET

I need to do some processing on fairly large XML files ( large here being potentially upwards of a gigabyte ) in C# including performing some complex xpath queries. The problem I have is that the standard way I would normally do this through the System.XML libraries likes to load the whole file into memory before it does anything with it, which can cause memory problems with files of this size.

I don't need to be updating the files at all just reading them and querying the data contained in them. Some of the XPath queries are quite involved and go across several levels of parent-child type relationship - I'm not sure whether this will affect the ability to use a stream reader rather than loading the data into memory as a block.

One way I can see of making it work is to perform the simple analysis using a stream-based approach and perhaps wrapping the XPath statements into XSLT transformations that I could run across the files afterward, although it seems a little convoluted.

Alternately I know that there are some elements that the XPath queries will not run across, so I guess I could break the document up into a series of smaller fragments based on it's original tree structure, which could perhaps be small enough to process in memory without causing too much havoc.

I've tried to explain my objective here so if I'm barking up totally the wrong tree in terms of general approach I'm sure you folks can set me right...

© Stack Overflow or respective owner

Related posts about c#

.NET WebRequest.PreAuthenticate not quite what it sounds like

as seen on West-Wind - Search for 'West-Wind'
I’ve run into the problem a few times now: How to pre-authenticate .NET WebRequest calls doing an HTTP call to the server – essentially send authentication credentials on the very first request instead of waiting for a server challenge first? At first glance this sound like it should be easy:… >>> More
HttpWebRequest and Ignoring SSL Certificate Errors

as seen on West-Wind - Search for 'West-Wind'
Man I can't believe this. I'm still mucking around with OFX servers and it drives me absolutely crazy how some these servers are just so unbelievably misconfigured. I've recently hit three different 3 major brokerages which fail HTTP validation with bad or corrupt certificates at least according to… >>> More
The dynamic Type in C# Simplifies COM Member Access from Visual FoxPro

as seen on West-Wind - Search for 'West-Wind'
I’ve written quite a bit about Visual FoxPro interoperating with .NET in the past both for ASP.NET interacting with Visual FoxPro COM objects as well as Visual FoxPro calling into .NET code via COM Interop. COM Interop with Visual FoxPro has a number of problems but one of them at least got a lot… >>> More
Dynamic Type to do away with Reflection

as seen on West-Wind - Search for 'West-Wind'
The dynamic type in C# 4.0 is a welcome addition to the language. One thing I’ve been doing a lot with it is to remove explicit Reflection code that’s often necessary when you ‘dynamically’ need to walk and object hierarchy. In the past I’ve had a number of ReflectionUtils that used string based expressions… >>> More
Finding a Relative Path in .NET

as seen on West-Wind - Search for 'West-Wind'
Here’s a nice and simple path utility that I’ve needed in a number of applications: I need to find a relative path based on a base path. So if I’m working in a folder called c:\temp\templates\ and I want to find a relative path for c:\temp\templates\subdir\test.txt I want to receive back subdir\test… >>> More

Related posts about xpath

xpath query in a servlet gives exception

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a Document object initialized in the init() method of the servlet and use it in the doPost() method to service the requests. selectNodeList() xpath query gives exception when the servlet services many request at same time. The Exception is shown below: Caused by: javax.xml.transform.TransformerException:… >>> More
Xpath question Xml Xpath

as seen on Stack Overflow - Search for 'Stack Overflow'
I need an xpath expression that would return the value of I need to get the value of this node. the value to extract is my xpath expression is //rates/rate[loantype='30-Year Fixed Rate'] The issue hre is that there are three value each node has a subtype element. Beside fileter for loantype… >>> More
XPath to find element based on another XPath element

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I have an Java AST and I try to find a variable inside it via XPath. Lets say the variable is called 'foobar' I could use //VariableDeclarator/VariableDeclaratorId[@Image='foobar'] but what if I dont know the text 'foobar', but want to read it from another element //VariableDeclarator/VariableDeclaratorId[@Image=//SynchronizedStatement/Expression/PrimaryExpression/PrimaryPrefix/Name] the… >>> More
php xpath query on and xpath result

as seen on Stack Overflow - Search for 'Stack Overflow'
Can I use an xpath query on a result already obtained using xpath? >>> More
how to use nokogiri methods .xpath & .at_xpath

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm learning how to use nokogiri and few questions came to me based on the code below require 'rubygems' require 'mechanize' post_agent = WWW::Mechanize.new post_page = post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708') puts "\nabsolute path with tbody gives nil" puts post_page… >>> More