Separating text strings into a table of individual words in SQL via XML.

Posted by Phil Factor on Simple Talk See other posts from Simple Talk or by Phil Factor
Published on Tue, 13 May 2014 14:35:11 +0000 Indexed on 2014/05/26 21:56 UTC
Read the original article Hit count: 269

Filed under:

Nearly nine years ago, Mike Rorke of the SQL Server 2005 XML team blogged ‘Querying Over Constructed XML Using Sub-queries’. I remember reading it at the time without being able to think of a use for what he was demonstrating. Just a few weeks ago, whilst preparing my article on searching strings, I got out my trusty function for splitting strings into words and something reminded me of the old blog.

I’d been trying to think of a way of using XML to split strings reliably into words. The routine I devised turned out to be slightly slower than the iterative word chop I’ve always used in the past, so I didn’t publish it. It was then I suddenly remembered the old routine. Here is my version of it.

I’ve unwrapped it from its obvious home in a function or procedure just so it is easy to appreciate. What it does is to chop a text string into individual words using XQuery and the good old nodes() method. I’ve benchmarked it and it is quicker than any of the SQL ways of doing it that I know about. Obviously, you can’t use the trick I described here to do it, because it is awkward to use REPLACE() on 1…n characters of whitespace. I’ll carry on using my iterative function since it is able to tell me the location of each word as a character-offset from the start, and also because this method leaves punctuation in (removing it takes time!). However, I can see other uses for this in passing lists as input or output parameters, or as return values.

 

if exists (Select * from sys.xml_schema_collections where name like 'WordList')

  drop XML SCHEMA COLLECTION WordList

go

create xml schema collection WordList as '

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="words">

       <xs:simpleType>

              <xs:list itemType="xs:string" />

       </xs:simpleType>

</xs:element>

</xs:schema>'

 

go

 

DECLARE @string VARCHAR(MAX)

–we'll get some sample data from the great Ogden Nash

Select @String='This is a song to celebrate banks,

Because they are full of money and you go into them and all

you hear is clinks and clanks,

Or maybe a sound like the wind in the trees on the hills,

Which is the rustling of the thousand dollar bills.

Most bankers dwell in marble halls,

Which they get to dwell in because they encourage deposits

and discourage withdrawals,

And particularly because they all observe one rule which woe

betides the banker who fails to heed it,

Which is you must never lend any money to anybody unless

they don''t need it.

I know you, you cautious conservative banks!

If people are worried about their rent it is your duty to deny

them the loan of one nickel, yes, even one copper engraving

of the martyred son of the late Nancy Hanks;

Yes, if they request fifty dollars to pay for a baby you must

look at them like Tarzan looking at an uppity ape in the

jungle,

And tell them what do they think a bank is, anyhow, they had

better go get the money from their wife''s aunt or ungle.

But suppose people come in and they have a million and they

want another million to pile on top of it,

Why, you brim with the milk of human kindness and you

urge them to accept every drop of it,

And you lend them the million so then they have two million

and this gives them the idea that they would be better off

with four,

So they already have two million as security so you have no

hesitation in lending them two more,

And all the vice-presidents nod their heads in rhythm,

And the only question asked is do the borrowers want the

money sent or do they want to take it withm.

Because I think they deserve our appreciation and thanks,

the jackasses who go around saying that health and happi-

ness are everything and money isn''t essential,

Because as soon as they have to borrow some unimportant

money to maintain their health and happiness they starve

to death so they can''t go around any more sneering at good

old money, which is nothing short of providential. '

 

–we now turn it into XML

declare @xml_data xml(WordList) 

set @xml_data='<words>'+ replace(@string,'&', '&amp;')+'</words>'

 

 select T.ref.value('.', 'nvarchar(100)')

 from (Select @xml_data.query('

                     for $i in data(/words) return

                     element li { $i }

              '))  A(list)

cross apply A.List.nodes('/li') T(ref)

 

 

…which gives (truncated, of course)…

Clip2

© Simple Talk or respective owner

Related posts about Uncategorized