Best way to split a string by word (SQL Batch separator)
- by Paul Kohler
I have a class I use to "split" a string of SQL commands by a batch separator - e.g. "GO" - into a list of SQL commands that are run in turn etc.
...
private static IEnumerable<string> SplitByBatchIndecator(string script, string batchIndicator)
{
string pattern = string.Concat("^\\s*", batchIndicator, "\\s*$");
RegexOptions options = RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.Multiline;
foreach (string batch in Regex.Split(script, pattern, options))
{
yield return batch.Trim();
}
}
My current implementation uses a Regex with yield but I am not sure if it's the "best" way.
It should be quick
It should handle large strings (I have some scripts that are 10mb in size for example)
The hardest part (that the above code currently does not do) is to take quoted text into account
Currently the following SQL will incorrectly get split:
var batch = QueryBatch.Parse(@"-- issue...
insert into table (name, desc)
values('foo', 'if the
go
is on a line by itself we have a problem...')");
Assert.That(batch.Queries.Count, Is.EqualTo(1), "This fails for now...");
I have thought about a token based parser that tracks the state of the open closed quotes but am not sure if Regex will do it.
Any ideas!?