Finding Common Phrases in SQL Server TEXT Column
- by regex
Short Desc:
I'm curious to see if I can use SQL Analysis services or some other SQL Server service to mine some data for me that will show commonalities between SQL TEXT fields in a dataset.
Long Desc
I am looking at a subset of data that consists of about 10,000 rows of TEXT blobs which are used as a notes column in a issue tracking (ticketing) software. I would like to use something out of the box (without having to build something) that might be able to parse through all of the rows and find commonly used byte sequences in the "Notes" column. In other words, I want to find commonly used phrases (two to three word phrases, so 9 - 20 character sections of the TEXT blob). This will help me better determine if associate's notes contain similar phrases (troubleshooting techniques) that we could standardize in our troubleshooting process flow.
Closing Note
I'd really rather not build an application to do this as my method will probably not be the most efficient way to do it.
Hopefully all this makes sense. Please let me know in the comments if anything needs clarification.