Mike Fal (b | t) is hosting this month's T-SQL Tuesday on Trick Shots. I love this choice because I've been preoccupied with sneaky/tricky/evil SQL Server stuff for a long time and have been presenting on it for the past year. Mike's directives were "Show us a cool trick or process you developed…It doesn’t have to be useful", which most of my blogging definitely fits, and "Tell us what you learned from this trick…tell us how it gave you insight in to how SQL Server works", which is definitely a new concept. I've done a lot of reading and watching on SQL Server Internals and even attended training, but sometimes I need to go explore on my own, using my own tools and techniques. It's an itch I get every few months, and, well, it sure beats workin'.
I've found some people to be intimidated by SQL Server's internals, and I'll admit there are A LOT of internals to keep track of, but there are tons of excellent resources that clearly document most of them, and show how knowing even the basics of internals can dramatically improve your database's performance. It may seem like rocket science, or even brain surgery, but you don't have to be a genius to understand it.
Although being an "evil genius" can help you learn some things they haven't told you about. ;)
This blog post isn't a traditional "deep dive" into internals, it's more of an approach to find out how a program works. It utilizes an extremely handy tool from an even more extremely handy suite of tools, Sysinternals. I'm not the only one who finds Sysinternals useful for SQL Server: Argenis Fernandez (b | t), Microsoft employee and former T-SQL Tuesday host, has an excellent presentation on how to troubleshoot SQL Server using Sysinternals, and I highly recommend it. Argenis didn't cover the Strings.exe utility, but I'll be using it to "hack" the SQL Server executable (DLL and EXE) files.
Please note that I'm not promoting software piracy or applying these techniques to attack SQL Server via internal knowledge. This is strictly educational and doesn't reveal any proprietary Microsoft information. And since Argenis works for Microsoft and demonstrated Sysinternals with SQL Server, I'll just let him take the blame for it. :P (The truth is I've used Strings.exe on SQL Server before I ever met Argenis.)
Once you download and install Strings.exe you can run it from the command line. For our purposes we'll want to run this in the Binn folder of your SQL Server instance (I'm referencing SQL Server 2012 RTM):
cd "C:\Program Files\Microsoft SQL Server\MSSQL11\MSSQL\Binn"
C:\Program Files\Microsoft SQL Server\MSSQL11\MSSQL\Binn> strings *sql*.dll > sqldll.txt
C:\Program Files\Microsoft SQL Server\MSSQL11\MSSQL\Binn> strings *sql*.exe > sqlexe.txt
I've limited myself to DLLs and EXEs that have "sql" in their names. There are quite a few more but I haven't examined them in any detail. (Homework assignment for you!)
If you run this yourself you'll get 2 text files, one with all the extracted strings from every SQL DLL file, and the other with the SQL EXE strings. You can open these in Notepad, but you're better off using Notepad++, EditPad, Emacs, Vim or another more powerful text editor, as these will be several megabytes in size.
And when you do open it…you'll find…a TON of gibberish. (If you think that's bad, just try opening the raw DLL or EXE file in Notepad. And by the way, don't do this in production, or even on a running instance of SQL Server.) Even if you don't clean up the file, you can still use your editor's search function to find a keyword like "SELECT" or some other item you expect to be there. As dumb as this sounds, I sometimes spend my lunch break just scanning the raw text for anything interesting. I'm boring like that.
Sometimes though, having these files available can lead to some incredible learning experiences. For me the most recent time was after reading Joe Sack's post on non-parallel plan reasons. He mentions a new SQL Server 2012 execution plan element called NonParallelPlanReason, and demonstrates a query that generates "MaxDOPSetToOne". Joe (formerly on the Microsoft SQL Server product team, so he knows this stuff) mentioned that this new element was not currently documented and tried a few more examples to see what other reasons could be generated.
Since I'd already run Strings.exe on the SQL Server DLLs and EXE files, it was easy to run grep/find/findstr for MaxDOPSetToOne on those extracts. Once I found which files it belonged to (sqlmin.dll) I opened the text to see if the other reasons were listed. As you can see in my comment on Joe's blog, there were about 20 additional non-parallel reasons. And while it's not "documentation" of this underdocumented feature, the names are pretty self-explanatory about what can prevent parallel processing. I especially like the ones about cursors – more ammo! - and am curious about the PDW compilation and Cloud DB replication reasons.
One reason completely stumped me: NoParallelHekatonPlan. What the heck is a hekaton? Google and Wikipedia were vague, and the top results were not in English. I found one reference to Greek, stating "hekaton" can be translated as "hundredfold"; with a little more Wikipedia-ing this leads to hecto, the prefix for "one hundred" as a unit of measure. I'm not sure why Microsoft chose hekaton for such a plan name, but having already learned some Greek I figured I might as well dig some more in the DLL text for hekaton. Here's what I found:
hekaton_slow_param_passing
Occurs when a Hekaton procedure call dispatch goes to slow parameter passing code path
The reason why Hekaton parameter passing code took the slow code path
hekaton_slow_param_pass_reason
sp_deploy_hekaton_database
sp_undeploy_hekaton_database
sp_drop_hekaton_database
sp_checkpoint_hekaton_database
sp_restore_hekaton_database
e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\hkproc.cpp
e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\matgen.cpp
e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\matquery.cpp
e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\sqlmeta.cpp
e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\resultset.cpp
Interesting! The first 4 entries (in red) mention parameters and "slow code". Could this be the foundation of the mythical DBCC RUNFASTER command? Have I been passing my parameters the slow way all this time?
And what about those sp_xxxx_hekaton_database procedures (in blue)? Could THEY be the secret to a faster SQL Server? Could they promise a "hundredfold" improvement in performance? Are these special, super-undocumented DIB (databases in black)?
I decided to look in the SQL Server system views for any objects with hekaton in the name, or references to them, in hopes of discovering some new code that would answer all my questions:
SELECT name FROM sys.all_objects
WHERE name LIKE '%hekaton%'
SELECT name FROM sys.all_objects
WHERE object_definition(OBJECT_ID) LIKE '%hekaton%'
Which revealed:
name
------------------------
(0 row(s) affected)
name
------------------------
sp_createstats
sp_recompile
sp_updatestats
(3 row(s) affected)
Hmm. Well that didn't find much. Looks like these procedures are seriously undocumented, unknown, perhaps forbidden knowledge. Maybe a part of some unspeakable evil?
(No, I'm not paranoid, I just like mysteries and thought that punching this up with that kind of thing might keep you reading. I know I'd fall asleep without it.)
OK, so let's check out those 3 procedures and see what they reveal when I search for "Hekaton":
sp_createstats:
-- filter out local temp tables, Hekaton tables, and tables for which current user has no permissions
-- Note that OBJECTPROPERTY returns NULL on type="IT" tables, thus we only call it on type='U' tables
OK, that's interesting, let's go looking down a little further:
((@table_type<>'U') or (0 = OBJECTPROPERTY(@table_id, 'TableIsInMemory'))) and -- Hekaton table
Wellllll, that tells us a few new things:
There's such a thing as Hekaton tables (UPDATE: I'm not the only one to have found them!)
They are not standard user tables and probably not in memory UPDATE: I misinterpreted this because I didn't read all the code when I wrote this blog post.
The OBJECTPROPERTY function has an undocumented TableIsInMemory option
Let's check out sp_recompile:
-- (3) Must not be a Hekaton procedure.
And once again go a little further:
if (ObjectProperty(@objid, 'IsExecuted') <> 0 AND
ObjectProperty(@objid, 'IsInlineFunction') = 0 AND
ObjectProperty(@objid, 'IsView') = 0 AND
-- Hekaton procedure cannot be recompiled
-- Make them go through schema version bumping branch, which will fail
ObjectProperty(@objid, 'ExecIsCompiledProc') = 0)
And now we learn that hekaton procedures also exist, they can't be recompiled, there's a "schema version bumping branch" somewhere, and OBJECTPROPERTY has another undocumented option, ExecIsCompiledProc. (If you experiment with this you'll find this option returns null, I think it only works when called from a system object.)
This is neat! Sadly sp_updatestats doesn't reveal anything new, the comments about hekaton are the same as sp_createstats. But we've ALSO discovered undocumented features for the OBJECTPROPERTY function, which we can now search for:
SELECT name, object_definition(OBJECT_ID) FROM sys.all_objects
WHERE object_definition(OBJECT_ID) LIKE '%OBJECTPROPERTY(%'
I'll leave that to you as more homework. I should add that searching the system procedures was recommended long ago by the late, great Ken Henderson, in his Guru's Guide books, as a great way to find undocumented features. That seems to be really good advice!
Now if you're a programmer/hacker, you've probably been drooling over the last 5 entries for hekaton (in green), because these are the names of source code files for SQL Server! Does this mean we can access the source code for SQL Server? As The Oracle suggested to Neo, can we return to The Source???
Actually, no.
Well, maybe a little bit. While you won't get the actual source code from the compiled DLL and EXE files, you'll get references to source files, debugging symbols, variables and module names, error messages, and even the startup flags for SQL Server. And if you search for "DBCC" or "CHECKDB" you'll find a really nice section listing all the DBCC commands, including the undocumented ones. Granted those are pretty easy to find online, but you may be surprised what those web sites DIDN'T tell you! (And neither will I, go look for yourself!) And as we saw earlier, you'll also find execution plan elements, query processing rules, and who knows what else. It's also instructive to see how Microsoft organizes their source directories, how various components (storage engine, query processor, Full Text, AlwaysOn/HADR) are split into smaller modules. There are over 2000 source file references, go do some exploring!
So what did we learn? We can pull strings out of executable files, search them for known items, browse them for unknown items, and use the results to examine internal code to learn even more things about SQL Server. We've even learned how to use command-line utilities! We are now 1337 h4X0rz! (Not really. I hate that leetspeak crap.)
Although, I must confess I might've gone too far with the "conspiracy" part of this post. I apologize for that, it's just my overactive imagination. There's really no hidden agenda or conspiracy regarding SQL Server internals. It's not The Matrix. It's not like you'd find anything like that in there:
Attach Matrix Database
DM_MATRIX_COMM_PIPELINES
MATRIXXACTPARTICIPANTS
dm_matrix_agents
Alright, enough of this paranoid ranting! Microsoft are not really evil! It's not like they're The Borg from Star Trek:
ALTER FEDERATION DROP
ALTER FEDERATION SPLIT
DROP FEDERATION
#tsql2sday