Check Your BIOS Settings

In March, I had just received three new servers for a new cluster (Dell R910s) I was creating for our flagship application’s database.  I had just gotten the cluster setup and had moved a copy of our Production database to the cluster to test it out before we migrated to it officially.

The next week I was sitting in the SQLskills Immersion Event in Tampa when I get a page and e-mail with a Sev 24, Error 832.  This error can mean something truly bad has happened, that the memory has corrupted the data while in the Buffer Cache.

I kicked off a CheckDB to see if the corruption was anywhere else in the database, but since I was sitting in the classroom with Paul Randal (Blog | Twitter), the corruption guru himself, I shot him an e-mail letting him know what I was seeing and asking if he had any suggestions.  He suggested that the corruption probably wasn’t on disk, since the 832 specifically means that the corruption happened after the time the page was read into memory, so that was somewhat of a relief.  Since we suspected the box had bad memory, I flipped the instance to another node in the cluster and restarted the CheckDB.

Since this cluster was still in test, there was no real concern from a business perspective, but I wanted my infrastructure guys to test the box as soon as possible, in case there was a hardware issue that would delay the roll-out to production coming up.  So, they started running memory tests as well as all the other Dell diagnostics.  We checked BIOS versions, Windows patches, memory over and over again, and nothing looked wrong, at all.

Finally, after working with Dell Support for several hours, they found that the problem was two BIOS settings, that would normally be disabled for our servers, were enabled.  C-State (CPU States) and C1E (Processor Core Speed and Voltage) were both configured, which was causing problems with the stability of the RAM.  These had been enabled because the server in question had been bought off of the Dell Outlet, and the previous owner must have enabled them, because they are not on by default.

After they disabled those two settings in the BIOS, the server has been stable ever since.  It certainly gave me a scare, but I learned a lesson from it: double-check your BIOS settings, and make sure you know what those settings really mean.

Intellisense Space-Completion Change in SSMS 2012

Microsoft added Intellisense to SSMS in 2008 and many of us have come to love this feature.  However, in 2012, one of my co-workers noticed a problem (from his perspective) in the Intellisense completion action.  The default configuration of SSMS 2012 is such that you can no longer complete an Intellisense selection using a “Space”.  I never noticed this issue because I use “Tab” to do my completions, which still works correctly.  My co-worker was further annoyed to find that there is no way to change this in the SSMS “Tools–>Options” dialog, as you might expect.  However, there is a fix available.  As noted in this StackOverflow Article, if you hit CTRL+ALT+SPACE, the input mode is changed and space works once again.  I hope this helps someone who may have been frustrated by this.

Problems with TCP Chimney Offload (TOE)

Over the past few years, several of my servers have had odd intermittent network connection issues where user sessions would “Time-Out” immediately, even when their timeout was set to 30 seconds (or more).  After digging into the problem from multiple angles with my network guys, I stumbled onto an article that showed promise.  I am re-publishing this information here for quick reference and in case someone else has a need to investigate this as a possible issue/solution.  Please see this TechNet entry for more information.

The TCP Offload Engine (TOE) is intended to alleviate problems with high connections by offloading certain workloads onto hardware.  While this is a great idea in theory, in practice it can sometimes cause more problems than solutions.  It is fairly non-interruptive to try turning the TCP Chimney Offload off to test if this is a cause of a strange network connection issue you may be having, so it is worth a try.

First, check to see if you have TCP Chimney Offload enabled:

netsh int tcp show global

A list of global parameters will be returned, look for the one that says, “Chimney Offload State”.  If it is set to anything other than “disabled”, try setting the parameter to “disabled”.  You will need to be in an elevated command prompt to issue the following command:

netsh int tcp set global chimney=disabled

After issuing this command, observe your system to see if you have a change, either for better or worse, with regard to your network connections.  Again, please see this TechNet entry for more information.

Command Line and Quotes: Fun with C#

Most DBAs don’t get to jump into programming often, but I came from a development background, so I try to use some legitimate code every once in a while.  It helps me keep that skill-set fresh.  Recently, in my company, we had a need for a unified solution for dealing with archiving, deleting, compressing, and so forth of various types of files.  Having an application that can be passed a bunch of parameters and can be triggered by a simple scheduled task is very useful in many aspects of IT.

We used to use ABAKT for these sort of operations, but the project was abandoned a long time ago and we wanted something that we could add to as we needed.  So, I had a few hours of downtime one weekend and decided to write one.  Everything went very well with the project and it worked almost flawlessly.  There was just one problem I ran into that stumped me for a couple days.

The problem cropped up when I was trying to pass a UNC to the application that had a space in it.  Such as:

\\someserver.mydomain\theShare\folder\some folder with spaces\anotherFolder\

Initially, I thought, well that’s no problem, just encapsulate the string in double-quotes and we are fine, as so:

"\\someserver.mydomain\theShare\folder\some folder with spaces\anotherFolder\"

However, the command line interface interprets that last \” as an escaped double-quote and NOT a closing double-quote to the beginning of the UNC.  This, in turn, causes my C# argument parser to see the string as 4 parameters, rather than 1 long one.  The parameters passed to my C# application would be:

  • \\someserver.mydomain\theShare\folder\some
  • folder
  • with
  • spaces\anotherFolder”

The way this issue presented really threw me off track several times.  I couldn’t figure out why my parameters were so messed up.  My solution was to escape out the last backslash, this worked perfectly:

"\\someserver.mydomain\theShare\folder\some folder with spaces\anotherFolder\\"

A little deviation from SQL stuff, but some useful information I wish I could have found when i was working on this project.

Fundamentals Matter

I am currently interviewing candidates for a new Junior DBA at my company.  Even though the position for which I am hiring is a junior position, I do expect the candidates to have a base level of technology, know the basics of server administration, and have some strong critical thinking skills.  So, to test the way each potential employee thinks, I ask a lot of intermediate to sometimes advanced SQL questions.  My thinking is that, 1) I want to know what level the candidate is really at, and 2) I want to discover how they respond to questions they haven’t faced before.  Have they dug down deep into the material, trying to learn as much as they could?  Have they been casual observers of technology?  Have they tried to test out ideas they’ve heard?  I certainly don’t consider my questions pass/fail, they are just an indication of where the person sitting across from me is in their technical career.

I have had several candidates in so far and a few of them have come to me via a recruiter.  After I had wrapped up with the recruiter regarding some of the candidates, I had mentioned to him that some of them, while completely qualified for this position, may want to stretch themselves a bit, because I see a lot of potential there.  He asked if i could clarify, so I started trying to put it into a concise idea.  I realized that the core of what I was trying to get across was simple, people who are starting into their technical career need to learn how to think about technology, not just learn the syntax and the base skills.  So, I came up with these suggestions, mostly specific to SQL Server, that I thought would get the gears turning about how to start digging.  These are not an end unto themselves, but simple stepping stones into building a technology paradigm; in this case, using the vehicle of SQL Server.

  1. Dig into how the syntax works.  Start by opening up SSMS and going through the different wizards.  Select different options and parameters, and then click the “Script” button at the top of the dialog to produce a script that you can read through.  Start working through it and find which parts of the syntax are new to you.  If you find a parameter you are not familiar with, or if you come across an entire group of keywords you have never used before, start researching them.  Find out what the code is actually doing.  Books Online is a great resource to start reading up on the syntax, but then expand your research and test it out for yourself.  Books Online DOES have errors in it, as do hundreds of blogs, forums, and twitter-feeds.  All of this information is coming from some human, double-check their work.
  2. Learn the first layer of Internals.  Read up on how the Transaction Log functions and the role it plays in the database.  Learn what VLFs are and how they work within the Transaction Log.  Learn the process data goes through when queried, updated, and deleted.  There is a lot going on here and understanding the basic principles will help you immensely.
  3. Start building a paradigm.  I like to think that there is no technology that we posses today that was not built up through a quasi-logical process, placing one building block on another along the way.  This enables me to build a paradigm that I can start plugging new information into.  The more robust and thorough my model is, the better I can assimilate new ideas.  If I come across something that doesn’t fit, I either have to recognize the failing in my paradigm and adjust my model to be more correct, or I have to find out why that new information is wrong.  Those are the only two options I see, either I don’t understand something as well as I need to, or the new information is not completely accurate.  It is up to me to find out which is true.  Your brain may not work the exact same way that mine does, but having some type of model in your head of how different systems interact, helps you predict functionality and troubleshoot anomalies.
  4. Start to recognize false information.  A lot of ideas get passed around the SQL community, and some of them are just straight-out wrong.  There are some DBAs who believe everything they read or hear, this leads to a lot of myths being shoveled back and forth that are not true, or are not true in all cases.  Knowing this information, or at least being familiar with it, will give you a leg-up.  Start with Paul Randal’s “Common SQL Server Myths” document.  This is a great summary of some of the information out there that is just plain wrong, and it will give you some good tools to start watching out for bad information that doesn’t fit into your paradigm.

Applying this type of thought to different areas of technology can help you start to think critically and proactively about the technologies that interest you most.  So, when I speak of fundamentals, I am not talking about features and syntax, I am speaking of the fundamentals of being a technologist.  This is a reminder that I myself need to come back to again and again, and I hope it can help someone else as well.

SQLskills Immersion Event 3 Wrap-Up

I am back home after my 3rd SQLskills Immersion Event. This has been a very full week of deep learning,  hundreds of questions, some fun networking, and intelligent (and sometimes sophomoric) discussions ranging from deep dives on SQL Engine internals to “the answer to life, the universe, and everything”.

If you have never attended a SQLskills Immersion Event and you are serious about SQL Server, let me please encourage you to sign up and attend, starting with IE1 (internals and performance). While there is no requirement to take the classes in order, I can attest to the value of doing so. Paul (Blog | Twitter) and Kimberly (Blog | Twitter) lay a very strong framework for everything else in that first week and then they, along with Jonathan (Blog | Twitter) and Joe (Blog | Twitter), build on that framework for the next two classes. On at least 20 different occasions, someone who had skipped the first class would ask a question or be confused on a point that had been explained in detail at the previous event. I understand that the names of IE2 or IE3 may sound more appealing to you, because of something you are facing in your current job, but some of the lessons in these classes are at the 400/500 level, you will benefit from having that deep understanding before progressing.

There were several highlights from this week that I would like to share. Restore strategies, including Kimberly’s USB drive Partial Database Availability and Online Piece-Meal Restore (she goes into more depth in the class than she did at her PASS Summit session). Paul went through a ton of corruption situations and the most productive steps to take to wade through the process of digging out of them, each and every one of them starting with, “Do Not Panic”. Jonathan showed us some great tips and lessons learned regarding Clustering Strategies (my favorite quote from that module: “I carry at least 5 separate clusters with me on my laptop at all times”, what a bragger! :) ). And Joe, SQLskills’ latest addition, did a fantastic job representing the internals of replication and mirroring. There was a lot more, but it would be unreasonable to try to list it all here, just go check them out for yourself.

After these three weeks I feel like I understand what I do on a daily basis even better and have garnered great tips to make my environment more robust and prepare me for new challenges as they arise.
In addition to all of the great learning, I also had a great time hanging out with the SQLskills group (if you don’t know them, they are genuinely awesome people). I also met a lot of great people from the class. Almost all of the attendees are folks who are deeply interested in SQL top-to-bottom, but also have a wide range of other interests, and love to chat about everything from the LHC at CERN to TopGear. I got to know several of them quite well and they were a lot of fun to chat with and bounce ideas off of.

To wrap up, I would highly recommend the Immersion classes. Don’t worry about your skill-set going into IE1, if you are eager to learn, can focus on the classes, and work on applying the knowledge both at work and in the many labs they give you to work through, you will be able to follow everything they are talking about and drastically round out your SQL knowledge.

The Day Before SQLskills Immersion Event 3 – High Availability and Disaster Recovery

Tomorrow starts my third SQLskills Immersion Event.  I took the first two in Chicago (May & Oct 2011), and they were both phenomenal!  They are led by my friends, Paul Randal (Blog | Twitter), Kimberly Tripp (Blog | Twitter), Jonathan Kehayias (Blog | Twitter), and Joe Sack (Blog | Twitter).  So, we have all gathered here in Tampa, FL to improve our knowledge.

Each class has been chocked full of great knowledge with tons of very deep dives and little-known facts.  I am expecting IE3 to be no different.  HA and DR are a huge part of my job and I am always trying to improve on these aspects of my databases.  I am hoping to walk away from this class with new tips, tricks, and ideas to not only improve my current environment, but grow my understanding of SQL and improve my “craft” in general.  One thing is certain:  if you are at a SQLskills Immersion Event, you are serious about deepening your knowledge of SQL.

One extra-curricular note about SQLskills Immersion Events: when they say “Immersion” they mean it!  You have the option to spend a lot of time with your fellow students as well as the teachers.  I love connecting with others in the community, visiting with those I’ve met before, and meeting new people as well.   It is very refreshing to spend time with others who not only love SQL, but also love talking about a wide range of other topics; sometimes we even get into a little philosophy.  :-D

I plan to post an update later in the week or at the end of the event.  Until then, I’ll be fully “immersed” in SQL knowledge!

What My Friends Think I Do…

I had to whip this up to add to the evil Meme Noise.  Enjoy, or Sorry, depending on your reaction:

Database Administrators: What My Friends Think I Do…

Dealing with Big Data – T-SQL Tuesday #027

T-SQL Tuesday

As part of T-SQL Tuesday, I wanted to discuss one of the experiences I have had with dealing with Big Data in SQL.

Over the course of the last few years, our Development group has created a new flagship product that our entire enterprise is now using for day-to-day activities.  It is a system that allows the business users to have direct power over the way their data is calculated and manipulated without us having to make direct changes to compiled code or database procedures.  As a result, we had to create an extremely abstracted data model, in which no data can really be grouped by any kind of “business” idea, and only by general types of data, such as “inputs”, “calculated data”, and “configuration”.  This is all fed into our system from various sources, recorded, versioned, calculated, and reported out through various methods.

It is also a requirement that we keep every version of any data we receive or calculate, as well as every version of the “equations” our business users have created, referencing these data.  This enables them to better explain and modify business decisions based on the data available at the time that operations made the decisions they did.  It also makes audits and compliance checking completely traceable and transparent.

As you may realize, this makes for an enormously complex, but flexible, and powerful, system.  It also creates a very quickly growing amount of data.  This system has only been in production for 13 months so far, and we already have added 3.2 TB of data (with about 6 billion rows in our largest table) and the growth-rate is increasing as we incorporate more areas of business into this model.  We are currently experiencing a growth of over 350GB per month in our production database and we are expecting this to continue to expand to over 500 GB per month by the end of the year.  Due to the nature of our industry, the data is continually resettled by the market and this is done heavily for the first 6 months.  So, the data for 6 months ago will be at least 4 times as large as the data that is processed today.  This is, by no means, the largest database or the largest growth rates out there, but for us, it’s a huge jump from our legacy systems.

So, we have had to rethink the way we address this data.  Our old ways of optimizing and handling the database didn’t translate 1:1 to this new VLDB.  This manifested specifically in one particular design consideration, early on in development.

When I was still developing the database, we were only receiving 5 to 10 GB of data per day in inputs and calcualated data, however, I knew this was still the early stages and that the amount of data was going to grow significantly and I wanted to be ready before we got there.  So, I started testing partitioning schemes to divide up the data, to make the data easily queried, but still seperated enough.

Initially, I tried to create a variation of the sliding window scenario, based on months in the future, dividing out into weeks in the future, dividing out into days for the days surrounding “today”, collapsing back into weeks and months in the past.  The scenario was very complex to setup, but I got the management script created and it was working, but it caused an issue on the switch-out, in which data was unavailable for too long while trying to collapse the weeks into months.  With our resettlement timeline, this was unacceptable.  So, I called Kimberly Tripp (Blog | Twitter), the partitioning expert.

After a conversation about what I was trying to accomplish, she suggested that I stop trying to do a rolling window scenario for this particular case.  Instead, since I wanted the granularity of a single day, go ahead and partition at a day, but create a single table for each year, and then wrap up those years using a partitioned view.  I implemented a variation of this, based on creating a BIGINT that contains both the day in question and an indicator for which market I was dealing with (so that I could have my data split between markets as well).

This has proven to be a very successful method that has kept database round-trips, consisting of 7 to 12 million rows each, very quick and completely within the scope of our users’ needs.  It has also made my partition maintenance very simple.  I only have to add another year out into the future once a year, to keep the machine rolling.  And, once my data has gotten to a place where the business users can allow me to archive off old data, I’ll have a clear delineation upon which to divide my data.  It has been a clear win for my situation and shows promise of scalability far beyond where we are even today.

Resolving Windows Authentication Annoyances using Klist

In my SQL environment, I manage all of my SQL Server rights by Windows Authentication, based upon Active Directory groups. This allows me to centralize my security administration into groups for specific types of users within my environment. I also get the benefit of my developers not ever accessing my databases with connection strings containing a clear-text password to my databases.

I have had two specific issues arising from managing my security in this way. One of my problems is that this allows my domain administrators to have some level of control over who might have access to a database. I have, on at least two occasions, had an over-zealous admin assign rights to a user “because the user asked for it” without my knowledge. I have already posted my solution to this problem in my post, Monitoring Active Directory Groups for Windows Authentication in SQL Server Using PowerShell.  In this post, I would like to discuss my other issue, propagating changes in Active Directory quickly.

When I first started using Windows Authentication for my SQL Servers, based upon Active Directory groups, I would notice that I would add a user to a group in Active Directory and it would take a long time before the user was actually able to use the rights; sometimes they even had to reboot.  This led to a lot of frustration on both my side and the users’.  It even caused us to jump back to SQL Authentication on some time-sensitive rollouts, because we had no idea why the Active Directory rights were “not working”.  So, after a few embarrassing episodes, I decided to dig into why this was happening.

To me, it made no sense why a Microsoft core enterprise system (SQL Server) talking to another Microsoft core enterprise system (Active Directory) couldn’t figure out that new permissions had been granted to a user’s group.  So, I dug a little into how Kerberos works and found that the group membership is passed in the Kerberos ticket, not just a reference back to Active Directory (which I had thought).  So, this meant that when a ticket was granted to a user’s session, the groups that the particular user was a member of were held until either the ticket was cleared or expired.

It can be useful to see what tickets you have open, which you can do by typing “klist” into a command prompt.  This, however, only lets you see your current tickets.

So, that led me to looking into what actions cause your Kerberos tickets to regenerate.  Remember that little message bubble that shows up in one of your sessions right after you have changed your password?

When you lock and unlock your computer, you are causing Windows to request new Kerberos tickets.  This also explains why rebooting a user’s machine caused the rights to start working, even though it was complete overkill.

Another way to force Windows to request new Kerberos tickets is to run “klist purge” from the command prompt.  This explicitly asks Windows to dump your currently Kerberos tickets and thus, request new ones.

Although this is a simple problem, solving it finally relieved a nagging headache I had experienced from time to time.