Thursday, August 18, 2011

Monitor Databases in DAGs

A few days ago, someone at the Microsoft Forums asked if there was a script to alert an administrator of when Exchange performs a failover of databases in a DAG.

This was something that I have wanted to do for a long time, but never actually got to do it... So here is my current solution (might get improved in the future).


With Exchange 2010 and DAGs, it is important to monitor whenever a database automatic fails over to another server. Although everything keeps working without any problems for end users (hopefully), administrators still have to investigate why a failover happened.

In case you have Exchange deployed across multiple AD sites and a database fails over to a server on another site, this will probably impact the way your users access OWA, for example.

Databases in a DAG, and therefore with multiple copies, have the ActivationPreference attribute that shows which servers have preference over the others to mount the database in case of a disaster or a manual switchover.

The following output is just an example of what you will get if you run the following command in an environment with at least a DAG and multiple copies:

Get-MailboxDatabase | Sort Name | Select Name, ActivationPreference


Name    ActivationPreference
----    --------------------
ADB1    {[MBXA1, 1], [MBXA2, 2]}
ADB2    {[MBXA1, 1], [MBXA2, 2]}
ADB3    {[MBXA1, 1], [MBXA2, 2]}
...
MDB1    {[MBX1, 1], [MBX2, 2], [MBX3, 3], [MBX4, 4]}
MDB2    {[MBX1, 1], [MBX2, 2], [MBX3, 3], [MBX4, 4]}
MDB3    {[MBX1, 1], [MBX2, 2], [MBX3, 3], [MBX4, 4]}
...

Based on the ActivationPreference attribute, we can monitor if databases are currently active on the servers that they should be, i.e., on servers with an ActivationPreference of 1.

To check this, we can use the following script:



Get-MailboxDatabase | Sort Name | ForEach {
 $db = $_.Name
 $curServer = $_.Server.Name
 $ownServer = $_.ActivationPreference | ? {$_.Value -eq 1}

 Write-Host "$db on $curServer should be on $($ownServer.Key) - " -NoNewLine

 If ($curServer -ne $ownServer.Key)
 {
  Write-Host "WRONG" -ForegroundColor Red
 }
 Else
 {
  Write-Host "OK" -ForegroundColor Green
 }
}



Which basically compares the server where the database is currently active with the server that has an ActivationPreference of 1. If they differ, then write WRONG in red to let the administrator know.

But since we are at it, why not also check for the status of the database and the state of its content index? This can be checked using the Get-MailboxDatabaseCopyStatus cmdlet.

According to the Monitoring High Availability and Site Resilience TechNet article, here are all the possible values for the database copy status:


Database Copy Status
Failed - The mailbox database copy is in a Failed state because it isn't suspended, and it isn't able to copy or replay log files. While in a Failed state and not suspended, the system will periodically check whether the problem that caused the copy status to change to Failed has been resolved. After the system has detected that the problem is resolved, and barring no other issues, the copy status will automatically change to Healthy;

Seeding - The mailbox database copy is being seeded, the content index for the mailbox database copy is being seeded, or both are being seeded. Upon successful completion of seeding, the copy status should change to Initializing;

SeedingSource - The mailbox database copy is being used as a source for a database copy seeding operation;

Suspended - The mailbox database copy is in a Suspended state as a result of an administrator manually suspending the database copy by running the Suspend-MailboxDatabaseCopy cmdlet;

Healthy - The mailbox database copy is successfully copying and replaying log files, or it has successfully copied and replayed all available log files;

ServiceDown - The Microsoft Exchange Replication service isn't available or running on the server that hosts the mailbox database copy;

Initializing - The mailbox database copy will be in an Initializing state when a database copy has been created, when the Microsoft Exchange Replication service is starting or has just been started, and during transitions from Suspended, ServiceDown, Failed, Seeding, SinglePageRestore, LostWrite, or Disconnected to another state. While in this state, the system is verifying that the database and log stream are in a consistent state. In most cases, the copy status will remain in the Initializing state for about 15 seconds, but in all cases, it should generally not be in this state for longer than 30 seconds;

Resynchronizing - The mailbox database copy and its log files are being compared with the active copy of the database to check for any divergence between the two copies. The copy status will remain in this state until any divergence is detected and resolved;

Mounted - The active copy is online and accepting client connections. Only the active copy of the mailbox database copy can have a copy status of Mounted;

Dismounted - The active copy is offline and not accepting client connections. Only the active copy of the mailbox database copy can have a copy status of Dismounted;

Mounting - The active copy is coming online and not yet accepting client connections. Only the active copy of the mailbox database copy can have a copy status of Mounting;

Dismounting - The active copy is going offline and terminating client connections. Only the active copy of the mailbox database copy can have a copy status of Dismounting;

DisconnectedAndHealthy - The mailbox database copy is no longer connected to the active database copy, and it was in the Healthy state when the loss of connection occurred. This state represents the database copy with respect to connectivity to its source database copy. It may be reported during DAG network failures between the source copy and the target database copy;

DisconnectedAndResynchronizing - The mailbox database copy is no longer connected to the active database copy, and it was in the Resynchronizing state when the loss of connection occurred. This state represents the database copy with respect to connectivity to its source database copy. It may be reported during DAG network failures between the source copy and the target database copy;

FailedAndSuspended - The Failed and Suspended states have been set simultaneously by the system because a failure was detected, and because resolution of the failure explicitly requires administrator intervention. An example is if the system detects unrecoverable divergence between the active mailbox database and a database copy. Unlike the Failed state, the system won't periodically check whether the problem has been resolved, and automatically recover. Instead, an administrator must intervene to resolve the underlying cause of the failure before the database copy can be transitioned to a healthy state;

SinglePageRestore - This state indicates that a single page restore operation is occurring on the mailbox database copy;



Based on these values, we want the Status attribute to be either Mounted (true for the server where the database is mounted) or Healthy (for the servers that hold a copy of it). For the ContentIndexState attribute, we want it to be always Healthy.

To monitor both these attribute, we can use the following command:


Get-MailboxDatabase | Sort Name | Get-MailboxDatabaseCopyStatus | ForEach {
 If ($_.Status -notmatch "Mounted" -and $_.Status -notmatch "Healthy" -or $_.ContentIndexState -notmatch "Healthy")
 {
  Write-Host "`n$($_.Name) - Status: $($_.Status) - Index: $($_.ContentIndexState)" -ForegroundColor Red
 }
}



Now, let’s put everything together and tell the script that if something is wrong with any database, to send an e-mail to the administrator! This way, we can create a schedule task to run this script every 2 minutes, for example.

Let’s also compare the AD sites where the current server hosting the database is against the AD site where the server that should be hosting the database is. As I mentioned before, this is important as it can change the way users access OWA.

You can also download the entire script from here.

Function getExchangeServerADSite ([String] $excServer)
{
 # We could use WMI to check for the domain, but I think this method is better
 # Get-WmiObject Win32_NTDomain -ComputerName $excServer

 $configNC =([ADSI]"LDAP://RootDse").configurationNamingContext
 $search = new-object DirectoryServices.DirectorySearcher([ADSI]"LDAP://$configNC")
 $search.Filter = "(&(objectClass=msExchExchangeServer)(name=$excServer))"
 $search.PageSize = 1000
 [Void] $search.PropertiesToLoad.Add("msExchServerSite")

 Try {
  $adSite = [String] ($search.FindOne()).Properties.Item("msExchServerSite")
  Return ($adSite.Split(",")[0]).Substring(3)
 } Catch {
  Return $null
 }
}



[Bool] $bolFailover = $False
[String] $errMessage = $null

Get-MailboxDatabase | Sort Name | ForEach {
 $db = $_.Name
 $curServer = $_.Server.Name
 $ownServer = $_.ActivationPreference | ? {$_.Value -eq 1}

 # Compare the server where the DB is currently active to the server where it should be
 If ($curServer -ne $ownServer.Key)
 {
  # Compare the AD sites of both servers
  $siteCur = getExchangeServerADSite $curServer
  $siteOwn = getExchangeServerADSite $ownServer.Key
  
  If ($siteCur -ne $null -and $siteOwn -ne $null -and $siteCur -ne $siteOwn)
  {
   $errMessage += "`n$db on $curServer should be on $($ownServer.Key) (DIFFERENT AD SITE: $siteCur)!" 
  }
  Else
  {
   $errMessage += "`n$db on $curServer should be on $($ownServer.Key)!"
  }

  $bolFailover = $True
 }
}

$errMessage += "`n`n"

Get-MailboxDatabase | Sort Name | Get-MailboxDatabaseCopyStatus | ForEach {
 If ($_.Status -notmatch "Mounted" -and $_.Status -notmatch "Healthy" -or $_.ContentIndexState -notmatch "Healthy")
 {
  $errMessage += "`n$($_.Name) - Status: $($_.Status) - Index: $($_.ContentIndexState)"
  $bolFailover = $True
 }
}

If ($bolFailover)
{
 Send-MailMessage -From "admin_nuno@letsexchange.com -To "exchange.alerts@letsexchange.com" -Subject "DAG NOT Healthy!" -Body $errMessage -Priority High -SMTPserver "mail.letsexchange.com"
 Schtasks.exe /Change /TN "MonitorDAG" /DISABLE
}




As always, sorry for the format of the code...
At the end of the script, if an e-mail is sent, you might want to disable the schedule task, otherwise you will receive an e-mail every two minutes until you resolve the issue...

Please note that there are more attributes that can and should be monitored! For example, you could run the Test-ReplicationHealth to view replication status information about mailbox database copies.

Hope this helps!

12 comments:

  1. Excellent script! Is there a way to trigger this on an event in event viewer instead of running it every 2 minutes? I know how to trigger the scheduled task on event, but I don't know which event ID that would be...

    ReplyDelete
  2. I found that triggering on Event ID 306 in Event Viewer, Microsoft, Exchange, HighAvailability, Operational will activate the script. I haven't tried this live yet.

    "Event ID 306 is what is logged on the Primary Active Manager (PAM) server when a database is moved to another node. [...] The event description doesn’t distinguish between switchovers and failovers, so you might think there isn’t a way to be alerted only when a failover occurs. Exchange logs, however, much more data than what is shown in the description. Click on the Details tab and within the Friendly view, you will see what Exchange actually logs about the event"

    Information found here: http://www.flobee.net/configure-scom-to-alert-when-an-exchange-database-fails-over/

    ReplyDelete
    Replies
    1. Hi,

      Thank you for your comments and for the feedback! :)
      Having SCOM is obviously the ideal scenario. However, not every organization running an Exchange DAG has SCOM in place unfortunately...
      Thank you for the tip!

      Best regards,
      Nuno

      Delete
  3. HI Nuno,
    Thanks for the write-up. I need some help with your amazing scripting ability. The requirement is just 10% of what you mentioned in this blog.


    The below cmdlet returns the number of databases mounted on the server

    Get-MailboxDatabaseCopyStatus -Server EXCH1 | where {$_.status -eq "mounted" } | Measure-Object

    Count :15
    Average :
    Sum :
    Maximum :
    Minimum :
    Property :

    Need help to write a powershell script which will trigger an email if the value of count is more then 15
    the input of the server names and databases on each server can be given by .csv file. to make it easier to manage.

    Thanks for your time and efforts.

    ReplyDelete
    Replies
    1. Hi G9,

      Sure, it is a pleasure to help. If you just want something really, really basic, you can do something like this:

      While ($True) {
      If ((Get-MailboxDatabaseCopyStatus -Server EXCH1 | ? {$_.Status -eq "mounted"}).Count -lt 15) {
      Send-MailMessage -From ExchangeAdmin@domain.com -To user@domain.com -Subject "Mounted DBs!" -Priority High -SmtpServer mail.domain.com
      Exit
      }

      Start-Sleep -Seconds 60
      }

      After this, you can start adding other things like a nice HTML body with which DBs are mounted and dismounted, check all servers in the environment, etc...

      Does this help?

      Regards,
      Nuno

      Delete
  4. Can this be ran on an a 2 node Exchange 2013 DAG? Also what server runs this script as a scheduled task? Would it be the witness server? thanks

    ReplyDelete
    Replies
    1. Yes it can! You can run the script from any Exchange server, even one that is part of the same org, but not part of the DAG you want to monitor :)

      Delete
  5. Hi Nuno!
    Please help me. I am beginner in powershell but I need a script which send an e-mail when the Mounted status changed.
    Thanks

    ReplyDelete
    Replies
    1. Hi! What do you mean, from when a DB gets dismounted?

      Delete
    2. When the DB copy mounted status switch from server1 to server2

      Delete
    3. But that's what this script is for... Is it not working for you?

      Delete
    4. Sorry, my mistake. It works, thank you!

      Delete