BES Logs ... What's in them? Part 5 of 20 (CTRL log)
Posted 06-26-2009 at 04:00 AM by hdawg
Tags bes log, blackberry controller, blackberry controller log, blackberry logs, blackberry server logs, ctrl log
Welcome to Part 5 of at least a 20 part series looking at the BES logs. (Part 4)
In this post we'll look at my favorite log; the BlackBerry Controller Log (CTRL). Nine times out of ten this is always the first log I go to when troubleshooting ... and if I'm troubleshooting message delivery it is the first log I go to.
Before we get into it, I want you to know that this is my favorite log is because it is no nonsense and clutter free. Only two things ever effectively show up in this log ... a
or a 
The BlackBerry Controller doesn't deliver messages, doesn't manage IT Policy, doesn't perform encryption and decryption, and certainly doesn't manage application deployment. The BlackBerry Controller is the watchdog of BES. It monitors the health of the other BES services, and gives them a kick start back to life if they die ... to a certain degree.
Just like all the other logs, this one will turn over daily and at the top you'll see a list of processes and information about the server and in addition you'll see:
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart BES Agent(s) on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart Dispatcher(s) on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart Sync Server on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart Policy Server on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart MDS on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart Attachment Server on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart Router on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart DB Consistency on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart Instant Messanging on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart Instant Messanging Connector on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart MDS Services on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will wait for WaitCount = 6 to restart BES Agent(s) on hung threads
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] The number of BES Agent restarts is limited to 10 times per 24 hours
[20000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller won't generate userdump files. Userdump.exe not found.
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] No blackout period found for BES Agent restart on hung thread
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart agents that did not send 2 heartbeats in a row
Take a look at the BES Agent info above; the Controller will do restarts if the service needs it, but once it gets to a certain threshold: The number of BES Agent restarts is limited to 10 times per 24 hours, the Controller will give up: [20406] (06/01 14:53:10.419):{0x22AC} 'TESTBES1' agent 1: will not restart - reached the maximum of 10 restarts per 24 hours. Restarting the BlackBerry Dispatcher service or killing a process through task manager will not provide a resolution. The only way to recover a server from this notification is to restart the operating system; reboot.
If you decide to not reboot, the CTRL log will generate entries once a minute letting you know that it isn't going to restart the Messaging Agent. Since the CTRL and ALRT are B.F.F.'s, every one of these alerts will also show up in the ALRT log, which we all know will show up in your Windows Application Log. If you've got some monitoring in place you could have it notify you of such an issue ... honestly though, by this point you've probably already received a few calls about message delays or people not getting email.
A healthy CTRL log will have a log full of these entries (notice the 10 minute increments):
[30000] (06/01 04:06:06.987):{0xDD4} Performing system health check (BlackBerry Controller Version 4.1.5.40)
[30000] (06/01 04:16:06.696):{0xDD4} Performing system health check (BlackBerry Controller Version 4.1.5.40)
[30000] (06/01 04:26:06.405):{0xDD4} Performing system health check (BlackBerry Controller Version 4.1.5.40)
This is a way for the service to say: Hey, I'm working here ... all is good.
An unhealthy CTRL log (and more importantly an unhealthy BES) will look like this:
[30000] (06/01 14:12:46.435):{0x22B0} Performing system health check (BlackBerry Controller Version 4.1.5.40)
[30000] (06/01 14:21:04.038):{0x22AC} 'TESTBES1': controlling 1 agents (1)
[30000] (06/01 14:21:04.061):{0x22AC} 'Unnamed' extension 1 started as process 11236
[30000] (06/01 14:21:04.065):{0x22AC} 'TESTBES1' agent 1 started as process 10348
[30000] (06/01 14:21:05.123):{0x22AC} 'Unnamed' extension 1: UDP log port is 4185
[30000] (06/01 14:21:05.431):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:21:07.095):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:21:07.095):{0x22AC} 'TESTBES1' agent 1 will start in 7 seconds
[30000] (06/01 14:21:17.097):{0x22AC} 'TESTBES1' agent 1 started as process 8680
[30000] (06/01 14:21:17.929):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:21:19.263):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:21:19.263):{0x22AC} 'TESTBES1' agent 1 will start in 8 seconds
[30000] (06/01 14:21:29.265):{0x22AC} 'TESTBES1' agent 1 started as process 10160
[30000] (06/01 14:21:30.082):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:21:31.433):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:21:31.433):{0x22AC} 'TESTBES1' agent 1 will start in 8 seconds
[30000] (06/01 14:21:41.450):{0x22AC} 'TESTBES1' agent 1 started as process 9868
[30000] (06/01 14:21:42.338):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:21:43.601):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:21:43.601):{0x22AC} 'TESTBES1' agent 1 will start in 8 seconds
[30000] (06/01 14:21:53.618):{0x22AC} 'TESTBES1' agent 1 started as process 8692
[30000] (06/01 14:21:54.465):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:21:55.771):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:21:55.771):{0x22AC} 'TESTBES1' agent 1 will start in 8 seconds
[30000] (06/01 14:22:05.786):{0x22AC} 'TESTBES1' agent 1 started as process 11112
[30000] (06/01 14:22:06.582):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:22:07.954):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:22:07.954):{0x22AC} 'TESTBES1' agent 1 will start in 8 seconds
[30000] (06/01 14:22:17.955):{0x22AC} 'TESTBES1' agent 1 started as process 3376
[30000] (06/01 14:22:18.737):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:22:20.122):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:22:20.122):{0x22AC} 'TESTBES1' agent 1 will start in 7 seconds
[30000] (06/01 14:22:30.124):{0x22AC} 'TESTBES1' agent 1 started as process 9308
[30000] (06/01 14:22:30.967):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:22:32.291):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:22:32.291):{0x22AC} 'TESTBES1' agent 1 will start in 8 seconds
[30000] (06/01 14:22:42.292):{0x22AC} 'TESTBES1' agent 1 started as process 6344
[30000] (06/01 14:22:43.094):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:22:44.460):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:22:44.460):{0x22AC} 'TESTBES1' agent 1 will start in 8 seconds
[30000] (06/01 14:22:46.472):{0x22B0} Performing system health check (BlackBerry Controller Version 4.1.5.40)
[30000] (06/01 14:22:54.476):{0x22AC} 'TESTBES1' agent 1 started as process 11248
[30000] (06/01 14:22:55.323):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:22:56.629):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:22:56.629):{0x22AC} 'TESTBES1' agent 1 will start in 8 seconds
[30000] (06/01 14:23:06.645):{0x22AC} 'TESTBES1' agent 1 started as process 9564
[30000] (06/01 14:23:07.537):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:23:08.812):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[20406] (06/01 14:23:08.821):{0x22AC} 'TESTBES1' agent 1: will not restart - reached the maximum of 10 restarts per 24 hours
[20406] (06/01 14:24:08.856):{0x22AC} 'TESTBES1' agent 1: will not restart - reached the maximum of 10 restarts per 24 hours
[20406] (06/01 14:25:08.907):{0x22AC} 'TESTBES1' agent 1: will not restart - reached the maximum of 10 restarts per 24 hours
[20406] (06/01 14:26:08.955):{0x22AC} 'TESTBES1' agent 1: will not restart - reached the maximum of 10 restarts per 24 hours
In the matter of two minutes, the BlackBerry Controller restarted messaging agent #1 on TESTBES1 10 times and then gave up. The cause for these errors could be one of or multiple things (in no particular order):
Regardless of the root cause, something is wrong and it requires immediate attention.
Straight forward and to the point, alerts and errors when something is really wrong; that is why I love the CTRL log.
If you've got any questions about other specific entries in the CTRL log feel free to post a question / comment in response to this post, or in the forum.
Tune in tomorrow where I'm going to take a break from looking at a specific log file, but will instead provide some information on what tools I use for log file analysis. We'll get back to the named log files after the weekend.
Thanks again to everyone for reading ... if you've made it through this first week
. If you've got any feedback or ideas for what you'd like to see more on please drop me a line.
In this post we'll look at my favorite log; the BlackBerry Controller Log (CTRL). Nine times out of ten this is always the first log I go to when troubleshooting ... and if I'm troubleshooting message delivery it is the first log I go to.
Before we get into it, I want you to know that this is my favorite log is because it is no nonsense and clutter free. Only two things ever effectively show up in this log ... a
The BlackBerry Controller doesn't deliver messages, doesn't manage IT Policy, doesn't perform encryption and decryption, and certainly doesn't manage application deployment. The BlackBerry Controller is the watchdog of BES. It monitors the health of the other BES services, and gives them a kick start back to life if they die ... to a certain degree.
Just like all the other logs, this one will turn over daily and at the top you'll see a list of processes and information about the server and in addition you'll see:
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart BES Agent(s) on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart Dispatcher(s) on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart Sync Server on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart Policy Server on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart MDS on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart Attachment Server on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart Router on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart DB Consistency on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart Instant Messanging on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart Instant Messanging Connector on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart MDS Services on crash
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will wait for WaitCount = 6 to restart BES Agent(s) on hung threads
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] The number of BES Agent restarts is limited to 10 times per 24 hours
[20000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller won't generate userdump files. Userdump.exe not found.
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] No blackout period found for BES Agent restart on hung thread
[30000] (06/01 00:00:00.247):{0x1A50} [CFG] Controller will restart agents that did not send 2 heartbeats in a row
Take a look at the BES Agent info above; the Controller will do restarts if the service needs it, but once it gets to a certain threshold: The number of BES Agent restarts is limited to 10 times per 24 hours, the Controller will give up: [20406] (06/01 14:53:10.419):{0x22AC} 'TESTBES1' agent 1: will not restart - reached the maximum of 10 restarts per 24 hours. Restarting the BlackBerry Dispatcher service or killing a process through task manager will not provide a resolution. The only way to recover a server from this notification is to restart the operating system; reboot.
If you decide to not reboot, the CTRL log will generate entries once a minute letting you know that it isn't going to restart the Messaging Agent. Since the CTRL and ALRT are B.F.F.'s, every one of these alerts will also show up in the ALRT log, which we all know will show up in your Windows Application Log. If you've got some monitoring in place you could have it notify you of such an issue ... honestly though, by this point you've probably already received a few calls about message delays or people not getting email.
A healthy CTRL log will have a log full of these entries (notice the 10 minute increments):
[30000] (06/01 04:06:06.987):{0xDD4} Performing system health check (BlackBerry Controller Version 4.1.5.40)
[30000] (06/01 04:16:06.696):{0xDD4} Performing system health check (BlackBerry Controller Version 4.1.5.40)
[30000] (06/01 04:26:06.405):{0xDD4} Performing system health check (BlackBerry Controller Version 4.1.5.40)
This is a way for the service to say: Hey, I'm working here ... all is good.
An unhealthy CTRL log (and more importantly an unhealthy BES) will look like this:
[30000] (06/01 14:12:46.435):{0x22B0} Performing system health check (BlackBerry Controller Version 4.1.5.40)
[30000] (06/01 14:21:04.038):{0x22AC} 'TESTBES1': controlling 1 agents (1)
[30000] (06/01 14:21:04.061):{0x22AC} 'Unnamed' extension 1 started as process 11236
[30000] (06/01 14:21:04.065):{0x22AC} 'TESTBES1' agent 1 started as process 10348
[30000] (06/01 14:21:05.123):{0x22AC} 'Unnamed' extension 1: UDP log port is 4185
[30000] (06/01 14:21:05.431):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:21:07.095):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:21:07.095):{0x22AC} 'TESTBES1' agent 1 will start in 7 seconds
[30000] (06/01 14:21:17.097):{0x22AC} 'TESTBES1' agent 1 started as process 8680
[30000] (06/01 14:21:17.929):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:21:19.263):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:21:19.263):{0x22AC} 'TESTBES1' agent 1 will start in 8 seconds
[30000] (06/01 14:21:29.265):{0x22AC} 'TESTBES1' agent 1 started as process 10160
[30000] (06/01 14:21:30.082):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:21:31.433):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:21:31.433):{0x22AC} 'TESTBES1' agent 1 will start in 8 seconds
[30000] (06/01 14:21:41.450):{0x22AC} 'TESTBES1' agent 1 started as process 9868
[30000] (06/01 14:21:42.338):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:21:43.601):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:21:43.601):{0x22AC} 'TESTBES1' agent 1 will start in 8 seconds
[30000] (06/01 14:21:53.618):{0x22AC} 'TESTBES1' agent 1 started as process 8692
[30000] (06/01 14:21:54.465):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:21:55.771):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:21:55.771):{0x22AC} 'TESTBES1' agent 1 will start in 8 seconds
[30000] (06/01 14:22:05.786):{0x22AC} 'TESTBES1' agent 1 started as process 11112
[30000] (06/01 14:22:06.582):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:22:07.954):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:22:07.954):{0x22AC} 'TESTBES1' agent 1 will start in 8 seconds
[30000] (06/01 14:22:17.955):{0x22AC} 'TESTBES1' agent 1 started as process 3376
[30000] (06/01 14:22:18.737):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:22:20.122):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:22:20.122):{0x22AC} 'TESTBES1' agent 1 will start in 7 seconds
[30000] (06/01 14:22:30.124):{0x22AC} 'TESTBES1' agent 1 started as process 9308
[30000] (06/01 14:22:30.967):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:22:32.291):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:22:32.291):{0x22AC} 'TESTBES1' agent 1 will start in 8 seconds
[30000] (06/01 14:22:42.292):{0x22AC} 'TESTBES1' agent 1 started as process 6344
[30000] (06/01 14:22:43.094):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:22:44.460):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:22:44.460):{0x22AC} 'TESTBES1' agent 1 will start in 8 seconds
[30000] (06/01 14:22:46.472):{0x22B0} Performing system health check (BlackBerry Controller Version 4.1.5.40)
[30000] (06/01 14:22:54.476):{0x22AC} 'TESTBES1' agent 1 started as process 11248
[30000] (06/01 14:22:55.323):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:22:56.629):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[30000] (06/01 14:22:56.629):{0x22AC} 'TESTBES1' agent 1 will start in 8 seconds
[30000] (06/01 14:23:06.645):{0x22AC} 'TESTBES1' agent 1 started as process 9564
[30000] (06/01 14:23:07.537):{0x22AC} 'TESTBES1' agent 1: UDP log port is 4085
[30000] (06/01 14:23:08.812):{0x22AC} 'TESTBES1' agent 1 stopped. Exit code = 5305
[20406] (06/01 14:23:08.821):{0x22AC} 'TESTBES1' agent 1: will not restart - reached the maximum of 10 restarts per 24 hours
[20406] (06/01 14:24:08.856):{0x22AC} 'TESTBES1' agent 1: will not restart - reached the maximum of 10 restarts per 24 hours
[20406] (06/01 14:25:08.907):{0x22AC} 'TESTBES1' agent 1: will not restart - reached the maximum of 10 restarts per 24 hours
[20406] (06/01 14:26:08.955):{0x22AC} 'TESTBES1' agent 1: will not restart - reached the maximum of 10 restarts per 24 hours
In the matter of two minutes, the BlackBerry Controller restarted messaging agent #1 on TESTBES1 10 times and then gave up. The cause for these errors could be one of or multiple things (in no particular order):
- Exchange Server down: If the Messaging Agent can't connect to an Exchange server it'll keep trying and eventually die
- Overloaded Exchange Server: BES and other clients may be submitting too many requests and Exchange may just stop responding.
- Network outage / latency: BES requires a reliable, low latency (<35ms RTT ping) connection to an Exchange server. This is why it is always best practice to keep your BES & Exchange servers close to each other / on the same LAN.
- Other Network Infrastructure: There could be problems with Active Directory or DNS that eventually cause this failure.
Regardless of the root cause, something is wrong and it requires immediate attention.
Straight forward and to the point, alerts and errors when something is really wrong; that is why I love the CTRL log.
If you've got any questions about other specific entries in the CTRL log feel free to post a question / comment in response to this post, or in the forum.
Tune in tomorrow where I'm going to take a break from looking at a specific log file, but will instead provide some information on what tools I use for log file analysis. We'll get back to the named log files after the weekend.
Thanks again to everyone for reading ... if you've made it through this first week
Total Comments 2
Comments
-
The ctl message below mean BES lost connection with Exchange? what will happen if waitcount goes up to 10? BES may have to reboot to rebuild connect with Exchange?
Thanks alot!
Bill
-------------------------------------------------------
[30000] (11/18 10:25:42.105):{0xA2C} Performing system health check (BlackBerry Controller Version 4.1.6.54)
[30000] (11/18 10:25:56.714):{0xA20} 'besserver' agent 1: hung threads detected. WaitCount = 2
[30000] (11/18 10:25:56.714):{0xA20} 'besserver' agent 1: hung threads detected. WaitCount = 2
[30000] (11/18 10:25:56.714):{0xA20} 'besserver' agent 1: hung threads detected. WaitCount = 2
[30000] (11/18 10:25:56.729):{0xA20} 'besserver' agent 1: hung threads detected. WaitCount = 2
[30000] (11/18 10:25:56.729):{0xA20} 'besserver' agent 1: hung threads detected. WaitCount = 2Posted 11-18-2009 at 12:48 PM by bizheng
-
Posted 11-18-2009 at 12:54 PM by hdawg












