Oracle Enterprise Manager
Ops Center provides a feature called "OS Analytics". This feature allows
you to get a better understanding of how the Operating System is being
utilized. You can research the historical usage as well as real time
data. This post will show how you can benefit from OS Analytics and how
it works behind the scenes.
The recording of our call to discuss this blog is available here:
https://oracleconferencing.webex.com/oracleconferencing/ldr.php?AT=pb&SP=MC&rID=71517797&rKey=4ec9d4a3508564b3Download the presentation here
See also:
Blog about Alert Monitoring and Problem Notification
Blog about Using Operational Profiles to Install Packages and other content
Here is quick summary of what you can do with OS Analytics in Ops Center:
View historical charts and real time value of CPU, memory, network and disk utilization
Find the top CPU and Memory processes in real time or at a certain historical day
Determine proper monitoring thresholds based on historical data
Drill down into a process details
Where to start
To start with OS Analytics, choose the OS asset in the tree and click the Analytics tab.
You can see the CPU utilization, Memory utilization and Network
utilization, along with the current real time top 5 processes in each
category (click the image to see a larger version):
In the above screen, you can click each of the top 5 processes to
see a more detailed view of that process. Here is an example of one of
the processes:
One of the cool things is that you can see the process tree for
this process along with some port binding and open file descriptors.
Next, click the "Processes" tab to see real time information of all the processes on the machine:
An interesting column is the "Target" column. If you configured
Ops Center to work with Enterprise Manager Cloud Control, then the two
products will talk to each other and Ops Center will display the
correlated target from Cloud Control in this table. If you are only
using Ops Center - this column will remain empty.
The "Threshold" tab is particularly helpful - you can view
historical trends of different monitored values and based on the graph -
determine what the monitoring values should be:
You can ask Ops Center to suggest monitoring levels based on the
historical values or you can set your own. The different colors in the
graph represent the current set levels: Red for critical, Yellow for warning and Blue for Information, allowing you to quickly see how they're positioned against real data.
It's important to note that when looking at longer periods, Ops
Center smooths out the data and uses averages. So when looking at values
such as CPU Usage, try shorter time frames which are more detailed,
such as one hour or one day.
Applying new monitoring values
When first applying new values to monitored attributes - a popup
will come up asking if it's OK to get you out of the current Monitoring
Policy. This is OK if you want to either have custom monitoring for a
specific machine, or if you want to use this current machine as a "Gold
image" and extract a Monitoring Policy from it. You can later apply the
new Monitoring Policy to other machines and also set it as a default
Monitoring Profile.
Once you're done with applying the different monitoring values, you
can review and change them in the "Monitoring" tab. You can also click
the "Extract a Monitoring Policy" in the actions pane on the right to
save all the new values to a new Monitoring Policy, which can then be
found under "Plan Management" -> "Monitoring Policies".
Visiting the past
Under the "History" tab you can "go back in time". This is very
helpful when you know that a machine was busy a few hours ago (perhaps
in the middle of the night?), but you were not around to take a look at
it in real time. Here's a view into yesterday's data on one of the
machines:
You can see an interesting CPU spike happening at around 3:30 am
along with some memory use. In the bottom table you can see the top 5
CPU and Memory consumers at the requested time. Very quickly you can see
that this spike is related to the Solaris 11 IPS repository
synchronization process using the "pkgrecv" command.
The "time machine" doesn't stop here - you can also view historical
data to determine which of the zones was the busiest at a given time:
Under the hood
The data collected is stored on each of the agents under /var/opt/sun/xvm/analytics/historical/
An "os.zip" file exists for the main OS. Inside you will find
many small text files, named after the Epoch time stamp in which they
were taken
If you have any zones, there will be a file called "guests.zip"
containing the same small files for all the zones, as well as a folder
with the name of the zone along with "os.zip" in it
If this is the Enterprise Controller or the Proxy Controller,
you will have folders called "proxy" and "sat" in which you will find
the "os.zip" for that controller
The actual script collecting the data can be viewed for debugging purposes as well:
On Linux, the location is: /opt/sun/xvmoc/private/os_analytics/collect
If you would like to redirect all the standard error into a file
for debugging, touch the following file and the output will go into it:
# touch /tmp/.collect.stderr
The temporary data is collected under /var/opt/sun/xvm/analytics/.collectdb until it is zipped.
If you would like to review the properties for the Analytics, you can view those per each agent in /opt/sun/n1gc/lib/XVM.properties. Find the section "Analytics configurable properties for OS and VSC" to view the Analytics specific values.
I hope you find this helpful! Please post questions in the comments below.
Eran Steiner