FMw Diagnostic Framework : Automatic Capture of Diagnostic Data Upon First Failure!

Posted by Daniel Mortimer on Oracle Blogs See other posts from Oracle Blogs or by Daniel Mortimer
Published on Fri, 2 Nov 2012 12:58:38 +0000 Indexed on 2012/11/02 17:18 UTC
Read the original article Hit count: 419

Filed under:

Introduction

There is nothing more frustrating than a problem that "cannot be reproduced". Logs, configuration files have been analysed but there just isn't enough information to establish the root cause. The issue maybe closed, but you are left with the feeling that the problem will raise its ugly head again in the future. Trouble is, to resolve such issues you need to capture diagnostic data at the exact time the incident occurs. Step forward Fusion Middleware Diagnostic Framework! 

Diagnostic Framework monitors WebLogic Managed Servers and delivers "Automatic capture of diagnostic data upon first failure". To quote from

Oracle Fusion Middleware Administrator's Guide 11g Release 1 (11.1.1)
Chapter 13 Diagnosing Problems

"When a critical error occurs ... the Diagnostic Framework automatically collects diagnostics, such as thread dumps, DMS metric dumps, and WebLogic Diagnostics Framework (WLDF) server image dumps ... The data is stored in a file-based repository and is accessible with command-line utilities."

In other words the data collected upon first failure - especially the thread and image dumps - provides a snapshot of the system as or immediately after the problem occurs. The table below shows the type of WebLogic Server issues which fall into the scope of Diagnostic Framework

How to Configure Diagnostic Framework?

Depending on your Fusion Middleware product choice you may not need to do anything! Diagnostic Framework is automatically installed, configured and initiated for any WebLogic Domain which has the Oracle Java Required Files (JRF) template applied. This template is applied by default whenever you configure WebLogic Managed Servers for products such as

  • Portal / Forms / Reports / Discoverer
  • Identity Management ( OID , OAM , OIM etc)
  • WebCenter
  • SOA

Check your WebLogic Domain directory structure. If you have an "adr" sub directory under

DOMAIN_HOME/servers/<servername>/

then JRF template has been applied and Diagnostic Framework will be in play.

Should the "adr" sub directory not exist, review the advice given in My Oracle Support article

How to Apply FMW ( EM ) Control and JRF to a WebLogic Domain and Managed Servers [ID 947043.1]

If you are working with a standalone WebLogic Server solution and applying Oracle JRF is not acceptable, consider using WLDF - WebLogic Diagnostic Framework. (Fusion Middleware Diagnostic Framework makes use of WLDF under the covers.) Couple of useful links about WLDF are listed below

How to Get Started With Diagnostic Framework

To be frank, the Fusion Middleware Administrator's Guide is the best place to start your learning

Oracle Fusion Middleware Administrator's Guide 11g Release 1 (11.1.1)
Chapter 13 Diagnosing Problems

A lot of reading here,  but if you are in hurry and just want to get the right information to Oracle Support to help resolve your issue, check out the next section below.

How to Upload Diagnostic Framework Incident Data to Oracle Support

Some Background Information

There are three interfaces to the Repository:

  1. Enterprise Manager Cloud Control (Support Workbench)
  2. WLST (Command Line)
  3. ADRCI (Command Line)

The Enterprise Manager Cloud Control does provide a nice GUI interface to search, view and package diagnostic framework incidents. However, this software is not to be confused with Fusion Middleware (EM) Control. Cloud Control (formerly known as Grid Control) is part of the Enterprise Manager media package. EM Cloud Control has it's own install and configuration story. Therefore, for the benefit of those yet to install and play with Cloud Control, I am going to describe how to use the command line tools.

Ideally, you would only need to one command line interface, but currently I suggest using both - mainly due to the fact that ADRCI SHOW INCIDENTS does not reveal the description behind the Diagnostic Framework error code.

Instructions:

Note:

WLST and ADRCI are case sensitive when it comes to handling parameter values. If you make a mistake, expect an unfriendly syntax error message.

1) Find the incident

Note:

The managed server which you are troubleshooting must be up and running. If the managed server is down, ensure the domain's Admin Server is accessible. If you cannot connect to the Admin Server or the Managed Server the example WLST commands will not work.

a) Launch WLST 

Note: Use the WLST which resides in the "oracle_common" directory (not WL_HOME/common/bin) otherwise you will get a syntax error like the one below

Traceback (innermost last):
  File "<console>", line 1, in ?
NameError: listIncidents

MW_HOME/oracle_common/common/bin/wlst.sh

b) Connect to the managed server or the admin server e.g.

wls:/offline> connect('weblogic','welcome1','t3://localhost:7020')

c) Run the command

wls:/MyDomain/serverConfig> listIncidents()

This will list the incidents for the server to which you have connected. If you have connected to the Admin Server and want to list the incidents for a managed server within the domain, use the command

wls:/MyDomain/serverConfig> listIncidents(adrHome='diag\ofm\MyDomain\MyManagedServer'
,server='MyManagedServer')

Example output

Incident Id     Problem Key              Incident Time
        1       DFW-99998 [java.lang.NullPointerException]
[oracle.error.simulator.ErrorSimulator.createNullPointerException][errorWebApp_1-0-0-0]        
Fri Nov 02 10:38:46 GMT 2012

 The piece highlighted in bold is the description you do not see when using the ADRCI 'SHOW INCIDENT' command.

Make a note of the incident id. You are ready to move to step 2

2. Package the incident

a) Set up the environment - example commands below are for Unix

cd <DOMAIN_HOME>/bin
. ./setDomainEnv.sh

If you want ADRCI to run a Remote Diagnostic Agent collection (recommended) at generate package time, point ORACLE_HOME at oracle_common

ORACLE_HOME=$MW_HOME/oracle_common; export ORACLE_HOME

To prevent ADRCI from running RDA at generate package time, point ORACLE_HOME at WL_HOME/server/adr directory. 

ORACLE_HOME=$WL_HOME/server/adr; export ORACLE_HOME

b) Launch adrci

$WL_HOME/server/adr/adrci

c) Set BASE and HOMEPATH

adrci> SET BASE /oracle/middleware/user_projects/domains/
mydomain/servers/mymanagedserver/adr
adrci> SET HOMEPATH diag/ofm/mydomain/mymanagedserver

d)  Optionally run SHOW INCIDENTS e.g.

adrci> SHOW INCIDENTS -MODE DETAIL
ADR Home = /oracle/middleware/user_projects/domains/mydomain/
servers/mymanagedserver/adr/diag/ofm/mydomain/mymanagedserver:
*************************************************************************

**********************************************************
INCIDENT INFO RECORD 1
**********************************************************
   INCIDENT_ID                   1
   STATUS                        ready
   CREATE_TIME                   2012-11-02 10:38:46.468000 +00:00
   PROBLEM_ID                    1
   CLOSE_TIME                    <NULL>
   FLOOD_CONTROLLED              none
   ERROR_FACILITY                DFW
   ERROR_NUMBER                  99998
   ERROR_ARG1                    <NULL>
   ERROR_ARG2                    <NULL>
   ERROR_ARG3                    <NULL>
   ERROR_ARG4                    <NULL>
   ERROR_ARG5                    <NULL>
   ERROR_ARG6                    <NULL>
   ERROR_ARG7                    <NULL>
   ERROR_ARG8                    <NULL>
   ERROR_ARG9                    <NULL>
   ERROR_ARG10                   <NULL>
   ERROR_ARG11                   <NULL>
   ERROR_ARG12                   <NULL>
   SIGNALLING_COMPONENT          <NULL>
   SIGNALLING_SUBCOMPONENT       <NULL>
   SUSPECT_COMPONENT             <NULL>
   SUSPECT_SUBCOMPONENT          <NULL>
   ECID                          5162744c6a2eea5e:155ff445:13ac0aae7cb:-8000-000
0000000000325
   IMPACTS                       0
1 rows fetched

e)  Create a logical package

IPS CREATE PACKAGE INCIDENT incident_number

e.g.

adrci> IPS CREATE PACKAGE INCIDENT 1
Created package 1 based on incident id 1, correlation level typical

f) Generate the package

IPS GENERATE PACKAGE package_number IN path

e.g.

adrci> IPS GENERATE PACKAGE 1 IN /tmp
Generated package 1 in file /tmp/DFW99998j_20121102113633_COM_1.zip, mode complete

Note:

If the generate package command hangs, ADRCI may be experiencing an issue when running RDA. To avoid such trouble, exit ADRCI and point the ORACLE_HOME environment variable at WL_HOME/server/adr

3) Upload the package zip to Oracle Support via your Service Request

a) Log into My Oracle Support and locate your Service Request

b) Click on "Add Attachments


c) And upload the zip file


© Oracle Blogs or respective owner

Related posts about /Oracle