How to automate a monitoring system for ETL runs
- by Jeffrey McDaniel
Upon completion of the Primavera ETL process there are a few ways to determine if the process finished successfully. First, in the <installation directory>\log folder, there is a staretlprocess.log and staretl.html files. These files will give the output results of the ETL run. The staretl.html file will give a detailed summary of each step of the process, its run time, and its status. The .log file, based on the logging level set in the Configuration tool, can give extensive information about the ETL process. The log file can be used as a validation for process completion.
To automate the monitoring of these log files, perform the following steps:
1. Write a custom application to parse through the log file and search for [ERROR] . In most cases, a major [ERROR] could cause the ETL process to fail. Searching the log and finding this value is worthy of an alert.
2. Determine the total number of steps in the ETL process, and validate that the log file recorded and entry for the final step. For example validate that your log file contains an entry for Step 39/39 (could be different based on the version you are running). If there is no Step 39/39, then either the process is taking longer than expected or it didn't make it to the end. Either way this would be a good cause for an alert.
3. Check the last line in the log file. The last line of the log file should contain an indication that the ETL run completed successfully. For example, the last line of a log file will say (results could be different based on Reporting Database versions): [INFO] (Message) Finished Writing Report
4. You could write an Ant script to execute the ETL process and have it set to - failonerror="true" - and from there send results to an external tool to monitor the jobs, send to email, or send to database.
With each ETL run, the log file appends to the existing log file by default. Because of this behavior, I would recommend renaming the existing log files before running a new ETL process. By doing this, only log entries for the currently running ETL process is recorded in the new log files. Based on these log entries, alerts can be setup to notify the administrator or DBA.
Another way to determine if the ETL process has completed successfully is to monitor the etl_processmaster table. Depending on the Reporting Database version this could be in the Stage or Star databases. As of Reporting Database 2.2 and higher this would be in the Star database. The etl_processmaster table records entries for the ETL run along with a Start and Finish time. If the ETl process has failed the Finish date should be null. This table can be queried at a time when ETL process is expected to be finished and if null send an alert. These are just some options. There are additional ways this can be accomplished based around these two areas - log files or database.
Here is an additional query to gather more information about your ETL run (connect as Staruser):
SELECT SYSDATE,test_script,decode(loc, 0, PROCESSNAME, trim(SUBSTR(PROCESSNAME, loc+1))) PROCESSNAME
,duration duration from ( select (e.endtime - b.starttime) * 1440 duration, to_char(b.starttime, 'hh24:mi:ss') starttime, to_char(e.endtime, 'hh24:mi:ss') endtime,
b.PROCESSNAME, instr(b.PROCESSNAME, ']') loc, b.infotype test_script from ( select processid, infodate starttime, PROCESSNAME, INFOMSG, INFOTYPE from etl_processinfo
where processid = (select max(PROCESSID) from etl_processinfo) and infotype = 'BEGIN' ) b
inner Join ( select processid, infodate endtime, PROCESSNAME, INFOMSG, INFOTYPE from etl_processinfo
where processid = (select max(PROCESSID) from etl_processinfo) and infotype = 'END' ) e on b.processid = e.processid
and b.PROCESSNAME = e.PROCESSNAME order by b.starttime)