Search Results

Search found 63 results on 3 pages for 'respawn'.

Page 3/3 | < Previous Page | 1 2 3

Do we need to explicitly pass php.ini's location to php-fpm?

- by F21

I am seeing a strange issue where my php.ini is not used if I do not explicitly pass it to php-fpm when starting it. This is the upstart script I am using: start on (filesystem and net-device-up IFACE=lo) stop on runlevel [016] pre-start script mkdir -p /run/php end script expect fork respawn exec /usr/local/php/sbin/php-fpm --fpm-config /etc/php/php-fpm.conf If PHP is started with the above, my php.ini is never used, even though it is in Configuration File (php.ini) Path. This is the relevant part from phpinfo(): Configuration File (php.ini) Path /etc/php/ Loaded Configuration File (none) Scan this dir for additional .ini files (none) Additional .ini files parsed (none) If I modify the last line of the upstart script to point php-fpm to php.ini explicitly: exec /usr/local/php/sbin/php-fpm --fpm-config /etc/php/php-fpm.conf -c /etc/php/php.ini Then we see that the php.ini is loaded: Configuration File (php.ini) Path /etc/php/ Loaded Configuration File /etc/php/php.ini Scan this dir for additional .ini files (none) Additional .ini files parsed (none) Why is this the case? Is this a quirk in php-fpm? Minor update: This also seems to be a problem for php5-fpm installed using apt-get. I did a test install in a Ubuntu Server 12.04 virtual machine by running the following: sudo apt-get install nginx php5-fpm PHP-FPM and nginx were started after installation and everything seemed fine. I then uncommented php's settings in nginx's configuration and placed a test phpinfo() file to inspect PHP's settings. The relevant bits are: Configuration File (php.ini) Path /etc/php5/fpm Loaded Configuration File (none) Scan this dir for additional .ini files /etc/php5/fpm/conf.d Additional .ini files parsed /etc/php5/fpm/conf.d/10-pdo.ini I noted that no php.ini was loaded either. However, if I go to /etc/php5/fpm, I can see that a php.ini exists. I also checked the start up scripts for PHP-FPM and the -c parameter was not used to link the ini file to PHP. This can potentially be confusing for people who would expect php.ini to be loaded automatically by PHP-FPM.

Read the article
SuperMicro BMC on OpenSuSE Linux --cannot access from LAN

- by Kendall

Hi, I have an (old) SMC-001 IPMI device on an (old) X6DVL-EG2 motherboard. My problem is that I cannot access the BMC from LAN. I'm also getting some interesting output from ipmitool. First, the setup. I enable Console Redirection in the BIOS, turn BIOS Redirection after POSt to "disabled". I then modprobe'ed for ipmi_msghandler, ipmi_devintf and ipmi_si. I then found ipmi0 under /dev. So far so good. Since I want console redirection over serial, I modified /boot/grub/menu.lst: http://pastebin.com/YYJmhusQ I then modified "/etc/inittab" as follows: S1:12345:respawn:/sbin/agetty -L 19200 ttyS1 ansi Networking I set as following, using "ipmitool" ipaddr: 192.168.3.164 netmask: 255.255.255.0 defgw: 192.168.3.1 The above are correct for my environment. To test it I do: ipmitool -I open chassis power off which responds by powering off the machine. When I to access from another computer on the network, however, I get an error message: host# ipmitool -I lanplus -H 192.168.10.164 -U Admin -a chassis power status Error: Unable to establish LAN session Unable to get Chassis Power Status "Admin" seems to be a valid user name: host# ipmitool -I open user list 1 2 Admin true false true USER The interesting output from ipmitool I initially mentioned: host # ipmitool -I open lan set 1 access on Set Channel Access for channel 1 failed: Request data field length limit exceeded Also, newload4:/home/gjones # ipmitool channel info 1 Channel 0x1 info: Channel Medium Type : 802.3 LAN Channel Protocol Type : IPMB-1.0 Session Support : session-less Active Session Count : 0 Protocol Vendor ID : 7154 Get Channel Access (volatile) failed: Request data field length limit exceeded The output of "ipmitool -I open lan print 1" is here: http://pastebin.com/UZyL6yyE Any help/suggestions is greatly appreciated; I've been working with this thing for a few hours now with no success.

Read the article
PHP-FPM processes holding onto MongoDB connection states

- by Brendan

For the relevant part of our server stack, we're running: NGINX 1.2.3 PHP-FPM 5.3.10 with PECL mongo 1.2.12 MongoDB 2.0.7 CentOS 6.2 We're getting some strange, but predictable behavior when the MongoDB server goes away (crashes, gets killed, etc). Even with a try/catch block around the connection code, i.e: try { $mdb = new Mongo('mongodb://localhost:27017'); } catch (MongoConnectionException $e) { die( $e->getMessage() ); } $db = $mdb->selectDB('collection_name'); Depending on which PHP-FPM workers have connected to mongo already, the connection state is cached, causing further exceptions to go unhandled, because the $mdb connection handler can't be used. The troubling thing is that the try does not consistently fail for a considerable amount of time, up to 15 minutes later, when -- I assume -- the php-fpm processes die/respawn. Essentially, the behavior is that when you hit a worker that hasn't connected to mongo yet, you get the die message above, and when you connect to a worker that has, you get an unhandled exception from $mdb->selectDB('collection_name'); because catch does not run. When PHP is a single process, i.e. via Apache with mod_php, this behavior does not occur. Just for posterity, going back to Apache/mod_php is not an option for us at this time. Is there a way to fix this behavior? I don't want the connection state to be inconsistent between different php-fpm processes.

Read the article
upstart scripts: run a task after networking goes up

- by The Journeyman geek

I'm working on moving my current server setup to newer hardware, and migrating from ubuntu karmic koala to lucid lynx. Currently i'm using gw6c (compiled from the gogo6 website, as opposed to the version from the repositories) to get ipv6 access for my systems. On the karmic koala system, i used simple init.d script to get the ipv6 client started #! /bin/sh /usr/local/gw6c/bin/gw6c -f /usr/local/gw6c/bin/gw6c.conf I figured since this runs at any runlevel, it should translate to respawn console none start on startup stop on shutdown script exec /usr/local/gw6c/bin/gw6c -f /usr/local/gw6c/bin/gw6c.conf emit free6_ipv6_started end script this works fine started from initctrl, but it apparently fails to start when it boots. - its status being stop/waiting. It works fine (and respawns) when started otherwise.Any ideas on where i'm going wrong, and what would be the appropriate 'start on' arguement? EDIT: the exact error is 'init: gw6c main process (xxx) ended with status 8' followed by the process respawning , with xxx being a PID i suspect. I'm also half suspecting this is cause gw6c starts before networking does, and i need my eth0 up before gw6c is

Read the article
iPhone universal app. MoviePlayer.framwork problem.

- by e40pud

I have application based on 3.0 iPhone OS SDK One of tasks is playing video (I use MPMoviePlayerController for this task) Now I try to make universal app working on both 3.0 and 3.2 OS I did all steps described in apple documentation: Upgrade Current Target for iPad; make run-time checking for symbols using [[UIDevice currentDevice] respondsToSelector:@selector(userInterfaceIdiom)] function. But when I start my application on device - iPhone with OS 3.1.3 my apllication is crashes with next log: Tue May 25 18:00:28 unknown SpringBoard[24] <Notice>: MultitouchHID(208b30) uilock state: 1 -> 0 Tue May 25 18:00:29 unknown SpringBoard[24] <Notice>: MultitouchHID(292580) device bootloaded Tue May 25 18:00:34 unknown UIKitApplication:...[0xaa0f][1517] <Notice>: dyld: Symbol not found: _MPMoviePlayerWillEnterFullscreenNotification Tue May 25 18:00:34 unknown UIKitApplication:...[0xaa0f][1517] <Notice>: Referenced from: /var/mobile/Applications/876EA35E-5756-436B-A9E2-5481D4D62050/....app/... Tue May 25 18:00:34 unknown UIKitApplication:...[0xaa0f][1517] <Notice>: Expected in: /System/Library/Frameworks/MediaPlayer.framework/MediaPlayer Tue May 25 18:00:35 unknown kernel[0] <Debug>: launchd[1517] Builtin profile: container (seatbelt) Tue May 25 18:00:35 unknown kernel[0] <Debug>: launchd[1517] Container: /private/var/mobile/Applications/876EA35E-5756-436B-A9E2-5481D4D62050 (seatbelt) Tue May 25 18:00:35 unknown ReportCrash[1518] <Notice>: Formulating crash report for process cnetmobile[1517] Tue May 25 18:00:36 unknown com.apple.launchd[1] <Warning>: (UIKitApplication:...[0xaa0f]) Job appears to have crashed: Trace/BPT trap Tue May 25 18:00:36 unknown com.apple.launchd[1] <Warning>: (UIKitApplication:...[0xaa0f]) Throttling respawn: Will start in 2147483646 seconds Tue May 25 18:00:36 unknown SpringBoard[24] <Warning>: Application '...' exited abnormally with signal 5: Trace/BPT trap Tue May 25 18:00:36 unknown ReportCrash[1518] <Error>: Saved crashreport to /var/mobile/Library/Logs/CrashReporter/..._2010-05-25-180034_...-iPhone.plist using uid: 0 gid: 0, synthetic_euid: 501 egid: 0 Tue May 25 18:01:36 unknown SpringBoard[24] <Notice>: MultitouchHID(208b30) uilock state: 0 -> 1 As you can see the error is "Symbol not found: _MPMoviePlayerWillEnterFullscreenNotification". This symbol is notification available in MediaPlayer.framework starting from iPhone OS 3.2 So, what am I doing wrong? What I should do to have universal application working correct on OS 3.2 (with new available functionality) and older OSes (with their functionality)?

Read the article
Stopping process in /etc/inittab kills spawned process. Doesn't happen in rc.local.

- by Paul

Hi, I'm trying to execute a firmware upgrade while my programming is running in inittab. My program will run 2 commands. One to extract the installer script from the tarball and the other to execute the installer script. In my code I'm using the system() function call. These are the 2 command strings below, system ( "tar zvxf tarball.tar.gz -C / installer.sh 2>&1" ); system( "nohup installer.sh tarball >/dev/null 2>&1 &" ); The installer script requires the tarball to be an argument. I've tried using sudo but i still have the same problem. I've tried nohup with no success. The installer script has to kill my program when doing the firmware upgrade but the installer script will stay alive. If my program is run from the command line or rc.local, on my target device, my upgrade works fine, i.e. when my program is killed my installer script continues. But I need to run my program from /etc/inittab so it can respawn if it dies. To stop my program in inittab the installer script will hash it out and execute "telinit q". This is where my program dies (but thats what I want it to do), but it also kills my installer script. Does anyone know why this is happening and what can I do to solve it? Thanks in advance.

Read the article
Setting up Apache and PHP on Mac OS X Snow Leopard

- by Martin Bean

I've recently purchased an Apple iMac. Unfortunately, enabling Apache and PHP has thrown up some problems. I enabled Mac's built-in Web Sharing through System Preferences, at which point I got an output and could add HTML files to my user directory. However, PHP files were being displayed rather than interpreted. I then discovered this is because PHP isn't enabled by default on Mac's Apache set-up. After a quick Google search, I came across this page: http://developer.apple.com/mac/articles/internet/phpeasyway.html I proceeded to the section, Enabling PHP in Apache, copying and pasting the following code snippet into a new Terminal window and hitting Return: set admin_email to (do shell script "defaults read AddressBookMe ExistingEmailAddress") user_www=$HOME/Sites filename=php-test user_index=${user_www}/${filename}.php user_db=${user_www}/${filename}-db.sqlite3 # NOTE: Having a writeable database in your home directory can be a security risk! conf=`apachectl -V | awk -F= '/SERVER_CONFIG/ {print \$2}'| sed 's/"//g'` conf_old=$conf.$$ conf_new=/tmp/php_conf.new touch $user_db chmod a+r $user_index chmod a+w $user_db chmod a+w $user_www echo "Enabling PHP in $conf ..." sed '/#LoadModule php5_module/s/#LoadModule/LoadModule/' $conf | sed "s^[email protected]^<b>\$admin_email</b>^" > $conf_new echo "(Re)Starting Apache ..." osascript <<EOF do shell script "/bin/mv -f $conf $conf_old; /bin/mv $conf_new $conf; /usr/sbin/apachectl restart" with administrator privileges EOF Unfortunately, this has completed thrown Apache and now nothing is being served; instead I'm receiving "Failed to open page" errors because it cannot connect to the server, despite Web Sharing still being active in System Preferences. So therefore I guess my question is this: how can I undo the changes made by the copy-and-pasting of the above code snippet? Admittedly, I don't understand what the above did; I just thought it looked like a Terminal command and tried it. I have no experience in setting up Apache on Mac OS X (and I've only installed XAMPP and WampServer on Windows). So any points on reversing the aforementioned, and then successfully enabling PHP would be great. EDIT: I've discovered, via Console, the following error message is being recorded when trying to browse to 127.0.0.1... (org.apache.httpd) Throttling respawn: Will start in 10 seconds no listening sockets available, shutting down Unable to open logs (org.apache.httpd[13453]) Exited with exit code: 1 Does this point any more to the issue? EDIT #2: I'm now getting this in Console... 15/02/2010 21:24:14 osascript[3597] Error loading /Library/ScriptingAdditions/Adobe Unit Types.osax/Contents/MacOS/Adobe Unit Types: dlopen(/Library/ScriptingAdditions/Adobe Unit Types.osax/Contents/MacOS/Adobe Unit Types, 262): no suitable image found. Did find: /Library/ScriptingAdditions/Adobe Unit Types.osax/Contents/MacOS/Adobe Unit Types: no matching architecture in universal wrapper

Read the article
GI ????

- by Allen Gao

Normal 0 7.8 ? 0 2 false false false MicrosoftInternetExplorer4 classid="clsid:38481807-CA0E-42D2-BF39-B33AF135CC4D" id=ieooui st1\:*{behavior:url(#ieooui) } /* Style Definitions */ table.MsoNormalTable {mso-style-name:????; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman"; mso-ansi-language:#0400; mso-fareast-language:#0400; mso-bidi-language:#0400;} ??????????11gR2 GI ?????????,??????GI????????????????????? ????????GI???????3???,ohasd??,??????,??????? ??,ohasd??? 1. /etc/inittab?????? h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null ???,??????? root 4865 1 0 Dec02 ? 00:01:01 /bin/sh /etc/init.d/init.ohasd run ??????????????,???? +init.ohasd ????????? + os????????? + ??S* ohasd????, ??S96ohasd + GI????????(crsctl enable crs) ??,ohasd.bin ??????,????OLR????,??,??ohasd.bin??????,?????OLR??????????????OLR???$GRID_HOME/cdata/${HOSTNAME}.olr 2. ohasd.bin????????agents(orarootagent, oraagent, cssdagnet ? cssdmonitor) ???????????????,?????agent??????,??????????$GRID_HOME/bin ???????????,??,?????????,??corruption. ???,??????? 1. Mdnsd ??????(Multicast)???????????????????,??????????????????????????? 2. Gpnpd ????,??????????bootstrap ??,??????????????gpnp profile???,?????mdnsd??????,???????????,?????????????,??gpnp profile (<gi_home>/gpnp/profiles/peer/profile.xml)?????????? 3. Gipcd ????,????????????????(cluster interconnect)?????,???????gpnpd???,??,??????????,?????gpnpd ??????? 4. Ocssd.bin ?????????????gpnp profile?????????(Voting Disk),????gpnpd ??????????,?????????????,??ocssd.bin ??????,?????????? + gpnp profile ?????????? + gpnpd ??????? + ??????asm disk ??????????? + ??????????? 5. ??????????:ora.ctssd, ora.asm, ora.cluster_interconnect.haip, ora.crf, ora.crsd ?? ??:????????????????ocssd.bin, gpnpd.bin ? gipcd.bin ????,??gpnpd.bin????,ocssd.bin ? gipcd.bin ?????????,?gpnpd.bin????????,ocssd.bin ? gipcd.bin ????????gpnp profile?????????? ??,????????????,?????crsd????????? 1. Crsd?????????????OCR,????OCR????ASM?,???? ASM??????,??OCR???ASM??????????OCR???????,???????????????? 2. Crsd ?????agents(orarootagent, oraagent_<rdbms_owner>, oraagent_<gi_owner> )???agent????,??????????$GRID_HOME/bin ???????????,??,?????????,??corruption. 3. ???????? ora.net1.network : ????,?????????????,scanvip, vip, listener?????????????,??????????,vip, scanvip ?listener ??offline,?????????????? ora.<scan_name>.vip:scan???vip??,?????3?? ora.<node_name>.vip : ?????vip ?? ora.<listener_name>.lsnr: ???????????????,?11gR2??,listener.ora???????,????????? ora.LISTENER_SCAN<n>.lsnr: scan ????? ora.<????>.dg: ASM ????????????????mount???,dismount???? ora.<????>.db: ???????11gR2????????????,??????????rac ????????,??????????,???????“USR_ORA_INST_NAME@SERVERNAME(<node name> )”???????,??????????ASM???,???????????????????,??dependency?????????,??????????????????,???dependancy???????,??????(crsctl modify res ……)? ora.<???>.svc:?????????11gR2 ??,?????????,???10gR2??,???????????,srv ?cs ????? ora.cvu :?????11.2.0.2???,???????cluvfy??,???????????????? ora.ons : ONS??,????????,????? ??,?????GI??????????????????? $GRID_HOME/log/<node_name>/ocssd <== ocssd.bin ?? $GRID_HOME/log/<node_name>/gpnpd <== gpnpd.bin ?? $GRID_HOME/log/<node_name>/gipcd <== gipcd.bin ?? $GRID_HOME/log/<node_name>/agent/crsd <== crsd.bin ?? $GRID_HOME/log/<node_name>/agent/ohasd <== ohasd.bin ?? $GRID_HOME/log/<node_name>/mdnsd <== mdnsd.bin ?? $GRID_HOME/log/<node_name>/client <== ????GI ??(ocrdump, crsctl, ocrcheck, gpnptool??)??????????? $GRID_HOME/log/<node_name>/ctssd <== ctssd.bin ?? $GRID_HOME/log/<node_name>/crsd <== crsd.bin ?? $GRID_HOME/log/<node_name>/cvu <== cluvfy ????????? $GRID_HOME/bin/diagcollection.sh <== ????????????????? ??,????????(/var/tmp/.oracle ? /tmp/.oracle),??????????????????ipc???,??,?????????????????????,???GI?????????????????????,??????????GI??????????????

Read the article
Using Upstart to manage Unicorn w/ rbenv + bundler binstubs w/ ruby-local-exec shebang

- by codykrieger

Alright, this is melting my brain. It might have something to do with the fact that I don't understand Upstart as well as I should. Sorry in advance for the long question. I'm trying to use Upstart to manage a Rails app's Unicorn master process. Here is my current /etc/init/app.conf: description "app" start on runlevel [2] stop on runlevel [016] console owner # expect daemon script APP_ROOT=/home/deploy/app PATH=/home/deploy/.rbenv/shims:/home/deploy/.rbenv/bin:$PATH $APP_ROOT/bin/unicorn -c $APP_ROOT/config/unicorn.rb -E production # >> /tmp/upstart.log 2>&1 end script # respawn That works just fine - the Unicorns start up great. What's not great is that the PID detected is not of the Unicorn master, it's of an sh process. That in and of itself isn't so bad, either - if I wasn't using the automagical Unicorn zero-downtime deployment strategy. Because shortly after I send -USR2 to my Unicorn master, a new master spawns up, and the old one dies...and so does the sh process. So Upstart thinks my job has died, and I can no longer restart it with restart or stop it with stop if I want. I've played around with the config file, trying to add -D to the Unicorn line (like this: $APP_ROOT/bin/unicorn -c $APP_ROOT/config/unicorn.rb -E production -D) to daemonize Unicorn, and I added the expect daemon line, but that didn't work either. I've tried expect fork as well. Various combinations of all of those things can cause start and stop to hang, and then Upstart gets really confused about the state of the job. Then I have to restart the machine to fix it. I think Upstart is having problems detecting when/if Unicorn is forking because I'm using rbenv + the ruby-local-exec shebang in my $APP_ROOT/bin/unicorn script. Here it is: #!/usr/bin/env ruby-local-exec # # This file was generated by Bundler. # # The application 'unicorn' is installed as part of a gem, and # this file is here to facilitate running it. # require 'pathname' ENV['BUNDLE_GEMFILE'] ||= File.expand_path("../../Gemfile", Pathname.new(__FILE__).realpath) require 'rubygems' require 'bundler/setup' load Gem.bin_path('unicorn', 'unicorn') Additionally, the ruby-local-exec script looks like this: #!/usr/bin/env bash # # `ruby-local-exec` is a drop-in replacement for the standard Ruby # shebang line: # # #!/usr/bin/env ruby-local-exec # # Use it for scripts inside a project with an `.rbenv-version` # file. When you run the scripts, they'll use the project-specified # Ruby version, regardless of what directory they're run from. Useful # for e.g. running project tasks in cron scripts without needing to # `cd` into the project first. set -e export RBENV_DIR="${1%/*}" exec ruby "$@" So there's an exec in there that I'm worried about. It fires up a Ruby process, which fires up Unicorn, which may or may not daemonize itself, which all happens from an sh process in the first place...which makes me seriously doubt the ability of Upstart to track all of this nonsense. Is what I'm trying to do even possible? From what I understand, the expect stanza in Upstart can only be told (via daemon or fork) to expect a maximum of two forks.

Read the article
Cannot read status the monit daemon, even with allowed group

- by jefflunt

I cannot seem to get monit status or other CLI commands to work. I've built monit v5.8 to run on a Raspberry Pi. I'm able to add services to be monitored, and the web interface can be accessed just fine, as I've set it up for public read-only access (it's a test server, not my final production setup, so not a big deal right now). Problem is, when I run monit status while logged in as root I get: # monit status monit: cannot read status from the monit daemon I also have monit started on boot via this /etc/inittab file entry: mo:2345:respawn:/usr/local/bin/monit -Ic /etc/monitrc I've verified that monit is running, and I'm getting email alerts anytime I either kill the monit process manually, or reboot my raspberry pi. So, next I check my monitrc file permissions to see which group is allowed access. # ls -al /etc/monitrc -rw------- 1 root root 2359 Aug 24 14:48 /etc/monitrc Here's my relevant allow section of the control file. set httpd port 80 allow [omitted] readonly allow @root allow localhost allow 0.0.0.0/0.0.0.0 Also tried setting permissions on this file to 640 to allow group read permissions, but no matter what I try I either get the same error as noted above, or when the permissions are set to 640 I get: # monit status monit: The control file '/etc/monitrc' must have permissions no more than -rwx------ (0700); right now permissions are -rw-r----- (0640). What am I missing here? I know that the httpd must be enabled, as that's the interface that the CLI uses to get information (or so I've read), so I've done that. And in terms of monit doing its monitoring job and sending email alerts, that's all working as well. Here's my entire monitrc file - again, this is version v5.8, and it was build with both PAM and SSL support. The process runs under the root user: # Global settings set daemon 300 with start delay 5 set logfile /var/log/monit.log set pidfile /var/run/monit.pid set idfile /var/run/.monit.id set statefile /var/run/.monit.state # Mail alerts ## Set the list of mail servers for alert delivery. Multiple servers may be ## specified using a comma separator. If the first mail server fails, Monit # will use the second mail server in the list and so on. By default Monit uses # port 25 - it is possible to override this with the PORT option. # set mailserver smtp.gmail.com port 587 username [omitted] password [omitted] using tlsv1 ## Send status and events to M/Monit (for more informations about M/Monit ## see http://mmonit.com/). By default Monit registers credentials with ## M/Monit so M/Monit can smoothly communicate back to Monit and you don't ## have to register Monit credentials manually in M/Monit. It is possible to ## disable credential registration using the commented out option below. ## Though, if safety is a concern we recommend instead using https when ## communicating with M/Monit and send credentials encrypted. # # set mmonit http://monit:[email protected]:8080/collector # # and register without credentials # Don't register credentials # # ## Monit by default uses the following format for alerts if the the mail-format ## statement is missing:: set mail-format { from: [email protected] subject: $SERVICE $DESCRIPTION message: $EVENT Service: $SERVICE Date: $DATE Action: $ACTION Host: $HOST Description: $DESCRIPTION Monit instance provided by chicagomeshnet.com } # Web status page set httpd port 80 allow [omitted] readonly allow @root allow localhost allow 0.0.0.0/0.0.0.0 ## You can set alert recipients whom will receive alerts if/when a ## service defined in this file has errors. Alerts may be restricted on ## events by using a filter as in the second example below.

Read the article
Unexpected start of already-primary server processes when heartbeat on secondary is stopped.

- by vorik

Hi, I've got an active-passive Heartbeat cluster with Apache, MySQL, ActiveMQ and DRBD. Today, I wanted to perform hardware-maintenance on the secondary node (node04), so I stopped the heartbeat service before shutting it down. Then, the primary node (node03) received a shutdown notice from the secondary node (node04). This logging comes from the primary node: node03 heartbeat[4458]: 2010/03/08_08:52:56 info: Received shutdown notice from 'node04.companydomain.nl'. heartbeat[4458]: 2010/03/08_08:52:56 info: Resources being acquired from node04.companydomain.nl. harc[27522]: 2010/03/08_08:52:56 info: Running /etc/ha.d/rc.d/status status heartbeat[27523]: 2010/03/08_08:52:56 info: Local Resource acquisition completed. mach_down[27567]: 2010/03/08_08:52:56 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down[27567]: 2010/03/08_08:52:56 info: mach_down takeover complete for node node04.companydomain.nl. heartbeat[4458]: 2010/03/08_08:52:56 info: mach_down takeover complete. harc[27620]: 2010/03/08_08:52:56 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp ip-request-resp[27620]: 2010/03/08_08:52:56 received ip-request-resp drbddisk OK yes ResourceManager[27645]: 2010/03/08_08:52:56 info: Acquiring resource group: node03.companydomain.nl drbddisk Filesystem::/dev/drbd0::/data::ext3 mysql apache::/etc/httpd/conf/httpd.conf LVSSyncDaemonSwap::master monitor activemq tivoli-cluster MailTo::[email protected]::DRBDFailureDrisAcc MailTo::[email protected]::DRBDFailureDrisAcc 1.2.3.212 ResourceManager[27645]: 2010/03/08_08:52:56 info: Running /etc/ha.d/resource.d/drbddisk start Filesystem[27700]: 2010/03/08_08:52:57 INFO: Running OK ResourceManager[27645]: 2010/03/08_08:52:57 info: Running /etc/ha.d/resource.d/mysql start mysql[27783]: 2010/03/08_08:52:57 Starting MySQL[ OK ] apache[27853]: 2010/03/08_08:52:57 INFO: Running OK ResourceManager[27645]: 2010/03/08_08:52:57 info: Running /etc/ha.d/resource.d/monitor start monitor[28160]: 2010/03/08_08:52:58 ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/activemq start activemq[28210]: 2010/03/08_08:52:58 Starting ActiveMQ Broker... ActiveMQ Broker is already running. ResourceManager[27645]: 2010/03/08_08:52:58 ERROR: Return code 1 from /etc/ha.d/resource.d/activemq ResourceManager[27645]: 2010/03/08_08:52:58 CRIT: Giving up resources due to failure of activemq ResourceManager[27645]: 2010/03/08_08:52:58 info: Releasing resource group: node03.companydomain.nl drbddisk Filesystem::/dev/drbd0::/data::ext3 mysql apache::/etc/httpd/conf/httpd.conf LVSSyncDaemonSwap::master monitor activemq tivoli-cluster MailTo::[email protected]::DRBDFailureDrisAcc MailTo::[email protected]::DRBDFailureDrisAcc 1.2.3.212 ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/IPaddr 1.2.3.212 stop IPaddr[28329]: 2010/03/08_08:52:58 INFO: ifconfig eth0:0 down IPaddr[28312]: 2010/03/08_08:52:58 INFO: Success ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/MailTo [email protected] DRBDFailureDrisAcc stop MailTo[28378]: 2010/03/08_08:52:58 INFO: Success ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/MailTo [email protected] DRBDFailureDrisAcc stop MailTo[28433]: 2010/03/08_08:52:58 INFO: Success ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/tivoli-cluster stop ResourceManager[27645]: 2010/03/08_08:52:58 info: Running /etc/ha.d/resource.d/activemq stop activemq[28503]: 2010/03/08_08:53:01 Stopping ActiveMQ Broker... Stopped ActiveMQ Broker. ResourceManager[27645]: 2010/03/08_08:53:01 info: Running /etc/ha.d/resource.d/monitor stop monitor[28681]: 2010/03/08_08:53:01 ResourceManager[27645]: 2010/03/08_08:53:01 info: Running /etc/ha.d/resource.d/LVSSyncDaemonSwap master stop LVSSyncDaemonSwap[28714]: 2010/03/08_08:53:02 info: ipvs_syncmaster down LVSSyncDaemonSwap[28714]: 2010/03/08_08:53:02 info: ipvs_syncbackup up LVSSyncDaemonSwap[28714]: 2010/03/08_08:53:02 info: ipvs_syncmaster released ResourceManager[27645]: 2010/03/08_08:53:02 info: Running /etc/ha.d/resource.d/apache /etc/httpd/conf/httpd.conf stop apache[28782]: 2010/03/08_08:53:03 INFO: Killing apache PID 18390 apache[28782]: 2010/03/08_08:53:03 INFO: apache stopped. apache[28771]: 2010/03/08_08:53:03 INFO: Success ResourceManager[27645]: 2010/03/08_08:53:03 info: Running /etc/ha.d/resource.d/mysql stop mysql[28851]: 2010/03/08_08:53:24 Shutting down MySQL.....................[ OK ] ResourceManager[27645]: 2010/03/08_08:53:24 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 stop Filesystem[29010]: 2010/03/08_08:53:25 INFO: Running stop for /dev/drbd0 on /data Filesystem[29010]: 2010/03/08_08:53:25 INFO: Trying to unmount /data Filesystem[29010]: 2010/03/08_08:53:25 ERROR: Couldn't unmount /data; trying cleanup with SIGTERM Filesystem[29010]: 2010/03/08_08:53:25 INFO: Some processes on /data were signalled Filesystem[29010]: 2010/03/08_08:53:27 INFO: unmounted /data successfully Filesystem[28999]: 2010/03/08_08:53:27 INFO: Success ResourceManager[27645]: 2010/03/08_08:53:27 info: Running /etc/ha.d/resource.d/drbddisk stop heartbeat[4458]: 2010/03/08_08:53:29 WARN: node node04.companydomain.nl: is dead heartbeat[4458]: 2010/03/08_08:53:29 info: Dead node node04.companydomain.nl gave up resources. heartbeat[4458]: 2010/03/08_08:53:29 info: Link node04.companydomain.nl:eth0 dead. heartbeat[4458]: 2010/03/08_08:53:29 info: Link node04.companydomain.nl:eth1 dead. hb_standby[29193]: 2010/03/08_08:53:57 Going standby [foreign]. heartbeat[4458]: 2010/03/08_08:53:57 info: node03.companydomain.nl wants to go standby [foreign] Soo... What just happened here??? Heartbeat on node04 stopped and told node03, which was the active node at the time. Somehow, node03 decided to start the cluster processes that were already running. (For the processes that are not critical, I always return a 0 from the startupscript so it does not stops the entire cluster when a non-essential part fails.) When starting ActiveMQ, it returns status 1 because it is already running. This fails the node and shuts everything down. As heartbeat is not running on the secondary node, it cannot failover to there. When I tried to run ha_takeover to restart the resources, absolutely nothing happened. Only after I restarted heartbeat on the primary node the resources could be started (after a delay of 2 minutes). These are my questions: Why does heartbeat on the primary node try to start the cluster processes again? Why did ha_takeover not work? What can I do to prevent this from happening? Server configuration: DRBD: version: 8.3.7 (api:88/proto:86-91) GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by [email protected], 2010-01-20 09:14:48 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate B r---- ns:0 nr:6459432 dw:6459432 dr:0 al:0 bm:301 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0 uname -a Linux node04 2.6.18-164.11.1.el5 #1 SMP Wed Jan 6 13:26:04 EST 2010 x86_64 x86_64 x86_64 GNU/Linux haresources node03.companydomain.nl \ drbddisk \ Filesystem::/dev/drbd0::/data::ext3 \ mysql \ apache::/etc/httpd/conf/httpd.conf \ LVSSyncDaemonSwap::master \ monitor \ activemq \ tivoli-cluster \ MailTo::[email protected]::DRBDFailureDrisAcc \ MailTo::[email protected]::DRBDFailureDrisAcc \ 1.2.3.212 ha.cf debugfile /var/log/ha-debug logfile /var/log/ha-log keepalive 500ms deadtime 30 warntime 10 initdead 120 udpport 694 mcast eth0 225.0.0.3 694 1 0 mcast eth1 225.0.0.4 694 1 0 auto_failback off node node03.companydomain.nl node node04.companydomain.nl respawn hacluster /usr/lib64/heartbeat/dopd apiauth dopd gid=haclient uid=hacluster Thank you very much in advance, Ger Apeldoorn

Read the article
Why does Mac OS X Software Update not work when machine uses Active Directory?

- by Lyndsey Ferguson

My company's IT department is mostly a Windows run operation and in order to become more secure, they are altering the way that the Macintosh computers login to our internal network so that they use Active Directory like their Windows counterparts. I have been given Administrative permission on my Mac and I am able to do most of what I used to be able to do in terms of authentication of software installations. However, there is a problem: the "Software Update" feature doesn't work. What happens is that when I try to get the Mac to perform its Software Updates from the Apple menu, the normal window appears listing what has to be updated; I am able to select what to update and click the "Update" button, but then nothing happens. It doesn't ask for authentication like it used to, the computer doesn't perform any download or installation (it does sometimes ask me to agree to license agreements for iTunes). I can download the updates individually and install them without any issues, but the auto-update fails. I'd rather use the Software Update menu item like I used to: it is much more convenient. Any suggestions on how I can fix this? EDIT Nov 19th, 2009, 10:09 EST: I have posted this question to the Apple Mac OS X Snow Leopard support forum. EDIT Nov 19th, 2009, 12:39 EST:Yes, the Terminal command "sudo softwareupdate --install --all" does work flawlessly. I want to avoid that as my co-workers are generally not comfortable on the Mac. I also tried Chealion's suggestion to delete "~/Library/Preferences/com.apple.SoftwareUpdate.plist" and "/Library/Preferences/com.apple.SoftwareUpdate.plist", Software Update still fails. However, I did get diagnostic messages in the Console (below). I've deleted the MS Office Package Receipts and examined the suhelperd (Software Update Helper Daemon?); it appears that suhelperd is crashing and that explains why it doesn't work. I've submitted a bug report to Apple (radar://7408619). Here are the Console diagnostic messages: 11/19/09 12:36:44 PM com.apple.suhelperd[66829] terminate called after throwing an instance of 'NSException' 11/19/09 12:36:47 PM com.apple.launchd[1] (com.apple.suhelperd[66829]) Job appears to have crashed: Abort trap 11/19/09 12:36:48 PM com.apple.ReportCrash.Root[66830] 2009-11-19 12:36:48.275 ReportCrash[66830:2703] Saved crash report for suhelperd[66829] version ??? (???) to /Library/Logs/DiagnosticReports/suhelperd_2009-11-19-123648_localhost.crash 11/19/09 12:36:54 PM com.apple.launchd[1] (com.apple.suhelperd) Throttling respawn: Will start in 1 seconds 11/19/09 12:36:55 PM com.apple.suhelperd[66836] terminate called after throwing an instance of 'NSException' 11/19/09 12:36:55 PM com.apple.launchd[1] (com.apple.suhelperd[66836]) Job appears to have crashed: Abort trap 11/19/09 12:36:56 PM com.apple.ReportCrash.Root[66830] 2009-11-19 12:36:56.017 ReportCrash[66830:2f03] Saved crash report for suhelperd[66836] version ??? (???) to /Library/Logs/DiagnosticReports/suhelperd_2009-11-19-123655_localhost.crash 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_automator.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_automator_workflow.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_autoupdate.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_clipart.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_core.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_dock.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_entourage.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_entourage_help_std.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_equationeditor.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_errorreporting.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_excel.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_excel_help_std.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_fonts.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_graph.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_helpviewer.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_launch.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_ooxml.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_orgchart.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_powerpoint.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_powerpoint_help_std.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_proofing_brazilian.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_proofing_danish.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_proofing_dutch.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_proofing_english.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_proofing_finnish.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_proofing_french.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_proofing_german.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_proofing_italian.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_proofing_japanese.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_proofing_norwegian.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_proofing_portuguese.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_proofing_spanish.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_proofing_swedish.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_required.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_silverlight.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_sounds.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_word.pkg 11/19/09 12:36:58 PM Software Update[66826] PackageKit: *** Missing bundle identifier: /Library/Receipts/Office2008_en_word_help_std.pkg 11/19/09 12:37:26 PM com.apple.suhelperd[66839] terminate called after throwing an instance of 'NSException' 11/19/09 12:37:26 PM com.apple.launchd[1] (com.apple.suhelperd[66839]) Job appears to have crashed: Abort trap 11/19/09 12:37:26 PM com.apple.ReportCrash.Root[66830] 2009-11-19 12:37:26.929 ReportCrash[66830:2b07] Saved crash report for suhelperd[66839] version ??? (???) to /Library/Logs/DiagnosticReports/suhelperd_2009-11-19-123726_localhost.crash And here is the suhelperd crash report: Process: suhelperd [66839] Path: /System/Library/PrivateFrameworks/SoftwareUpdate.framework/Versions/A/Resources/suhelperd Identifier: suhelperd Version: ??? (???) Code Type: X86-64 (Native) Parent Process: launchd [1] Date/Time: 2009-11-19 12:37:26.473 -0500 OS Version: Mac OS X 10.6.2 (10C540) Report Version: 6 Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x0000000000000000, 0x0000000000000000 Crashed Thread: 0 Dispatch queue: com.apple.main-thread Application Specific Information: abort() called *** Terminating app due to uncaught exception 'NSRangeException', reason: '*** -[NSCFArray objectAtIndex:]: index (0) beyond bounds (0)' *** Call stack at first throw: ( 0 CoreFoundation 0x00007fff859a9444 __exceptionPreprocess + 180 1 libobjc.A.dylib 0x00007fff8787e0f3 objc_exception_throw + 45 2 CoreFoundation 0x00007fff859a9267 +[NSException raise:format:arguments:] + 103 3 CoreFoundation 0x00007fff859a91f4 +[NSException raise:format:] + 148 4 Foundation 0x00007fff855da080 _NSArrayRaiseBoundException + 122 5 Foundation 0x00007fff8553cb81 -[NSCFArray objectAtIndex:] + 75 6 Admin 0x00007fff8107920e +[User(UserPrivate) _userWithInfo:attributes:] + 71 7 Admin 0x00007fff81080d6b +[User findUserByID:searchParent:] + 404 8 suhelperd 0x0000000100001274 0x0 + 4294972020 9 suhelperd 0x0000000100002240 0x0 + 4294976064 10 suhelperd 0x00000001000053b1 0x0 + 4294988721 11 suhelperd 0x00000001000044b3 0x0 + 4294984883 12 suhelperd 0x0000000100004154 0x0 + 4294984020 13 libSystem.B.dylib 0x00007fff83eb60d8 mach_msg_server + 357 14 suhelperd 0x00000001000036eb 0x0 + 4294981355 15 suhelperd 0x0000000100002a1f 0x0 + 4294978079 16 suhelperd 0x0000000100001080 0x0 + 4294971520 ) Thread 0 Crashed: Dispatch queue: com.apple.main-thread 0 libSystem.B.dylib 0x00007fff83e86fe6 __kill + 10 1 libSystem.B.dylib 0x00007fff83f27e32 abort + 83 2 libstdc++.6.dylib 0x00007fff873cf5d2 __tcf_0 + 0 3 libobjc.A.dylib 0x00007fff87881d29 _objc_terminate + 100 4 libstdc++.6.dylib 0x00007fff873cdae1 __cxxabiv1::__terminate(void (*)()) + 11 5 libstdc++.6.dylib 0x00007fff873cdb16 __cxxabiv1::__unexpected(void (*)()) + 0 6 libstdc++.6.dylib 0x00007fff873cdbfc __gxx_exception_cleanup(_Unwind_Reason_Code, _Unwind_Exception*) + 0 7 libobjc.A.dylib 0x00007fff8787e192 object_getIvar + 0 8 com.apple.CoreFoundation 0x00007fff859a9267 +[NSException raise:format:arguments:] + 103 9 com.apple.CoreFoundation 0x00007fff859a91f4 +[NSException raise:format:] + 148 10 com.apple.Foundation 0x00007fff855da080 _NSArrayRaiseBoundException + 122 11 com.apple.Foundation 0x00007fff8553cb81 -[NSCFArray objectAtIndex:] + 75 12 com.apple.framework.Admin 0x00007fff8107920e +[User(UserPrivate) _userWithInfo:attributes:] + 71 13 com.apple.framework.Admin 0x00007fff81080d6b +[User findUserByID:searchParent:] + 404 14 suhelperd 0x0000000100001274 0x100000000 + 4724 15 suhelperd 0x0000000100002240 0x100000000 + 8768 16 suhelperd 0x00000001000053b1 0x100000000 + 21425 17 suhelperd 0x00000001000044b3 0x100000000 + 17587 18 suhelperd 0x0000000100004154 0x100000000 + 16724 19 libSystem.B.dylib 0x00007fff83eb60d8 mach_msg_server + 357 20 suhelperd 0x00000001000036eb 0x100000000 + 14059 21 suhelperd 0x0000000100002a1f 0x100000000 + 10783 22 suhelperd 0x0000000100001080 0x100000000 + 4224 Thread 1: Dispatch queue: com.apple.libdispatch-manager 0 libSystem.B.dylib 0x00007fff83e51bba kevent + 10 1 libSystem.B.dylib 0x00007fff83e53a85 _dispatch_mgr_invoke + 154 2 libSystem.B.dylib 0x00007fff83e5375c _dispatch_queue_invoke + 185 3 libSystem.B.dylib 0x00007fff83e53286 _dispatch_worker_thread2 + 244 4 libSystem.B.dylib 0x00007fff83e52bb8 _pthread_wqthread + 353 5 libSystem.B.dylib 0x00007fff83e52a55 start_wqthread + 13 Thread 2: 0 libSystem.B.dylib 0x00007fff83e529da __workq_kernreturn + 10 1 libSystem.B.dylib 0x00007fff83e52dec _pthread_wqthread + 917 2 libSystem.B.dylib 0x00007fff83e52a55 start_wqthread + 13 Thread 0 crashed with X86 Thread State (64-bit): rax: 0x0000000000000000 rbx: 0x00007fff707d7298 rcx: 0x00007fff5fbff868 rdx: 0x0000000000000000 rdi: 0x0000000000010517 rsi: 0x0000000000000006 rbp: 0x00007fff5fbff880 rsp: 0x00007fff5fbff868 r8: 0x00007fff707da9e0 r9: 0x0000000000000063 r10: 0x00007fff83e83026 r11: 0x0000000000000202 r12: 0x00007fff85a2dca1 r13: 0x0000000000000000 r14: 0x00007fff70bea228 r15: 0x00007fff5fbffb10 rip: 0x00007fff83e86fe6 rfl: 0x0000000000000202 cr2: 0x00007fff70e3afd0

Read the article
Heartbeat won't successfully start up resources from a cold boot when a failed node is present

- by Matthew

I currently have two ubuntu servers running Heartbeat and DRBD. The servers are directory connected with a 1000Mbps crossover cable on eth1 and have access to an IP camera LAN on eth0. Now, let's say that one node is down and the remaining functional node is booting after having been shut down. The node that is still functioning won't start up heartbeat and provide access to the drbd resource from a cold boot. I have to manually restart heartbeat by sudo service heartbeat restart to get everything up and running. How can I get it to start fine from a cold start, when only one server is present? Here is the ha.cf: debug /var/log/ha-debug logfile /var/log/ha-log logfacility none keepalive 2 deadtime 10 warntime 7 initdead 60 ucast eth1 192.168.2.2 ucast eth0 10.1.10.201 node EMserver1 node EMserver2 respawn hacluster /usr/lib/heartbeat/ipfail ping 10.1.10.22 10.1.10.21 10.1.10.11 auto_failback off Some material from the syslog: harc[4604]: 2012/11/27_13:54:49 info: Running /etc/ha.d//rc.d/status status mach_down[4632]: 2012/11/27_13:54:49 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down[4632]: 2012/11/27_13:54:49 info: mach_down takeover complete for node emserver2. Nov 27 13:54:49 EMserver1 heartbeat: [4586]: info: Initial resource acquisition complete (T_RESOURCES(us)) Nov 27 13:54:49 EMserver1 heartbeat: [4586]: info: mach_down takeover complete. IPaddr[4679]: 2012/11/27_13:54:49 INFO: Resource is stopped Nov 27 13:54:49 EMserver1 heartbeat: [4605]: info: Local Resource acquisition completed. harc[4713]: 2012/11/27_13:54:49 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp ip-request-resp[4713]: 2012/11/27_13:54:49 received ip-request-resp IPaddr::10.1.10.254 OK yes ResourceManager[4732]: 2012/11/27_13:54:50 info: Acquiring resource group: emserver1 IPaddr::10.1.10.254 drbddisk::r0 Filesystem::/dev/drbd1::/shr::ext4 nfs-kernel-server IPaddr[4759]: 2012/11/27_13:54:50 INFO: Resource is stopped ResourceManager[4732]: 2012/11/27_13:54:50 info: Running /etc/ha.d/resource.d/IPaddr 10.1.10.254 start IPaddr[4816]: 2012/11/27_13:54:50 INFO: Using calculated nic for 10.1.10.254: eth0 IPaddr[4816]: 2012/11/27_13:54:50 INFO: Using calculated netmask for 10.1.10.254: 255.255.255.0 IPaddr[4816]: 2012/11/27_13:54:50 INFO: eval ifconfig eth0:0 10.1.10.254 netmask 255.255.255.0 broadcast 10.1.10.255 IPaddr[4804]: 2012/11/27_13:54:50 INFO: Success ResourceManager[4732]: 2012/11/27_13:54:50 info: Running /etc/ha.d/resource.d/drbddisk r0 start Filesystem[4965]: 2012/11/27_13:54:50 INFO: Resource is stopped ResourceManager[4732]: 2012/11/27_13:54:50 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /shr ext4 start Filesystem[5039]: 2012/11/27_13:54:50 INFO: Running start for /dev/drbd1 on /shr Filesystem[5033]: 2012/11/27_13:54:51 INFO: Success ResourceManager[4732]: 2012/11/27_13:54:51 info: Running /etc/init.d/nfs-kernel-server start Nov 27 13:55:00 EMserver1 heartbeat: [4586]: info: Local Resource acquisition completed. (none) Nov 27 13:55:00 EMserver1 heartbeat: [4586]: info: local resource transition completed. Nov 27 13:57:46 EMserver1 heartbeat: [4586]: info: Heartbeat shutdown in progress. (4586) Nov 27 13:57:46 EMserver1 heartbeat: [5286]: info: Giving up all HA resources. ResourceManager[5301]: 2012/11/27_13:57:46 info: Releasing resource group: emserver1 IPaddr::10.1.10.254 drbddisk::r0 Filesystem::/dev/drbd1::/shr::ext4 nfs-kernel-server ResourceManager[5301]: 2012/11/27_13:57:46 info: Running /etc/init.d/nfs-kernel-server stop ResourceManager[5301]: 2012/11/27_13:57:46 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /shr ext4 stop Filesystem[5372]: 2012/11/27_13:57:46 INFO: Running stop for /dev/drbd1 on /shr Filesystem[5372]: 2012/11/27_13:57:47 INFO: Trying to unmount /shr Filesystem[5372]: 2012/11/27_13:57:47 INFO: unmounted /shr successfully Filesystem[5366]: 2012/11/27_13:57:47 INFO: Success ResourceManager[5301]: 2012/11/27_13:57:47 info: Running /etc/ha.d/resource.d/drbddisk r0 stop ResourceManager[5301]: 2012/11/27_13:57:47 info: Running /etc/ha.d/resource.d/IPaddr 10.1.10.254 stop IPaddr[5509]: 2012/11/27_13:57:47 INFO: ifconfig eth0:0 down IPaddr[5497]: 2012/11/27_13:57:47 INFO: Success Nov 27 13:57:47 EMserver1 heartbeat: [5286]: info: All HA resources relinquished. Nov 27 13:57:48 EMserver1 heartbeat: [4586]: info: killing /usr/lib/heartbeat/ipfail process group 4603 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBFIFO process 4589 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBWRITE process 4590 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBREAD process 4591 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBWRITE process 4592 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBREAD process 4593 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBWRITE process 4594 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBREAD process 4595 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBWRITE process 4596 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBREAD process 4597 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBWRITE process 4598 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: killing HBREAD process 4599 with signal 15 Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4589 exited. 11 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4596 exited. 10 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4598 exited. 9 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4590 exited. 8 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4595 exited. 7 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4591 exited. 6 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4592 exited. 5 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4593 exited. 4 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4597 exited. 3 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4594 exited. 2 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: Core process 4599 exited. 1 remaining Nov 27 13:57:49 EMserver1 heartbeat: [4586]: info: emserver1 Heartbeat shutdown complete. Here is some more from the log ResourceManager[2576]: 2012/11/28_16:32:42 info: Acquiring resource group: emserver1 IPaddr::10.1.10.254 drbddisk::r0 Filesystem::/dev/drbd1::/shr::ext4 nfs-kernel-server IPaddr[2602]: 2012/11/28_16:32:42 INFO: Running OK Filesystem[2653]: 2012/11/28_16:32:43 INFO: Running OK Nov 28 16:32:52 EMserver1 heartbeat: [1695]: WARN: node emserver2: is dead Nov 28 16:32:52 EMserver1 heartbeat: [1695]: info: Dead node emserver2 gave up resources. Nov 28 16:32:52 EMserver1 ipfail: [1807]: info: Status update: Node emserver2 now has status dead Nov 28 16:32:52 EMserver1 heartbeat: [1695]: info: Link emserver2:eth1 dead. Nov 28 16:32:53 EMserver1 ipfail: [1807]: info: NS: We are still alive! Nov 28 16:32:53 EMserver1 ipfail: [1807]: info: Link Status update: Link emserver2/eth1 now has status dead Nov 28 16:32:55 EMserver1 ipfail: [1807]: info: Asking other side for ping node count. Nov 28 16:32:55 EMserver1 ipfail: [1807]: info: Checking remote count of ping nodes. Nov 28 16:32:57 EMserver1 heartbeat: [1695]: info: Heartbeat shutdown in progress. (1695) Nov 28 16:32:57 EMserver1 heartbeat: [2734]: info: Giving up all HA resources. ResourceManager[2751]: 2012/11/28_16:32:57 info: Releasing resource group: emserver1 IPaddr::10.1.10.254 drbddisk::r0 Filesystem::/dev/drbd1::/shr::ext4 nfs-kernel-server ResourceManager[2751]: 2012/11/28_16:32:57 info: Running /etc/init.d/nfs-kernel-server stop ResourceManager[2751]: 2012/11/28_16:32:57 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd1 /shr ext4 stop Filesystem[2829]: 2012/11/28_16:32:57 INFO: Running stop for /dev/drbd1 on /shr Filesystem[2829]: 2012/11/28_16:32:57 INFO: Trying to unmount /shr Filesystem[2829]: 2012/11/28_16:32:58 INFO: unmounted /shr successfully Filesystem[2823]: 2012/11/28_16:32:58 INFO: Success ResourceManager[2751]: 2012/11/28_16:32:58 info: Running /etc/ha.d/resource.d/drbddisk r0 stop ResourceManager[2751]: 2012/11/28_16:32:58 info: Running /etc/ha.d/resource.d/IPaddr 10.1.10.254 stop IPaddr[2971]: 2012/11/28_16:32:58 INFO: ifconfig eth0:0 down IPaddr[2958]: 2012/11/28_16:32:58 INFO: Success Nov 28 16:32:58 EMserver1 heartbeat: [2734]: info: All HA resources relinquished. Nov 28 16:32:59 EMserver1 heartbeat: [1695]: info: killing /usr/lib/heartbeat/ipfail process group 1807 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBFIFO process 1777 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBWRITE process 1778 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBREAD process 1779 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBWRITE process 1780 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBREAD process 1781 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBWRITE process 1782 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBREAD process 1783 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBWRITE process 1784 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBREAD process 1785 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBWRITE process 1786 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: killing HBREAD process 1787 with signal 15 Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1778 exited. 11 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1779 exited. 10 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1780 exited. 9 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1781 exited. 8 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1782 exited. 7 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1783 exited. 6 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1784 exited. 5 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1785 exited. 4 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1786 exited. 3 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1787 exited. 2 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: Core process 1777 exited. 1 remaining Nov 28 16:33:01 EMserver1 heartbeat: [1695]: info: emserver1 Heartbeat shutdown complete. If I restarted heartbeat at this point... the resources heartbeat controls would start up fine.... please help!

Read the article

Search Results

Search found 63 results on 3 pages for 'respawn'.

Page 3/3 | < Previous Page | 1 2 3

- by F21

- by Kendall

- by Brendan

- by The Journeyman geek

- by e40pud

- by Paul

- by Martin Bean

- by Allen Gao

- by codykrieger

- by jefflunt

- by vorik

- by Lyndsey Ferguson

- by Matthew

< Previous Page | 1 2 3