After little R & D on
Google/Internet found few suitable solutions. I had chosen 'uptime' command running with remote SSH connection in a loop. Adding
more value to this sending a mail on the event of crossing the threshold value.
This threshold will be vary depending upon the application and CPU power. Trial
and error make you to identify what could be the threshold. Once script started
working he was amazed and appreciated as well.
This script can be run forever with a specified time interval. You can use 'at' command or 'crontab' also for this task. I prepared a 'bash' script that could work for Solaris and also on Linux.
Before to this script we need to establish the password less connection to all the remote machines with 'key-gen' command. Public key authentication, which is the good choice password less connecting remote UNIX machines. Here, you can use any choice for encryption algorithms such as RSA, DSA etc.,
This script can be run forever with a specified time interval. You can use 'at' command or 'crontab' also for this task. I prepared a 'bash' script that could work for Solaris and also on Linux.
Before to this script we need to establish the password less connection to all the remote machines with 'key-gen' command. Public key authentication, which is the good choice password less connecting remote UNIX machines. Here, you can use any choice for encryption algorithms such as RSA, DSA etc.,
Customization/Cosmotics to this
script
When you run this script at your
prompt you can see the high load average server details in red colour which
makes sense to act up on that quicker. All server list I had kept in a plan
text file and accessed it line by line as array for looping.
#!/bin/bash
#======================================================
# This script will check CPU Load, network ping status
# and also checks diskspace on every machine
#======================================================
RECIPIENTS="xyz@gmail.com"
LOG=./load.log
check_load()
{
loadnow=`echo $msg| cut
-d, -f4 | cut -d: -f2 | cut -d. -f1`
d=`echo $msg |awk '{print
$((NF-1))}'`
SD=`date
"+%Y-%h-%d@%H:%M:%S"`
echo $SD '****'
if [ $loadnow -gt 14 ]; then
echo -e '
\033[31m' $server ' ' $loadnow '\033[m'>>$LOG
echo $SD $server '
' $loadnow |mailx -s LOAD_WARN $RECPIENTS
elif [ $loadnow -gt 19 ];
then
echo -e '
\033[31m' $server ' ' $loadnow '\033[m'>>$LOG
echo $SD $server '
' $loadnow |mailx -s LOAD_CRITICAL $RECPIENTS
else
echo -e $server
'\t' $loadnow '\t' $p '\t'$d >>$LOG
fi
}
#==============================================================
# M A I N S C R I P T
#==============================================================
if [ -f $LOG ]
then
rm $LOG
fi
serlist=`cat prodServers.txt`
echo -e
"========================================================">>$LOG
echo -e " HOSTNAME CPU Load
Network status Disk
Space">>$LOG
echo -e
"========================================================">>$LOG
for server in $serlist
do
echo 'connecitng to '
$server
msg=`ssh $server
"uptime; df -k |grep /WASLogs |awk '{print \$5}'"`
p=`ping $server 56 2 |grep
loss | awk -F',' '{ print $3 }'`
check_load
done
cat load.log
Please make sure that you must have prodServers.txt file
in the same script path. Sample prodServers.txt file as follows:
myprod.server1.com
myprod.server2.com
...
myprod.server20.com
No comments:
Post a Comment