Nagios配置

Nagios配置
Nagios配置

目录

Object Configuration Overview 对象配置概述................................................................................... 错误!未定义书签。Object Definitions 对象定义 ............................................................................................................... 错误!未定义书签。开发相关 .............................................................................................................................................. 错误!未定义书签。

Object Configuration Overview 对象配置概述

Up To: Contents

See Also: Configuration Overview, Object Definitions

What Are Objects?什么是对象

Objects are all the elements that are involved in the monitoring and notification logic. Types of objects include: 对象是所有在监视和通知逻辑中被处理的元素。对象的类型包含:

?Services 服务

?Service Groups 服务组

?Hosts 主机

?Host Groups 主机组

?Contacts 联系人

?Contact Groups 联系人组

?Commands 命令

?Time Periods 时间段

?Notification Escalations 通知自动调整

?Notification and Execution Dependencies 通知和执行依赖

More information on what objects are and how they relate to each other can be found below. 更多关于对象是什么和对象相互之间如何关联的信息如下。

Where Are Objects Defined?什么地方定义对象

Objects can be defined in one or more configuration files and/or directories that you specify using the cfg_file and/or cfg_dir directives in the main configuration file. 对象可以在一个或多个配置文件(或者目录)中被定义,这些配置文件(或目录)使用cfg_file(或cfg_dir)在主配置文件中指定。

Tip: When you follow quickstart installation guide, several sample object configuration files are placed in /usr/local/nagios/etc/objects/. You can use these sample files to see how

object inheritance works and learn how to define your own object definitions. 提示:如果你按照快速安装向导进行安装,/usr/local/nagios/etc/objects/目录中有一些对象定义的实例文件。你可以使用这些文件来查看对象是如何继承的,学习如何定义你自己的对象。

How Are Objects Defined?对象是如何定义的?

Objects are defined in a flexible template format, which can make it much easier to manage your Nagios configuration in the long term. Basic information on how to define objects in your configuration files can be found here. 对象以一个灵活的模板格式定义,这些模板使得你在很长一个组中更容易地管理你的Nagios配置。关于怎样在你的配置文件中定义对象的基本信息可以在这里找到。

Once you get familiar with the basics of how to define objects, you should read up on object inheritance, as it will make your configuration more robust for the future. Seasoned users can exploit some advanced features of object definitions as described in the documentation on object tricks. 你熟悉了基本对象的定义后,就可以了解对象继承。它会使你的配置在以后更强健。老用户可以利用一些对象定义的高些特性。这些说明在文档的对象技巧章节。

Objects Explained对象解释

Some of the main object types are explained in greater detail below... 下面详细解释一些主要的对象类型。

Hosts are one of the central objects in the monitoring logic.

Important attributes of hosts are as follows: 主机是监控逻辑中的

核心对象。主机重要的属性如下:

?Hosts are usually physical devices on your network (servers,

workstations, routers, switches, printers, etc). 主机通常是

网络中的一个物理的设备(服务器,工作站,路由器,交换机,打

印机等)。

?Hosts have an address of some kind (e.g. an IP or MAC address).

主机会有某种类型的地址(如IP,MAC)

?Hosts have one or more more services associated with them. 主

机有一个或多个服务与之相关联

?Hosts can have parent/child relationships with other hosts,

often representing real-world network connections, which is

used in the network reachability logic. 主机与其它主机存在

父子关系,这种父子关系通常代表真实网络中的连接,这些连接在

网络可达性逻辑中会用到。

Host Groups are groups of one or more hosts. Host groups can make

it easier to (1) view the status of related hosts in the Nagios web

interface and (2) simplify your configuration through the use of

object tricks . 主机组是一个或多个主机的组。主机组可以简化以下操

作:1)在Nagios的web接口中查看相关联的主机状态2)通过对象技巧简

化你的配置过程

Services are one of the central objects in the monitoring logic.

Services are associated with hosts and can be: 服务是监控逻

辑中的重要对象。服务与主机关联并且可以是:

?Attributes of a host (CPU load, disk usage, uptime, etc.)

主机的某个属性(CPU占用,磁盘使用,运行时间)

?Services provided by the host (HTTP, POP3, FTP, SSH, etc.)

服务由主机提供(HTTP,POP3,FTP,SSH等)

?Other things associated with the host (DNS records, etc.)

其它与主机相关的事物(DNS记录等)

Service Groups are groups of one or more services. Service groups

can make it easier to (1) view the status of related services

in the Nagios web interface and (2) simplify your configuration

through the use of object tricks. 服务组是一个或多个服务的组。

服务组可以使得以下操作更简单1)在Nagios的WEB界面中查看相关

联的服务2)通过对象技巧简化配置的过程

Contacts are people involved in the notification process: 联系人是在通

知处理中涉及的人

?Contacts have one or more notification methods (cellphone, pager, email, instant messaging, etc.) 联系人有一个或多个通知方式(手机,

页面,邮件,即时通讯等)

?Contacts receive notifications for hosts and service they are responsible for 联系人接收他们负责的主机和服务的通知

Contact Groups are groups of one or more contacts. Contact groups can make

it easier to define all the people who get notified when certain host or service problems occur. 联系人组是一个或多个联系人的组。联系人组可以简

化所有在主机或服务发生故障时被通知的人的定义

?When hosts and services can be monitored 何时主机和服务会被监控

?When contacts can receive notifications 何时联系人会接收通知

Information on how timeperiods work can be found here. 时间段的工作方式可以在这

里找到。

Commands are used to tell Nagios what programs, scripts, etc. it should

execute to perform: 命令用来告诉Nagios(执行)什么程序,脚本等。它会

被执行来处理:

?Host and service checks 主机和服务检查

?Notifications 通知

?Event handlers 事件处理句柄?and more... 和更多

Object Definitions 对象定义

Up To: Contents

See Also: Object Configuration Overview, Object Tricks, Object Inheritance, Custom Object Variables

Introduction介绍

One of the features of Nagios' object configuration format is that you can create object definitions that inherit properties from other object definitions. An explanation of how object inheritence works can be found here. I strongly suggest that you familiarize yourself with object inheritence once you read over the documentation presented below, as it will make the job of creating and maintaining object definitions much easier than it otherwise would be. Also, read up on the object tricks that offer shortcuts for otherwise tedious configuration tasks. Nagios的对象配置格式的其中一个特点是你可以通过继承其它对象定义属性来创建对象定义。对象定义如何工作可以在这里找到。本人强烈建议你先熟悉对象继承再继续下面的文档阅读。这样可以使你创建和维护对象定义的工作变得简化。同时,读一下对象技巧。对象技巧为其它冗长的配置任务提供了捷径。

When creating and/or editing configuration files, keep the following in mind: 当创建和(或)编辑配置文件,请记住以下:

1.Lines that start with a '#' character are taken to be comments and are not processed

以#开关的行为注释,它不会被执行

2.Directive names are case-sensitive 指令名是大小写敏感的

3.Characters that appear after a semicolon (;) in configuration lines are treated as

comments and are not processed 分号;之后的配置行内容为注释将不被执行

Retention Notes

It is important to point out that several directives in host, service, and contact definitions may not be picked up by Nagios when you change them in your configuration files. Object directives that can exhibit this behavior are marked with an asterisk (*). The reason for this

behavior is due to the fact that Nagios chooses to honor values stored in the state retention file over values found in the config files, assuming you have state retention enabled on a program-wide basis and the value of the directive is changed during runtime with an external command. 很重要的一点必须指出的是,主机,服务和联系人定义中的一些指令可能不会被Nagios获得,当你在配置文件中改变它们。那些可能会显示出这种反应的对象指令都被*标注了。出现这种现象的原因是Nagios选择尊从于将值保存在state_retention_file中(配置文件中找到值),如果你将state_retention在一个程序宽度基础层面打开了并且在运行的过程中指令的值被一个外部程序修改了。

One way to get around this problem is to disable the retention of non-status information using the retain_nonstatus_information directive in the host, service, and contact definitions. Disabling this directive will cause Nagios to take the initial values for these directives from your config files, rather than from the state retention file when it (re)starts. 一个避免这种问题的方式是在主机,服务和联系人定义中通过设置retain_nonstatus_information项来关闭无状态的信息的保持。关闭这个指令将会导致Nagios从配置文件中为这些指令获取初始值,而不是从状态保持文件中,当它启动或重启里。

Sample Configuration Files简单配置文件

Note: Sample object configuration files are installed in the /usr/local/nagios/etc/ directory when you follow the quickstart installation guide. 注:一些对象配置文件在

/usr/local/nagios/etc/目录中可以找到,如果你按照快速安装向导安装的话。

Object Types

Host definitions

Host group definitions

Service definitions

Service group definitions

Contact definitions

Contact group definitions

Time period definitions

Command definitions

Service dependency definitions

Service escalation definitions

Host dependency definitions

Host escalation definitions

Extended host information definitions

Extended service information definitions

Host Definition 主机定义

Description: 描述

A host definition is used to define a physical server, workstation, device, etc. that resides on your network. 一个主机定义是用来定义一个物理的服务器,工作站,设备等。这些设备都在你的网络中。

Definition Format: 定义格式

Note: Directives in red are required, while those in black are optional. 注:红色为必须的,黑色为可选的

define host{

host_name host_name

alias alias

display_name display_name

address address

parents host_names

hostgroups hostgroup_names

check_command command_name

initial_state [o,d,u]

max_check_attempts #

check_interval #

retry_interval #

active_checks_enabled [0/1]

passive_checks_enabled [0/1]

check_period timeperiod_name

obsess_over_host [0/1]

check_freshness [0/1]

freshness_threshold #

event_handler command_name

event_handler_enabled [0/1]

low_flap_threshold #

high_flap_threshold #

flap_detection_enabled [0/1]

flap_detection_options [o,d,u]

process_perf_data [0/1]

retain_status_information [0/1]

retain_nonstatus_information [0/1]

contacts contacts

contact_groups contact_groups

notification_interval #

first_notification_delay #

notification_period timeperiod_name

notification_options [d,u,r,f,s]

notifications_enabled [0/1]

stalking_options [o,d,u]

notes note_string

notes_url url

action_url url

icon_image image_file

icon_image_alt alt_string

vrml_image image_file

statusmap_image image_file

2d_coords x_coord,y_coord

3d_coords x_coord,y_coord,z_coord

}

Example Definition: 定义实例

define host{

host_name bogus-router

alias Bogus Router #1

address 192.168.1.254

parents server-backbone

check_command check-host-alive

check_interval 5

retry_interval 1

max_check_attempts 5

check_period 24x7

process_perf_data 0

retain_nonstatus_information 0

contact_groups router-admins

notification_interval 30

notification_period 24x7

notification_options d,u,r

}

Directive Descriptions: 指令描述

host_name: This directive is used to define a short name used to identify

the host. It is used in host group and service definitions to

reference this particular host. Hosts can have multiple services

(which are monitored) associated with them. When used properly,

the $HOSTNAME$ macro will contain this short name. 这个指令

用来定义一个用来指定主机的短的名称。它被用在主机组和服务定义

中来引用这个特定的主机。主机可以有多个关联服务被监控。正确使

用的情况下,$HOSTNAME$宏将包含这个短的名称。

alias: This directive is used to define a longer name or description

used to identify the host. It is provided in order to allow you

to more easily identify a particular host. When used properly,

the $HOSTALIAS$ macro will contain this alias/description. 这

个指令被用来定义一个长的名称来描述指定的主机。提供它是用来允

许你更简单地识别特定主机。正确使用的情况下,$HOSTALIAS$宏包含

这个别名(或称描述)

address: This directive is used to define the address of the host.

Normally, this is an IP address, although it could really be

anything you want (so long as it can be used to check the status

of the host). You can use a FQDN to identify the host instead

of an IP address, but if DNS services are not available this could

cause problems. When used properly, the $HOSTADDRESS$ macro will

contain this address. Note: If you do not specify an address

directive in a host definition, the name of the host will be used

as its address. A word of caution about doing this, however -

if DNS fails, most of your service checks will fail because the

plugins will be unable to resolve the host name. 这个指令被

用来定义主机的地址。通常,这是一个IP地址。当然也可以是你想要

的任何值(只要它能被用来检查主机的状态)。你可以使用一个FQDN

(正式域名)代替使用ip地址来指定主机,但是如果DNS服务不可达

时这将会导致问题产生。$HOSTADDRESS$宏包含这个地址。注:如果你

在一个主机定义中不指定一个地址指令,主机的名称将被用来作为它

的地址。这种方式的一个警告:无论如何,如果DNS失败了,你的很

多服务将会失败,因为插件将不能处理主机名称。

display_name: This directive is used to define an alternate name that should

be displayed in the web interface for this host. If not

specified, this defaults to the value you specify for the

host_name directive. Note: The current CGIs do not use this

option, although future versions of the web interface will. 这

个指令被用来指定一个替换名称,它将会在web界面中显示来提示这

个主机。如果不指定,直接使用host_name指令。注:当前的CGI不

使用这个选项,虽然以的版本将会使用。

parents: This directive is used to define a comma-delimited list of short

names of the "parent" hosts for this particular host. Parent

hosts are typically routers, switches, firewalls, etc. that lie

between the monitoring host and a remote hosts. A router, switch,

etc. which is closest to the remote host is considered to be that

host's "parent". Read the "Determining Status and Reachability

of Network Hosts" document located here for more information.

If this host is on the same network segment as the host doing

the monitoring (without any intermediate routers, etc.) the host

is considered to be on the local network and will not have a

parent host. Leave this value blank if the host does not have

a parent host (i.e. it is on the same segment as the Nagios host).

The order in which you specify parent hosts has no effect on how

things are monitored. 这个指令用来指定一个分号分隔的特定的父

主机的短名称。父主机一般为路由器,交换机,防火墙等。它们都位

于监控主机和远端主机之间。这些路由器,交换机等距离远端主机最

近的即为主机的父。读xxxxxx文档来获取更多信息。如果这个主机和

监控主机在同一个网络(没有中间路由器),那么这个主机将没有父

主机。如果主机没有父主机这项留空。你所指定的父主机对如何监控

主机没有影响。

hostgroups: This directive is used to identify the short name(s) of the

hostgroup(s) that the host belongs to. Multiple hostgroups

should be separated by commas. This directive may be used as an

alternative to (or in addition to) using the members directive

in hostgroup definitions. 这个指令被用来指定主机所属的主机组

的短名称。多个主机组可以以逗号分隔。这个指令可以被用来作为主

机组中的members指令的一个替代。

check_command: This directive is used to specify the short name of the command

that should be used to check if the host is up or down. Typically,

this command would try and ping the host to see if it is "alive".

The command must return a status of OK (0) or Nagios will assume

the host is down. If you leave this argument blank, the host will

not be actively checked. Thus, Nagios will likely always assume

the host is up (it may show up as being in a "PENDING" state in

the web interface). This is useful if you are monitoring printers

or other devices that are frequently turned off. The maximum

amount of time that the notification command can run is

controlled by the host_check_timeout option. 这个指令用来指定

一个命令,这个命令用来检测该主机是否启动。通常,这个命令会试

着ping该主机来检查主机是否牌活动状态。这个命令必须返回一个状

态0,否则Nagios假设主机处于关机状态。如果你把这个参数留空,

主机将不会被动态检测。这样一来,Nagios会认为主机一直处于运行

状态(它也可能会在web界面中显示为PENDING状态)。它对于像打

印机或其它经常关闭的设备的监控非常有用。通知命令的最大可运行

时间由host_check_timeout选项控制。

initial_state: By default Nagios will assume that all hosts are in UP states

when it starts. You can override the initial state for a host

by using this directive. Valid options are: o = UP, d = DOWN,

and u = UNREACHABLE. Nagios在启动时默认假设所有主机都为启动

状态。你可以使用这个指令重写这个初始状态。可用的选项是o= UP,

d = DOWN, u = UNREACHABLE

max_check_attempts: This directive is used to define the number of times that Nagios

will retry the host check command if it returns any state other

than an OK state. Setting this value to 1 will cause Nagios to

generate an alert without retrying the host check. Note: If you

do not want to check the status of the host, you must still set

this to a minimum value of 1. To bypass the host check, just leave

the check_command option blank. 这个指令被用来定义主机检测命

令的重试次数,如果它返回一个非OK的状态的话。设置这个值为1

将会导致Nagios在不重试的情况下产生一个警告。注:如果你不想检

查主机的状态,你必须保持这个设置为最小值1.要绕过主机检测,请

保持check_command选项为空。

check_interval: This directive is used to define the number of "time units"

between regularly scheduled checks of the host. Unless you've

changed the interval_length directive from the default value of

60, this number will mean minutes. More information on this value

can be found in the check scheduling documentation. 这个指令

用来定义一个时间单位的数量(来指定时间),在这个时间内定期执

行检测主机操作。如果你没有修改interval_length指令的默认值60,

这个数量代表多少分钟。

retry_interval: This directive is used to define the number of "time units" to

wait before scheduling a re-check of the hosts. Hosts are

rescheduled at the retry interval when they have changed to a

non-UP state. Once the host has been retried max_check_attempts

times without a change in its status, it will revert to being

scheduled at its "normal" rate as defined by the check_interval

value. Unless you've changed the interval_length directive from

the default value of 60, this number will mean minutes. More

information on this value can be found in the check scheduling

documentation. 这个指令被用来定义等待重新检测主机的时间单位

数量。当主机状态变为一个非启动状态时,主机会被计划一个重试定

时。当主机被重试次数达到max_check_attempts而主机的状态仍然没

有改变,它将会切换到一个由check_interval指定的计划任务在它的

normal级别。如果你没有修改interval_length指令的默认值60,这

个值代表分钟数。

active_checks_enabled *: This directive is used to determine whether or not active checks

(either regularly scheduled or on-demand) of this host are

enabled. Values: 0 = disable active host checks, 1 = enable

active host checks (default). 这个指令用来决定主动主机检查

(无论是定期任务还是基于询问的)是否开启。

passive_checks_enabled *: This directive is used to determine whether or not passive checks

are enabled for this host. Values: 0 = disable passive host

checks, 1 = enable passive host checks (default). 被动主机检

check_period: This directive is used to specify the short name of the time

period during which active checks of this host can be made. 主

动检测被构造的时间。

obsess_over_host *: This directive determines whether or not checks for the host will

be "obsessed" over using the ochp_command.

check_freshness *: This directive is used to determine whether or not freshness

checks are enabled for this host. Values: 0 = disable freshness

checks, 1 = enable freshness checks (default).

freshness_threshold: This directive is used to specify the freshness threshold (in

seconds) for this host. If you set this directive to a value of

0, Nagios will determine a freshness threshold to use

automatically. ?

event_handler: This directive is used to specify the short name of the command

that should be run whenever a change in the state of the host

is detected (i.e. whenever it goes down or recovers). Read the

documentation on event handlers for a more detailed explanation

of how to write scripts for handling events. The maximum amount

of time that the event handler command can run is controlled by

the event_handler_timeout option. 这个指令用来指定一个命令的

短名字,这个命令会在主机状态被改变时运行(无论是主机down掉或

恢复)。命令的超时时间由event_handler_timeout选项控制。event_handler_enabled *: This directive is used to determine whether or not the event

handler for this host is enabled. Values: 0 = disable host event

handler, 1 = enable host event handler. 这个指令用来决定这个

主机的事件处理是否被启用。0不起用,1起用。

low_flap_threshold: This directive is used to specify the low state change threshold

used in flap detection for this host. More information on flap

detection can be found here. If you set this directive to a value

of 0, the program-wide value specified by the

low_host_flap_threshold directive will be used. 这个指令用来

指定一个抖动检测的最小阀值。如果指定该值为0,则使用程序级的

low_host_flap_threshold指令值。

high_flap_threshold: This directive is used to specify the high state change threshold

used in flap detection for this host. More information on flap

detection can be found here. If you set this directive to a value

of 0, the program-wide value specified by the

high_host_flap_threshold directive will be used. 同上。指定

最大值。

flap_detection_enabled *: This directive is used to determine whether or not flap detection

is enabled for this host. More information on flap detection can

be found here. Values: 0 = disable host flap detection, 1 = enable

host flap detection. 是否启用抖动检测

flap_detection_options: This directive is used to determine what host states the flap

detection logic will use for this host. Valid options are a

combination of one or more of the following: o = UP states, d

= DOWN states, u = UNREACHABLE states. 这个指令用来决定该主

机的flap_detection_logic将使用的主机状态。o = UP states, d =

DOWN states, u = UNREACHABLE states.

process_perf_data *: This directive is used to determine whether or not the processing

of performance data is enabled for this host. Values: 0 = disable

performance data processing, 1 = enable performance data

processing. 这个指令用来决定主机是否使用性能处理数据。0为禁

用性能数据处理,1为启用。

retain_status_information: This directive is used to determine whether or not

status-related information about the host is retained across

program restarts. This is only useful if you have enabled state

retention using the retain_state_information directive. Value:

0 = disable status information retention, 1 = enable status

information retention. 这个指令用来决定程序重启时是否保持主

机状态相关信息。这一项只有使用retain_state_information指令启

用了状态保持时才可用。0为禁用,1为启用

retain_nonstatus_information: This directive is used to determine whether or not non-status

information about the host is retained across program restarts.

This is only useful if you have enabled state retention using

the retain_state_information directive. Value: 0 = disable

non-status information retention, 1 = enable non-status

information retention. 这个指令用来决定当程序重启时是否保持

主机的非状态信息。这一项只有使用retain_state_information指令

启用了状态保持时才可用。0为禁用,1为启用

contacts: This is a list of the short names of the contacts that should

be notified whenever there are problems (or recoveries) with

this host. Multiple contacts should be separated by commas.

Useful if you want notifications to go to just a few people and

don't want to configure contact groups. You must specify at least

one contact or contact group in each host definition. 这是一

个列表,当主机出现问题或恢复时通知的列表中的联系人。多个联系

人用逗号分隔。如果不想配置contact_groups选项或者要通知很少部

分人的情况下这个指令很有用。在每个主机定义中,你必须为主机指

定至少一个联系人或联系人组。

contact_groups: This is a list of the short names of the contact groups that

should be notified whenever there are problems (or recoveries)

with this host. Multiple contact groups should be separated by

commas. You must specify at least one contact or contact group

in each host definition. 主机出问题或恢复时通知的联系人组。多

个联系人组用逗号分隔。每个主机至少指定一个联系人或联系人组。notification_interval: This directive is used to define the number of "time units" to

wait before re-notifying a contact that this service is still

down or unreachable. Unless you've changed the interval_length

directive from the default value of 60, this number will mean

minutes. If you set this value to 0, Nagios will not re-notify

contacts about problems for this host - only one problem

notification will be sent out.

first_notification_delay: This directive is used to define the number of "time units" to

wait before sending out the first problem notification when this

host enters a non-UP state. Unless you've changed the

interval_length directive from the default value of 60, this

number will mean minutes. If you set this value to 0, Nagios will

start sending out notifications immediately.

notification_period: This directive is used to specify the short name of the time

period during which notifications of events for this host can

be sent out to contacts. If a host goes down, becomes

unreachable, or recoveries during a time which is not covered

by the time period, no notifications will be sent out. notification_options: This directive is used to determine when notifications for the

host should be sent out. Valid options are a combination of one

or more of the following: d= send notifications on a DOWN state,

u = send notifications on an UNREACHABLE state, r = send

notifications on recoveries (OK state), f= send notifications

when the host starts and stops flapping, and s = send

notifications when scheduled downtime starts and ends. If you

specify n(none) as an option, no host notifications will be sent

out. If you do not specify any notification options, Nagios will

assume that you want notifications to be sent out for all

possible states. Example: If you specify d,r in this field,

notifications will only be sent out when the host goes DOWN and

when it recovers from a DOWN state.

notifications_enabled *: This directive is used to determine whether or not notifications

for this host are enabled. Values: 0 = disable host

notifications, 1 = enable host notifications.

stalking_options: This directive determines which host states "stalking" is

enabled for. Valid options are a combination of one or more of

the following: o= stalk on UP states, d= stalk on DOWN states,

and u = stalk on UNREACHABLE states. More information on state

stalking can be found here.

notes: This directive is used to define an optional string of notes

pertaining to the host. If you specify a note here, you will see

the it in the extended information CGI (when you are viewing

information about the specified host).

notes_url: This variable is used to define an optional URL that can be used

to provide more information about the host. If you specify an

URL, you will see a red folder icon in the CGIs (when you are

viewing host information) that links to the URL you specify here.

Any valid URL can be used. If you plan on using relative paths,

the base path will the the same as what is used to access the

CGIs (i.e. /cgi-bin/nagios/). This can be very useful if you want

to make detailed information on the host, emergency contact

methods, etc. available to other support staff.

action_url: This directive is used to define an optional URL that can be used

to provide more actions to be performed on the host. If you

specify an URL, you will see a red "splat" icon in the CGIs (when

you are viewing host information) that links to the URL you

specify here. Any valid URL can be used. If you plan on using

relative paths, the base path will the the same as what is used

to access the CGIs (i.e. /cgi-bin/nagios/).

icon_image: This variable is used to define the name of a GIF, PNG, or JPG

image that should be associated with this host. This image will

be displayed in the various places in the CGIs. The image will

look best if it is 40x40 pixels in size. Images for hosts are

assumed to be in the logos/ subdirectory in your HTML images

directory (i.e. /usr/local/nagios/share/images/logos).

icon_image_alt: This variable is used to define an optional string that is used

in the ALT tag of the image specified by the

argument.

vrml_image: This variable is used to define the name of a GIF, PNG, or JPG

image that should be associated with this host. This image will

be used as the texture map for the specified host in the statuswrl

CGI. Unlike the image you use for the variable, this

one should probably not have any transparency. If it does, the

host object will look a bit wierd. Images for hosts are assumed

to be in the logos/ subdirectory in your HTML images directory

(i.e. /usr/local/nagios/share/images/logos).

statusmap_image: This variable is used to define the name of an image that should

be associated with this host in the statusmap CGI. You can

specify a JPEG, PNG, and GIF image if you want, although I would

strongly suggest using a GD2 format image, as other image formats

will result in a lot of wasted CPU time when the statusmap image

is generated. GD2 images can be created from PNG images by using

the pngtogd2utility supplied with Thomas Boutell's gd library.

The GD2 images should be created in uncompressed format in order

to minimize CPU load when the statusmap CGI is generating the

network map image. The image will look best if it is 40x40 pixels

in size. You can leave these option blank if you are not using

the statusmap CGI. Images for hosts are assumed to be in the

logos/ subdirectory in your HTML images directory (i.e.

/usr/local/nagios/share/images/logos).

2d_coords: This variable is used to define coordinates to use when drawing

the host in the statusmap CGI. Coordinates should be given in

positive integers, as they correspond to physical pixels in the

generated image. The origin for drawing (0,0) is in the upper

left hand corner of the image and extends in the positive x

direction (to the right) along the top of the image and in the

positive y direction (down) along the left hand side of the

image. For reference, the size of the icons drawn is usually

about 40x40 pixels (text takes a little extra space). The

coordinates you specify here are for the upper left hand corner

of the host icon that is drawn. Note: Don't worry about what the

maximum x and y coordinates that you can use are. The CGI will

automatically calculate the maximum dimensions of the image it

creates based on the largest x and y coordinates you specify. 3d_coords: This variable is used to define coordinates to use when drawing

the host in the statuswrl CGI. Coordinates can be positive or

negative real numbers. The origin for drawing is (0.0,0.0,0.0).

For reference, the size of the host cubes drawn is 0.5 units on

each side (text takes a little more space). The coordinates you

specify here are used as the center of the host cube.

Host Group Definition

Description:

A host group definition is used to group one or more hosts together for simplifying configuration with object tricks or display purposes in the CGIs.

Definition Format:

Note: Directives in red are required, while those in black are optional.

define hostgroup{

hostgroup_name hostgroup_name

alias alias

members hosts

hostgroup_members hostgroups

notes note_string

notes_url url

action_url url

}

Example Definition:

define hostgroup{

hostgroup_name novell-servers

alias Novell Servers

members netware1,netware2,netware3,netware4

}

Directive Descriptions:

hostgroup_name: This directive is used to define a short name used to identify the host group. alias: This directive is used to define is a longer name or description used to identify the host group. It is provided in order to allow you to more easily

identify a particular host group.

members: This is a list of the short names of hosts that should be included in this group. Multiple host names should be separated by commas. This directive

may be used as an alternative to (or in addition to) the hostgroups directive

in host definitions.

hostgroup_members: This optional directive can be used to include hosts from other "sub" host groups in this host group. Specify a comma-delimited list of short names

of other host groups whose members should be included in this group. notes: This directive is used to define an optional string of notes pertaining to the host. If you specify a note here, you will see the it in the extended

information CGI (when you are viewing information about the specified host). notes_url: This variable is used to define an optional URL that can be used to provide more information about the host group. If you specify an URL, you will see

a red folder icon in the CGIs (when you are viewing hostgroup information)

that links to the URL you specify here. Any valid URL can be used. If you

plan on using relative paths, the base path will the the same as what is

used to access the CGIs (i.e. /cgi-bin/nagios/). This can be very useful

if you want to make detailed information on the host group, emergency contact

methods, etc. available to other support staff.

action_url: This directive is used to define an optional URL that can be used to provide more actions to be performed on the host group. If you specify an URL, you

will see a red "splat" icon in the CGIs (when you are viewing hostgroup

information) that links to the URL you specify here. Any valid URL can be

used. If you plan on using relative paths, the base path will the the same

as what is used to access the CGIs (i.e. /cgi-bin/nagios/).

Service Definition

Description:

A service definition is used to identify a "service" that runs on a host. The term "service" is used very loosely. It can mean an actual service that runs on the host (POP, SMTP, HTTP, etc.) or some other type of metric associated with the host (response to a ping, number of logged in users, free disk space, etc.). The different arguments to a service definition are outlined below.

Definition Format:

Note: Directives in red are required, while those in black are optional.

define service{

host_name host_name

hostgroup_name hostgroup_name

service_description service_description

display_name display_name

servicegroups servicegroup_names

is_volatile [0/1]

check_command command_name

initial_state [o,w,u,c]

max_check_attempts #

check_interval #

retry_interval #

active_checks_enabled [0/1]

passive_checks_enabled [0/1]

check_period timeperiod_name

obsess_over_service [0/1]

check_freshness [0/1]

freshness_threshold #

event_handler command_name

event_handler_enabled [0/1]

low_flap_threshold #

high_flap_threshold #

flap_detection_enabled [0/1]

flap_detection_options [o,w,c,u]

process_perf_data [0/1]

retain_status_information [0/1]

retain_nonstatus_information [0/1]

notification_interval #

first_notification_delay #

notification_period timeperiod_name

notification_options [w,u,c,r,f,s]

notifications_enabled [0/1]

contacts contacts

contact_groups contact_groups

stalking_options [o,w,u,c]

notes note_string

notes_url url

action_url url

icon_image image_file

icon_image_alt alt_string

}

Example Definition:

define service{

4.nagios报警配置

nagios报警配置 一、邮件报警 修改contact.cfg中联系人模版的email,如下 安装邮件客户端组件mailx yum install mailx 关闭nagios客户端服务器,产生警报 当尝试连接次数达到4次时,发送通知,收到邮件

二、微信报警 微信报警的原理是使用微信开放的API来向微信发送信息1.创建微信企业微信 登录https://https://www.360docs.net/doc/c516110924.html,/,创建企业微信

创建后,添加第三方应用 添加成功后,取得secret

在我的企业中,取得CorpID 下载python脚本https://https://www.360docs.net/doc/c516110924.html,/zhangnq/nagios/tree/master/weixin,上传到nagios服务器中,放到/usr/local/nagios/python/weixin目录下 修改config-sanmple.py 文件名为config.py并修改内容如下

修改文件权限 chmod 777 config.py NotifyByWeixin.py commands.cfg命令文件中添加weixin命令,如下 define command{ command_name notify-host-by-weixin command_line /usr/local/nagios/python/weixin/NotifyByWeixin.py "host-@@-$NOTIFICATIONTYPE$-@@-$HOSTNAME$-@@-$HOSTSTATE$-@@-$HOSTAD DRESS$-@@-$HOSTOUTPUT$-@@-$CONTACTALIAS$" } define command{ command_name notify-service-by-weixin command_line /usr/local/nagios/python/weixin/NotifyByWeixin.py "service-@@-$NOTIFICATIONTYPE$-@@-$SERVICEDESC$-@@-$HOSTALIAS$-@@-$HO STADDRESS$-@@-$SERVICESTATE$-@@-$SERVICEOUTPUT$-@@-$CONTACTALIAS$" } templates.cfg模板文件中添加联系人模板 define contact{ name weixin-contact service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r,f,s host_notification_options d,u,r,f,s service_notification_commands notify-service-by-weixin host_notification_commands notify-host-by-weixin register 0 } contacts.cfg联系人中添加微信通知联系人,这里alias名字要和微信公共平台通讯录中名字帐号一样才会发送成功。 define contact{ contact_name zhangnq-weixin

nagios监控mysql详细配置

Nagios监控mysql 工作原理 利用特定的用户定期访问指定的mysql数据库。当不能访问或连不通时则报警。 配置过程如下 1、在生产库上安装nagios插件, 以nagios-plugins-1.4.15为例来讲述整个配置过程 进入nagios-plugins-1.4.15 目录 ./configure 编译 然后是编译完显示一定要有mysql支持,不然没有check_mysql这个插件 config.status: creating po/Makefile --with-apt-get-command: --with-ping6-command: /bin/ping6 -n -U -w %d -c %d %s --with-ping-command: /bin/ping -n -U -w %d -c %d %s --with-ipv6: yes --with-mysql: /usr/bin/mysql_config --with-openssl: yes --with-gnutls: no --enable-extra-opts: no --with-perl: /usr/bin/perl --enable-perl-modules: no --with-cgiurl: /nagios/cgi-bin --with-trusted-path: /bin:/sbin:/usr/bin:/usr/sbin --enable-libtap: no Make all&&make install 插件安装完毕 再安装nrpe 安装过程略同时是编译安装三步曲:./configure make make install 2、配置mysql mysql> create database nagios; //建立nagios专用数据 Query OK, 1 row affected (0.00 sec) mysql> grant select on nagios.* to nagios@'%' identified by '######'; //建立nagios专用用户权限只给select就OK,给多了不安全。###是密码 Query OK, 0 rows affected (0.00 sec) mysql> flush privileges; //刷新权限让数据库接受新的配置 Query OK, 0 rows affected (0.00 sec) mysql> select User,Password,Host from user;//查询一下 +--------+-----------------------------------------------------------------------+------------------------------------+ | User | Password | Host | +--------+-----------------------------------------------------------------------+------------------------------------+ | root | *FD571203974BA9AFE270FE62151AE967ECA5E0AA | localhost | | root | | localhost.localdomain | | root | | 127.0.0.1 |

样章_海量运维监控系统规划与部署(基于Linux+Nagios+Centreon+NagVis等)

企业级IT监控系统概述 众所周知,随着中国经济的迅猛发展,国内企业的信息化发展也取得了前所未有的 成就,无论是部署规模还是运维规模都变得庞大起来。伴随而来的企业信息化需求逐步迈向多元化,层次化,异构化,使得IT基础框架和上层应用日益复杂。为了确保信息服务质量、提升安全性,对于在此类企业从事IT运维工作的管理人员和技术人员来讲,如何及时获得信息系统告警信息、迅速定位故障原因、快速高效地处理各类IT问题、降低故障率和故障响应时间等等,就称为亟待解决的问题和难点。 目前来说,很多企业的核心业务都已经完全信息化。为了确保业务稳定可靠,并且快速有效地开展,企业经常会运用多个信息系统进行消息传递和系统交互,从而加大了故障定位的时间和问题解决的难度。面对系统宕机或者服务中断,每一位负责任的IT运维管理人员在面对用户的投诉、领导的问责、同事们的紧张时,无不在殚精竭虑地思考如何能够快速准确地定位系统故障,及时采取有效手段使故障能够快速解决,业务能够及时恢复。如此一来,研发并部署一套适合企业特点的,能够统一管理和展现各种监控资源,实现集中告警,全面协助IT运维管理人员实时掌握系统整体运行状态,快速定位故障,缩短处理时间的企业级IT运维监控系统就显得迫在眉睫了。 什么是IT运维监控系统 既然IT运维监控系统这么重要,那么究竟什么才是IT运维监控系统呢? 所谓IT运维监控系统,有如下两层含义-“监”指的是对其他服务器的检测、监视;“控”指的是对其他服务器的控制,掌控。IT运维监控系统往往是一套独立的信息系统、或者是若干信息系统的集合,用以对其他信息系统进行问题检测,甚至能够实现对其他信息系统进行部分或者完全的远程控制。 例如,就服务器检测而言,监控系统能够周期性地连接到一个HTTP服务器上,检测其是否能够正常响应浏览器的请求。又例如,监控系统能够接收系统管理人员的指令,在被监控的服务器上执行一个脚本,完成某项控制类操作。这一切听起来好像很简单,但是别忘了,许多商业性质的系统监控软件都不再是简单的单一软件,而是摇身一变,成为多个组件在一起才能发挥作用的“套件”,且售价动辄都是上百万人民币,还不算上后期的实施和维

nagios监控系统手册详细操作

nagios网络监控 Nagios是什么: Nagios是一款开源的免费网络监视工具,能有效监控Windows、Linux和Unix的主机状态,交换机路由器等网络设置,打印机等。在系统或服务状态异常时发出邮件或短信报警第一时间通知网站运维人员,在状态恢复后发出正常的邮件或短信通知。 nagios是功能强大的监控软件,主要用来监控网络设备的状态(比如:主机的资源状态);适合于:对大量的服务器进行监控,判断其负载或服务是否正常,发生异常能通过邮件、短信报警。特别注意:流量监控不是他的强项,流量监控建议使用cacti.可以绘制非常直观的图形 nagios能监视什么: nagios可以监控:1、主机是否宕机(通过ping命令,如果ping不通会认为主机属于宕机状态,但不影响所监控的其他服务);2、服务器资源(cpu使用率、硬盘剩余空间等);3、网络服务(smtp\pop3\http\);4、监控网络设备(路由器、交换机等。) 一、RHEL系统上部署Nagios:(禁用selinux功能) 系统环境:RHEL,在nagios主机上监控mysql服务器 nagios 主机:192.168.10.100 mysql 主机: 192.168.10.101 操作步骤: 1、安装编译所需的软件包:如下图所示: # yum –y install httpd php-* gd-* mysql-devel (若mysql-devel包不安装,会没有check_mysql插件。) 2、创建运行nagios服务的用户 注:useradd nagios #创建运行nagios服务的用户 usermod -G nagios apache #使apache用户对nagios目录具有写权限,不 然web页面操作失败. 3、nagios软件安装 释放nagios源码包,进行编译前的预备置: 编译并安装nagios及相关操作,如下图所示: 注: make install //安装主程序,CGI和HTML文件 make install-init //在/etc/rc.d/init.d安装启动脚本

在 CentOS 6.4 下安装 Nagios

肖祥洲的技术文档 RSS Search ? RSS 文档 存档 关于 在 CentOS 6.4 下安装 Nagios Sep 12th, 2013 软件环境说明 运行环境 操作系统:64位CentOS 6.4版本(安装在VirtualBox 4.2.16虚拟机中) Nagiox版本:3.5.1 IP地址:10.8.9.192(静态) 主机名:nagios-server 缺省内存:512M 硬盘空间:24G 根用户 用户名:root 密码:nmsroot 用于运行Nagios 服务端的用户 用户名:nagios 密码:nagios 所属用户组:nagios, nagcmd 虚拟机所在的宿主系统 操作系统:Windows 7, 64位家庭版 IP地址:10.8.9.195(静态) 用户名:numax 密码:numax 内存:2G 安装说明 本文描述从 Nagios 源代码安装配置的步骤。 IBM DeveloperWorks 有一篇比较好的参考文档在 https://www.360docs.net/doc/c516110924.html,/developerworks/cn/linux/1309_luojun_nagios/index.html,但是这篇文档比较复杂。 也可以参考 Nagios官方的文档,相对要简单一些,在这个地址 https://www.360docs.net/doc/c516110924.html,/downloads/nagioscore/docs/Installing_Nagios_Core_From_Source.pdf。 1. 安装、配置及运行 Nagios 服务端需要的软件 #y u m i n s t a l l h t t p d p h p g c c g l i b c g l i b c-c o m m o n g d g d-d e v e l o p e n s s l-d e v e l w g e t p e r l m a k e n e t-s n m p w g e t 编辑 /e t c/h t t p d/c o n f/h t t p d.c o n f,加入下面的配置参数: ... S e r v e r N a m e l o c a l h o s t:80 ... 2. 下载 Nagios服务端及其插件源代码 Nagios服务端的源代码地址在 https://www.360docs.net/doc/c516110924.html,/projects/nagios/files/nagios-3.x/nagios-3.5.1,插件的地址在 https://www.360docs.net/doc/c516110924.html,/projects/nagiosplug/files/nagiosplug/1.4.16。执行如下命令下载: #m a k d i r s o f t #c d s o f t #w g e t h t t p://s o u r c e f o r g e.n e t/p r o j e c t s/n a g i o s/f i l e s/n a g i o s-3.x/n a g i o s-3.5.1/n a g i o s-3.5.1.t a r.g z/d o w n l o a d #w g e t h t t p://s o u r c e f o r g e.n e t/p r o j e c t s/n a g i o s p l u g/f i l e s/n a g i o s p l u g/1.4.16/n a g i o s-p l u g i n s-1.4.16.t a r.g z/d o w n l o a d #l s n a g i o s-3.5.1.t a r.g z n a g i o s-p l u g i n s-1.4.16.t a r.g z 3. 创建运行 Nagios服务端的用户和组 #u s e r a d d n a g i o s #g r o u p a d d n a g c m d #u s e r m o d-a-G n a g c m d n a g i o s 4. 编译、安装、配置及运行Nagios服务端 解压 Nagios:

Nagios安装配置总结

1.Nagios介绍 Nagios是一个监视系统运行状态和网络信息的监视系统。Nagios能监视所指定的本地或远程主机以及服务,同时提供异常通知功能等。 Nagios可运行在Linux/Unix平台之上,同时提供一个可选的基于浏览器的WEB界面以方便系统管理人员查看网络状态,各种系统问题,以及日志等等。 1.1监控范围 1、监控网络服务(SMTP、POP3、HTTP、NNTP、PING等); 2、监控主机资源(处理器负荷、磁盘、内存利用率等); 3、简单地插件设计使得用户可以方便地扩展自己服务的检测方法; 4、并行服务检查机制; 5、当服务或主机问题产生与解决时将告警发送给联系人(通过EMail、短信、用户定义方式); 6、自动的日志滚动功能; 7、可以支持并实现对主机的冗余监控; 8、可选的WEB界面用于查看当前的网络状态、通知和故障历史、日志文件等; 1.2监控方式 Nagios通常由一个主程序(Nagios)、一个插件程序(Nagios-plugins)和四个可选的ADDON(NRPE、NSCA、NSClient++和NDOUtils)组成。Nagios的监控工作都是通过插件实现的,因此,Nagios和Nagios-plugins是服务器端工作所必须的组件。而四个ADDON中,NRPE用来在监控的远程Linux/Unix 主机上执行脚本插件以实现对这些主机资源的监控;NSCA用来让被监控的远程Linux/Unix主机主动将监控信息发送给Nagios服务器(这在冗余监控模式中特别要用到);NSClient++是用来监控Windows主机时安装在Windows主机上的组件;而NDOUtils则用来将Nagios的配置信息和各event产生的数据存入数据库,以实现这些数据的快速检索和处理。这四个ADDON(附件)中,NRPE和NSClient++工作于客户端,NDOUtils工作于服务器端,而NSCA则需要同时安装在服务器端和客户端。 目前,Nagios只能安装在Linux系统主机上,其编译需要用到gcc。同时,如果打算使用web界面的管理工具的话,还需要有apache服务器和GD图形库的支持。

Nagios监控Linux主机(NRPE安装与应用)

Nagios监控Linux主机(NRPE安装与应用) 一、NRPE简介及工作原理 NRPE是nagios的一个扩展,它被用于被监控的服务器上,向nagios监控平台提供该服务器的一些本地的情况。例如,cpu负载、内存使用、硬盘使用等等。NRPE可以称为nagios 的for linux 客户端。 NRPE 由两个部分组成:工作在监控机一侧的check_nrpe 插件、工作在被监控机一侧的NRPE 守护进程。 Nagios 服务器执行check_nrpe 插件并告诉他检查哪个服务,check_nrpe 插件通过SSL 连接方式联系远程服务器上的NRPE 守护进程,NRPE 守护进程执行相应的插件完成指定的检查,并返回结果。 工作原理是:插件nrpe在被监控机上开启一个daemon,通过这个daemon来和监控主机建立一条ssl加密通道,通过这条通道来传送被监控机的本地信息,达到监控的目的。装在被监控机上的daemon就相当于一个nagios的传递员,命令行从nagios监控主机发出,然后daemon接受到信息,就会执行这条命令行,执行的方式,其实是和nagios主机是一样的,所以被监控机上也需要装一套nagios-plugins插件。例如:nagios主机需要监控被监控机的硬盘信息,就会对被监控机发出一条命令说:“我要看你的硬盘信息。”被监控机nrpe的daemon接到这个命令之后,就会运行一个插件,来检查被监控机本地硬盘的信息,然后插件把信息反馈到nrpe,nrpe通过ssl通道再把这些信息反馈到nagios主机。如下图所示 二、NRPE安装 1、所需安装包nrpe、nagios-plugins,这两个包都可以从https://www.360docs.net/doc/c516110924.html,上得到,本例为nrpe-2.12.tar.gz 2、安装openssl、openssl-devel; yum install -y openssl yum install -y openssl-devel 3、安装nrpe和nagios-plugins插件

Nagios安装与配置详解

Nagios学习笔记之(一)最初搭建 2012-07-17 13:05:08 标签:linux监控nagios cacti 声明:原创作品,如需,请与作者联系。否则将追究法律责任。 Nagios学习笔记之一最初搭建 一、简介: Nagios是一款开源的免费网络监视工具,能有效监控Windows、Linux和Unix的主机状态,交换机路由器等网络设置,打印机等。在系统或服务状态异常时发出或短信报警第一时间通知运维人员,在状态恢复后发出正常的或短信通知。 二、搭建过程: 2.1安装前: 2.1.1安装依赖包,下载源程序包 1.#cd /etc/yum.repos.d/ 2.#rm -fr ./* 3.#wget wget mirrors.163./.help/CentOS-Base-163.repo 4.#yum makecache #删除系统自带的yum源,下载网易的网络源并更新缓存 1.#yum -y install gcc glibc glibc-common gd gd-devel httpd #安装必须的依赖包 1.#wget https://www.360docs.net/doc/c516110924.html,/sourceforge/nagios/nagios-3.4.1.tar.gz 2.#wget https://www.360docs.net/doc/c516110924.html,/sourceforge/nagiosplug/nagios-plugins-1.4.15.ta r.gz

#下载nagios主程序以及插件程序 2.1.2正式安装: 1.#groupadd nagcmd 2.#useradd -G nagcmd nagios 3.#usermod -G nagcmd apache #创建一个用户组名为nagcmd用于从Web接口执行外部命令。将nagios用户和apache用户都加到这个组中。 1.#tar zxf nagios-3.4.1.tar.gz 2.#cd nagios 3.#./configure --prefix=/usr/local/nagios --with-command-group=nagcmd 4.#解压程序包,并进行预编译前的配置(默认用户就是nagios,所以只需指定组) 5.#make all #编译Nagios程序包源 码 6.#make install #安装二进制运行程序 7.#make install-init #初始化脚本 8.#make install-config #配置文件样本 9.#make install-commandmode #设置运行目录权限 10.#make install-webconf #安装Nagios的WEB配置文件到Apache 的conf.d目录下 #htpasswd -c /usr/local/nagios/etc/https://www.360docs.net/doc/c516110924.html,ers nagiosadmin #创建一个nagiosadmin的用户用于登录Nagios的web界面。 #service httpd restart #重启apache使服务生效 1.#tar zxf nagios-plugins-1.4.15.tar.gz 2.#cd nagios-plugins-1.4.15 3.#./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios 4.#make && make install 5.#编译安装nagios插件

centos5.5下安装nagios与fetion

为了稳定起见,我用的比较成熟的老版本程序,主程序:nagios3.0.6 yum -y install httpd gcc glibc glibc-common gd gd-devel yum -y install openssl-devel(不做这步,安装nrpe会出现checking for SSL headers... configure: error: Cannot find ssl headers错误) 先安装好 hhtpd gcc gd 库等依赖程序。 以下操作均在nagios主程序所在机器进行。 安装前准备: 1.新建nagios用户及用户组 useradd nagios password nagios (设置密码) 2.修改安装文件夹权限 chown nagios.nagios /usr/local/nagios 一、安装nagios主程序 tar -zxvf nagios-3.0.6.tar.gz cd nagios-3.0.6 ./configure –prefix=/usr/local/nagios –with-command-group=nagios make all make install make install-init make install-config make install-commandmode ls /usr/local/nagios (查看是否有etc、bin、sbin、share、var、libexec这六个目录,如果有,表示安装成功) cd .. 二、安装nagios-plugins插件 1、tar -zxvf nagios-plugins-1.4.9.tar.gz cd nagios-plugins-1.4.9 ./configure --prefix=/usr/local/nagios --with-nagios-user-nagios --with-nagios-group=nagios make make install ls /usr/local/nagios/libexec(会显示很多插件) 2、将apache的运行用户加到nagios组里面 从httpd.conf中过滤出当前的apache运行用户: grep ^User /etc/httpd/conf/httpd.conf User apache(返回值) 我的是apache,下面将这个用户加入nagios组 usermod -G nagios apache 3、修改apache配置文件 vi /etc/httpd/conf/httpd.conf shift+g 跳至文件最后,并加入如下内容: #setting for nagios 20090325 #setting by https://www.360docs.net/doc/c516110924.html, ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin

nagios 邮件报警配置

nagios 邮件报警配置 张映发表于 2012-04-20 分类目录:服务器相关 nagios最主要优点就是在于它的报警功能,下面讲解一下用nagios来发邮件警告。 一,配置contacts.cfg 查看复制打印? 1.vim /etc/nagios/objects/contacts.cfg 2. 3.define contact{ //30行 4. contact_name nagiosadmin 5. use generic-contact 6. alias Nagios Admin 7. email xxxx@https://www.360docs.net/doc/c516110924.html, //这里 改成自己的邮箱 8. } 二,配置nagios.cfg和localhost.cfg 查看复制打印? 1.vim /etc/nagios/nagios.cfg 2. 3.enable_notifications=1 //807行,开启报警 4. 5.上面的总的开启也就是nagios装的所有插件,出现问题都会报警 6. 7.vim /etc/nagios/objects/localhost.cfg 8. 9.define service{ //以processes为例 10. use local-service ; Name of service template to use 11. host_name localhost 12. service_description Total Processes 13. check_command check_local_procs!250!4 00!RSZDT 14. notifications_enabled 1 15. }

nagios监控安装文档

监控之神—nagios 我的Nagios的艰辛.....网络监控之神(一) nagios真的很强大,对于监控的windows/linux/switch/printer都能很好的表示,并且可以对其中的服务进行监控。nagios最强大的就是报警功能,email/msn/飞信/网络电话。开源就是好啊!!就好像我配置mail邮件系统一般!! 过程是艰辛的,结果是美好的! 第一。先看基本的nagios安装! 下载nagios-3.0.3.tar.gz -----------------------主程序 nagios-plugins-1.4.12.tar.gz------------------插件 nagios-nrpe_2.8.1.orig.tar.gz -------------监控Linux需要 nsclient++0.3.3 ------------------监控windows需要 官方文档 https://www.360docs.net/doc/c516110924.html, 英文的! 开始安装 1。 yum install gcc yum install glibc glibc-commonyum install gd gd-devel 2。自己编译安装apache2.0。安装哪里随便,(我这里是自己编译的httpd,么有用系统自带的) 3。安装nagios-3.0.3.tar.gz ======================= /usr/sbin/useradd nagios passwd nagios 输入密码 /usr/sbin/groupadd nagcmd /usr/sbin/usermod -G nagcmd nagios /usr/sbin/usermod -G nagcmd apache ===================== tar xzf nagios-3.0.3.tar.gz ======================= cd nagios-3.0.3 ======================= ./configure --with-command-group=nagcmd (不用跟参数,默认安装在/usr/local/nagios) ================ make all

我的Centreon+Nagios+NRPE+NSClient++配置手册

Centreon+Nagios+Nrpe+NSClient++完整配置
[简介] Centreon 是开源的 IT 监控软件,由法国人于 2003 年开发,最初名为 Oreon,并于 2005 年正式更名为 Centreon。 Nagios 是一个监视系统运行状态和网络信息的监视系统。Nagios 能监视所指定的本地或远程主机以及 服务,同时提供异常通知功能等。Nagios 可运行在 Linux/Unix 平台之上,同时提供一个可选的基于浏览器 的 WEB 界面以方便系统管理人员查看网络状态,各种系统问题,以及日志等等。 NRPE 是 Nagios 的一个扩展,它被用于被监控的服务器上,向 Nagios 监控平台提供该服务器的一些本 地的情况。例如,cpu 负载、内存使用、硬盘使用等等。NRPE 可以称为 Nagios 的 For Linux 客户端。 (SSL (安全套接层)相比 SSH(安全外壳协议)安全性略低,但是易用性和系统资源消耗不如 NRPE)
NSClient++ 是 Nagios 监控系统在 Windows 下的客户端软件。 Centreon 作为 Nagios 的分布式监控管理平台,其功能之强大,打造了 Centreon 在 IT 监控方面强势地 位,它的底层使用 Nagios 监控软件,Nagios 通过 Ndoutil 模块将监控数据写入数据库,Centreon 读取该数 据并即时的展现监控信息, 通过 Centreon 可以简单地管理和配置所有 Nagios, 因此, 完全可以使用 Centreon 轻易的搭建企业级分布式 IT 基础运维监控系统。 在功能方面,将它与 Nagios+Cacti 方式对比,说一下优点: 1、 方式添加 HOST, GUI 支持 hosttemplate 与 servicestemplate, 自动建立关联服务, Nagios+Cacti 与 相比配置方便简单。 2、支持 graphtemplate,添加 servcie 时自动添加 graph,不必象 Nagios+Cacti 监控与流量图设置 2 步走。 3、HOST 监控可以全部采用 NRPE 方式,不必象 Nagios+Cacti 生成流量图必须使用 SNMP 4、支持多节点分布式监控,Nagios+Cacti 的分布式监控现在非常的麻烦 5、支持 ACL 权限管理方式,对用户权限限制到菜单项,Nagios+Cacti 一个帐号大家用过时啦 6、详细的日志管理功能,日志搜索过滤都支持 7、功能模块化管理 实验情况: ? ? ? 系统服务器:Cent OS 6.3 远程 Windows:Windows 7 IP: 172.18.4.188 IP:172.18.4.51
远程 linux 主机:Cent OS 6.3 IP:172.18.4.184

nagios+pnp4nagios实现监控huawei交换机流量的详细配置过程

nagios snmp 流量监控之 check_traffic.sh
公司机房新增了一台 H3C S5024P 交换机做为内网之间机器互通用 。其本身支持通过 web 查看, 不过由于对该机没有配置公网 IP , 同时每次都登录查看感觉很不方便。 所认就想到利用 nagios 和 snmp 来进行监控。可以进行 snmp 监控的 nagios 插件很多,不过也可以自己根据 snmp 协议来写 。既然有 现成的,就懒得写了,直接拿来“石头”的 shell 版的来监控交换机流量并配置告警。 这篇文章省略了 nagios 的安装过程,安装交换机流量监控图需要许多插件,主要的软件包括 LAMP+Nagios+Nagios-plugin+snmp+Pnp4nagios+rrdtool,下面的操作是在安装完 nagios,并在 浏览器可以顺利打开后的操作。
一、安装
1、安装 snmp 包并配置 snmpd.conf 文件
yum -y install net-snmp*
直接一条命令就可以完成所需的 snmp 包的安装。接着修改/etc/snmpd.conf 文件为如下内容:
# https://www.360docs.net/doc/c516110924.html, source community com2sec notConfigUser 127.0.0.1(localhostcheck_traffice.sh 也可不改) XXXX (共同体名) # group context sec.model sec.level prefix read write notif access notConfigGroup "" any noauth exact mib2 none none #改 system 为 mib2 view mib2 included https://www.360docs.net/doc/c516110924.html,.dod.internet.mgmt.mib-2 fc #去掉前面的"#"注释符
2、上传插件 上传 check_traffic.sh 至/App/nagios/libexec 目录修改属主并赋予可执行权限。
chown nagios.nagios check_traffic.sh chmod +x check_traffic.sh
3、获取网卡 index 值
[root@web3 ~]# /usr/local/nagios/libexec/check_traffic.sh -V 2c -C public -H 192.168.254.200 -L List Interface for host 192.168.8.254. Interface index 262 orresponding to InLoopBack0 Interface index 390 orresponding to Aux0/0 Interface index 518 orresponding to GigabitEthernet0/1 Interface index 646 orresponding to GigabitEthernet0/2 Interface index 774 orresponding to GigabitEthernet0/3 Interface index 902 orresponding to GigabitEthernet0/4

Linux系统监控之 Nagios配置教程详解(赵舜东)

Linux系统监控之 Nagios配置教程详解(赵舜东) 实验目的:通过实验熟练掌握Nagios这个开源的监控解决方案的实施。 实验环境:Red Hat Enterprise Linux +nagios.3.2.0 实验步骤: 第一部分:Linux系统监控之 Nagios详解(一) 一、Nagios 简介 (一)什么是Nagios,Nagios有哪些特性。 (二)Nagios工作原理 二、部署Nagios监控服务器 (一)下载所需软件包 (二)安装Nagios (三)安装Nagios的插件nagios-plugin (四)配置检测主机是否存活 (五)第一部分功能测试 第二部分:Linux系统监控之 Nagios详解(二) 三、使用Nagios监控Linux 客户端 (一)Nagios监控服务器的配置 (二)Nagios监控客户端的配置 四、使用Nagios 监控Windows 客户端 (一)Nagios 监控服务器的配置 (二)Nagios 监控客户端的配置 第三部分:Linux系统监控之 Nagios详解(三) 五、Nagios 配置文件详解 实验简介:公司进行了机房改造,新系统也上线了,需要一个强大的监控方案,对服务器和各服务的运行情况进行有效的监控,我第一个想到的就是Nagios这个强大的开源解决方案,本文以监控八台服务器和Nagios服务器本身为例。根据先实现、后深入的方式,把本文分为三个部分,开始先实现了功能, 在第三个部分,在对配置的内容进行详细的讲解, 功能实现:实现Web浏览器监控,Mail报警邮件收发,手机短信收发。手机短信怎么收发呢?网上有很多很多的方法,我推荐一种就是使用139信箱,139信箱有一项免费的功能就是发邮件通知到您的手机上,可以在手机上看邮件内容,免费的哦。什么?还没有139信箱,那么别傻呆了,系统运维必备的信箱,快免费申请吧。 本作品为本站原创作品,如需转载请注明来自UnixHot 技术联盟实验答疑:zhaoshundong@https://www.360docs.net/doc/c516110924.html,

怎么设置nagios监控交换机的端口

怎么设置nagios监控交换机的端口 1.cd /usr/local/nagios/etc/objects,增加需要的交换机,示例配置SW.cfg: define host{ use generic-switch host_name xxx alias xxxx address 192.168.x.x check_command check-host-alive initial_state o max_check_attempts 2 check_interval 1 retry_interval 1 check_period 24x7 freshness_threshold 1 event_handler notify-host-by-email flap_detection_options o,d,u contacts xxx notification_interval 0 notification_period 24x7 first_notification_delay 1 notification_options d,u,r,f,s notifications_enabled 1 stalking_options o,d,u register 1 } define service{ use generic-service host_name xxx service_description xxxxx check_command check_snmp_interface!public!ifOperStatus.10113 normal_check_interval 1 retry_check_interval 1 } 2.配置完上面的后,在/usr/local/nagios/etc下编辑nagios.cfg文件,相应的位置增加上面的配置: cfg_file=/usr/local/nagios/etc/objects/SW.cfg 3.如果在/usr/local/nagios/etc/objects/commands.cfg里面没有check_snmp_interface这个命令,可以编辑该文件,增加: vi commands.cfg

nagios技术文档整理(终结版)

接近两个星期的奋战,nagios的安装搭建以及监控服务自动报警功能终于基本得以实现,现在自己整理一份安装技术手册,方便自己以后查阅和回顾。 一、N agios试验环境以基本安装 主机名操作系统IP 作用 Nagios-Server Centos5.4 211.162.127.51 监控机 211.162.127.43 Centos5.4 211.162.127.43 被监控机nagios的功能是监控服务和主机,但是他自身并不包括这部分功能的代码, 所有的监控、检测功能都是有插件来完成的。 再说报警功能,如果监控系统发现问题不能报警那就没有意义了,所以报警也是nagios很重要的功能之一。但是,同样的,nagios自身也没有报警部分的代码,甚至没有插件,而是交给用户或者其他相关开源项目组去完成。 nagios安装,是指基本平台,也就是nagios软件包的安装。它是监控体系的框架,也是所有监控的基础。 打开nagios官方的文档,会发现nagios基本上没有什么依赖包,只要求系统是linux或者其他nagios支持的系统。不过如果你没有安装apache(httpd服务),那么你就没有那么直观的界面来查看监控信息了,所以apache姑且算是一个前提条件。关于apache的安装,网上有很多,照着安装就是了。安装之后要检查一下是否可以正常工作。 nagios定义了4中监控状态,代表不同的严重级别,除了OK代表正常不用关心外,其余3种都要引起重视.如下表: 状态代码颜色 正常OK 绿色, 警告WARNING 黄色, 严重CRITICAL 红色, 未知错误UNKOWN 深黄色 (一)下载所需软件包 1.Nagios-3. 2.0.tar.gz (Nagios主程序软件包) [root@nagios~]#wget https://www.360docs.net/doc/c516110924.html,/sourceforge/nagios/nagios.3.2.0.tar.gz 2.Nagios-plugins-1.4.15.tar.gz (Nagios 插件) [root@nagios~]#wget https://www.360docs.net/doc/c516110924.html,/sourceforge/nagiosplug/nagios-plugins-1.4.15.t ar.gz

相关主题
相关文档
最新文档