Zabbix for network and server monitoring. The requirements and criteria for each section:
Hardware Health Monitoring:
Cisco Switches: 2 Nos. Use SNMP to monitor metrics such as CPU utilization memory usage temperature fan speed and power supply status.
Mellanox Switch: Similar to Cisco switches leverage SNMP to monitor hardware health metrics.
Supermicro and Dell Servers: 5 Nos.Monitor CPU temperature fan speed power supply status and other relevant hardware metrics using SNMP or other appropriate methods2.
Zabbix Templates: Create custom templates for each device type to collect relevant data If required.
Proxmox Health Monitoring:
OS Metrics: Monitor Proxmox OS metricsCPU memory disk network using Zabbix agents or SNMP.
Virtual MachinesVMs:
Linux VMs: Install Zabbix agents on Linux VMs to monitor OS-level metrics.
Windows VMs: Use SNMP or Zabbix agents to monitor Windows VMs.
Cluster Monitoring:
Monitor cluster status resource usage and failover events.
Alerting
Thresholds: Define flexible thresholds for each metric
Alert Channels: Configure email SMS or other notification channels.
Escalation Logic: Set up escalation rules based on severitye.g. critical alerts notify admins immediately.
Maintenance Windows: Suppress alerts during maintenance.
Dependencies: Define dependencies between alerts to reduce noise.
Anomaly Detection: Dynamically adjust baseline values for expected behavior.
Auto-Remediation: Execute remote commands/scripts to resolve issues.
Documentation & Trouble shooting Manual
This is the outline of activities required to be done
...