Cassandra Expert
Posted on January 9, 2026
Job Description
Job Description
Overview
Experience required45 years
Location: Remote
BGV: Yes
Key Responsibilities
- Strong, handson experience managing productiongrade Apache Cassandra clusters
- Experience defining and enforcing Cassandra best practices, governance, and operational standards
- Ability to create detailed runbooks and SOPs for:
- Node addition and removal
- Cluster rebalancing
- Repair operations
- Version upgrades (experience with Cassandra 4.x required; upgrade planning to 5.x expected)
- Proven experience in Cassandra performance tuning, including: JVM tuning, Cache configuration, Thread pool and timeout tuning
- Strong understanding and handson ability to identify and resolve:
- Hot partitions
- Read/write amplification issues
- High latency and failure scenarios
- Deep understanding of Cassandra architecture and internals
- Experience reviewing and optimizing:
- Cluster topology (replication strategy, consistency level, etc.)
- Disk, memory, and storage layouts
- Ability to define and maintain capacity planning guidelines based on data growth and workload patterns
- Handson experience setting up monitoring and alerting for Cassandra clusters
- Ability to monitor and alert on critical metrics such as:
- Read/write latency
- Read/write failures
- Repair health
- Disk usage and storage trends
- Experience defining and executing backup and restore strategies:
- Snapshots vs incremental backups
- Backup validation and restore drills
- Ability to plan and execute DR simulations and ensure operational readiness
- Strong Linux fundamentals and troubleshooting skills
- Automation and scripting skills (Shell/Python)
- Experience operating Cassandra in cloud environments (AWS/GCP/Azure)
Preferred Skills
- Handson experience with Cassandra 5.x or largescale version upgrades
- Experience with infrastructure as code (Terraform, Ansible, etc.)
- Exposure to SRE practices (SLIs, SLOs, error budgets)
- Experience integrating Cassandra monitoring with tools like:
- Prometheus & Grafana
- Datadog, New Relic, or similar observability platforms
- Experience optimizing cost efficiency for largescale database operations
- Exposure to Kubernetesbased Cassandra deployments
Qualifications
- Prior experience on projects involving Cassandra.
Other Details
- Standardize Cassandra Best Practices & Governance
- Create runbooks and SOPs for:
- Node Addition/removal
- Rebalancing
- Repair operations
- Version upgrade (we want to move from 4.1.3 to 5xx)
- Establish capacity planning guidelines
- Setup monitoring, alerting and observability for Cassandra DB:
- Monitoring for latency, read/write failures, repair health, disk usage
- Current State Analysis and Tuning
- Deepdive review of existing setup
- Cluster topology (replication etc)
- Disk, memory, and JVM tuning
- Tune (JVM, Cache, Thread pool and timeout)
- Identify and fix:
- Hot partitions
- Read/write amplification issues etc
- Operation reliability and DR
- Define backup and restore strategy:
- Snapshots vs incremental backups
- Restore drills.
- Cost Efficiency in terms of Operating DB.
Required Skills
governance
apache cassandra clusters
Clarification Board
Your Clarifications
"Send your Job Related Query - you'll get a reply soon."