We are currently seeking a talented Devops Engineer who is passionate about operating large scale distributed systems and solving enterprise scale problems. Come join our Analytics Data team to build the future data platform. As a Principal Engineer you will play a critical role in ensuring the reliability and performance of our platform. We are looking for someone who is passionate about building and maintaining scalable, highly available systems and has a strong background in infrastructure and automation. If you thrive in a fast-paced, collaborative environment and have a track record of successfully operating distributed systems at scale we want to hear from you!
Responsibilities:
As a Principal Devops engineer you will work to improve the reliability and performance of Zscaler's data platform
You will partner with our product engineering teams to design, build, operate, and automate distributed applications that are critical to the function of Zscaler's business
Engage in and improve the whole lifecycle of services—from inception and design, deployment, operation, and refinement
Scale systems sustainably by automation; Drive changes that improve reliability and velocity
Establish and practice low noise incident response rotations and blameless postmortems to prevent problem recurrence.
Develop documentation and capacity plans, and debug the hardest problems on large distributed systems.
Collaborate with software engineers to establish, maintain, and optimize functional and performance SLAs.
Operate and Maintain modern data lake technologies like Hadoop & Spark
Participate in on-call rotation.
Preferred Qualifications:
Proficient in at least one modern programming language
Systematic problem-solving methods, effective communication skills.
Experience with containers and container orchestration systems such as Kubernetes
Good technical understanding of Distributed systems architecture and data processing framework
Hands-on experience managing big data processing and distributed systems like Hadoop, Spark, Kafka etc
10+ years of experience designing, building and supporting large scale systems in production
Ability to drive decisions and deliver results across multiple teams and functions
Experience in deploying, managing, and operating scalable and fault tolerant Linux infrastructure.
Hands-on operational experience performance measurement and benchmarking.
Hands on experience with one of more public cloud providers (AWS, Azure, or GCP)
Ability to prioritize tasks and work independently.
#LI-AZ2