Site Reliability Engineer – DevOps

Home » Company » Careers » Site Reliability Engineer – DevOps
ZPE Systems is looking for a site reliability engineer – devops to join our team in Fremont – California.
You must be proactive, a team player, and passionate about technology. You will participate in different
projects made up of multicultural teams distributed throughout the world. You’ll work directly with
developers to help with day to day builds, automation, and management of their infrastructures. If you’re
looking for an opportunity to work and grow, this mightbe the right place for you!

Position Information

Type: Full-time
USA and International Locations


  • Build and maintain application platforms that are reliable, scalable, and performant
  • Support application development teams in the design and development of new applications, ensuring that the designs are reliable, efficient, and optimized to meet the performance needs of the business
  • Facilitate capacity planning
  • Build and maintain application development systems and processes to facilitate effective change management
  • Automate and standardize repeatable tasks
  • Develop and execute monitoring strategies to analyze performance trends and ensure rapid issue response
  • Respond to performance and availability issues for application platforms, and resolve issues in response to reported incidents
  • Investigate and analyze root cause defects – postmortem
  • Provide on-call coverage for supported applications to ensure performance and availability within service levels

Minimum Qualifications:

  • 2+ years of experience developing and operating distributed systems
  • 2+ years of Linux server administration
  • Knowledge of networking principles and how they relate to the architecture and performance of distributed systems
  • Fluent in at least one programming language (Python, and Golang preferred)
  • Experience building and maintaining a Container Infrastructure (Docker, Rancher, Kubernetes, etc.)
  • Experience working with tools like Terraform and Ansible
  • Experience administering, monitoring, and performance tuning web application platform technologies
  • Follows best practice
  • Experienced and comfortable working in Git
  • Knowledge of Scrum & Agile methodologies
  • Strong troubleshooting abilities
  • Strong customer service mindset
  • Attention to detail
  • Self motivated and diligent
  • Eligible to work in the United States
  • Fluent English, written and spoken; excellent communication skills

Preferred Qualifications:

  • Strong Terraform and Ansible experience
  • Strong experience with Grafana, Prometheus, and Loki
  • Strong Experience with Kubernetes
  • Experience with AWS, GCP, and Linode
  • Experience building and maintaining CI / CD pipelines
  • Experience with Cassandra, PostgreSQL, Mongo, Django, Golang

Additional Eligibility Requirements

Other Duties:

Please note this job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee for this job. Duties, responsibilities, and activities may change at any time with or without notice.

Are you a Good Fit?

Send your resume to with the subject “ Site Reliability Engineer – DevOps”