About ZPE Systems, Inc.
ZPE Systems solves the networking problems of large enterprises, including 6 of the top 10 global tech giants, to meet increasing demands for infrastructure availability, security, and scalability.
ZPE Systems develops and manufactures secure remote in-band and out-of-band management solutions for enterprises to access, control and manage, and automate critical IT infrastructure from data center to the edge.
Companies that maintain or operate many data centers, colos, campuses, and branch locations, such as those in healthcare, supply chain, education, government and finance, trust ZPE’s Intel-based serial consoles, services routers, and cloud management software to eliminate human error, close security gaps, and resolve interoperability issues.
ZPE Systems was founded in the Silicon Valley in 2013 with sales and support offices worldwide, with continuous expansion through a growing network of trusted partners and service providers.
About the Position
ZPE Systems is looking for a site reliability engineer – devops to join our team in Fremont – California.
You must be proactive, a team player, and passionate about technology. You will participate in different
projects made up of multicultural teams distributed throughout the world. You’ll work directly with
developers to help with day to day builds, automation, and management of their infrastructures. If you’re
looking for an opportunity to work and grow, this mightbe the right place for you!
- Build and maintain application platforms that are reliable, scalable, and performant
- Support application development teams in the design and development of new applications, ensuring that the designs are reliable, efficient, and optimized to meet the performance needs of the business
- Facilitate capacity planning
- Build and maintain application development systems and processes to facilitate effective change management
- Automate and standardize repeatable tasks
- Develop and execute monitoring strategies to analyze performance trends and ensure rapid issue response
- Respond to performance and availability issues for application platforms, and resolve issues in response to reported incidents
- Investigate and analyze root cause defects – postmortem
- Provide on-call coverage for supported applications to ensure performance and availability within service levels
- 2+ years of experience developing and operating distributed systems
- 2+ years of Linux server administration
- Knowledge of networking principles and how they relate to the architecture and performance of distributed systems
- Fluent in at least one programming language (Python, and Golang preferred)
- Experience building and maintaining a Container Infrastructure (Docker, Rancher, Kubernetes, etc.)
- Experience working with tools like Terraform and Ansible
- Experience administering, monitoring, and performance tuning web application platform technologies
- Follows best practice
- Experienced and comfortable working in Git
- Knowledge of Scrum & Agile methodologies
- Strong troubleshooting abilities
- Strong customer service mindset
- Attention to detail
- Self motivated and diligent
- Eligible to work in the United States
- Fluent English, written and spoken; excellent communication skills
- Strong Terraform and Ansible experience
- Strong experience with Grafana, Prometheus, and Loki
- Strong Experience with Kubernetes
- Experience with AWS, GCP, and Linode
- Experience building and maintaining CI / CD pipelines
- Experience with Cassandra, PostgreSQL, Mongo, Django, Golang
ZPE Systems is an Equal Opportunity Employer and embraces diversity in our employee population. It is the policy of ZPE Systems to provide equal opportunity to all qualified applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status or special disabled veteran, marital status, pregnancy, genetic information, or any other legally protected status.
Are you a Good Fit?
Send your resume to firstname.lastname@example.org with the subject “ Site Reliability Engineer – DevOps”