Site Reliability Engineering Team Lead - Real-time Telemetry Tooling

Full-Time
South Jordan, UT
Willis Towers Watson
Posted 3 years ago – Accepting applications

Job Description

Our engineering team has built the largest private Medicare marketplace in the country. We passionately focus on the continuous improvement of the systems we build and the culture we promote. We build a platform that provides the best possible support to our customers who are shopping for insurance, and where our insurance carriers can be confident that their products are accurately and impartially represented.
We are looking for an Engineering Team Lead for our Insights Engineering team. This position is responsible for leading the design, deployment, scaling, and maintenance of platforms used by two dozen engineering teams to instrument real-time telemetry monitoring and alerting. This team is embedded within our Platform Engineering Group.

We operate in a complex, multi-tenant, hybrid cloud and on-premises infrastructure that spans both the Windows and Linux OS. We strive for security, reliability, and automation in line with DevOps and Site Reliability Engineering principles. If you are passionate about learning and improvement through metrics and automation, and passionate about engendering that mindset in others, we want to hear from you.

The Role

Communication

Keep leadership well informed of your team's direction and focus
Ensure that your entire team is well informed of changes or status
Explore new ways of improving communication among your team and with other teams
Promote inclusion and collaboration between various functional disciplines
Conduct 1-on-1 meetings with all team members
Write and maintain architectural, stakeholder, and policy documentation

Innovation

Encourage and inspire others to innovate
Look for new ways to improve our processes
Look for new ways to improve the quality of our infrastructure
Look for new ways to increase the velocity with which your team delivers, leveraging expertise from various functional disciplines
Look for new ways to remediate production incidents more quickly and safely
Encourage participation in department Communities of Practice

Productivity

Hold everyone accountable for being on time and staying productive
Adhere to and advocate for best practices including Infrastructure as Code, monitoring, high availability, disaster recovery, security, and DevOps methodologies

Initiative

Know what needs to be worked on and keep the team focused on the goal
Provide timely assistance and remediation solutions during critical situations and production incidents
Take ultimate responsibility for the success or failure of delivering on time and with the highest quality possible

Group Culture

Organizational leadership and influence without relying on hierarchy
Guide the culture and attitude of the team toward an optimistic, proactive, and encouraging direction
Foster an environment where it is safe to fail and to learn from failure

The Requirements

Hands-on Engineering
- 10+ years of hands-on technical experience with many of the following technologies
- Windows and Linux Servers
- Infrastructure Monitoring tools like Zabbix, Sensu, Nagios, SolarWinds, etc.
- Application Performance Monitoring like New Relic, DataDog, etc.
- Log Aggregation tools like SumoLogic, Splunk, ELK, etc.
- Cloud platforms, preferably with Azure
- Secrets management with Consul and Vault or similar systems
- Configuration management tools like Salt and Terraform
- Continuous Integration and Continuous Delivery with tools like TeamCity, Octopus Deploy, Concourse, or Azure DevOps
- Firewalls and load balancers such as F5
- Web servers including IIS, NGINX, and Tomcat
- Proficiency, high-comfort, and familiarity with
- One or more programming languages, such as Python or Go
- One or more scripting languages, such as Powershell and BASH
- Command line interfaces
- Networking infrastructure
- Git
People Management
- 3+ years of experience with and responsibility for the following HR concerns regarding your team members
- Recognize and coach problematic behavior and discussing corrective actions
- Reward those colleagues who go above and beyond their job duties
- Become familiar with the career aspirations of each team member, and assist in setting short and long-term goals to support them in those pursuits
- Resolve conflicts between individuals, and know when to get leadership or HR involved
- Manage paid-time-off and work-from-home requests
- Interview, hire, and onboard high-quality job applicants
Bachelor's degree or equivalent experience strongly preferred; HS diploma required

EOE, including disability/vets

Apply to this Job

Willis Towers Watson

Know someone who would be perfect for this role?

Apply to this Job