About EY GDS :
EY Global Delivery Services means 29.500 specialists providing globally IT, HR, finance, project management and strategic business services to EY member firms.
In addition we deliver support and solutions to clients from all over the world.
Our Team’s Culture : EY’s mission is to build a better working world and we can’t do it without the right people : exceptional people know for their thought leadership with an entrepreneurial spirit who want to work with the best, constantly learn and create positive change.
In the Client Technology Platforms (CTP) Team within the Client Technology (CT) organization, we use innovative and superior technology capabilities while maximizing efficiency to allow further invest in ongoing growth.
Our evolving approach adopts the innovative and cultural approaches of technology leaders :
To build high performing and independent global teams, we value integrity, respect, collaboration and excellence
We foster continuous learning through purposeful and broad sharing of knowledge
We evolve and challenge teams by communicating in a respectful, direct and open manner
In pursuit of speed and flexibility, we focus on outcomes rather than rules
To foster creativity, we want work to be fun, exciting and rewarding
We automate the repeatable to ensure speed and quality
We exercise good judgement in balancing strategic goals with tactical needs
EY GDS has a positive, diverse, and supportive culture we look for people who are curious, inventive, and work to be a little better every single day.
In our work together we aim to be smart, humble, hardworking and, above all, collaborative.
The opportunity :
Site Reliability Engineers (SRE) at EY GDS fill the mission-critical role of ensuring that our complex systems are healthy, monitored, automated, and designed to scale.
You will use your background as an engineering generalist to work closely with our development teams from the early stages of design all the way through identifying and resolving production issues.
The ideal candidate will be passionate about an operations role that involves deep knowledge of both the application and the product, and will also believe that automation is a key component to operating large-scale systems.
Our STE team solve incredibly difficult problems using the best tools available for the job, and are rapidly extending the use of new technologies.
They spend just as much of their time working on systems as they do writing code. You’ll be tasked with all manner of work from building operational tooling, automating operational workflows, performing architecture and design reviews, investigating system failures and complex outages, improving our monitoring infrastructure, defining service level objectives and agreements for EY products and flows, and much more.
Essential Functions of the Job :
Gain deep knowledge of our complex applications.
Serve as a primary point responsible for the overall health, performance, and capacity of one or more of our technology products.
Strong experience with Azure, or AWS (design, SDKs, best practices).
Familiar with design principles of monitoring and alerting systems.
Designing, implementing, and maintaining robust monitoring and alerting to improve performance and reliability.
As part of this role, you are the responsible of the SLA / SLO.
Experience implementing industry standard security best practices.
Experience with automation, configuration management, and developing infrastructure as code.
Use engineering best practices deliver high-quality production code, utilize automated testing, and build reusable components
Develop tools to improve our ability to rapidly deploy and effectively monitor custom applications in a large-scale Windows and Linux environment.
Work closely with development teams to ensure that platforms are designed with "operability" in mind.
Function well in a fast-paced, rapidly-changing environment.
Participating in the operations on-call rotation, triaging and addressing production issues
Skills and Experience Requirements :
B.S. or higher in Computer Science or other technical discipline, or related practical experience.
Programming skills (.NET and PowerShell Python, Ruby, Java / Scala or C).
5+ years experience in a Microsoft and Linux large-scale operations role.
Experience in designing, analyzing, and troubleshooting large-scale distributed systems.
Debug production issues across services and levels of the stack.
Experience with one or more orchestration, deployment tools Azure Resource Manager (ARM), Terraform, Ansible.
Familiarity with Git or other source control systems.
Experience with TFS or Visual Studio Team Services (VSTS).
Experience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelines.
Experience in working with Public Clouds (Microsoft Azure is a plus).
PowerShell or Python experience, specifically for systems automation.
Working knowledge of the TCP / IP stack, internet routing and load balancing.
Experience with monitoring alerting using technologies like Prometheus, Sensu, Nagios, Kafka, Wavefront, BigPanda, DataDog, PagerDuty.
Experience implementing, designing, deploying Docker, Kubernetes, Serverless (Function or Lambda’s).
Previous experience working with geographically-distributed coworkers.
Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.
Creative thinker and strong problem solver with meticulous attention to detail
Highly organized, creative, motivated, and passionate about achieving results
What working at EY GDS offers :
This role offers you the unique opportunity to work for some of the leading European companies across all sectors as part of our international and multidisciplinary EY teams.
If you can confidently demonstrate that you meet the criteria above, please contact us as soon as possible. Apply now to make your mark.