Jobbeschreibung
We are an innovative technology company specializing in the development and marketing of scalable and high-performance community platforms in the online dating sector—an industry that remains stable and continues to grow even in times of crisis.
As a Site Reliability Engineer with a focus on Observability, you will ensure full transparency and deep insights into our IT systems. You can expect exciting challenges, cutting-edge technologies, and a strong team that embraces automation, efficiency, and continuous improvement.
- Development and Enhancement of the Observability Strategy:
- Design and operation of a comprehensive observability strategy for our platforms and systems.
- Ensuring optimal monitoring of performance, availability, and security.
- Monitoring, Logging, and Tracing Solutions:
- Implementation and optimization of monitoring, logging, and tracing technologies.
- Evaluation and integration of new observability tools and best practices.
- Dashboards, Alerting, and Analytics:
- Development and maintenance of dashboards for comprehensive system monitoring.
- Setup of alerting and analytics mechanisms for early error detection and resolution.
- Collaboration and Integration:
- Close collaboration with developers, DevOps teams, and other stakeholders.
- Integration of observability practices into the software development lifecycle.
- Performance Analysis & Capacity Planning:
- Conducting performance analyses to optimize system efficiency.
- Capacity planning to ensure a stable and scalable operation.
- Incident Response & Root Cause Analysis:
- Supporting automation of operational processes and improving SRE principles.
- Conducting post-mortems and root cause analyses after incidents to drive continuous improvement.
- At least 5 years of professional experience in Site Reliability Engineering, DevOps, or Software Development with a focus on Automation & Configuration Management.
- Language skills: Very good proficiency in German (fluent in spoken and written form) and/or English (fluent in spoken and written form).
- Strong expertise in observability tools such as Prometheus, Grafana.
- Experience with logging technologies like ELK Stack or Splunk.
- Knowledge of cloud technologies (AWS, Azure) and container orchestration (Kubernetes, Docker).
- Good programming and scripting skills (e.g., Python).
- Experience with Infrastructure-as-Code (Terraform, Ansible) is a plus.
- Strong analytical skills and a solution-oriented mindset.
- Team player with excellent communication skills and the ability to explain complex technical concepts in a clear and understandable manner.
- Strong team spirit
- Permanent employment contract
- Flexible working hours and mobile working
- Individual training and development opportunities
- Subsidy for an Urban Sports Club membership (M or L)
- Three weekly fitness sessions with a professional trainer
- Regular team events to foster a strong team spirit
- Use our JobRad leasing model and ride your dream bike for both work and personal use
- Option to work remotely from abroad for a limited time in coordination with your team
- Subsidy for the 58-euro ticket ("Deutschlandticket")