Senior Site Reliability Engineer – Observability (m/f/d)

iVentureGroup GmbH

Job Description

We are an innovative technology company specializing in the development and marketing of scalable and high-performance community platforms in the online dating sector—an industry that remains stable and continues to grow even in times of crisis.

As a Site Reliability Engineer with a focus on Observability, you will ensure full transparency and deep insights into our IT systems. You can expect exciting challenges, cutting-edge technologies, and a strong team that embraces automation, efficiency, and continuous improvement.


  • Development and Enhancement of the Observability Strategy:
    • Design and operation of a comprehensive observability strategy for our platforms and systems.
    • Ensuring optimal monitoring of performance, availability, and security.
  • Monitoring, Logging, and Tracing Solutions:
    • Implementation and optimization of monitoring, logging, and tracing technologies.
    • Evaluation and integration of new observability tools and best practices.
  • Dashboards, Alerting, and Analytics:
    • Development and maintenance of dashboards for comprehensive system monitoring.
    • Setup of alerting and analytics mechanisms for early error detection and resolution.
  • Collaboration and Integration:
    • Close collaboration with developers, DevOps teams, and other stakeholders.
    • Integration of observability practices into the software development lifecycle.
  • Performance Analysis & Capacity Planning:
    • Conducting performance analyses to optimize system efficiency.
    • Capacity planning to ensure a stable and scalable operation.
  • Incident Response & Root Cause Analysis:
    • Supporting automation of operational processes and improving SRE principles.
    • Conducting post-mortems and root cause analyses after incidents to drive continuous improvement.

  • At least 5 years of professional experience in Site Reliability Engineering, DevOps, or Software Development with a focus on Automation & Configuration Management.
  • Language skills: Very good proficiency in German (fluent in spoken and written form) and/or English (fluent in spoken and written form).
  • Strong expertise in observability tools such as Prometheus, Grafana.
  • Experience with logging technologies like ELK Stack or Splunk.
  • Knowledge of cloud technologies (AWS, Azure) and container orchestration (Kubernetes, Docker).
  • Good programming and scripting skills (e.g., Python).
  • Experience with Infrastructure-as-Code (Terraform, Ansible) is a plus.
  • Strong analytical skills and a solution-oriented mindset.
  • Team player with excellent communication skills and the ability to explain complex technical concepts in a clear and understandable manner.

  • Strong team spirit
  • Permanent employment contract
  • Flexible working hours and mobile working
  • Individual training and development opportunities
  • Subsidy for an Urban Sports Club membership (M or L)
  • Three weekly fitness sessions with a professional trainer
  • Regular team events to foster a strong team spirit
  • Use our JobRad leasing model and ride your dream bike for both work and personal use
  • Option to work remotely from abroad for a limited time in coordination with your team
  • Subsidy for the 58-euro ticket ("Deutschlandticket")
View More