Job Description
We are an innovative technology company shaping the IT landscape of the future with cutting-edge cloud and automation technologies. Our focus is on scalable, highly available, and secure IT infrastructures. Among other things, we specialize in the development and marketing of scalable and high-performance community platforms in the online dating sector—an industry that remains stable and continues to grow even in times of crisis.
Join our team and leverage your expertise in Site Reliability Engineering (SRE) and IT transformation to further develop our platforms and deliver innovative solutions for our customers and internal teams. You can expect exciting challenges, the latest technologies, and a strong team that values collaboration, automation, and continuous improvement.
- Automation of Operational Processes
- Development and Implementation of Infrastructure-as-Code (IaC) Solutions (e.g., Terraform, Ansible).
- Automation of Recurring Tasks such as deployments, scaling, and troubleshooting.
- Building and Maintaining CI/CD Pipelines for fast and secure software delivery.
- Transformation of Traditional Linux Systems into Kubernetes
- Migration of Traditional Linux Servers to Kubernetes environments.
- Supporting Software Development Teams in utilizing Kubernetes for their workloads.
- Automation of Network Components in the on-premise data center.
- Secure Secret Management
- Development and Implementation of a Secret Management System (e.g., HashiCorp Vault).
- Integration of Critical IT Infrastructure Components.
- Providing Simple but Secure Access to credentials for applications and users.
- Error Management & Self-Healing Systems
- Implementation of Automated Recovery Mechanisms to minimize system downtime.
- Development of Proactive Monitoring Solutions for early problem detection.
- Utilization of Event-Driven Automation (e.g., Lambda functions, webhooks) for autonomous anomaly response.
- Infrastructure Optimization
- Automated Scaling of Cloud and On-Premise Systems based on load predictions.
- Ensuring Compliance with Security Policies through Compliance-as-Code.
- Resource Optimization via Automated Capacity Planning.
- Incident Response & Troubleshooting
- Development of Runbooks and Playbooks for automated incident management.
- Utilization of AIOps for log and metric analysis to enable early fault detection.
- Collaboration with Developers and Operations Teams to accelerate issue resolution through automation.
- At least 5 years of professional experience in Site Reliability Engineering, DevOps, or Software Development with a focus on Automation & Configuration Management.
- Language skills: Very good proficiency in German (fluent in spoken and written form) and/or English (fluent in spoken and written form).
- This role requires on-site presence in our Hamburg office two days a week for team meetings. We support flexible work models while ensuring effective collaboration within the team
- Strong expertise in Infrastructure as Code (IaC) and Configuration Management (e.g., Terraform, Ansible).
- Proficiency in CI/CD pipelines and automation tools (e.g., Jenkins, GitHub Actions, GitLab CI, ArgoCD).
- Experience with monitoring and logging solutions (e.g., Prometheus, Grafana).
- Knowledge of container technologies & orchestration (e.g., Kubernetes, Docker).
- Experience with cloud platforms (AWS, Azure, Google Cloud).
- Good programming and scripting skills (Python, Bash).
- Problem-solving mindset and strong teamwork skills.
- Strong team spirit
- Permanent employment contract
- Flexible working hours and mobile working
- Individual training and development opportunities
- Subsidy for an Urban Sports Club membership (M or L)
- Three weekly fitness sessions with a professional trainer
- Regular team events to foster a strong team spirit
- Use our JobRad leasing model and ride your dream bike for both work and personal use
- Option to work remotely from abroad for a limited time in coordination with your team
- Subsidy for the 58-euro ticket ("Deutschlandticket")