Senior DevOps / SRE Engineer
Software Engineering
Prague, Czechia
We are looking for a Senior Site Reliability Engineer to join our Site Reliability Engineering (SRE) team. In this role, you'll drive the reliability, scalability, and performance of our platform, ensuring our systems remain stable as we grow. We value innovation and are seeking someone eager to bring fresh ideas – especially around building automation that reduces manual effort and improving distributed systems resilience.
This isn't a top-down organization; our engineers are the ones who flag technical challenges and design the solutions. You will collaborate closely with Platform Engineering, Security, AI Platform, and Product teams to design durable systems and make data-driven operational decisions.
What You'll Do
Collaborate with Engineering, Platform, and Security teams to embed SRE best practices early in system design.
Lead advancements in observability, monitoring, alerting, and incident-response workflows.
Analyze platform performance to contribute to cost-optimization, performance tuning, and resilience planning.
Build infrastructure and automation tooling that improves platform reliability and enhances deployment safety.
Diagnose and resolve complex production issues across distributed systems, and drive open post-incident reviews so failures translate into durable improvements.
Strengthen system consistency and author clear, concise documentation for runbooks and operational processes.
Who You Are
4+ years of experience in SRE, DevOps, platform engineering, or similar production-facing roles.
Strong problem-solving and debugging skills in distributed systems to maintain higher platform stability.
Eager to share operational guidelines, champion SRE practices across teams, and openly discuss what we can learn from system failures.
Excellent communication skills (English is our default language) with a genuine, collaborative approach to working across diverse engineering teams.
Strong hands-on experience with cloud environments (AWS, GCP, or similar) and proficiency with infrastructure-as-code and CI/CD pipelines.
Familiarity with Kubernetes (or container orchestration), event-driven architectures, or supporting ML/AI workloads and GPU infrastructure.
What Success Looks Like:
Within 3 Months:
Fully onboarded into the Rossum ecosystem, gaining a deep understanding of our infrastructure, observability stack, and SRE processes while building relationships across the team.
Gaining a deep understanding of our synergy with Coupa and our shared roadmap.
Initial Impact Goal: Improve a small reliability issue or add value to an existing automation or monitoring area.
Within 6 Months:
Independently managing key responsibilities, owning recurring reliability tasks, and identifying areas for strategic improvement.
Actively participating in the alignment of processes within the new Coupa organizational structure.
Operational KPI: Implement measurable enhancements to alert quality, CI/CD reliability, or service health metrics.
Within 12 Months:
Recognized as a subject matter expert within the team, navigating the global Coupa ecosystem.
Successfully contributing to Rossum's mission at a massive scale using new global resources.
Long-Term Strategic Goal: Lead a major reliability or infrastructure initiative, providing technical recommendations to guide our long-term reliability strategy.
Why Join Us?
At Rossum, we're on a mission to free the world from boring manual data entry. Our AI platform helps companies save millions of hours, allowing professionals to focus on creative, impactful work.
In an exciting move for our future, we have joined forces with Coupa, the world's leading unified platform for Business Spend Management. By combining Rossum's cutting-edge document AI with Coupa's global ecosystem, we are uniquely positioned to redefine how businesses operate at a massive scale. You can read more about this exciting milestone and our shared vision in the official announcement here.
What sets us apart?
Cutting-edge AI technology reshaping how businesses operate globally.
A collaborative, supportive environment where autonomy thrives.
Opportunities to grow in a fast-scaling company.
A culture that values diversity, empathy, and genuine connection.
As part of the Coupa family, you'll enjoy the agility of a fast-moving, innovation-focused team with the stability and reach of a global market leader. For you, this means an even greater opportunity to make an impact, access new global markets, and grow your career within a collaborative culture that values autonomy, diversity, and genuine connection. Together, we're not just automating data—we're giving time back to the world's professionals.
What we offer
Future with Coupa: We are currently in an integration phase, during which we are reviewing and aligning our total rewards programs. Our goal is to blend Rossum's local culture with Coupa's global standards to provide you with a long-term future featuring clear career pathways, tailored learning journeys, and world-class development opportunities.
Current Benefits:
Flexible working models with a base in vibrant Prague and options for hybrid setup.
Competitive benefits designed to support your well-being, growth, and work-life harmony.
5 weeks of vacation, 5 sick/personal days, and extra 2 weeks of paternity leave.
Personal development, education, and language courses budget.
High-end tech (MacBook, external monitor, keyboard of your choice) and a MultiSport card.
Team offsites, regular meetups, and a friendly, ambitious team.
Ready to make an impact in your next role? Apply now!