Staff Infrastructure Engineer - Augmented Developer Experience
Shopify · зарплата не указана · Global · сайт компании · опубликовано 28 мая 2026 г.
Описание вакансии
Step onto the critical path of every engineer shipping code at Shopify. Imagine owning the AI system that reviews every pull request in the company, flags security vulnerabilities, generates regression tests, and recently scanned one of the largest monorepos in the industry for bugs within a few hours, for roughly the cost of a team lunch. We're looking for a Staff Infrastructure Engineer to architect, operate, and evolve the asynchronous, AI-powered workflows that millions of developer interactions depend on. This is a software engineering role with deep ML-system exposure — not a modeling role. If you've built infrastructure where ML or LLM outputs are first-class inputs, where "quality" is a percentage rather than a pass/fail, and where shipping a 50% improvement is real progress, this is the seat. Join a remote-first, AI-native team,, ship continuously, and set the technical bar as we deepen our ML-systems muscle.
RESPONSIBILITIES
- Architect, optimize, and own the asynchronous AI-powered workflows that sit on the critical path of every Shopify pull request — designed for high throughput, low latency, and continuous reliability at very large scale.
- Lead the design and rollout of new surfaces — anti-pattern detection, security vulnerability scanning, automated test generation, monorepo-wide bug scanning — and bring them from prototype to production.
- Drive reliability, latency, and cost for systems where output quality is probabilistic and improvement is measured in percentage points: bug catch rate, comment acceptance, sentiment trend.
- Run A/B tests in production as routine practice, back-test against historical incident data, and iterate based on precision, recall, and developer sentiment.
- Partner with the CI/CD and source-control teams as we integrate with Shopify's next-generation developer platform.
- Set technical direction across ambiguous systems work and make clear trade-offs between customer impact, maintainability, and extensibility — mentoring engineers across all levels as the team grows.
- Participate in the team's on-call rotation, supporting production reliability for systems that fire on every PR.
QUALIFICATIONS
- Proven, hands-on expertise building and operating asynchronous workflow systems in production at scale — in addition to request/response services.
- Track record shipping software on customer-critical paths with rigorous deploy, observability, and rollback practices.
- Comfort with non-deterministic systems: you treat "50% improvement" as real progress, percentage-based rollouts as the norm, and don't reach for a fix when the answer isn't 100%.
- Demonstrated experience designing systems where ML predictions or LLM outputs are a first-class input, including thinking about evaluation, drift, and quality.
- Strong software engineering skills in any modern language — our stack is Ruby, Go, and Python, none gating.
- Staff-scope leadership: setting direction on ambiguous systems work and framing clear trade-offs across customer impact, maintainability, and extensibility.
- Fluency with AI coding tools (Claude Code, Cursor, etc.) as part of your daily workflow.
NICE TO HAVES
- Direct exposure to LLM-based systems — prompting, evaluation harnesses, cost/latency tradeoffs.
- Background in search, recommendations, or other ML-driven product infrastructure.
- Experience working alongside MLEs or applied scientists in a production setting
At Shopify, we pride ourselves on moving quickly—not just in shipping, but in our hiring process as well. If you're ready to apply, please be prepared to interview with us within the week. Our goal is to complete the entire interview loop within 30 days. You will be expected to complete a live pair programming session, come prepared with your own IDE.