logo

View all jobs

Software Engineer II – AI Infrastructure

Annapolis Junction, MD

Software Engineer II – AI Infrastructure

Location: Annapolis Junction, MD
Work Schedule: Full-Time, Onsite
Clearance Required: Active TS/SCI with Full Scope Polygraph (FSP)
Salary Range: $193,000 - $306,000

Overview

Join us in building the next generation of AI infrastructure that will power innovation across critical mission environments.

We are seeking an experienced Software Engineer II to support an advanced AI Infrastructure Team responsible for developing and maintaining the platform that serves as the foundation for enterprise AI capabilities. This role focuses on AI inference services while supporting a broader ecosystem of AI-enabled applications, including Retrieval-Augmented Generation (RAG), autonomous agents, and emerging machine learning technologies.

The ideal candidate is a highly skilled engineer who can independently design, build, deploy, and operate scalable infrastructure solutions while helping shape the future of AI adoption across mission-critical environments.

Key Responsibilities

AI Infrastructure & Platform Engineering

  • Design, implement, and optimize infrastructure supporting AI model inference at scale.
  • Develop, deploy, and maintain production AI services and applications.
  • Support emerging AI technologies, including:
    • Retrieval-Augmented Generation (RAG)
    • Agentic AI Systems
    • Large Language Model (LLM) Platforms
    • AI Inference Services
  • Build highly available, reliable, and scalable AI platform components.
  • Navigate ambiguous requirements and define practical, scalable technical solutions.

Cloud & Systems Engineering

  • Design and manage cloud-native infrastructure within AWS environments.
  • Automate infrastructure provisioning and configuration using Infrastructure-as-Code (IaC) principles.
  • Support Kubernetes deployments and administration across production environments.
  • Integrate systems across diverse platforms and technologies.
  • Optimize high-volume web applications and distributed systems for performance and reliability.

Observability & Operations

  • Implement monitoring, logging, and observability solutions across AI services and infrastructure.
  • Develop operational dashboards and alerting capabilities using:
    • Grafana
    • Prometheus
    • OpenTelemetry
    • Application Performance Monitoring (APM) tools
  • Support incident response, troubleshooting, and root cause analysis efforts.

DevOps & Automation

  • Develop and maintain CI/CD pipelines.
  • Improve deployment automation and operational efficiency.
  • Promote DevOps best practices across engineering teams.
  • Drive adoption of modern engineering tools and methodologies.

Security & Collaboration

  • Contribute to secure AI system design and implementation.
  • Support compliance with organizational security requirements.
  • Provide technical guidance and informal mentorship to junior engineers.
  • Collaborate with software engineers, data scientists, platform engineers, and mission stakeholders.

Required Qualifications

Education & Experience

  • Bachelor's degree in Computer Science, Software Engineering, Computer Engineering, Information Systems, or a related technical discipline.

Substitution:

  • Four (4) additional years of directly related experience may be substituted for a bachelor's degree.

Experience

  • Eight (8) or more years of software engineering experience.
  • Proven experience building and supporting production systems at scale.
  • Experience designing and supporting high-volume web applications.
  • Experience integrating complex systems across multiple technologies and platforms.
  • Experience supporting cloud-native infrastructure in AWS.
  • Experience administering and deploying applications within Kubernetes environments.

Technical Skills

  • Strong Python development skills.
  • AWS Cloud Engineering
  • Kubernetes
  • Infrastructure as Code (IaC)
  • CI/CD Pipelines
  • DevOps Methodologies
  • Monitoring and Observability Platforms
  • Distributed Systems Architecture
  • Performance Optimization
  • Systems Integration

Observability Technologies

Experience with one or more of the following:

  • OpenTelemetry
  • Grafana
  • Prometheus
  • Application Performance Monitoring (APM) Solutions

Professional Skills

  • Strong problem-solving and analytical abilities.
  • Ability to thrive in ambiguous and rapidly evolving environments.
  • Strong organizational influence and change management skills.
  • Excellent written and verbal communication skills.
  • Ability to work independently and collaboratively within highly technical teams.

Desired Qualifications

Candidates with one or more of the following qualifications are highly desired:

  • Experience with AI inference serving technologies such as:
    • vLLM
    • LiteLLM
    • Similar inference platforms
  • Experience with agentic AI frameworks such as:
    • LangChain
    • LangGraph
    • Similar orchestration frameworks
  • Experience with:
    • Vector databases
    • Embedding systems
    • Semantic search technologies
  • Knowledge of:
    • High-Performance Computing (HPC)
    • Distributed Computing Systems
  • Experience supporting production AI/ML environments.

Compensation

Salary Range: $193,000 - $306,000

Compensation is based on experience, education, technical expertise, and overall alignment with program requirements.

Benefits

Medical Coverage

Choose from three comprehensive medical plans through Aetna. The company pays 80% of monthly premiums for employees.

Health Savings Account (HSA)

  • Pre-tax contributions for qualified medical expenses
  • Company contributes 50% of the annual deductible (prorated based on start date)

Dental Coverage

  • Aetna Passive PPO Max Plan
  • Company pays 80% of monthly premiums

Vision Coverage

  • Aetna Vision Preferred Premier 24M Plan
  • Company pays 80% of monthly premiums

Life Insurance

  • 100% Company-Paid Life Insurance
  • Accidental Death & Dismemberment (AD&D) Coverage

Short-Term Disability

  • 100% Company-Paid
  • Pays 60% of earnings up to $1,500 per week for up to 12 weeks

Retirement Plan

  • Automatic 6% employer contribution to 401(k)
  • Fully vested from day one
  • Employee contributions encouraged but not required

Paid Time Off & Holidays

  • 5–6 weeks of PTO depending on tenure
  • 11 paid holidays annually

Professional Development

  • $5,000 annual tuition reimbursement
  • Paid training, certifications, and industry conferences
  • Ongoing support for technical growth and career advancement

Why Join Us?

This is an opportunity to help shape the future of AI infrastructure while supporting critical mission objectives. You'll work alongside top-tier engineers building scalable AI platforms, deploying cutting-edge technologies, and solving some of the most challenging problems in modern software engineering.

https://www.staffed4u.com/ 

Share This Job

Powered by