Software Engineer, IoT Reliability

Qualifications

5+ years of professional experience in software engineering, embedded systems, or systems-level development
Background managing, deploying, or supporting distributed fleets of Linux-based IoT or edge devices
Proficiency in Python, with strong command-line experience using Bash
Hands-on familiarity with Docker and containerized development workflows
Experience using observability tools such as Prometheus, OpenTelemetry, Grafana, DataDog, or comparable systems
Ability to debug issues across hardware interfaces, OS subsystems, networking layers, and application logic
Understanding of wireless technologies such as cellular (LTE/5G), WiFi, or low-power IoT connectivity (e.g., LoRa/LPWAN)
Experience creating internal tooling, automation pipelines, or dashboards that improve device visibility and operational efficiency
Strong written and verbal communication skills with the ability to clearly articulate technical findings
Comfort working in an ownership-driven, cross-functional engineering environment
Willingness to travel occasionally (~10–20%) for on-site validation, diagnostics, or testing

Responsibilities

Design systems that measure, analyze, and improve the long-term reliability and performance of a large IoT device fleet
Build internal tools for device provisioning, configuration management, diagnostics, and field validation
Develop automation pipelines supporting firmware rollouts, software updates, and fleetwide configuration changes
Investigate device anomalies through telemetry, logs, networking traces, and real-time metrics
Identify patterns or systemic issues and propose scalable solutions across multiple hardware generations
Support integration of IoT devices with backend services using event-driven or publish–subscribe communication frameworks
Collaborate closely with hardware, firmware, and operations teams to diagnose and resolve field issues
Conduct performance tuning and functionality testing for sensors, connectivity modules, or device-specific features
Perform periodic fleet audits to monitor degradation, identify reliability risks, and recommend architectural improvements
Create and maintain technical documentation, troubleshooting guides, operational runbooks, and internal playbooks
Participate in occasional field deployments or on-site testing where remote replication is insufficient

Compensation:

If you are interested in learning more about this role, please fill out the form below or reach out to us on LinkedIn!