Understanding the Sidecar Pattern: How Atlassian Reduced Latency by 70%
In the world of software architecture, certain patterns stand out for their elegance and effectiveness. One such pattern is the Sidecar Pattern, a design that has gained popularity for its ability to optimize system performance and simplify complex workflows. In this blog post, we’ll dive deep into the Sidecar Pattern, explore how Atlassian used it to reduce latency by 70% in their Tenant Contact Service (TCS), and discuss when and how you can leverage this pattern in your own systems.
What is the Sidecar Pattern?
The Sidecar Pattern is a micro-services architecture pattern where a separate process (the "sidecar") is deployed alongside the main application. This sidecar handles specific tasks, such as communication with external services, logging, monitoring, or security, without burdening the main application. Think of it as a motorcycle sidecar: it’s attached to the main vehicle (the application) and assists it in performing additional tasks.
The key advantage of the Sidecar Pattern is that it decouples cross-cutting concerns from the main application, making the system more modular, maintainable, and scalable.
Atlassian’s Tenant Contact Service (TCS): A Case Study
Atlassian, the company behind popular tools like Jira, Confluence, and Bitbucket, relies heavily on its Tenant Contact Service (TCS). TCS is a critical service that:
Identifies which tenant (customer) a request belongs to.
Fetches tenant metadata for further processing.
Is called multiple times in every customer request across all Atlassian products.
TCS is a high-throughput, low-latency service, with an average response time of 5-6 milliseconds and four to five nines of availability. If TCS goes down, the entire Atlassian ecosystem is affected.
The Problem: High Latency and Failures
Despite TCS’s robust performance, one internal team at Atlassian reported high latency and frequent failures when using the service. While the P99 latency (99th percentile latency) for TCS was in the single-digit milliseconds, this team observed significantly higher latencies.
Upon investigation, the root cause was clear: the team’s client code was inefficient. They were:
Making sequential calls instead of parallel ones.
Not following best practices for interacting with TCS.
This inefficiency led to poor performance, even though the TCS service itself was operating optimally.
The Solution: Introducing the Sidecar
Instead of asking the team to rewrite their client code, the TCS team decided to solve the problem for all teams by introducing a Sidecar. Here’s how it worked:
What is a Sidecar?
A Sidecar is a separate process that runs alongside the main application on the same machine or instance.
It handles all interactions with the TCS service, ensuring that best practices (e.g., retries, parallelisation, failure handling) are followed.
How Does It Work?
The main application communicates with the Sidecar via local HTTP calls, avoiding the overhead of network calls.
The Sidecar, in turn, communicates with the TCS service, applying all the necessary optimizations.
Why Not a Library?
The TCS team considered writing a library for clients to use but decided against it for two main reasons:
Language Agnosticism: A library would need to be written in multiple languages (e.g., Go, Java, Rust) to support different teams, which is cumbersome.
Ease of Use: A Sidecar simplifies communication by using local HTTP calls, making it easier to implement and maintain.
Benefits of the Sidecar Pattern
The introduction of the Sidecar brought several benefits:
Improved Latency:
Teams using the Sidecar observed significant latency improvements, even those already following best practices.
For example, one team saw their latency drop by 70%.
Reduced Load on TCS:
- By enforcing best practices, the Sidecar reduced the total number of requests to TCS, optimizing its performance.
Horizontal Solution:
- Instead of solving the problem for one team, the Sidecar provided a universal solution that benefited all teams.
When to Use the Sidecar Pattern
The Sidecar Pattern is particularly useful in the following scenarios:
Cross-Cutting Concerns:
- Use a Sidecar to handle tasks like logging, metrics collection, security, or service communication without burdening the main application.
Language Agnosticism:
- If your system involves multiple programming languages, a Sidecar can provide a consistent interface for all teams.
Performance Optimization:
- A Sidecar can enforce best practices (e.g., retries, parallelization) to optimize performance.
Observability:
- Tools like Fluentd or Prometheus can be deployed as Sidecars to collect and forward metrics/logs from the main application.
Key Takeaways
Sidecar Pattern Optimizes Systems:
When implemented correctly, the Sidecar Pattern can heavily optimize system performance.
It is commonly used for metrics collection, log aggregation, and observability.
Build Horizontal Solutions:
Instead of solving problems for individual teams, build solutions that benefit the entire ecosystem.
This approach ensures that optimizations are implemented once and reused across the board.
Sidecar vs. Library:
- A Sidecar is language-agnostic and easier to maintain than a library, especially in polyglot environments.
Real-World Examples of Sidecar Usage
Metrics and Log Collection:
- Tools like Fluentd or Prometheus can be deployed as Sidecars to collect and forward metrics/logs from the main application.
Service Mesh:
- In a service mesh architecture (e.g., Istio, Linkerd), Sidecars are used to handle service-to-service communication, security, and observability.
API Gateways:
- A Sidecar can act as an API gateway, handling authentication, rate limiting, and request routing.
Conclusion
The Sidecar Pattern is a powerful tool for optimizing system performance and ensuring best practices are followed. Atlassian’s implementation of the Sidecar Pattern for TCS demonstrates how it can reduce latency, improve efficiency, and provide a universal solution for multiple teams.