Centralized Log Management in AWS: A Primer

The many benefits of centralized logging in AWS and the challenges to overcome
12.09.2023
Tags

What do an e-commerce website, a practice management system, an accounting service, and an infrastructure monitoring platform have in common? Logs. Billions of logs. What happens when you migrate their workloads to cloud providers such as AWS? You add operational log data from the cloud services and infrastructure you use, plus log trails from events within your cloud accounts. In short, trillions of logs, with much greater diversity in purpose and stakeholders. Whatever log management solution you had, it must keep up with the new requirements. Or be replaced by something that can.

This article sets the stage for a blog series on implementing modern log management solutions when entering the AWS cloud. Now replace modern with centralized. How is “centralizing” a modern thing to do in the distributed cloud era? What business advantages can it offer? What are the technical and operational challenges? Which ones are specific to AWS? Let’s get to it.

The Case for Centralized Log Management

Picture this scenario. A business is powered by a network of intertwined applications and services running on a data center. Several teams are responsible for the smooth operation of their segment of the IT landscape - e.g., product teams own microservices running on kubernetes clusters, DBA teams maintain databases deployed on bare-metal servers, and platform teams operate the underlying infrastructure. Each software system generates logs with different formats, granularity, and target users.

Adopting a decentralized log management approach is - to put it bluntly - inefficient and frustrating. If logs are scattered across multiple systems and analyzed differently depending on their type, it really doesn’t matter how detailed or informative they are. When issues arise, teams squander hours to piece together a coherent picture of what’s going on, manually collecting and correlating logs across sources upstream and downstream. Meanwhile, you’re not selling your products, enraging account holders by denying them access to your online bank services, not letting truck drivers leave your warehouses, not stopping GDPR fines from skyrocketing due to non-compliance, and so on. In a nutshell, you’re burning money, losing reputation, and undermining engineer morale and performance.

By centralizing log management and analytics, companies with a strong IT presence experience a multitude of benefits: shorter time to detect and respond to failures, operational efficiency, enhanced security, and simplified compliance with regulatory requirements. Let’s discuss these points in more detail.

Every Tool is a Silo

We recently supported a client developing an event-driven architecture on kubernetes in a hybrid cloud. Applications depended on self-managed and SaaS services such as Apache Kafka and Google Spanner. Troubleshooting issues crossing the microservices boundaries was a pain in the neck. Application and platform logs resided in distinct Elasticsearch clusters, cloud service logs were available through the Cloud Logging service UI, and other infrastructure logs were restricted to the platform team via a separate tool. Sounds familiar? You’re not alone.

In contrast, consolidating logs within a central analytics tool can dissolve silos between domains, functions, and teams. This means not only accelerating root cause analysis and incident response, but also detecting anomalies and potential issues before they become critical and impact user engagement and revenue generation. To make it more concrete, imagine that the UX team of a social media app has detected a significant increase in page load times with a consequent drop in user activities; having logs in a central place, they could identify a bottleneck in one of the backend services causing delays and could team up with the developers to optimize the service. Or, another example, say a data analyst working for a marketplace has inspected the frontend search logs to identify the least-requested product categories; using the same centralized logging tool, they could then create dashboards to coordinate marketing and developers’ efforts to boost conversions.

Ultimately, centralized logging can serve both as a collaboration tool across engineering teams and as an additional resource to align technical and business decision-makers.

Efficiency Up, Costs Down

Another benefit of integrating logs into a central solution is higher resource efficiency and lower costs. As an example, let’s focus on storage efficiency. Centralizing logs simplifies duplication avoidance, noise filtering, enhanced data compression, and spotting “chatty” log sources that could be sampled or refactored. We are going to cover all features in the next pieces of this blog series.

Features such as automated log retention policies can also free disk space from old data or move it to cheaper storage. Sure, one could achieve the same in distributed logging solutions too, but it would demand greater effort and planning. Automating log retention policies goes beyond storage savings, as it also contributes to upholding data compliance standards, our next topic.

Streamline Data Compliance

Our clients in regulated sectors have stringent data protection and governance requirements. Take a financial institution operating its workloads on AWS, for example. Bringing all logs from diverse services (e.g., Lambda, ECS, RDS) into a single location enables strong encryption and strict access control. This way, only authorized users can analyze logs that could convey sensitive financial transaction details. Central storage also streamlines the standardization of log retention, archival, and auditing processes - all crucial aspects of GDPR compliance, to name one, which would minimize the risks and costs of getting fined.

Ironically, reducing security and compliance complexity is an often-overlooked benefit of centralized logging despite data protection and privacy regulations ranking among the top concerns of companies undertaking a cloud migration journey.

The Bumps in the Road

So far, we’ve covered the advantages of centralized logging compared to its decentralized counterpart. However, as with any solution, it comes with its own set of challenges and drawbacks. That’s where we’ll shift our attention in the remainder of this article.

Centralization Challenges

As we like to say, there are no silver bullets. Storing and analyzing all logs using a centralized tool has shortcomings too. We summarize a few of them below.

Reliability. When you hear “centralized,” you should think of a single point of failure. Downtime hinders monitoring of the entire workload, delays the detection of security incidents, and compromises compliance efforts due to data loss. While AWS services for centralized logging guarantee high availability, self-managed solutions need redundancy and failover, which can be expensive.

Data Volume. It may be obvious but still worth noting the inherent challenge of centralizing logs when their volume is growing every year by 35%. Sound capacity planning, dynamic scalability, and strategic retention policies can prevent logging from becoming a bottleneck.

Data Variety. Logs come in various shapes and sizes, and their variety keeps expanding with the constant release of new AWS services and the consolidation of cloud native and hybrid architectures. While techniques such as normalization, parsing, and aggregations can standardize these formats for correlation analysis, integrating them into the logging pipeline of your AWS workloads requires careful planning and additional infrastructure provisioning.

Security Risks. If, on the one hand, centralized logging reduces the attack surface, on the other hand, it will magnify the impact of data breaches. Inadequate access control and encryption could provide attackers valuable information for further and more severe attacks.

Resistance to Change. Transitioning to centralized logging is more than just a technical shift; it may require organizational culture changes. Teams may resist departing from their familiar logging practices and tooling or be reluctant to share visibility over their logs, fearing a loss of autonomy or data protection issues. Active team involvement is, therefore, crucial, emphasizing the benefits of shared visibility and insights through log centralization, both for teams and the company.

Vendor Lock-in. When setting up centralized logging in AWS, you have several native services and third-party tools you can choose from. Centralizing everything could result in a situation where migrating away from the selected solution becomes prohibitively expensive if it no longer meets your organization’s performance or feature needs.

None of these points represent an insurmountable issue - in fact, potential countermeasures have already been suggested in their brief descriptions above. In the following parts of this blog series, we will further discuss overcoming these challenges when adopting AWS cloud solutions like Amazon CloudWatch Logs and Amazon OpenSearch Service. However, to get there, we need to get our workloads on AWS first. Let us then present a couple more considerations for centralizing logs in AWS.

Every Tool AWS Account is a Silo

Off from the start of their AWS cloud journey, we always recommend a multi-account strategy to our clients. This strategy delineates clear security and billing boundaries between different teams and business operations, helping minimize the ripple effects of operational incidents or changes.

So now teams have their AWS account and are funneling applications and service logs to - say - Amazon CloudWatch Logs. Log centralization accomplished? Not entirely: the team view is still limited to their workload. Oversimplifying, the best practice is to dedicate an account for logging and use it as a central repository for logs, potentially including those from on-premises data centers (see diagram below). This will remove the information silos between organization units and teams, with the benefits already discussed above in this article.

awscloud

A simplistic AWS multi-account setup for centralized logging.

Certainly, in reality, the situation is more complex than the diagram above suggests. Additional factors such as compliance (e.g., logs must be stored in the same region as their sources) and data transfer expenses must be considered.

Coordinate Phased Migrations

Large-scale AWS migrations are easier to swallow when workloads are migrated piece by piece. Easier doesn’t mean easy: a great deal of planning, preparation, and coffee is involved. With multiple teams to coordinate and moving parts to monitor, a centralized logging solution is paramount to success, as it offers insights into the health and performance of each migrated workload.

However, establishing a uniform observability process across migration teams demands considerable effort, a well-structured approach, and discipline. Quite a demanding feat. Been there.

What’s Next: AWS Services for Centralized Logging

This article provided a broad introduction to our upcoming series on centralized log management in AWS. Our focus will shift to more technical discussions starting from the next article, where we’ll compare Amazon CloudWatch Logs and Amazon OpenSearch Service. If you are a regular reader of our blog, you may sense where this is going. But, well, it’s all about the journey, and we promise it will be an objective and insightful one.