Athena and CloudTrail: A Marriage made in the Cloud
One of the first things which came to mind when AWS announced AWS Athena at re:Invent 2016 was querying CloudTrail logs. Over the course of the past month, I have had intended to set this up, but current needs dictated I had to do it quickly. When I went looking at JSON imports for Hive/Presto, I was quite confused. Of course, as a trusty technologist I went to Google. Much to my surprise, no one had published an article about using Athena to do this, I was only able to locate EMR based posts which used a custom serde to support the nested CloudTrail format.
I had mild success at first, but thanks to some Athena guru’s, I was able to get the magic piece in place.
I have to provide credit to AWS for their help with a few issues and amazing documentation on the event types.
I have provided references at the end of each field section and the end of the post with specific and broader details for the event fields and their uses.
To set all of this up, you first must have your CloudTrail logs in a single S3 bucket, this will work with a single account or many, but I purposely set up delivery to a single bucket but I created a table per source in Athena under a common database.
This is an example create table which will provide the table/field sytax formats I used in the tables below.
CloudTrail Record Query Columns
These are the columns you can reference in your queries, I have grouped them by purpose. This is not a full list of all CloudTrail fields, so if you need others such as VpcEndpoint, you should add that to the schema.
Event ID Fields
|record.eventID||GUID generated by CloudTrail to uniquely identify each event|
|record.sharedEventID||GUID generated by CloudTrail to uniquely identify CloudTrail events from the same AWS action that is sent to different AWS accounts|
|record.eventName||The requested action, which is one of the actions in the API for that service. (example: DescribeLoadBalancers)|
|record.eventSource||The service that the request was made to (e.g. ec2.amazonaws.com)|
|record.eventTime||The date and time the request was made, in coordinated universal time (UTC)|
|record.eventType||Identifies the type of event that generated the event record, one of AwsApiCall, ConsoleSignin, AwsServiceEvent (related to the trail itself, this can occur when another account made a call with a resource that you own)|
|record.eventVersion||The version of the log event format|
|record.sourceIPAddress||The IP address that the request was made from, when console is used, it will report console.amazonaws.com|
|record.requestId||The value that identifies the request, generated by the service being called|
|record.requestParameters||The parameters, if any, that were sent with the request|
|record.resources||An array of the resources accessed in the event, used most often by STS or KMS|
|record.resources.accountId||The account ID of the impacted element|
|record.responseElements.assumedRoleUser.arn||The arn of the assumed role for the unique session|
|record.responseElements.assumedRoleUser.assumedRoleId||The ID of the assumed role for the unique session|
|record.responseElements.credentials.accessKeyId||The access key of the caller|
|record.responseElements.credentials.expiration||The expiration of the current session|
|record.responseElements.credentials.sessionToken||The active token for the session References|
|record.userAgent||The agent through which the request was made|
|record.recipientAccountId||Represents the account ID that received this event, may differ from the calling account if cross-account access occurred and will differ on the "remote" end|
|record.userIdentity.accountId||The account that owns the entity that granted permissions for the request|
|record.userIdentity.arn||The Amazon Resource Name (ARN) of the principal that made the call|
|record.userIdentity.invokedBy||The name of the AWS service if that made the request|
|record.userIdentity.principalId||A unique identifier for the entity that made the call. For requests made with temporary security credentials, this value includes the session name that is passed to the AssumeRole, AssumeRoleWithWebIdentity, or GetFederationToken API call|
|record.userIdentity.sessionContext.attributes.creationDate||The date and time when the temporary security credentials were issued|
|record.userIdentity.sessionContext.attributes.mfaAuthenticated||The value is true if the root user or IAM user whose credentials were used for the request also was authenticated with an MFA device; otherwise, false|
|record.userIdentity.sessionContext.sessionIssuer.accountId||The account that owns the entity that was used to get credentials|
|record.userIdentity.sessionContext.sessionIssuer.arn||The internal ID of the entity that was used to get credentials|
|record.userIdentity.sessionContext.sessionIssuer.type||The source of the temporary security credentials, such as Root, IAMUser, or Role|
|record.userIdentity.sessionContext.sessionIssuer.userName||The friendly name of the user or role that issued the session. The value that appears depends on the sessionIssuer identity type. See reference material for more information|
|record.userIdentity.type||The type of the identity which is one of: Root, IAMUser, AssumedRole, FederatedIsr AWSAccount (cross-account access), AWSService (Access performed by an AWS service such as Elastic Beanstalk)|
Find all event names by ARN by IP address and count them up as the highest totals
Find all events where cross-account access occurred, group them by the source and the ARN and count the totals
Document Reference: http://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-event-reference.html