Member-only story
How To Handle Big Data With Node.js, AWS Kinesis, AWS S3, and AWS Athena
Build a Node.js application that manages and stores data in AWS S3

Imagine we are trying to build an application that receives a constant stream of data. This data will be in large volume, needs to be stored, doesn't change, and needs to be queried later, for example, web analytics, IoT signals, or a voting application. AWS provides a number of tools to achieve such a use case.
In this tutorial, we will look at building a simple application that receives the data from a Node.js server, stores it into AWS S3, and enables SQL queries with AWS Athena. Our data pipeline looks something like this:

Step 1. Create an AWS S3 Bucket
In your AWS Management Console, head over to Amazon S3, and click on Create Bucket. Enter a bucket name and proceed to create this bucket. We will use this bucket later in the process.

Step 2. Create an AWS Kinesis Data Stream
In your AWS Management Console, head over to Amazon Kinesis, and go to the Data Streams tab. Then click on “Create data stream.”

Next, enter a name for your stream. We are going with voting-app
. You can write anything here, and this name will be used in our Node.js application later. Under “Data stream capacity” you may enter 1 under “Number of open shards.” On the same page, you will see the Shard estimator, in case you would like to calculate how many shards you will need. You can always edit this number, so we are going with 1 for now.