Reading an Excel File in an S3 Bucket using TypeScript

Reading an Excel File in an S3 Bucket using TypeScript

Clock Icon2023.09.28 06:35

Hi, this is Charu from Games Solution Department, Classmethod. In this blog, we will be reading an excel file in an S3 bucket using Typescript and put the data in the dynamoDB table. We'll use the AWS SDK to interact with Amazon S3, and the xlsx library to parse the Excel file.

Let's get started:

To get started, you can upload an excel file in your S3 bucket and note down the object key.

Let's start by setting up a new TypeScript project and installing the necessary dependencies.

1. Create a new directory for your project and navigate to it in your terminal.

2. Initialise a new Node.js project with TypeScript support:

npm init -y

npm install typescript aws-sdk xlsx

Now, let's write the TypeScript code to read an Excel file from an S3 bucket.

Import the Libraries and complete the configuration

import { S3Client, GetObjectCommand } from "@aws-sdk/client-s3";
import { DynamoDBClient, PutItemCommand } from "@aws-sdk/client-dynamodb";
import * as XLSX from 'xlsx';

// Replace with your S3 bucket and file details
const bucketName = 'your-s3-bucket-name';
const fileName = 'your-excel-file.xlsx';

// AWS S3 client setup
const s3 = new S3Client({
  region: "your-region",
});

// AWS DynamoDB client setup
const dynamodb = new DynamoDBClient({
  region: "your-region",
});

Reading the Data as Chunks

// Function to convert a stream to a buffer
function streamToBuffer(stream: any): Promise<Buffer> {
  return new Promise((resolve, reject) => {
    const chunks: Buffer[] = [];
    stream.on('data', (chunk: Buffer) => chunks.push(chunk));
    stream.on('error', reject);
    stream.on('end', () => resolve(Buffer.concat(chunks)));
  });
}

This function takes one argument i.e. stream. A stream is a source of data that you can read from, such as a file stream or an HTTP response stream. We are converting the stream to Buffer. The function returns a promise that resolves to a Buffer. Promises are the way to handle asynchronous operations in JavaScript.

const chunks: Buffer[] = []:

Next, we have initialised an empty array called chunks to store the chunks of data read from the stream. A Buffer is a built-in Node.js class for working with binary data.

stream.on('data', (chunk: Buffer) => chunks.push(chunk)):

When data is available in the stream, the provided callback function is executed. In this case, we push each chunk of data onto the chunks array.

stream.on('error', reject):

If an error occurs while reading the stream, we reject the promise with the error. This allows the caller to handle and propagate the error appropriately.

stream.on('end', () => resolve(Buffer.concat(chunks))):

When the stream ends, meaning all data has been read, the provided callback function is executed. At this point, we resolve the promise by concatenating all the chunks into a single Buffer using Buffer.concat(chunks).

Here goes the rest of the code,

async function main() {
  try {
    // Define S3 object parameters
    const params = {
      Bucket: bucketName,
      Key: fileName,
    };

    // Retrieve the Excel file from S3
    const s3Response = await s3.send(new GetObjectCommand(params));

    // Parse the Excel file from the S3 response
    const workBook = XLSX.read(await streamToBuffer(s3Response.Body), {
      type: 'buffer',
    });

    // Get the first sheet in the Excel file and convert Excel sheet data to JSON
    const sheet = workBook.SheetNames[0];
    const workSheet = workBook.Sheets[sheet];
    const rows = XLSX.utils.sheet_to_json(workSheet);

    // Replace with your DynamoDB table name
    const TABLE_NAME = 'your-dynamodb-table-name';

    // Insert data into DynamoDB
    for (let row of rows) {
      try {
        await dynamodb.send(
          new PutItemCommand({
            TableName: TABLE_NAME,
            Item: {
              userId: { S: row.userId},
              fullName: { S: (row.firstName + ' ' + row.lastName) },
              Department: { S: row.Department },
              workLocation: { S: row.workLocation},
              email: { S: row.email},
            },
          })
        );
        console.log('Data inserted for user:', row.userId);
      } catch (error) {
        console.log('Error inserting data', error);
      }
    }

    console.log('Excel data imported successfully.');
  } catch (error) {
    console.log('S3 Error fetching data', error);
  }
}

// Execute the main function
main();

const sheet = workBook.SheetNames[0];

const workSheet = workBook.Sheets[sheet];

const rows = XLSX.utils.sheet_to_json(workSheet);

These lines will take the first sheet of the excel file and read that sheet only. The sheet_to_json function provided by xlsx library accepts a worksheet object as a parameter and returns an array of JSON objects.

Conclusion:

That's it! You've successfully created a TypeScript application to read an Excel file from an S3 bucket and insert its data into DynamoDB. You can now adapt and extend this code to fit your specific use case.

Thank you for reading!

Happy Learning:)

この記事をシェアする

facebook logohatena logotwitter logo

© Classmethod, Inc. All rights reserved.