Saturday, May 28, 2016

Node JS streams

Streams in NodeJS

Streams in NodeJS are not as complicated as people perceive. Although it can easily go from mere to composite in a blink of an eye.

Streams help you to picture complicated business into its simplified version with a couple of lines of code. With all complications abstracted in a module so to say. This promotes code simplicity, re-usability and in the process more value for code. If you are willing to introspect code with and without streams you'll certainly observe 2 or 3 lines of code would have saved some 30 - 40 lines of code, especially for people who build APIs and SDKs

Lets start small and a meager activity. Copy a file from one location/directory to another. You'd ask why such a mundane activity.

fs.createReadStream(sourceFile).pipe(fs.createWriteStream(targetFile));

Where is the validation of source file and destination path etc. One step at a time. This is a simple file copy operation. If you capsule this line within a try-catch block, a lot of the issues that you are thinking about are handle inherently without any effort. Now, you can let your imagination go wild and build on this solution to copy directories, drives or more. Our next step is something that will be more useful from a day to day perspective

More than copying a file

Downloading file from a remote location. Something like say you want to download a s/w from a remote location over ftp/http/https etc. Automating such pieces would make it easier for us. Say for e.g You need to download a huge file from a remote location.

var request = require('request');
var path = require('path');
var REMOTE_LINK = "https://download.sublimetext.com/sublime_text_3_build_3103_x64.tar.bz2"
var LOCAL_DOWNLOAD_PATH = path.join(process.env.HOME, "Downloads");
var downloadFile = path.join(LOCAL_DOWNLOAD_PATH, path.basename(REMOTE_LINK));

request(REMOTE_LINK).pipe(fs.createWriteStream(downloadFile));

We can do some variations on this. You can associate events to let you know when streaming is done like the following and more.

request
 .get(REMOTE_LINK)
 .on('response', function(response) {
  console.log(response.length);
 })
 .on('error', function(err) {
  console.log('ERROR: Failed to download file: ' + err);
 })
 .on('end', function() {
  console.log('Completed file download successfully');
 })
 .pipe(fs.createWriteStream(downloadFile));

Above code is not only methodical, but serves as a documentation, with clear logs on each step. Even a novice developer will be able to follow with ease.

What we downloaded was a bz2 file, what if we need to download and unzip too in one short?
That will be an exercise for you. What I can share though is check on npm pack utilities for 'untar', 'gunzip' etc

Let's look into a related example. How to unzip a .tar.gz2 file, get .tar. Here we go

fs.createReadStream(sourceFile)
 .pipe(zlib.createGunzip())
 .pipe(fs.createWriteStream(unzippedFile));

Alright, we unzipped a file, but we still just have a .tar file. How about extracting a tar after we unzip it

var tar = require('tar-fs');
var fs = require('fs');

if(process.argv.length < 4) {
    console.log('ERROR: Please pass on TAR file name followed by location to extract the file, to continue');
    process.exit();
}

const SOURCE_FILE = process.argv[2];
var DEST_PATH = process.argv[3];

var untarFile = function(sourceFile, targetPath) {
    try {
        fs.createReadStream(sourceFile).pipe(tar.extract(targetPath));
    } catch(e) {
        console.log(e);
        process.exit();
    }
}

untarFile(SOURCE_FILE, DEST_PATH);

Transformation solutions

So far we saw how to write NodeJS-stream code and download, unzip or tar file. How about little more serious activity relevance to translation/transformation of data. Say, we get stream of data and we wish to convert it to upper case, pass it along for further processing . Although this does not look like a day-to-day business problem, it helps to show case the potential of NodeJS streams. I'd consider this to be a seed solution for your business problems.

CapitalizingStream.js

var inherits = require('util').inherits;
var Transform = require('stream').Transform;

module.exports = CapitalizingTransformStream;

function CapitalizingTransformStream(options) {
    Transform.call(this, options);
}

inherits(CapitalizingTransformStream, Transform);

function _transform(chunk, encoding, callback) {
    if(encoding == 'buffer') {
        chunk = chunk.toString();
    }
    callback(null, chunk.toUpperCase());
}

CapitalizingTransformStream.prototype._transform = _transform;

ConvertCase.js

var net = require('net');
var CapitalizingStream = require('./CapitalizingStream');

function handleConnection(conn) {
    var remoteAddress = conn.remoteAddress + ':' + conn.remotePort;
    console.log('New client connection from %s', remoteAddress);
   
    // handle connection to service
    var service = new CapitalizingStream();
    service.once('error', onServiceError);

    conn.once('close', onConnectionClose);
    conn.on('error', onConnectionError);

    conn.pipe(service).pipe(conn);

    function onServiceError(err) {
        console.log('ERROR: %s', err.message);
    }

    function onConnectionClose() {
        console.log('Connection closed on :%s', remoteAddress);
    }

    function onConnectionError(err) {
        console.log('Connection %s error %s', remoteAddress, err.message);
    }
}

var server = net.createServer();
server.on('connection', handleConnection);
server.listen(9000, function() {
    console.log('Server listening to %j', server.address());
});

Though above solution may not have have much of industrial value, here's is something you can try on as an exercise and check your stream skills
  • Convert CSV to HTML. Pass this CSV contents from a remote location
  • Accept content, convert to HTML
  • Send it back

More automation with NodeJS

  • Downloading files in clusters for setup - sort of containers for download. Such solution helps people do setups for OS, their dev environment etc. Something like dockers, in a much much smaller scale
  • Download your favorite/bookmarks from youtube. Persist links in a store, periodically download it and convert mp3 to say mp4 or other formats
  • Reverse Proxy - Make application performance better through reverse proxy. Build gateway engines.
  • Build easy cache - Addition to above approach. 

No comments:

Post a Comment