Here is a screencast on submitting a word count MapReduce program written in Python. For some reason Windows Media Player is not able to play the file, but VLC is able to.
Here is the code for the mapper (1) and the reducer (1). Note that Hadoop provides Streaming (1) feature for writing MapReduce programs in non Java languages.
As observed in the screencast the Virtual Machine (VM) has all the necessary pieces to get easily started with Big Data and is provided as part of the Big Data training. The VM is updated regularly to add new frameworks and update the existing ones.
Hadoop tries to hide the underlying infrastructure details, so the MapReduce code and the command to submit a job is all the same for a single node and a thousand node cluster.

No comments:
Post a Comment