What Is Apache Hama?

Written by Caitlin Davidson

Share

Apache Hama Defined

Apache Hama is a distributed computing framework for big data analytics. It is based on Bulk Synchronous Parallel (BSP) computing models for scientific computations, including matrix, and graph and network algorithms. 

Apache Hama consists of three major components: BSPMaster, GroomServers and Zookeeper.

  • BSPMaster is responsible for:
    •       Maintaining groom server status
    •       Controlling super steps in a cluster
    •       Maintaining job progress information
    •       Scheduling jobs and assigning tasks to groom servers
    •       Disseminating execution class across groom servers
    •       Controlling fault
    •       Providing users with the cluster control interface.
  • A groom server (shortly referred to as groom) is a process that performs BSP tasks assigned by BSPMaster.
  • A Zookeeper is used to manage the efficient barrier synchronisation of the BSPPeers.

Advantages of Apache Hama include;

  • providing BSP primitives rather than graph processing APIs, enabling programmers to operate at a lower level.
  • It uses the BSP model to avoid conflicts and deadlines during communication.
  • Hama manages to provide explicit support to message passing.
  • It is primarily-Java based but also allows C++ programs.
  • Hama is an open-source software framework, and its source code is available for free to use. So, users can modify the source code as per their needs.

In Data Defined, we help make the complex world of data more accessible by explaining some of the most complex aspects of the field. Click Here for more Data Defined.