Integrating LVM with Hadoop and providing Elasticity to Datanode Storage :

4 min readMar 14, 2021

In this article we are going to learn about the lvm on hadoop so let’s start with very basic : what is hadoop ??

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.

now we have to integrate lVM with hadoop so let’s know about LVM :

LVM allows for very flexible disk space management. It provides features like the ability to add disk space to a logical volume and its filesystem while that filesystem is mounted and active and it allows for the collection of multiple physical hard drives and partitions into a single volume group which can then be divided into logical volumes.

you can setup LVM using these commands respectively:

pvcreate — This command converts the attached Disk or external disk into physical volume.
vgcreate — This command creates the volume groups for all the specified physical volumes.
vgdisplay — This command displays all the available or created volume groups.
lvcreate — This command creates the logical volume from the volume group.
lvdisplay — This command displays all the available or created logical volumes from the volume groups.
lvextend — This command extends the current logical volume size from the available space of the volume group.
resize2fs — This command extends the unallocated space of the partition without formatting it.

now let’s go to the task and try to the concept in very detail and easy way —

step1 — Attach a harddisk to the datanode where the use of lvm came in play-

in hadoop cluster we have 1 master node and 1 or more that 1 datanode available to build a entire cluster , so have top attach a drive to the data node .by using lvm to increase the storage on the fly.(like pay as we go model in aws )

i am performing this task on the aws so , i have to launch a EC2 instance and attcah an EBS storage to the instance you can also do the same with the web ui or use the command line to do the same

following commands are use to do the same:

1- go to ebs storage and attach 2 devide or follow the bellow command.

aws ec2 create-volume --availability-zone ap-south-1a --size 4 --volume-type gp2

2- now attach the volume to the datanode instance by using the following command

aws ec2 attach-volume  --instance-id yourinstanceid  --volume-id volumeid  --device device name(i.e /dev/sdb )

the result of the above command will be this if not kindly restart the process :

Till now we have attached two harddisks with the datanode . To check available harddisks in the same system run fdisk -l command. and the result will be something like this :

now we have to convert the drives/volumes into the physical volume using the given command

pvcreate /dev/xvdb /dev/xvdc

now we have to create a physical group of the physical volume the following commands will help yoou to do the same :

vgcreate vgname /dev/xvdb /dev/xvdbvgcreate vgname /dev/xvdb /dev/xvdc

vgname use can choose anything (free to opt vgname)

vgdisplay command is use to check whether the vg is created or not:

now we have to create Logical Volume from Volume Group the following command ios use to do the same :

lvcreate --size +2G  vgname -n lvname

lvdisplay command is use to verify if the logical volume is created or not .

It is very important to format any volume before use it because this will create a inode table .

so we have to format the volume the following command is use:

mkfs.ext4 /dev/mapper/vgname-lvname

Now we have to Mount the Logical Volume to the Hadoop DataNode Directory :

to mount the volume to the datanode the command is given below :

mount /dev/mapper/vgname-lvname /data

the complete volume will be mounted into the data directory , To verify whether it is mounted successfully or not, run df -hT command

now we have to Provide Elasticity to Hadoop DataNode using LVM “on the fly”

When we exceed the limit of Datanode of 2 GB which I shared till now after implementing LVM we have one LV with a size of 6 GB that is connected to the Hadoop Namenode Directory. That means it can fill up any time to using two commands we can easily extend the size of the LV partition on the fly.

lvextend --size +4G /dev/mapper/vgname-lvname

lvextend --size +1G /dev/mapper/vgname-lvname

now again we have to format the extended size , for this the given command is helpfull :

resize2fs /dev/vgname/lvname

This command will automatically resize or merge the unallocated space by recreating the inode table.

you can see the volume is extended to the given size !!!!!

TASK_COMPLETED

thank you :)

to know more about lvm or hadoop or for any error ping me on linkedin.