How to copy the files in between two CDH clusters

Sometimes, we may need to copy the content from one CDH5 cluster to an another CDH5 cluster. We can make use of distcp to achieve that.


hadoop distcp -m 10 -prbugpcaxt hdfs://ACTIVE_NAME_NODE_OF_CLUSTER_A:8020/FILE_PATH hdfs://ACTIVE_NAME_NODE_OF_CLUSTER_B:8020/FILE_PATH

-m stands for Maximum number of simultaneous copies. Specify the number of map operations.

-p refers Preserve r: replication number b: block size u: user g: group p: permission c: checksum-type a: ACL x: XAttr t: timestamp


hadoop distcp -m 100 -pbugp hdfs:// hdfs://

If you faced any memory issues like below during map or reduce operation, than pass the below arguments ( and -Dmapreduce.reduce.memory.mb)

Container is running beyond physical memory limits. Current usage: 1.1 GB of 1 GB physical memory used; 6.1 GB of 2.1 GB virtual memory used. Killing container.

hadoop distcp -Dmapreduce.reduce.memory.mb=2000 -m 100 -pbugp hdfs:// hdfs://

You would get an error if the file exists already. To override the destination file, then pass -update option. Refer the below command to know how to do it.

hadoop distcp -update hdfs:// hdfs://

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s