Category Archives: Hadoop – Commands

Search for a file in HDFS using Solr Find tool

HdfsFindTool is essentially the HDFS version of the Linux file system find command. The command walks one or more HDFS directory trees, finds all HDFS files that match the specified expression, and applies selected actions to them. By default, it prints the list of matching HDFS file paths to stdout, one path per line. Search… Read More »

Tracking YARN logs

Create script to get yarn logs $ vim hadoop_logs.sh #!/bin/bash APPLICATION_ID= CONTAINER_ID= NODE_ADDRESS= if [ $# -eq 1 ]; then yarn logs -applicationId ${APPLICATION_ID} elif [ $# -eq 3 ]; then yarn logs -applicationId ${APPLICATION_ID} -containerId ${CONTAINER_ID} -nodeAddress ${NODE_ADDRESS} else echo “you must specify 1 or 3 arguments ” fi Create a symlink $ ln… Read More »

Hadoop / HDFS Commands

Few useful Hadoop Commands Uncompress gz file from HDFS to HDFS – Hadoop: $hadoop fs -text /hdfs_path/compressed_file.gz | hadoop fs -put – /hdfs_path/uncompressed-file.txt To uncompress while copying from local to HDFS directly: $gunzip -c filename.txt.gz | hadoop fs -put – /user/dc-user/filename.txt Hadoop commands for reporting purpose: $hdfs fsck /hdfs_path $hdfs fsck /hdfs_path -files -locations $hadoop… Read More »