Category Archives: Hadoop – MapReduce Code – Python

Search for a pattern in HDFS files – python script

Problem: Search a pattern in HDFS files and return the filename which contains this pattern. For example, below are our input files: $vim log1.out [Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test [Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test [Wed Oct 11… Read More »