Thursday 18 May 2017

7 Reasons you should learn Python now

Python is a favorite among many developers for its strong emphasis on readability and efficiency, especially when compared to other languages like Java, PHP, or C++.

Sure, it’s old, but it’s 1980s old, not Cobol or Fortran old. Besides, if something works, why change it, especially when there are a so many ways to improve it.

Actually, depending on how you view it, longevity is a good thing in itself—a sign of stability and reliability.

If you’re like many people who first started out with Java, C, or Perl, the learning curve for Python is practically nonexistent. But the fact that it’s easy to learn is also the reason why some people don’t see Python as a necessary programming skill.

I’ll be honest with you, my love of Python didn’t really develop until a few years ago. It took a long career of painful lessons to appreciate everything this language and platform have to offer. My goal with this short post is to save you the same pain, and convince you why Python is something you need to know.

Python is easy to learn
Well, at least it’s “easier” when compared to many of the other programming languages available to you. There isn’t a lot of ceremony to Python’s syntax, which makes it readable even when you’re not a Python expert. My experience is that learning and teaching Python through examples is easier than approaching, say, Ruby or Perl the same way, since the syntax of Python has far fewer rules and special cases. The focus isn’t on language intricacies, it’s on what you want to accomplish with your code.

Python is Language of Choice
Python is a general-purpose interpreted, interactive, object-oriented and high-level programming language. Currently Python is the most popular Language in IT. Python adopted as a language of choice for almost all the domain in IT including Web Development, Cloud Computing (AWS, OpenStack, VMware, Google Cloud, etc.. ), Infrastructure Automation , Software Testing, Mobile Testing, Big Data and Hadoop, Data Science, etc.

Python Lets You Build More Functions With Fewer Lines Of Code.
Python is a quick study for anyone. With practice, you could easily build a rudimentary game in two days tops (and that’s coming from knowing absolutely nothing about programming).
Another factor that makes Python an attractive programming language for novices is its readability and efficiency.
Python is a versatile language and platform
Python will be 28 years old in 2017. Even though that’s older than many of my readers, it remains highly relevant because it can be applied to pretty much any software development or operations scenario you can find today. Managing local or cloud infrastructure? Python applies. Developing websites? Yep, it applies there too. Need to work against a SQL database? It does that. Need a custom function for Hive or Pig? Covered. Just building a small tool for yourself? Python’s simplicity makes it a great choice. Need a language that supports the rigor of object-oriented design? Python’s features make it relevant here, too. In short, investing a little effort into learning Python will give you skills that apply across a wide range of job roles.

Python has one of the most mature package libraries around
Once you know the language, you can leverage the platform. Python is backed by PyPI (pronounced Pie-Pie and perusable online here),which is a repository of more than 85,000 Python modules and scripts you can use immediately. These modules deliver prepackaged functionality to your local Python environment and solve problems as diverse as working with databases, implementing computer vision, executing advanced data analytics such as sentiment analysis, or building RESTful web services.

Python is a commonly-used language in data science
Whatever job you’re reaching for, data will be a part of it. IT Ops, software development, marketing, etc … they’re all drowning in data and thirsting for wisdom. Soon data analytics skills will be as necessary as coding skills, and Python has a strong presence in both areas. Next to the language R, Python is the most used language in modern data science; in fact, Python job postings outnumber R postings in the data science arena. The skills you develop learning Python will transfer directly to building these analytics skills.

Python is cross-platform and Open Source
Python’s been running cross-platform and developed as Open Source for more than 20 years. If you need code that works on Linux, Windows, and MacOS, Python provides. Moreover, it’s backed by decades of bug-squashing and kink-straightening to ensure that your code works as intended wherever you run it.

PYTHON IS FLEXIBLE.
There are several robust Python implementations integrated with other programming languages.
* CPython, a version with C
* Jython, or Python integrated with Java
* IronPython, which is designed for compatibility with .NET and C#
* PyObjc, or Python written with ObjectiveC toolkits
* RubyPython, or Python combined with Ruby.
Why you should know Python
There aren’t a lot of languages that can offer the versatility and simplicity of Python; there are even fewer that can do so alongside decades of thought, effort and community that has gone into Python. Whether you’re new to code or a script-spewing guru, Python is something you need to know.

Top 25 Python Interview Questions Prepared by Experts

1. What is JSON? How would convert JSON data into Python data?
JSON – stands for JavaScript Object Notation. It is a popular data format for storing data in NoSQL
databases. Generally JSON is built on 2 structures.
  1.  A collection of <name, value> pairs.
  2.  An ordered list of values.
    As Python supports JSON parsers, JSON-based data is actually represented as a dictionary in Python. You can convert json data into python using load() of json module.

2. How are the functions help() and dir() different?
These are the two functions that are accessible from the Python Interpreter. These two functions are used for viewing a consolidated dump of built-in functions.
  • help() – it will display the documentation string. It is used to see the help related to modules, keywords, attributes, etc.
    To view the help related to string datatype, just execute a statement help(str) – it will display the documentation for ‘str, module. ◦ Eg: >>>help(str) or >>>help() – it will open the prompt for help as help>
  • to view the help for a module, help> module module name Inorder to view the documentation of ‘str’ at the help>, type help>modules str
  • to view the help for a keyword, topics, you just need to type, help> “keywords python- keyword” and “topics list”
  • dir() – will display the defined symbols. Eg: >>>dir(str) – will only display the defined symbols.

3. Which command do you use to exit help window or help command prompt?
quit
When you type quit at the help’s command prompt, python shell prompt will appear by closing the help window automatically.

4. Does the functions help() and dir() list the names of all the built_in functions and variables? If no, how would you list them?
No. Built-in functions such as max(), min(), filter(), map(), etc is not apparent immediately as they are
available as part of standard module.To view them, we can pass the module ” builtins ” as an argument to “dir()”. It will display the
built-in functions, exceptions, and other objects as a list.>>>dir(__builtins )
[‘ArithmeticError’, ‘AssertionError’, ‘AttributeError’, ……… ]

5. Explain how Python does Compile-time and Run-time code checking?
Python performs some amount of compile-time checking, but most of the checks such as type, name, etc are postponed until code execution. Consequently, if the Python code references a user -defined function that does not exist, the code will compile successfully. In fact, the code will fail with an exception only when the code execution path references the function which does not exists.

6. Whenever Python exists Why does all the memory is not de-allocated / freed when Python exits?
Whenever Python exits, especially those python modules which are having circular references to other objects or the objects that are referenced from the global namespaces are not always de – allocated/freed/uncollectable.
It is impossible to deallocate those portions of memory that are reserved by the C library.
On exit, because of having its own efficient clean up mechanism, Python would try to deallocate/
destroy every object.

7. Explain Python's zip() function.?
zip() function- it will take multiple lists say list1, list2, etc and transform them into a single list of tuples by taking the corresponding elements of the lists that are passed as parameters.
list1 = ['A',
'B','C'] and list2 = [10,20,30].
zip(list1, list2) # results in a list of tuples say [('A',10),('B',20),('C',30)]
whenever the given lists are of different lengths, zip stops generating tuples when the first list ends.

8. Explain Python's pass by references Vs pass by value . (or) Explain about Python's parameter passing mechanism?
In Python, by default, all the parameters (arguments) are passed “by reference” to the functions. Thus, if you change the value of the parameter within a function, the change is reflected in the calling function.We can even observe the pass “by value” kind of a behaviour whenever we pass the arguments to functions that are of type say numbers, strings, tuples. This is because of the immutable nature of them.

9. As Everything in Python is an Object, Explain the characteristics of Python's Objects.
  • As Python’s Objects are instances of classes, they are created at the time of instantiation. Eg: object-name = class-name(arguments)
  • one or more variables can reference the same object in Python
  • Every object holds unique id and it can be obtained by using id() method. Eg: id(obj-name) will return unique id of the given object.
    every object can be either mutable or immutable based on the type of data they hold.
  •  Whenever an object is not being used in the code, it gets destroyed automatically garbage collected or destroyed
  •  contents of objects can be converted into string representation using a method

10. Explain how to overload constructors or methods in Python.
Python’s constructor – _init__ () is a first method of a class. Whenever we try to instantiate a object __init__() is automatically invoked by python to initialize members of an object.

11. Which statement of Python is used whenever a statement is required syntactically but the program needs no action?
Pass – is no-operation / action statement in Python
If we want to load a module or open a file, and even if the requested module/file does not exist, we want to continue with other tasks. In such a scenario, use try-except block with pass statement in the except block.
 try:import mymodulemyfile = open(“C:\myfile.csv”)except:pass

12. What is Web Scraping? How do you achieve it in Python?
Web Scrapping is a way of extracting the large amounts of information which is available on the web sites and saving it onto the local machine or onto the database tables.
In order to scrap the web:load the web page which is interesting to you. To load the web page, use “requests” module.
parse HTML from the web page to find the interesting information.Python has few modules for scraping the web. They are urllib2, scrapy, pyquery, BeautifulSoap, etc.

13. What is a Python module?
A module is a Python script that generally contains import statements, functions, classes and variable definitions, and Python runnable code and it “lives” file with a ‘.py’ extension. zip files and DLL files can also be modules.Inside the module, you can refer to the module name as a string that is stored in the global variable name .
A module can be imported by other modules in one of the two ways. They are
  1.  import
  2. from module-name import

14. Name the File-related modules in Python?
Python provides libraries / modules with functions that enable you to manipulate text files and binary files on file system. Using them you can create files, update their contents, copy, and delete files. The libraries are : os, os.path, and shutil.
Here, os and os.path – modules include functions for accessing the filesystem
shutil – module enables you to copy and delete the files.

15. Explain the use of with statement?
In python generally “with” statement is used to open a file, process the data present in the file, and also to close the file without calling a close() method. “with” statement makes the exception handling simpler by providing cleanup activities.
General form of with:
with open(“file name”, “mode”) as file-var:
processing statements
note: no need to close the file by calling close() upon file-var.close()

16. Explain all the file processing modes supported by Python ?
Python allows you to open files in one of the three modes. They are:
read-only mode, write-only mode, read-write mode, and append mode by specifying the flags “r”, “w”, “rw”, “a” respectively.
A text file can be opened in any one of the above said modes by specifying the option “t” along with
“r”, “w”, “rw”, and “a”, so that the preceding modes become “rt”, “wt”, “rwt”, and “at”.A binary file can be opened in any one of the above said modes by specifying the option “b” along with “r”, “w”, “rw”, and “a” so that the preceding modes become “rb”, “wb”, “rwb”, “ab”.

17. Explain how to redirect the output of a python script from standout(ie., monitor) on to a file ?
They are two possible ways of redirecting the output from standout to a file.
  1.  Open an output file in “write” mode and the print the contents in to that file, using sys.stdout attribute.
    import sys
    filename = “outputfile” sys.stdout = open() print “testing”
  2. you can create a python script say .py file with the contents, say print “testing” and then redirect it to the output file while executing it at the command prompt.
    Eg: redirect_output.py has the following code:
    print “Testing”
    execution: python redirect_output.py > outputfile.

18. Explain the shortest way to open a text file and display its contents.?
The shortest way to open a text file is by using “with” command as follows:
with open("file-name", "r") as fp:
fileData = fp.read()
#to print the contents of the file print(fileData)

19. How do you create a dictionary which can preserve the order of pairs?
We know that regular Python dictionaries iterate over <key, value> pairs in an arbitrary order, hence they do not preserve the insertion order of <key, value> pairs.
Python 2.7. introduced a new “OrderDict” class in the “collections” module and it provides the same interface like the general dictionaries but it traverse through keys and values in an ordered manner depending on when a key was first inserted.
from collections import OrderedDict
d = OrderDict([('Company-id':1),('Company-Name':'myTectra')])
d.items() # displays the output as: [('Company-id':1),('Company-Name':'myTectra')]

20. When does a dictionary is used instead of a list?
Dictionaries – are best suited when the data is labelled, i.e., the data is a record with field names.
lists – are better option to store collections of un-labelled items say all the files and sub directories in a folder.
Generally Search operation on dictionary object is faster than searching a list object.

21. What is the use of enumerate() in Python?
Using enumerate() function you can iterate through the sequence and retrieve the index position and its corresponding value at the same time.
>>> for i,v in enumerate([‘Python’,’Java’,’C++’]):
print(i,v)
0 Python
1 Java
2 C++

22. How many kinds of sequences are supported by Python? What are they?
Python supports 7 sequence types. They are str, list, tuple, unicode, bytearray, xrange, and buffer. where xrange is deprecated in python 3.5.X.

23. How do you perform pattern matching in Python? Explain
Regular Expressions/REs/ regexes enable us to specify expressions that can match specific “parts” of a given string. For instance, we can define a regular expression to match a single character or a digit, a telephone number, or an email address, etc.The Python’s “re” module provides regular expression patterns and was introduce from later versions of Python 2.5. “re” module is providing methods for search text strings, or replacing text strings along with methods for splitting text strings based on the pattern defined.

24. Name few methods for matching and searching the occurrences of a pattern in a given text String ?
There are 4 different methods in “re” module to perform pattern matching. They are:
match() – matches the pattern only to the beginning of the String. search() – scan the string and look for a location the pattern matches findall() – finds all the occurrences of match and return them as a list
finditer() – finds all the occurrences of match and return them as an iterator.

25. Explain split(), sub(), subn() methods of
To modify the strings, Python’s “re” module is providing 3 methods. They are:
split() – uses a regex pattern to “split” a given string into a list.
sub() – finds all substrings where the regex pattern matches and then replace them with a different string
subn() – it is similar to sub() and also returns the new string along with the no. of
replacements.

Top 25 Hadoop Interview Questions Prepared by Experts


1) Compare Hadoop & Spark
                     
Criteria                                           Hadoop                                                   Spark
Dedicated storage                           HDFS                                                     None
Speed of processing                        average                                                excellent
Libraries                                        Separate tools available                        Spark Core, SQL, Streaming, MLlib, GraphX

2)    What are real-time industry applications of Hadoop?
Hadoop, well known as Apache Hadoop, is an open-source software platform for scalable and distributed computing of large volumes of data. It provides rapid, high performance and cost-effective analysis of structured and unstructured data generated on digital platforms and within the enterprise. It is used in almost all departments and sectors today.Some of the instances where Hadoop is used:
  • Managing traffic on streets.
  • Streaming processing.
  • Content Management and Archiving Emails.
  • Processing Rat Brain Neuronal Signals using a Hadoop Computing Cluster.
  • Fraud detection and Prevention.
  • Advertisements Targeting Platforms are using Hadoop to capture and analyze click stream, transaction, video and social media data.
  • Managing content, posts, images and videos on social media platforms.
  • Analyzing customer data in real-time for improving business performance.
  • Public sector fields such as intelligence, defense, cyber security and scientific research.
  • Financial agencies are using Big Data Hadoop to reduce risk, analyze fraud patterns, identify rogue traders, more precisely target their marketing campaigns based on customer segmentation, and improve customer satisfaction.
  • Getting access to unstructured data like output from medical devices, doctor’s notes, lab results, imaging reports, medical correspondence, clinical data, and financial data.

3)    How is Hadoop different from other parallel computing systems?
Hadoop is a distributed file system, which lets you store and handle massive amount of data on a cloud of machines, handling data redundancy. The primary benefit is that since data is stored in several nodes, it is better to process it in distributed manner. Each node can process the data stored on it instead of spending time in moving it over the network.
On the contrary, in Relational database computing system, you can query data in real-time, but it is not efficient to store data in tables, records and columns when the data is huge.
Hadoop also provides a scheme to build a Column Database with Hadoop HBase, for runtime queries on rows.

4)    What all modes Hadoop can be run in?
Hadoop can run in three modes:
  • Standalone Mode: Default mode of Hadoop, it uses local file stystem for input and output operations. This mode is mainly used for debugging purpose, and it does not support the use of HDFS. Further, in this mode, there is no custom configuration required for mapred-site.xml, core-site.xml, hdfs-site.xml files. Much faster when compared to other modes.
  • Pseudo-Distributed Mode (Single Node Cluster): In this case, you need configuration for all the three files mentioned above. In this case, all daemons are running on one node and thus, both Master and Slave node are the same.
  • Fully Distributed Mode (Multiple Cluster Node): This is the production phase of Hadoop (what Hadoop is known for) where data is used and distributed across several nodes on a Hadoop cluster. Sepa rate nodes are allotted as Master and Slave.

5)    Explain the major difference between HDFS block and InputSplit.
In simple terms, block is the physical representation of data while split is the logical representation of data present in the block. Split acts a s an intermediary between block and mapper.
Suppose we have two blocks:
Block 1: myTectra
Block 2: my Tect

Now, considering the map, it will read first block from my till ll, but does not know how to process the second block at the same time. Here comes Split into play, which will form a logical group of Block1 and Block 2 as a single block.
It then forms key-value pair using inputformat and records reader and sends map for further processing With inputsplit, if you have limited resources, you can increase the split size to limit the number of maps. For instance, if there are 10 blocks of 640MB (64MB each) and there are limited resources, you can assign ‘split size’ as 128MB. This will form a logical group of 128MB, with only 5 maps executing at a time.
However, if the ‘split size’ property is set to false, whole file will form one inputsplit and is processed by single map, consuming more time when the file is bigger.

6)    What is distributed cache and what are its benefits?
Distributed Cache, in Hadoop, is a service by MapReduce framework to cache files when needed. Once a file is cached for a specific job, hadoop will make it available on each data node both in system and in memory, where map and reduce tasks are executing.Later, you can easily access and read the cache file and populate any collection (like array, hashmap) in your code.
Benefits of using distributed cache are:
  • It distributes simple, read only text/data files and/or complex types like jars, archives and others. These archives are then un-archived at the slave node.
  • Distributed cache tracks the modification timestamps of cache files, which notifies that the files should not be modified until a job is executing currently.

7)    Explain the difference between NameNode, Checkpoint NameNode and BackupNode.
  • NameNode is the core of HDFS that manages the metadata – the information of what file maps to what block locations and what blocks are stored on what datanode. In simple terms, it’s the data about the data being stored. NameNode supports a directory tree-like structure consisting of all the files present in HDFS on a Hadoop cluster. It uses following files for namespace:
    fsimage file- It keeps track of the latest checkpoint of the namespace.
    edits file-It is a log of changes that have been made to the namespace since checkpoint.
  • Checkpoint NameNode has the same directory structure as NameNode, and creates checkpoints for namespace at regular intervals by downloading the fsimage and edits file and margining them within the local directory. The new image after merging is then uploaded to NameNode.
    There is a similar node like Checkpoint, commonly known as Secondary Node, but it does not support the ‘upload to NameNode’ functionality.
  • Backup Node provides similar functionality as Checkpoint, enforcing synchronization with NameNode. It maintains an up-to-date in-memory copy of file system namespace and doesn’t require getting hold of changes after regular intervals. The backup node needs to save the current state in-memory to an image file to create a new checkpoint.

8)   What are the most common Input Formats in Hadoop?
There are three most common input formats in Hadoop:
  • Text Input Format: Default input format in Hadoop.
  • Key Value Input Format: used for plain text files where the files are broken into lines
  • Sequence File Input Format: used for reading files in sequence

9)    Define DataNode and how does NameNode tackle DataNode failures?
DataNode stores data in HDFS; it is a node where actual data resides in the file system. Each datanode sends a heartbeat message to notify that it is alive. If the namenode does noit receive a message from datanode for 10 minutes, it considers it to be dead or out of place, and starts replication of blocks that were hosted on that data node such that they are hosted on some other data node.A BlockReport contains list of all blocks on a DataNode. Now, the system starts to replicate what were stored in dead DataNode.
The NameNode manages the replication of data blocksfrom one DataNode to other. In this process, the replication data transfers directly between DataNode such that the data never passes the NameNode.

10)    What are the core methods of a Reducer?
The three core methods of a Reducer are: setup(): this method is used for configuring various parameters like input data size, distributed cache. public void setup (context) reduce(): heart of the reducer always called once per key with the associated reduced task public void reduce(Key, Value, context) cleanup(): this method is called to clean temporary files, only once at the end of the task public void cleanup (context)

11)    What is SequenceFile in Hadoop?
Extensively used in MapReduce I/O formats, SequenceFile is a flat file containing binary key/value pairs. The map outputs are stored as SequenceFile internally. It provides Reader, Writer and Sorter classes. The three SequenceFile formats are: Uncompressed key/value records. Record compressed key/value records – only ‘values’ are compressed here. Block compressed key/value records – both keys and values are collected in ‘blocks’ separately and compressed. The size of the ‘block’ is configurable.

12)    What is Job Tracker role in Hadoop?
Job Tracker’s primary function is resource management (managing the task trackers), tracking resource availability and task life cycle management (tracking the taks progress and fault tolerance). It is a process that runs on a separate node, not on a DataNode often. Job Tracker communicates with the NameNode to identify data location. Finds the best Task Tracker Nodes to execute tasks on given nodes. Monitors individual Task Trackers and submits the overall job back to the client. It tracks the execution of MapReduce workloads local to the slave node.

13)    What is the use of RecordReader in Hadoop?
Since Hadoop splits data into various blocks, RecordReader is used to read the slit data into single record. For instance, if our input data is split like: Row1: Welcome to Row2: Intellipaat It will be read as “Welcome to Intellipaat” using RecordReader.

14)   What is Speculative Execution in Hadoop?
One limitation of Hadoop is that by distributing the tasks on several nodes, there are chances that few slow nodes limit the rest of the program. Tehre are various reasons for the tasks to be slow, which are sometimes not easy to detect. Instead of identifying and fixing the slow-running tasks, Hadoop tries to detect when the task runs slower than expected and then launches other equivalent task as backup. This backup mechanism in Hadoop is Speculative Execution. It creates a duplicate task on another disk. The same input can be processed multiple times in parallel. When most tasks in a job comes to completion, the speculative execution mechanism schedules duplicate copies of remaining tasks (which are slower) across the nodes that are free currently. When these tasks finish, it is intimated to the JobTracker. If other copies are executing speculatively, Hadoop notifies the TaskTrackers to quit those tasks and reject their output. Speculative execution is by default true in Hadoop. To disable, set mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution JobConf options to false.

15)    What happens if you try to run a Hadoop job with an output directory that is already present?
It will throw an exception saying that the output file directory already exists.
To run the MapReduce job, you need to ensure that the output directory does not exist before in the HDFS.
To delete the directory before running the job, you can use shell:Hadoop fs –rmr /path/to/your/output/Or via the Java API: FileSystem.getlocal(conf).delete(outputDir, true);

16)    How can you debug Hadoop code?
First, check the list of MapReduce jobs currently running. Next, we need to see that there are no orphaned jobs running; if yes, you need to determine the location of RM logs.
  1. Run: “ps –ef | grep –I ResourceManager”
    and look for log directory in the displayed result. Find out the job-id from the displayed list and check if there is any error message associated with that job.
  2. On the basis of RM logs, identify the worker node that was involved in execution of the task.
  3. Now, login to that node and run – “ps –ef | grep –iNodeManager”
  4. Examine the Node Manager log. The majority of errors come from user level logs for each map-reduce job.

17)   How to configure Replication Factor in HDFS?
hdfs-site.xml is used to configure HDFS. Changing the dfs.replication property in hdfs-site.xml will change the default replication for all files placed in HDFS.
You can also modify the replication factor on a per-file basis using the
Hadoop FS Shell:[training@localhost ~]$ hadoopfs –setrep –w 3 /my/fileConversely,
you can also change the replication factor of all the files under a directory.
[training@localhost ~]$ hadoopfs –setrep –w 3 -R /my/dir
Go through Hadoop Training to learn about Replication Factor In HDFS now!

18)    How to compress mapper output but not the reducer output?
To achieve this compression, you should set:
conf.set("mapreduce.map.output.compress", true)
conf.set("mapreduce.output.fileoutputformat.compress", false)

19)    What is the difference between Map Side join and Reduce Side Join?
Map side Join at map side is performed data reaches the map. You need a strict structure for defining map side join. On the other hand, Reduce side Join (Repartitioned Join) is simpler than map side join since the input datasets need not be structured. However, it is less efficient as it will have to go through sort and shuffle phases, coming with network overheads.

20)    How can you transfer data from Hive to HDFS?
By writing the query:
hive> insert overwrite directory '/' select * from emp;
You can write your query for the data you want to import from Hive to HDFS. The output you receive will be stored in part files in the specified HDFS path.

21)   What companies use Hadoop, any idea?
 Yahoo! (the biggest contributor to the creation of Hadoop) – Yahoo search engine uses Hadoop, Facebook – Developed Hive for analysis,Amazon,Netflix,Adobe,eBay,Spotify,Twitter,Adobe.

22)    In Hadoop what is InputSplit?
It splits input files into chunks and assign each split to a mapper for processing.

23)    Mention Hadoop core components?
Hadoop core components include,
  • HDFS
  • MapReduce

24)    What is NameNode in Hadoop?
NameNode in Hadoop is where Hadoop stores all the file location information in HDFS. It is the master node on which job tracker runs and consists of metadata.

25)    Mention what are the data components used by Hadoop?
Data components used by Hadoop are
  • Pig
  • Hive