Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
Hadoop and HDFS

Hadoop And Hdfs

by SandipanSarma, Feb. 2019

Subjects: Hadoop and HDFS

Favorite

Add to folder

Flag

Related Essays

Nt1310 Unit 4 Assignment 4 Analysis
Abhinav Chittineni 1170709 Assignment 2-Usage of Map Reduce Governors State University Introduction: MapReduce is one of the main topic in t...
Nt1330 Unit 1 Paper
A Distributed File System (DFS) is being used as it will provide a centralize location for the data as well as its ability to be expanded easily as more file...
Sequenom Case Summary
Big data with Hadoop deployment recommended solution today only addresses part of the issue, and have done so by increasing knowledgeable IT staff for the te...
Mapreduce Rendezvous Hasing Based Virtual Hierarchies: The Cassandra Nosql Case Study
The gradual transformation in data quantity has resulted in emergence of the Big data and immense datasets that need to be stored. Traditional relational dat...
Cloud Computing Advantages And Disadvantages
This was the starting point of the current Big Data trend as it was a relatively cheap solution for businesses confronted with similar problems. Meanwhile, ...
Advantages And Disadvantages Of Storage Systems For Big Data
Storage Systems for Big Data Internet age comes with the vast amount of data that requires efficient storage and processing capabilities. To alleviate this ...
Carlson San Approach Case Study
Carlson had problem with it servers having to have separate disks for each server. SAN solves this problem by dividing the central storage more evenly. In ...
Examples Of Satisfiability Analysis
In order to reduce the computation time and decreases the storage space the workload is distributed on two or more computers. MapReduce is recent programming...
Netflix Case Study: Netflix
CASE OVERVIEW Netflix, a market leader in the US DVD-rental segment and video streaming segment had gained much of its popularity owing to its personalized ...
Predictive Analysis: Unknown Future Scenarios
The procedure starts with the collection of raw data and pass it through Hadoop which cleans up the data and use for the predetermined jobs like BI and in th...

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/14

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

14 Cards in this Set

Front
Back

	Problems in traditional computing	Storing huge data, storing heterogeneous data, fast processing
	Solutions by Hadoop	Data stored distributed in datanodes, can store variety of data, move CPU processing logic to all nodes and process parallely
	Hadoop definition	Open source software framework used for storing and processing BD in a distributed manner on large clusters of commodity hardware
	Features of Hadoop	Reliable, flexible, scalable, economical
	HDFS definition	Java based distributed file system that allows to store BD across multiple nodes in a Hadoop cluster. Provided as a storage service.
	Advantages of HDFS	Distributed storage, distributed and parallel computation, horizontal scalability (no added hardware; add new nodes to existing cluster on the go)
	HDFS architecture
	HDFS architecture definition	HDFS is block structured FS where each file divided into predetermined sized blocks kept in one or more datanodes
	Namenode or Master node	Maintains and manages the slave nodes; highly available server that manages FS namespace and access to files by clients. Never contains user data. Records metadata of all files in the cluster, and modifications to it. Records all blocks in HDFS and their location. Takes care of replication factor of blocks. Regularly receives heartbeat and block report from all datanodes to ensure they are alive. In case datanode fails, chooses other datanode as replica and balances disk storage.
	Datanode (a commodity hardware)	A block server that stores data in its local file. Slave daemons that run on each slave machine. Store actual data. Perform low level read-write requests from clients. Sends heartbeat to namenode periodically.
	Secondary node ( checkpoint node)	Works as helper daemon to namenode. Performs regular checkpoints on HDFS. Reads all metadata from RAM of namenodes and writes to HDD.
	Blocks	Smallest continuous location on HDD where data is stored. In any FS, file is stored as a collection of blocks. Block size is generally kept high in HDFS because data is huge (in terms of PB or YB).
	Replication management	Blocks are replicated to provide fault tolerance. Default=3. Namenodes control replication factor; if a block is over or under replicated, it adds or deletes as needed.
	HDFS architecture overall

Share This Flashcard Set

Set the Language

Hadoop And Hdfs

Add to Folders

Upgrade to Cram Premium

Related Essays

Card Range To Study

14 Cards in this Set