Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
20 Cards in this Set
- Front
- Back
Test disk I/O speed |
Command to test disk I/O speed hdparm -t Speed should be 70MB/sec or more. Anything less is an indication of problem. |
|
OS parameters |
Set vm.sawppiness to 0 in /etc/sysctl.conf Use ext3/etx4 filesystem, Recommended ext4 Increase ulimit for mapred and hdfs user to atleast 32k. Recommended 64k (/etc/security/limits.conf) Disable IPv6 Disable SELinux Install and Configure NTP daemon
|
|
Unix user accounts |
The HDFS, MapReduce, and YARN services are usually run as separate users, named hdfs, mapred, and yarn, respectively. They all belong to the same hadoop group |
|
Formatting HDFS filesystem |
The formatting process creates an empty filesystem by creating the storage directories and the initial versions of the namenode’s persistent data structures. Datanodes are not involved in the initial formatting process, since the namenode manages all of the filesystem’s metadata, and datanodes can join or leave the cluster dynamically. |
|
start-dfs.sh |
As hdfs user Starts a namenode on each machine returned by executing hdfs getconf -namenodes Starts a datanode on each machine listed in the slaves file Starts a secondary namenode on each machine returned by executing hdfs getconf -secondarynamenodes |
|
start-yarn.sh |
As yarn user Starts a resource manager on the local machine Starts a node manager on each machine listed in the slaves file |
|
Environment variables |
HADOOP_CLASSPATH hadoop-env.sh HADOOP_HEAPSIZE hadoop-env.sh JAVA_HOME hadoop-env.sh HADOOP_NAMENODE_OPTS hadoop-env.sh HADOOP_LOG_DIR hadoop-env.sh HADOOP_IDENT_STRING hadoop-env.sh HADOOP_SSH_OPTS hadoop-env.sh YARN_RESOURCEMANAGER_HEAPSIZE yarn-env.sh |
|
Important HDFS daemon properties |
fs.defaultFS core-site.xml dfs.namenode.name.dir hdfs-site.xml dfs.datanode.name.dir hdfs-site.xml dfs.namenode.checkpoint.dir hdfs-site.xml |
|
Importnat YARN daemon properties |
yarn-site.xml yarn.resourcemanger.hostname yarn.resourcemanger.address ($y.rm.hostname:8032) yarn.nodemanager.local-dirs yarn.nodemanager.aux-services yarn.nodemanager.resource.memory-mb (8192) yarn.nodemanager.resource.cpu-vcores (8) yarn.nodemanager.vmem-pmem-ratio (2.1) |
|
Mapreduce job memory/cpu properties |
mapreduce.map.memory.mb (1024) mapreduce.reduce.memory.mb (1024) mapred.child.java.opts (-Xmx200m) mapreduce.map.java.opts (-Xmx200m) mapreduce.reduce.java.opts (-Xmx200m) mapreduce.map.cpu.vcores (1) mapreduce.reduce.cpu.vcores (1) |
|
Default RPC Ports |
Namenode 8020 Datanode 50020 Job History 10020 Resource Manager 8032 Resource Manager Admin 8033 Resource Manager Scheduler 8030 Resource Manager resource tracker 8031 Node Manager 0 Node Manager localizer 8040 |
|
Default http ports |
Namenode 50070 Seconday Namenode 50090 Datanode 50075 Jobhistory 19888 mapreduce shuffle 13562 Resource Manager 8088 Node Manager 8042 |
|
Cluster membership |
In hdfs-site.xml for datanodes dfs.hosts (include filename) dfs.hosts.exclude ( exclude filename) In yarn-site.xml for node managers yarn.resourcemanager.nodes.include-path yarn.resourcemanager.nodes.exclude-path |
|
I/O buffer size |
Default 4 kb Recommended 128 kb Set the property in bytes in core-site.xml io.file.buffer.size |
|
Block Size |
default 128 MB Set the property in bytes in hdfs-site.xml dfs.blocksize |
|
Datanode reserve storage space |
Set the property in bytes in hdfs-site.xml dfs.datanode.du.reserved |
|
Trash |
Hadoop filesystem has Trash facility fs.trash.interval in minutes default value 0 User level feature .Trash folder for every user. hadoop fs -expunge |
|
Reduce Slow start |
default 5% Setting mapreduce.job.reduce.slowstart.completedmaps to a higher value, such as 0.80 (80%), will help improve throughput. |
|
Short circuit local read |
Enable short-circuit local reads by setting dfs.client.read.shortcircuit to true. The path is set using the property dfs.domain.socket.path, and must be a path that only the datanode user (typically hdfs) or root can create, such as /var/run/hadoop-hdfs/dn_socket. |
|
Configuration Precedence |
Highest to Lowest Code CLI Client Slave Cluster Default If a value in configuration file is marked final it overrides all others. |