目录
1 需求
2 HDFS参数调优
(1)修改:hadoop-env.sh
(2)修改hdfs-site.xml
(3)修改core-site.xml
(4)分发配置
3 MapReduce参数调优
(1)修改mapred-site.xml
(2)分发配置
4 Yarn参数调优
(1)修改yarn-site.xml配置参数如下
(2)分发配置
5 执行程序
(1)重启集群
(2)执行WordCount程序
(3)观察Yarn任务执行页面
(1)需求:从1G数据中,统计每个单词出现次数。服务器3台,每台配置4G内存,4核CPU,4线程。
(2)需求分析:
1G / 128m = 8个MapTask;1个ReduceTask;1个mrAppMaster
平均每个节点运行10个 / 3台 ≈ 3个任务(4 3 3)
export HDFS_NAMENODE_OPTS="-Dhadoop.security.logger=INFO,RFAS -Xmx1024m"export HDFS_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS -Xmx1024m"
dfs.namenode.handler.count 21
fs.trash.interval 60
xsync hadoop-env.sh hdfs-site.xml core-site.xml
mapreduce.task.io.sort.mb 100 mapreduce.map.sort.spill.percent 0.80 mapreduce.task.io.sort.factor 10 mapreduce.map.memory.mb -1 The amount of memory to request from the scheduler for each map task. If this is not specified or is non-positive, it is inferred from mapreduce.map.java.opts and mapreduce.job.heap.memory-mb.ratio. If java-opts are also not specified, we set it to 1024. mapreduce.map.cpu.vcores 1 mapreduce.map.maxattempts 4 mapreduce.reduce.shuffle.parallelcopies 5 mapreduce.reduce.shuffle.input.buffer.percent 0.70 mapreduce.reduce.shuffle.merge.percent 0.66 mapreduce.reduce.memory.mb -1 The amount of memory to request from the scheduler for each reduce task. If this is not specified or is non-positive, it is inferredfrom mapreduce.reduce.java.opts and mapreduce.job.heap.memory-mb.ratio.If java-opts are also not specified, we set it to 1024. mapreduce.reduce.cpu.vcores 2 mapreduce.reduce.maxattempts 4 mapreduce.job.reduce.slowstart.completedmaps 0.05 mapreduce.task.timeout 600000
xsync mapred-site.xml
The class to use as the resource scheduler. yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler Number of threads to handle scheduler interface. yarn.resourcemanager.scheduler.client.thread-count 8 Enable auto-detection of node capabilities such asmemory and CPU. yarn.nodemanager.resource.detect-hardware-capabilities false Flag to determine if logical processors(such ashyperthreads) should be counted as cores. Only applicable on Linuxwhen yarn.nodemanager.resource.cpu-vcores is set to -1 andyarn.nodemanager.resource.detect-hardware-capabilities is true. yarn.nodemanager.resource.count-logical-processors-as-cores false Multiplier to determine how to convert phyiscal cores tovcores. This value is used if yarn.nodemanager.resource.cpu-vcoresis set to -1(which implies auto-calculate vcores) andyarn.nodemanager.resource.detect-hardware-capabilities is set to true. The number of vcores will be calculated as number of CPUs * multiplier. yarn.nodemanager.resource.pcores-vcores-multiplier 1.0 Amount of physical memory, in MB, that can be allocatedfor containers. If set to -1 andyarn.nodemanager.resource.detect-hardware-capabilities is true, it isautomatically calculated(in case of Windows and Linux).In other cases, the default is 8192MB. yarn.nodemanager.resource.memory-mb 4096 Number of vcores that can be allocatedfor containers. This is used by the RM scheduler when allocatingresources for containers. This is not used to limit the number ofCPUs used by YARN containers. If it is set to -1 andyarn.nodemanager.resource.detect-hardware-capabilities is true, it isautomatically determined from the hardware in case of Windows and Linux.In other cases, number of vcores is 8 by default. yarn.nodemanager.resource.cpu-vcores 4 The minimum allocation for every container request at the RM in MBs. Memory requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have less memory than this value will be shut down by the resource manager. yarn.scheduler.minimum-allocation-mb 1024 The maximum allocation for every container request at the RM in MBs. Memory requests higher than this will throw an InvalidResourceRequestException. yarn.scheduler.maximum-allocation-mb 2048 The minimum allocation for every container request at the RM in terms of virtual CPU cores. Requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have fewer virtual cores than this value will be shut down by the resource manager. yarn.scheduler.minimum-allocation-vcores 1 The maximum allocation for every container request at the RM in terms of virtual CPU cores. Requests higher than this will throw anInvalidResourceRequestException. yarn.scheduler.maximum-allocation-vcores 2 Whether virtual memory limits will be enforced forcontainers. yarn.nodemanager.vmem-check-enabled false Ratio between virtual memory to physical memory when setting memory limits for containers. Container allocations are expressed in terms of physical memory, and virtual memory usage is allowed to exceed this allocation by this ratio. yarn.nodemanager.vmem-pmem-ratio 2.1
xsync yarn-site.xml
sbin/stop-yarn.shsbin/start-yarn.sh
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input /output
http://hadoop103:8088/cluster/apps
上一篇:Linux并发与竞争
下一篇:hadoop 集群搭建(详细版)