HiBench 7.x 使用问题整理

小编 2026-06-16 阅读:514 评论:0
一. 介绍 HiBench是一款用于hadoop集群性能测试的开源工具。支持MR,HIVE,SPARK等计算框架,且支持多种维度的测试。早在HiBench5.0的时候作者就用过,个人感觉还不错,无论是ap...

一. 介绍

HiBench是一款用于hadoop集群性能测试的开源工具。支持MR,HIVE,SPARK等计算框架,且支持多种维度的测试。早在HiBench5.0的时候作者就用过,个人感觉还不错,无论是apache hadoop,cdh还是hdp都可以支持。

最新发现,HiBench 7.x 版本发布了。因此就趁着机会,结合新版本总结一下使用中遇到的问题,助大家过坑。

HiBench的github地址如下:https://github.com/intel-hadoop/HiBench

 

环境:

OS:CentOS 7.3

Hadoop: HDP 2.6.5.x

HiBench: 7.1

 

二. 安装

安装方式:

使用maven命令即可,具体请见:https://github.com/intel-hadoop/HiBench/blob/master/docs/build-hibench.md

 

问题整理:

maven构建问题:

问题描述:

maven构建过程中报以下错误(以mahout为例):

INFO] --- download-maven-plugin:1.2.0:wget (default) @ mahout ---
Downloading: http://archive.apache.org/dist/mahout/0.11.0//apache-mahout-distribution-0.11.0.tar.gz
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
	at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
	at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
	at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
	at com.googlecode.WGet.doGet(WGet.java:293)
	at com.googlecode.WGet.execute(WGet.java:223)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
	at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
	at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
	at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
	at com.googlecode.WGet.doGet(WGet.java:293)
	at com.googlecode.WGet.execute(WGet.java:223)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[WARNING] Could not get content
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
	at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
	at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
	at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
	at com.googlecode.WGet.doGet(WGet.java:293)
	at com.googlecode.WGet.execute(WGet.java:223)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[WARNING] Retrying (1 more)
Downloading: http://archive.apache.org/dist/mahout/0.11.0//apache-mahout-distribution-0.11.0.tar.gz
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
	at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
	at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
	at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
	at com.googlecode.WGet.doGet(WGet.java:293)
	at com.googlecode.WGet.execute(WGet.java:223)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
	at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
	at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
	at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
	at com.googlecode.WGet.doGet(WGet.java:293)
	at com.googlecode.WGet.execute(WGet.java:223)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[WARNING] Could not get content
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)
	at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)
	at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)
	at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)
	at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)
	at com.googlecode.WGet.doGet(WGet.java:293)
	at com.googlecode.WGet.execute(WGet.java:223)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51
	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)

层主在构建\"hadoopbench-sql\",\"mahout\"和\"nutchindexing\"时都遇到了类似问题。

 

问题原因:

网站访问协议更换导致的对应文件下载异常。

 

解决方法:

方式1. 将pom.xml中的\"http\"修改为\"https\",例如:

<properties>
    <repo>https://archive.apache.org</repo>
    <file>dist/hive/hive-0.14.0/apache-hive-0.14.0-bin.tar.gz</file>
</properties>

方式2. 将对应文件在\"https://archive.apache.org/dist\"此源中下载至本地http,再修改pom.xml中的文件路径,例如:

<properties>
    <repo>http://private.repo.io/archive.apache.org</repo>
    <file>dist/hive/hive-0.14.0/apache-hive-0.14.0-bin.tar.gz</file>
</properties>

第一种方式最简单。如果网速不给力,请用第二种方法。

 

三. 使用

使用方式:

具体不同的测试请参见:https://github.com/intel-hadoop/HiBench/blob/master/docs/ 中的\"run-*.md\"

 

问题整理:

===================================================================

权限问题:

问题描述:

执行HiBench命令报错:

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode=\"/var/tmp\":hdfs:hdfs:drwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:353)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:325)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:246)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1950)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1934)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1917)
	at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4185)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1109)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:645)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
	at org.apache.hadoop.ipc.Client.call(Client.java:1498)
	at org.apache.hadoop.ipc.Client.call(Client.java:1398)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
	at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:610)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:290)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:202)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:184)
	at com.sun.proxy.$Proxy15.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3099)
	... 21 more

问题原因:

HiBench在使用过程中,会需要在HDFS下建立目录,如果默认使用root用户进行测试,且HDFS开启简单权限模式的话(hdfs为超级用户),自然会权限限制。

 

解决方法:

预先创建以下目录,并修改权限(暴力点可以直接给777,或是确定进行测试用户具备相应目录的写权限):

注:部分目录可通过conf/hibench.conf进行修改,以下目录参考是以默认配置文件为基准的。

/HiBench
/var
/user/root # root是作者用来测试的用户

===================================================================

sparkbench 日志没有输出到job history的问题:

问题描述:

使用默认的HiBench的运行命令执行spark作业测试,例如:

bin/workloads/sql/scan/spark/run.sh

该spark作业的日志将不会出现在job history server中。

 

问题原因:

HiBench的作业默认使用的是自生成的spark.conf而不是使用实际环境下的spark-defaults.conf。

 

解决方法:

首先,根据实际环境下的spark-defaults.conf找到以下用于规定日志输出的三个参数:

spark.eventLog.enabled
spark.yarn.historyServer.address
spark.eventLog.dir

将上述三个参数拼接成用于spark-submit的conf信息:

--conf \"spark.eventLog.enabled=true\" --conf \"spark.yarn.historyServer.address=node1.com:18081\" --conf \"spark.eventLog.dir=hdfs:///spark2-history/\" 

修改HiBench的\"bin/functions/workload_functions.sh\"文件,将上述conf信息添加至提交命令中,大概在216行(修改前请备份):

修改前:

if [[ \"$CLS\" == *.py ]]; then
        LIB_JARS=\"$LIB_JARS --jars ${SPARKBENCH_JAR}\"
        SUBMIT_CMD=\"${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --properties-file ${SPARK_PROP_CONF} --master ${SPARK_MASTER} ${YARN_OPTS} ${CLS} $@\"
else        
        SUBMIT_CMD=\"${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --properties-file ${SPARK_PROP_CONF} --class ${CLS} --master ${SPARK_MASTER} ${YARN_OPTS} ${SPARKBENCH_JAR} $@\"
fi

修改后:

if [[ \"$CLS\" == *.py ]]; then
        LIB_JARS=\"$LIB_JARS --jars ${SPARKBENCH_JAR}\"
        SUBMIT_CMD=\"${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --conf \"spark.eventLog.enabled=true\" --conf \"spark.yarn.historyServer.address=master1.com:18081\" --conf \"spark.eventLog.dir=hdfs:///spark2-history/\" --properties-file ${SPARK_PROP_CONF} --master ${SPARK_MASTER} ${YARN_OPTS} ${CLS} $@\"
else
        SUBMIT_CMD=\"${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --conf \"spark.eventLog.enabled=true\" --conf \"spark.yarn.historyServer.address=master1.com:18081\" --conf \"spark.eventLog.dir=hdfs:///spark2-history/\" --properties-file ${SPARK_PROP_CONF} --class ${CLS} --master ${SPARK_MASTER} ${YARN_OPTS} ${SPARKBENCH_JAR} $@\"
fi

===================================================================

Streaming原始数据由HDFS向Kafka输入失败的问题:

问题描述:

执行以下命令将在HDFS生成的种子数据发往Kafka:

bin/workloads/streaming/identity/prepare/dataGen.sh

发生异常:

====================================================
log4j:WARN No appenders could be found for logger (org.apache.kafka.clients.producer.ProducerConfig).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread \"pool-1-thread-1\" java.lang.IllegalArgumentException: java.net.UnknownHostException: mycluster
        at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
        at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:240)
        at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:144)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:579)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:524)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
        at com.intel.hibench.datagen.streaming.util.SourceFileReader.getReader(SourceFileReader.java:33)
        at com.intel.hibench.datagen.streaming.util.CachedData.<init>(CachedData.java:56)
        at com.intel.hibench.datagen.streaming.util.CachedData.getInstance(CachedData.java:43)
        at com.intel.hibench.datagen.streaming.util.KafkaSender.<init>(KafkaSender.java:55)
        at com.intel.hibench.datagen.streaming.DataGenerator$DataGeneratorJob.run(DataGenerator.java:116)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: mycluster
        ... 20 more

 

问题原因:

作者本人的hadoop集群的HDFS是采用高可用部署的,namenode为active/standby模式,拥有一个namespace叫\"mycluster\",因此默认在HiBench的conf/hadoop.conf中的配置信息为:

# The root HDFS path to store HiBench data
hibench.hdfs.master       hdfs://mycluster

 

具体原因应该出自数据传输的实现,请见\"com.intel.hibench.datagen.streaming.DataGenerator\"。目测是该实现不支持namespace的访问方式,本人没有细究,有兴趣的可以去看下上述类的源码。

 

解决方法:

将上述conf/hadoop.conf中的配置信息修改为指定的namenode信息即可,例如:

# The root HDFS path to store HiBench data
hibench.hdfs.master       hdfs://node1.com:8020

===================================================================

其它:

SequenceFile格式化:

很多默认的HiBench的测试结果文件都是以SequenceFile进行输出的(mahout必须是SequenceFile),具体请见各个作业的执行脚本,例如Wordcount的run.sh(截取部分):

run_hadoop_job ${HADOOP_EXAMPLES_JAR} wordcount \\
    -D mapreduce.job.maps=${NUM_MAPS} \\
    -D mapreduce.job.reduces=${NUM_REDS} \\
    -D mapreduce.inputformat.class=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat \\
    -D mapreduce.outputformat.class=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat \\
    -D mapreduce.job.inputformat.class=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat \\
    -D mapreduce.job.outputformat.class=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat \\
    ${INPUT_HDFS} ${OUTPUT_HDFS}
END_TIME=`timestamp`

为了方便阅读或是结果数据的后续使用,可通过如下方式进行转化:

使用FS SHELL进行文本输出:

sudo -u hdfs hadoop fs -text /path/xxx

使用MR(转载)实现文件转换:

https://blog.csdn.net/sinat_29508201/article/details/49155127

使用Spark(pyspark):

# SparkSession available as \'spark\'
reader=sc.sequenceFile(\"xxxx\",keyClass=\"org.apache.hadoop.io.Text\",valueClass=\"org.apache.hadoop.io.IntWritable\")

# 直接打印出来
def out(x):
    print x
reader.foreach(out)

# 转为sql
df=spark.read.json(reader.map(lambda x: x))

 

版权声明

本文仅代表作者观点,不代表百度立场。
本文系作者授权百度百家发表,未经许可,不得转载。

热门文章
  • 机房智能化温湿度解决方式之POE供电以太网温湿度传感器

    机房智能化温湿度解决方式之POE供电以太网温湿度传感器
    机房智能化温湿度解决方式之POE供电以太网温湿度传感器 北京盈创力和电子科技有限公司 智能型TCP网口温湿度记录仪 北京IP网络温湿度记录仪厂家,北京盈创力和 北京智能型TCP网口温湿度记录仪IP网络温湿度记录仪是一种新型的基于TCP/IP协议双绞线以太网标准温湿度采集模块,利用它可以实现现场温度值、相对湿度值的采集,同时利用其自身的RJ45通信接口可以方便地和机房监控主机或交换机集线器进行联网。 工作于-40℃~85℃工业级带...
  • Sequential Monte Carlo Methods (SMC) 序列蒙特卡洛/粒子滤波/Bootstrap Filtering

    Sequential Monte Carlo Methods (SMC) 序列蒙特卡洛/粒子滤波/Bootstrap Filtering
    Problem Statement 我们考虑一个具有马尔可夫性质、非线性、非高斯的状态空间模型(State Space Model):对于一个时间序列上的观测结果{yt,t∈N}\\{ y_t , t \\in N \\}{yt​,t∈N},我们认为每个观测结果yty_tyt​的生成依赖于一个无法直接观察的隐变量xt∈{xt,t∈N}x_t \\in \\{x_t , t \\in N \\}xt​∈{xt​,t∈N},即:p(...
  • HTTP状态保持的原理

    HTTP状态保持的原理
    a)在用户登录之后,浏览器返回响应的时候会在响应中添加上cookieb)浏览器接收到cookie之后会自动保存c)当用户再次请求同一服务器中的其他网页的时候,浏览器会自动带上之前保存的cookied)服务接收到请求之后可以请 request 对象中取到cookie 判断当前用户是否登录  Http是无状态的,就是连接时数据互通,关闭后...
  • Hive 系统函数及示例

    Hive 系统函数及示例
    查看所有系统函数 show functions; 函数分类 内置函数【系统函数】 数学函数: floor、round、ceil、cos、log2等 字符串函数: length、reverse、trim、lower、get_json_object、repeat等 收集函数: size 转换函数: cast 日期函数: year、month、datediff、date、date_add等 条件函数: coalesce、case…w...
  • CSRF的原理和防范措施

    CSRF的原理和防范措施
    a)攻击原理:i.用户C访问正常网站A时进行登录,浏览器保存A的cookieii.用户C再访问攻击网站B,网站B上有某个隐藏的链接或者图片标签会自动请求网站A的URL地址,例如表单提交,传指定的参数iii.而攻击网站B在访问网站A的时候,浏览器会自动带上网站A的cookieiv.所以网站A在接收到请求之后可判断当前用户是登录状态,所以...
标签列表