环境 hortonworks 2.3版本,ambari2.1.1, hadoop版本2.7.1
1. 下载RHadoop相关软件包
从地址(https://cran.r-project.org/src/base/R-3/)下载R语言的tar包
我下载的是:
https://cran.r-project.org/src/base/R-3/R-3.2.3.tar.gz
https://github.com/RevolutionAnalytics/rmr2/releases/download/3.3.1/rmr2_3.3.1.tar.gz
https://github.com/RevolutionAnalytics/rhdfs/blob/master/build/rhdfs_1.0.8.tar.gz
https://github.com/RevolutionAnalytics/rhbase/blob/master/build/rhbase_1.2.1.tar.gz
2. centos6.5 上安装R
然后安装相关依赖包:
#yum install gcc-gfortran
#yum install gcc gcc-c++
#yum install readline-devel
#yum install libXt-devel
# tar xvf R-3.2.3.tar.gz
# cd R-3.2.3
# ./configure
# make
# make install
3:确认Java环境变量
RHadoop依赖于rJava包,安装rJava前确认已经配置了Java环境变量,然后进行R对jvm建立连接。
[root@dataserver R-3.2.3]# cat /etc/profile结尾添加
########################################
export JAVA_HOME=/usr/java/jdk1.7.0_79
export JRE_HOME=/usr/java/jdk1.7.0_79/jre
export PATH=/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
export CLASSPATH=.:/lib/dt.jar:/lib/tool.jar
export HADOOP_CMD=/usr/bin/hadoop
export HADOOP_STREAMING=/usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar
export HADOOP_HOME=/usr/hdp/current/hadoop-client
export JAVA_HOME JRE_HOME PATH CLASSPATH
########################################
[root@dataserver R-3.2.3]# R CMD javareconf
4:安装相关的依赖包,确保RHadoop软件包能正常使用
[root@dataserver R-3.2.3]# R
> install.packages("rJava")
> install.packages("reshape2")
> install.packages("Rcpp")
> install.packages("iterators")
> install.packages("itertools")
> install.packages("digest")
> install.packages("RJSONIO")
> install.packages("functional")
> install.packages("bitops")
> install.packages("caTools")
> quit()
或者
install.packages(c("rJava", "Rcpp", "RJSONIO", "bitops", "digest", "functional", "stringr", "plyr", "reshape2", "caTools"))
5:安装RHadoop软件包
[root@dataserver R-3.2.3]# export HADOOP_CMD=/usr/bin/hadoop
[root@dataserver R-3.2.3]# export HADOOP_STREAMING=/usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar
[root@dataserver R-3.0.2]# R CMD INSTALL rhdfs_1.0.8.tar.gz
[root@dataserver R-3.0.2]# R CMD INSTALL rmr2_3.3.1.tar.gz
[root@dataserver R-3.0.2]# R CMD INSTALL rhbase_1.2.1.tar.gz
6:使用RHadoop软件包
[root@dataserver R-3.2.3]# R
> library(rhdfs)
> hdfs.init()
> hdfs.ls("/")
[root@dataserver R-3.2.3]# export HADOOP_HOME=/usr/hdp/current/hadoop-client
> library(rmr2)
普通的R语言程序:
> small.ints = 1:10
> sapply(small.ints, function(x) x^2)
MapReduce的R语言程序:
> small.ints = to.dfs(1:10)
> mapreduce(input = small.ints, map = function(k, v) cbind(v, v^2))
> from.dfs("/tmp/RtmpWnzxl4/file5deb791fcbd5")
如果出现如下异常:
Caused by: java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:248) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 24 more
需要做个链接:
ln -s /usr/local/bin/Rscript /usr/bin/Rscript
如果在centos7上安装R就简单多了:
步骤如下:
yum install epel-release
yum install R