第12课sparkr和mllib机器学习spark源代码导读_第1页
第12课sparkr和mllib机器学习spark源代码导读_第2页
第12课sparkr和mllib机器学习spark源代码导读_第3页
第12课sparkr和mllib机器学习spark源代码导读_第4页
第12课sparkr和mllib机器学习spark源代码导读_第5页
已阅读5页,还剩31页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、第12周法律【】和幻灯片为炼数成金网络课程的教学资料,所有资料只能在课程内使用,不得在课程以外范围散播,违者将可能被法律和经济责任。课程详情炼数成金培训http:SparkR简介SparkR 例子Spark MLlibSparkR2013年9月SparkR作为一个独立项目启动。2014年1月,SparkR项目在上开源(/amplab-extras/SparkR-pkg)SparkRSparkR包和JVM后端SparkRSparkR架构SparkRR-3.1.1 编译安装查看操作系统rootfeng03 R-3.1.1# lsb_release -aLSB Ver:base-4.0-amd64:

2、base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:pring-4.0-amd64:pring-4.0-noarchDistributor ID: CentOSDescription:CentOS release 6.6 (Final)Release:6.6Codename: Final相关依赖包1. yum install gccrootfeng03 # yum install gccLoaded plugins: fastestmirror, securitySettin

3、g up Install ProsSparkRDependencies Resolved=PackageArchVerReitorySize=Installing:gccx86_644.4.7-16.el6base10 MInstalling for dependencies:cloog-pplx86_640.15.7-1.2.el6base93 kUpdating for dependencies:libgccTranx86_644.4.7-16.el6base103 kion Summary=Install5 Package(s)UpgradeInstalled:2 Package(s)g

4、cc.x86_64 0:4.4.7-16.el6Dependency Installed:cloog-ppl.x86_64 0:0.15.7-1.2.el6cpp.x86_64 0:4.4.7-16.el6mpfr.x86_64 0:2.4.1-6.el6ppl.x86_64 0:0.10.2-11.el6Complete!SparkRyum install gcc-c+yum install gcc-gfortranyum install pcre-develyum install tcl-develyum install zlib-develyum install bzip2-devely

5、um install libX11-develyum install readline-develyum install libXt-develyum install tk-develyum install tetex-latexSparkR13.解压 jifengfeng03 r$ wget HYPERLINK http:/m/ http:/m jifengfeng03 r$ tar -zxf R-3.1.1.tar.gz14.编译安装./configure -enable-R-shlib make & make install/cran/src/base/R-3/R-3.1.1.tar.g

6、zrootfeng03 R-3.1.1$ ./configure -enable-R-shlibrootfeng03 R-3.1.1# make & make install15.启动R Srootfeng03 R-3.1.1# RR ver3.1.1 (2014-07-10) - Sock it to MeCopyright (C) 2014 The R Foundation for Sistical ComputingPlatform: x86_64-unknown-linux-gnu (64-bit)SparkRSparkR安装运行R s安装rJavainstall.packages(r

7、Java)选择213.安装devtoolsinstall.packages(devtools)SparkR3.安装devtoolsinstall.packages(devtools)ONF ERRORConfiguration failed because libcurl was not found. Try installing:deb: libcurl4-openssl-dev (Debian, Ubuntu, etc)rpm: libcurl-devel (Fedora, CentOS, RHEL)csw: libcurl_dev (Solaris)If libcurl is alrea

8、dy installed, checkt pkg-config is in yourPAnd PKG_CONFIG_PATH contains a libcurl.pc file. If pkg-configis unavailable you can set INCLUDE_DIR and LIB_DIR manually via:R CMD INSTALL -configure-vars=INCLUDE_DIR=. LIB_DIR=.configure: error:OpenSSL library requiredSparkRPlease install:libssl-dev (deb)

9、or openssl-devel (rpm)See config.log for more detailsERROR: configuration failed for package git2rremoving /home/jifeng/R/x86_64-unknown-linux-gnu-library/3.1/git2rERROR: dependency curl is not available for package httrremoving /home/jifeng/R/x86_64-unknown-linux-gnu-library/3.1/httr ERROR: depende

10、ncies curl, xml2 are not available for package rvers* removing /home/jifeng/R/x86_64-unknown-linux-gnu-library/3.1/rversERROR: dependencies httr, curl, rvers, git2r are not available for package devtools* removing /home/jifeng/R/x86_64-unknown-linux-gnu-library/3.1/devtoolsThe downloaded source pack

11、ages are in/tmp/Rtmp1A16li/downloaded_packagesWarning messages:SparkR1: In install.packages(devtools) :installation of package xml2 had non-zero exit s 2: In install.packages(devtools) :installation of package curl had non-zero exit s3: In install.packages(devtools) :installation of package git2r ha

12、d non-zero exit s4: In install.packages(devtools) :installation of package httr had non-zero exit s 5: In install.packages(devtools) :ususususinstallation of package rvers had non-zero exit s6: In install.packages(devtools) :出现错误,根据提示操作,退出R s安装rootfeng03 # yum install libcurl-develrootfeng03 # yum i

13、nstall openssl-develrootfeng03 # yum install libxml2-develusSparkR进入R sinstall.packages(git2r)install.packages(xml2)install.packages(rvers)安装SparkRlibrary(devtools)install_(amplab-extras/SparkR-pkg, subdir=pkg)丌下来,安装失败SparkR安装SparkR官网地址:地址:/amplab-extras/SparkR-pkg/tarball/master jifengfeng03 r$ lsm

14、aster R-3.1.1 R-3.1.1.tar.gz jifengfeng03 r$ mv master SparkR-pkg.gz jifengfeng03 r$ lsR-3.1.1 R-3.1.1.tar.gz SparkR-pkg.gz jifengfeng03 r$ tar zxf SparkR-pkg.gz jifengfeng03 r$ lsamplab-extras-SparkR-pkg-e532627 R-3.1.1 R-3.1.1.tar.gz SparkR-pkg.gz jifengfeng03 r$ cd amplab-extras-SparkR-pkg-e53262

15、7/ jifengfeng03 amplab-extras-SparkR-pkg-e532627$ lsBUILDING.mdcreate-docs.sh examplesATION.md install-dev.bat LICENSE README.mdsparkRSparkR_prep-0.1.shinstall-dev.shpkgrun-tests.sh SparkR_IDE_Setup.sh TODO.mdSparkR安装SparkR./install-dev.sh jifengfeng03 amplab-extras-SparkR-pkg-e532627$ ./install-dev

16、.sh* installing *source* package SparkR .* libs* arch -./sbt/sbt assemblyAttempting to fetch sbt# 100.0%Launching sbt from sbt/sbt-launch-0.13.6.jarError: Invalid or corrupt jarfile sbt/sbt-launch-0.13.6.jarmake: * /scala-2.10/sparkr-assembly-0.1.jar Error 1ERROR: compilation failed for package Spar

17、kR* removing /home/jifeng/r/amplab-extras-SparkR-pkg-e532627/lib/SparkRSparkR安装SparkR jifengfeng03 amplab-extras-SparkR-pkg-e532627$cat ./install-dev.sh # Install RR CMD INSTALL -library=$LIB_DIR pkg/ jifengfeng03 sbt$ cat sbtSBT_VERURL1=URL2=http:/=awk -F = /sbt.ver/ pr$2 ./project/perties/typesafe

18、/ivy-releases/.scala-sbt/sbt-launch/$SBT_VER/sbt-launch.jar/typesafe/ivy-releases/.jar.scala-sbt/sbt-launch/$SBT_VER/sbt-launch.jarJAR=sbt/sbt-launch-$SBT_VERprf Launching sbt from $JARnjava -Xmx1200m -XX:MaxPermSize=350m -XX:-jar $JAR $修改为本机的sbt地址JAR=/home/jifeng/sbt/bin/sbt-launch.jarCodeCacheSize

19、=256m SparkR安装SparkR jifengfeng03 amplab-extras-SparkR-pkg-e532627$ ./install-dev.sh* installing *source* package SparkR .* libs* arch -./sbt/sbt assemblyLaunching sbt from /home/jifeng/sbt/bin/sbt-launch.jarGetting.scala-sbt sbt 0.13.6 .downloading https:/typesafe/ivy-releases/.scala-sbt#sbt;0.13.6

20、!sbt.jar (14210ms)/typesafe/ivy-releases/.scala-sbt#main;0.13.6!main.jar (56723ms)/typesafe/ivy-releases.scala-sbt/sbt/0.13.6/jars/sbt.jar .SUCSFUL downloading https:/.scala-sbt/main/0.13.6/jars/main.jar .SUCSFUL downloading https:/piler-erfapiler-erface-bin.jar .SUCSFUL piler-in/typesafe/ivy-releas

21、espiler-inpiler-erface-bin.jar (33548ms)downloading https:/piler-erfapiler-erfarc.jar .SUCSFUL piler-erfarc.jar (17777ms)SparkRsuccp -fs Total time: 592 s, completed Sep 30, 2015 10:15:33 PM/scala-2.10/sparkr-assembly-0.1.jar ./inst/R CMD SHLIB -o SparkR.so string_hash_code.cmake1: Entering director

22、y /home/jifeng/r/amplab-extras-SparkR-pkg-e532627/pkg/srcgcc -std=gnu99 -I/usr/local/lib64/R/include -DNDEBUG -I/usr/local/include-fpic -g -O2 -c string_hash_code.c -o string_hash_code.ogcc -std=gnu99 -shared -L/usr/local/lib64 -o SparkR.so string_hash_code.o -L/usr/local/lib64/R/lib -lRmake1: Leavi

23、ng directory /home/jifeng/r/amplab-extras-SparkR-pkg-e532627/pkg/src installing to /home/jifeng/r/amplab-extras-SparkR-pkg-e532627/lib/SparkR/libs* R* inst* preparing package for lazy loadingCreating a generic function for lapply from package base in package SparkRCreating a generic function for Fil

24、ter from package base in package SparkR* help* installing help indi* building package indi* testing if installed package can be loaded* DONE (SparkR)SparkR例子 pi./sparkR examples/pi.R localSparkR运行命令Running sparkRIf you have installed it directly from, you can include the SparkR package and then init

25、ialize a SparkContext. For example to run wilocal Spark master you can launch R and then runlibrary(SparkR)sc - sparkR.init(master=local)If you have cloned and built SparkR, you can start using it by launching the SparkR s./sparkRwithSparkR also comes with several sample programsexample:./sparkR exa

26、mples/pi.R local2he examples directory. To run one of them, use ./sparkR . ForYou cso run the unit-tests for SparkR by running./run-tests.shinstall.packages(testt)SparkRRunning sparkRSparkR DataFrames./bin/sparkR -master spark:/feng03:7077从SparkContext和SQLContext开始sc - sparkR.init()sqlContext - spar

27、kRSQL.init(sc)本地data frame构造df - createDataFrame(sqlContext, faithful)head(df)Data Sour构造people - read.df(sqlContext, file:/home/jifeng/spark-1.4.0-bin-hadoop2.6/examples/src/main/resour/people.json, json)hepreople)Schemople)SparkRSparkR DataFrames在SparkR中运行SQL查询registerTempTable(people, people)teen

28、agers = 13 AND age val features = ArrayDouble(allnum.toString.toDouble,Vectors.dense(features)allamount.toString.toDouble)parsedDollect().foreach(prln)/对数据集聚类,3个类,20次迭代,形成数据模型 注意这里会使用设置的partition数20val numClusters = 3val numIterations = 20val m= KMeans.train(parsedData, numClusters, numIterations)Sp

29、ark实例/用模型对读入的数据进行分类,并显示val result1 = sqldata.map case Row(locationid, allnum, allamount) =val features = ArrayDouble(allnum.toString.toDouble,val linevectore = Vectors.dense(features)allamount.toString.toDouble)val prediction = m.predict(linevectore)locationid + + allnum + + allamount + + prediction.collect().foreach(pr/保存文件ln)val result2 = sqldata.map case Row(locationid, allnum , allamount) =val features = ArrayDouble(allnum.toString.toDouble, allamount.toString.toDouble)val linevectore = Vectors.dense(

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论