scala两种环境备忘

2022年6月15日21:53:32 发表评论 372 views

IDEA环境:

package wordcount

import org.apache.spark.{SparkConf, SparkContext}

object wordCountScala extends App{
val conf = new SparkConf().setAppName("Wordcount").setMaster("local");  //可删掉该setMaster
 val sc = new SparkContext(conf)

 val line = sc.textFile("D:\\win7远程\\14期大数据潭州课件\\第三阶段:实时开发(plus)\\2020-0105-Spark-SQL\\数据\\wordcount.txt")

 val result = line.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)

 result.foreach(println)
}

HDFS环境:

package wordcount

import org.apache.spark.{SparkConf, SparkContext}

object wordCountScala_HDFS extends App{
val conf = new SparkConf().setAppName("Wordcount");//可删掉该setMaster
 val sc = new SparkContext(conf)

 val line = sc.textFile("hdfs://bigdata166:9000/testdata/wordcount.txt")

 val result = line.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)

 result.foreach(println)
 result.collect()
}

 

[root@bigdata166 bin]# ./spark-submit --master spark://bigdata166:7077 --class wordcount.wordCountScala_HDFS ../testjar/scalaTest-1.0-SNAPSHOT.jar

 

其它的坑:

有时候不关sparkshell的时候会导致内存不足,控制台会提示查看UI页面。

加collect输出到日志,print输出到outstd好像

 

发表评论

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: