admin 管理员组

文章数量: 887044

SparkSql

2020.12.08号作业题

1.启动redis

redis-server /usr/local/redis/redis.conf

2.写代码

问题1.计算出总的成交量总额(结果保存到redis中)
问题2.计算每个商品分类的成交量(结果保存到redis中)
问题3.计算每个省份的成交总额(结果保存到redis)

object Work {def main(args: Array[String]): Unit = {Logger.getLogger("org").setLevel(Level.WARN)//1.连接redisval pool: JedisPool = new JedisPool(new GenericObjectPoolConfig, "qianfeng01", 6379)val jedis: Jedis = pool.getResourcejedis.auth("123456")//2.拿到数据val spark: SparkSession = SparkSession.builder().appName("work").master("local").getOrCreate()import spark.implicits._val df: DataFrame = spark.read.format("csv").load("data/producer.csv")val df1: DataFrame = df.toDF("id", "ip", "producer", "type", "price","province")df1.printSchema()df1.createTempView("t1")//问题1.计算出总的成交量总额(结果保存到redis中)val sql="""|select sum(cast(price as int)) as sum|from t1|""".stripMargin//spark.sql(sql).show()val frame: DataFrame = spark.sql(sql)val rows: Array[Row] = frame.collect()for(a<-rows){//保存到redisjedis.set("总的成交量总额",a.get(0).toString)}//问题2.计算每个商品分类的成交量(结果保存到redis中)val sql1="""|select producer,sum(cast(price as int)) as sum|from t1|group by producer|""".stripMarginval frame1: DataFrame = spark.sql(sql1)frame1.show()val rows1: Array[Row] = frame1.collect()for(b<-rows1){//保存到redisjedis.set(b.get(0).toString,b.get(1).toString)}//问题3.计算每个省份的成交总额(结果保存到redis)val sql2="""|select province,sum(cast(price as int)) as sum|from t1|group by province|""".stripMarginval frame2: DataFrame = spark.sql(sql2)frame2.show()val rows2: Array[Row] = frame2.collect()for(row<-rows2){//保存到redisjedis.set(row.get(0).toString,row.get(1).toString)}}}

3.参考数据

A0001,202.106.196.115,手机,iphone8,8000,海南省
A0002,202.106.196.116,服装,Tshirt,450,湖南省
A0003,202.106.196.117,药品,阿莫西林,40,广东省
A0004,202.106.196.118,药品,板蓝根,23,湖北省
A0005,202.106.196.119,手机,iphone9,8000,海南省
A0006,202.106.196.120,服装,Tshirt,320,海南省
A0007,202.106.196.121,药品,阿莫西林,40,湖北省
A0008,202.106.196.122,药品,板蓝根,23,湖北省
A0009,202.106.196.123,手机,iphone10,8000,湖北省
A0010,202.106.196.124,服装,Tshirt,450,湖北省
A0011,202.106.196.125,药品,阿莫西林,40,湖北省
A0012,202.106.196.126,药品,板蓝根,23,广东省
A0013,202.106.196.127,手机,iphone11,8000,湖南省
A0014,202.106.196.128,服装,Tshirt,450,湖南省
A0015,202.106.196.129,药品,阿莫西林,40,湖南省
A0016,202.106.196.130,药品,板蓝根,23,湖南省
A0017,202.106.196.131,手机,iphone12,9999,广东省
A0018,202.106.196.132,服装,Tshirt,340,湖南省

4.pom坐标

   <dependencies><dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming_2.11</artifactId><version>2.2.3</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming-kafka-0-10_2.11</artifactId><version>2.2.3</version></dependency><dependency><groupId>redis.clients</groupId><artifactId>jedis</artifactId><version>3.0.0</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.11</artifactId><version>2.2.3</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_2.11</artifactId><version>2.2.3</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-hive_2.11</artifactId><version>2.2.3</version></dependency><dependency><groupId>mysql</groupId><artifactId>mysql-connector-java</artifactId><version>5.1.28</version></dependency></dependencies>

本文标签: SparkSql