admin 管理员组

文章数量: 887007

hive sql常用题目测试

作业题

1、找出全部夺得3连贯的队伍

team,year
活塞,1990
公牛,1991
公牛,1992
公牛,1993
火箭,1994
火箭,1995
公牛,1996
公牛,1997
公牛,1998
马刺,1999
湖人,2000
湖人,2001
湖人,2002
马刺,2003
活塞,2004
马刺,2005
热火,2006
马刺,2007
凯尔特人,2008
湖人,2009
湖人,2010create table t1(
team string,
year int
)row format delimited fields terminated by ',';load data local inpath "/root/data/t1.dat" into table t1;

解题思路
– 连续值的求解,面试中常见的问题。这也是同一类,基本都可按照以下思路进行
– 1、使用 row_number 在组内给数据编号(rownum)
– 2、某个值(这里选用年份)- rownum = rank,得到结果可以作为后面分组计算的依据
– 3、根据求得的rank,作为分组条件,求终结果

with tmp as (select team ,year, year-row_number() over(partition by team order by year) as rank from t1
)
select distinct(team),numbers from(select team ,rank,count(*) numbers from tmp group by team,rank having numbers=3 ) tmp1 

结果

2、找出每个id在在一天之内所有的波峰与波谷值

id,time,price
sh66688,9:35,29.48
sh66688,9:40,28.72
sh66688,9:45,27.74
sh66688,9:50,26.75
sh66688,9:55,27.13
sh66688,10:00,26.30
sh66688,10:05,27.09
sh66688,10:10,26.46
sh66688,10:15,26.11
sh66688,10:20,26.88
sh66688,10:25,27.49
sh66688,10:30,26.70
sh66688,10:35,27.57
sh66688,10:40,28.26
sh66688,10:45,28.03
sh66688,10:50,27.36
sh66688,10:55,26.48
sh66688,11:00,27.41
sh66688,11:05,26.70
sh66688,11:10,27.35
sh66688,11:15,27.35
sh66688,11:20,26.63
sh66688,11:25,26.35
sh66688,11:30,26.81
sh66688,13:00,29.45
sh66688,13:05,29.41
sh66688,13:10,29.10
sh66688,13:15,28.24
sh66688,13:20,28.20
sh66688,13:25,28.59
sh66688,13:30,29.49
sh66688,13:35,30.45
sh66688,13:40,30.31
sh66688,13:45,30.17
sh66688,13:50,30.55
sh66688,13:55,30.75
sh66688,14:00,30.03
sh66688,14:05,29.61
sh66688,14:10,29.96
sh66688,14:15,30.79
sh66688,14:20,29.82
sh66688,14:25,30.09
sh66688,14:30,29.61
sh66688,14:35,29.88
sh66688,14:40,30.36
sh66688,14:45,30.88
sh66688,14:50,30.73
sh66688,14:55,30.76
sh88888,9:35,67.23
sh88888,9:40,66.56
sh88888,9:45,66.73
sh88888,9:50,67.43
sh88888,9:55,67.49
sh88888,10:00,68.34
sh88888,10:05,68.13
sh88888,10:10,67.35
sh88888,10:15,68.13
sh88888,10:20,69.05
sh88888,10:25,69.82
sh88888,10:30,70.62
sh88888,10:35,70.59
sh88888,10:40,70.40
sh88888,10:45,70.29
sh88888,10:50,70.53
sh88888,10:55,70.92
sh88888,11:00,71.13
sh88888,11:05,70.24
sh88888,11:10,70.37
sh88888,11:15,69.79
sh88888,11:20,69.73
sh88888,11:25,70.52
sh88888,11:30,71.23
sh88888,13:00,72.85
sh88888,13:05,73.76
sh88888,13:10,74.72
sh88888,13:15,75.48
sh88888,13:20,75.80
sh88888,13:25,76.74
sh88888,13:30,77.22
sh88888,13:35,77.12
sh88888,13:40,76.90
sh88888,13:45,77.80
sh88888,13:50,78.75
sh88888,13:55,78.30
sh88888,14:00,78.68
sh88888,14:05,78.99
sh88888,14:10,78.35
sh88888,14:15,78.37
sh88888,14:20,78.07
sh88888,14:25,78.80
sh88888,14:30,79.78
sh88888,14:35,79.72
sh88888,14:40,80.71
sh88888,14:45,79.92
sh88888,14:50,80.49
sh88888,14:55,80.44最终结果与此类似:
id	time	price	feature
sh66688	10:05	27.09	波峰
sh66688	10:15	26.11	波谷
sh66688	10:25	27.49	波峰
sh66688	10:30	26.7	波谷
sh66688	10:40	28.26	波峰
sh66688	10:55	26.48	波谷
sh66688	11:00	27.41	波峰
sh66688	11:05	26.7	波谷create table t2(
id string,
time string,
price double
)row format delimited fields terminated by ',';load data local inpath "/root/data/t2.dat" into table t2;

解题思路:
1.求当前行与前一行数据字段的(这里取price)增量
2.当前行的增量和后面一行的增量正负号不一致时,会出现波峰和波谷
波峰出现在,当前一行的增量为正,后面一行的增量为负。
波谷出现在,当前一行的增量为负,后面一行的增量为正。
3.注意时间的排序要求,可以将时间字符转换成数字类型后进行排序
4.对于第一行不论第二行的增量为正还是为负,第一行都是波峰或波谷。因此在对第一行求增量的时候,该增量要与第二行的增量相反。

with tmp as (
select id,(case when length(time)=4 then concat('0',time) else time end) time,price,(price-lag(price) over(partition by id order by cast(replace(time,":","") as bigint))) priceMinusbefore
from t2
)
select id,time,price,value from (select id,time,price,priceMinusbefore,afterPriceMinusbefore,(case when isnull(priceMinusbefore) then (case when (-(lead(priceMinusbefore) over(partition by id order by cast(replace(time,":","") as bigint))))>0 then "bofeng" else "bogu" end)else (case  when priceMinusbefore>0 and afterPriceMinusbefore<0 then "bofeng"when priceMinusbefore<0 and afterPriceMinusbefore>0 then "bogu" else "normal" end)end)valuefrom (select id,time,price,priceMinusbefore,lead(priceMinusbefore) over(partition by id order by cast(replace(time,":","") as bigint)) afterPriceMinusbeforefrom tmp)tmp1)tmp2
where value in("bofeng","bogu") order by cast(replace(time,":","") as bigint)

结果

3、写SQL

3.1、每个id浏览时长、步长
3.2、如果两次浏览之间的间隔超过30分钟,认为是两个不同的浏览时间;再求每个id浏览时长、步长

备注:请仔细阅读计算规则

测试数据

id	dt	browseid
934e8bee978a42c7a8dbb4cfa8af0b4f	2020/05/28 17:02	.html?show=IEEE1FIJ3106A1H062HA
934e8bee978a42c7a8dbb4cfa8af0b4f	2020/05/28 17:23	.html?show=EC1JGEC8G3HJC82JIHCD
934e8bee978a42c7a8dbb4cfa8af0b4f	2020/05/28 17:09	.html?show=G75J62JE63JE3678G98F
934e8bee978a42c7a8dbb4cfa8af0b4f	2020/05/28 17:12	.html?show=1957CGIA1702C1J9F0GH
934e8bee978a42c7a8dbb4cfa8af0b4f	2020/05/28 17:31	.html?show=BJ9CJJ6F0GH0CDGGHCCB
934e8bee978a42c7a8dbb4cfa8af0b4f	2020/05/28 17:34	.html?show=G5472AH6G1I61CGF9HGC
934e8bee978a42c7a8dbb4cfa8af0b4f	2020/05/28 18:46	.html?show=E771D8I4JJ0DE4DF575C
934e8bee978a42c7a8dbb4cfa8af0b4f	2020/05/28 18:13	.html?show=3EG4D1108IC3B446G2EB
934e8bee978a42c7a8dbb4cfa8af0b4f	2020/05/28 18:14	.html?show=99B480535EC2FA31DJ92
934e8bee978a42c7a8dbb4cfa8af0b4f	2020/05/28 18:11	.html?show=JGH3HJ36D7GHIEHEEFI6
934e8bee978a42c7a8dbb4cfa8af0b4f	2020/05/28 18:20	.html?show=6A6799246J9J4B6IC9HI
934e8bee978a42c7a8dbb4cfa8af0b4f	2020/05/28 18:25	.html?show=697DI68E5F133A1DD96D
934e8bee978a42c7a8dbb4cfa8af0b4f	2020/05/28 18:41	.html?show=1C72FF96F549G4A458BI
32258fe7130844399859aec54b6df5ff	2020/05/28 03:47	.html?show=A9IB685E7CJ9DIAB2244
32258fe7130844399859aec54b6df5ff	2020/05/28 03:33	.html?show=A70H86DF1EHG2E57H1HE
32258fe7130844399859aec54b6df5ff	2020/05/28 03:21	.html?show=366DJ2870404637EC19D
32258fe7130844399859aec54b6df5ff	2020/05/28 03:54	.html?show=DGCC1FH06B69I9B1GA08
32258fe7130844399859aec54b6df5ff	2020/05/28 03:48	.html?show=JAF2067192A1H53IJ00G
32258fe7130844399859aec54b6df5ff	2020/05/28 03:08	.html?show=I1J8G8075B7G5IDA326C
32258fe7130844399859aec54b6df5ff	2020/05/28 05:09	.html?show=J7BGJ4B50GFHG4FEJCB6
32258fe7130844399859aec54b6df5ff	2020/05/28 05:26	.html?show=6J9JJ89EADI7DI0H82C3
32258fe7130844399859aec54b6df5ff	2020/05/28 05:11	.html?show=6J224ECEABC7C9I62763
32258fe7130844399859aec54b6df5ff	2020/05/28 05:34	.html?show=GH06BC9D6I2G7H3D79B8
32258fe7130844399859aec54b6df5ff	2020/05/28 05:18	.html?show=H426J08J6H4JJB74HFJE
32258fe7130844399859aec54b6df5ff	2020/05/28 05:50	.html?show=0C78E264AHEADEJ26643
32258fe7130844399859aec54b6df5ff	2020/05/28 05:45	.html?show=A4702EJ6E5DJIA475AF1
de0096ad04ec4273b0462c7da7d79653	2020/05/28 07:04	.html?show=3I84DE05EH1AB6D13B3G
de0096ad04ec4273b0462c7da7d79653	2020/05/28 07:21	.html?show=CC6C7J0G326G2BJ3D179
de0096ad04ec4273b0462c7da7d79653	2020/05/28 17:48	.html?show=7HJ4BIAGHD73F49G9JJC
de0096ad04ec4273b0462c7da7d79653	2020/05/28 17:56	.html?show=H1110A1AA14H64DA876C
de0096ad04ec4273b0462c7da7d79653	2020/05/28 17:50	.html?show=664G909C9EG7JC63IB7D
de0096ad04ec4273b0462c7da7d79653	2020/05/28 07:53	.html?show=FGEC1A7I60JJDAJGAF1A
de0096ad04ec4273b0462c7da7d79653	2020/05/28 07:00	.html?show=240EB7E488G6FH0G27JF
de0096ad04ec4273b0462c7da7d79653	2020/05/28 17:48	.html?show=03FI36B82792CEBJHI29
de0096ad04ec4273b0462c7da7d79653	2020/05/28 17:14	.html?show=JHF32AB5EH58HEC2F63G
de0096ad04ec4273b0462c7da7d79653	2020/05/28 17:50	.html?show=HDCJCA8JE1BF1IFE6HF6
de0096ad04ec4273b0462c7da7d79653	2020/05/28 17:58	.html?show=I8D80BJFC3F3FEGGHA5C
de0096ad04ec4273b0462c7da7d79653	2020/05/28 17:13	.html?show=2GHE5B24F5ABC13I6EB4
de0096ad04ec4273b0462c7da7d79653	2020/05/28 17:39	.html?show=F6G8632470DAE5E760BG
de0096ad04ec4273b0462c7da7d79653	2020/05/28 07:15	.html?show=1E7F19G856JA9JD8AB9D
de0096ad04ec4273b0462c7da7d79653	2020/05/28 07:43	.html?show=5E96FFJA82E1I2BF2FEE
de0096ad04ec4273b0462c7da7d79653	2020/05/28 07:14	.html?show=89H155HCJ41H228010I5
de0096ad04ec4273b0462c7da7d79653	2020/05/28 07:36	.html?show=B38GA2D1E10EBFE8F6F7
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 08:24	.html?show=FC6FD5F45B12ABIF02GD
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 08:06	.html?show=HG4HE7CGI00A7A1F2J5F
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 08:51	.html?show=JB27071067EGBE8D060C
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 08:33	.html?show=D42J0IC234DIA481EF82
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 08:09	.html?show=H96I861CGIGIF571H2JJ
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 08:24	.html?show=H93JAJFDJH19HEF1E918
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 08:47	.html?show=FG0BGA0CFDF6270IJE32
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 08:17	.html?show=80JBIA9GFAJ76FD980AE
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 18:46	.html?show=1G50E0G0804JAJH2HBA1
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 18:00	.html?show=21DGJA045F8E64JHA0D6
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 18:08	.html?show=44EDBC5B43A444FH001C
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 18:10	.html?show=A1I8GI28GF0B14E97D64
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 18:51	.html?show=FIHCCA16AJDA32EC4332
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 18:56	.html?show=78GH99D70424B013G303
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 18:20	.html?show=F31BH181E6E8JJ4AD295
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 18:16	.html?show=47G414184H33E14DH159
307d9dce3b7f495ab8ad6033f8c54930	2020/05/28 18:10	.html?show=JJ189D6F4HD6F0E7A2AG
f5ae36c6cdda40d5954e08a2d14954a7	2020/05/28 14:51	.html?show=7DA832A31BI430197F48
f5ae36c6cdda40d5954e08a2d14954a7	2020/05/28 14:32	.html?show=ACDFI9730A2B646I0270
f5ae36c6cdda40d5954e08a2d14954a7	2020/05/28 14:11	.html?show=999AGGBH0DC2E35J097B
f5ae36c6cdda40d5954e08a2d14954a7	2020/05/28 14:32	.html?show=58IF72BB8F74ID23GE87
f5ae36c6cdda40d5954e08a2d14954a7	2020/05/28 14:37	.html?show=460DIAJ135CC950J3C77
f5ae36c6cdda40d5954e08a2d14954a7	2020/05/28 14:35	.html?show=81II8DB2J2IF8AIFJ67F
80ea80b2e5a64cbebfaf34aa797125f0	2020/05/28 03:22	.html?show=H02AJA95GBE98768ADHF
80ea80b2e5a64cbebfaf34aa797125f0	2020/05/28 03:07	.html?show=A4EDIFDEJB2J40I64F04
80ea80b2e5a64cbebfaf34aa797125f0	2020/05/28 03:56	.html?show=5I687EDH2C3A1JJAF57D
80ea80b2e5a64cbebfaf34aa797125f0	2020/05/28 03:58	.html?show=929I8CG2CDB9AE0268JI
80ea80b2e5a64cbebfaf34aa797125f0	2020/05/28 03:27	.html?show=D8C799HJ092G9I4230EH
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 02:17	.html?show=876583DAG4FIEI637F6E
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 02:46	.html?show=936BIDF2F4352A39H6FF
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 02:17	.html?show=6J2A9DDHB4787CD90134
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 02:35	.html?show=F87D3565JD253ED6FIHE
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 02:56	.html?show=DB23C4598E005CGHH06D
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 02:53	.html?show=B9B3GFIB5EIC9E32J5IJ
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 02:16	.html?show=20BBBA585JF22I953GBG
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 02:23	.html?show=IAG945B35D5DA6F1E992
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 02:52	.html?show=47BH0G2A7IGFHIH61A85
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 02:45	.html?show=IDE79JG4DI0J6508F0HH
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 22:32	.html?show=5JA75D16G22BE0H881G2
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 22:47	.html?show=F58GI58H74989HD65173
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 22:46	.html?show=G24C84DCG9FD5GFBFCEE
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 22:54	.html?show=90BIIIIA346E50A5DA67
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 22:17	.html?show=9F3AB45F1C3HAH58B8B8
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 22:16	.html?show=1J85244F2F81JCBGHH9C
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 22:56	.html?show=73E0E5J74EA8A5B8C0FD
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 22:14	.html?show=DCIJI9H51I5BHGC1587E
95273392ab1a4579914273cdd1f3a3ae	2020/05/28 22:58	.html?show=FI5EICD1F25F005J3CJG
022f86d4533740ad914f233cbd9c4430	2020/05/28 22:13	.html?show=87F3GAA5DH97H6G6I3J2
022f86d4533740ad914f233cbd9c4430	2020/05/28 22:51	.html?show=8775525F5EG213C94EE7
022f86d4533740ad914f233cbd9c4430	2020/05/28 22:33	.html?show=3HHEBFA7BA8J8BH3GGIG
022f86d4533740ad914f233cbd9c4430	2020/05/28 22:37	.html?show=BD7FE2HE8AED2F5J6818
022f86d4533740ad914f233cbd9c4430	2020/05/28 22:56	.html?show=DE8IH40GG47E096E0BE5
022f86d4533740ad914f233cbd9c4430	2020/05/28 22:38	.html?show=24AEBB54FA71D9F7JIDA
022f86d4533740ad914f233cbd9c4430	2020/05/28 22:05	.html?show=5JI210IFIJ0I707G4II7
022f86d4533740ad914f233cbd9c4430	2020/05/28 22:39	.html?show=9BCGBGHBE73IG5AF4569

建表导入数据:

-- 建表语句
create table t3(id string,dt string,browseid string
)row format delimited fields terminated by '\t';-- 导入数据
load data local inpath "/root/data/t3.dat" into table t3;

计算规则说明:

id dt browseid
id:唯一的用户id
dt:用户在这个时间点点击进入了一个页面
browseid:用户浏览了哪个页面

简化数据(以下为某个用户,在一天内的浏览记录):
1 08:20 1.html
1 08:23 2.html
1 08:24 3.html
1 08:40 4.html
1 09:33 5.html
1 09:40 6.html
1 09:30 7.html
1 09:36 8.html
1 09:37 9.html
1 09:41 a.html

3.1、每个id浏览时长、步长
用户1的浏览时长 = 09:41 - 08:20 = 81分钟
用户1的浏览步长 = count数 = 10次
3.1 解答

select id ,date_format(to_date(from_unixtime(UNIX_TIMESTAMP(dt,"yyyy/MM/dd hh:mm"))),"yyyy-MM-dd") daytime,count(*) count,(max(UNIX_TIMESTAMP(dt,"yyyy/MM/dd hh:mm"))-min(UNIX_TIMESTAMP(dt,"yyyy/MM/dd hh:mm")))/60 totalminutes
from t3 group by id , date_format(to_date(from_unixtime(UNIX_TIMESTAMP(dt,"yyyy/MM/dd hh:mm"))),"yyyy-MM-dd");

结果

3.2、如果两次浏览之间的间隔超过30分钟,认为是两个不同的浏览时间;再求每个id浏览时长、步长
用户1在 8:40 - 09:30 的间隔超过了30。生产中认为:
用户1在 08:20 - 08:40 浏览一次网站。这次浏览时长为20分钟,步长为4
用户1在 09:30 - 09:41 又浏览一次网站。这次浏览时长为11分钟,步长为6

对于测试数据SQL1的结果:
934e8bee978a42c7a8dbb4cfa8af0b4f 104.0 13

对于测试数据SQL2的结果:
934e8bee978a42c7a8dbb4cfa8af0b4f 32.0 6
934e8bee978a42c7a8dbb4cfa8af0b4f 35.0 7

解题思路
1 对时间进行升序排列,按照id进行分区,并计算当前时间到该分区第一个值之间的间隔timeintervals,用该时间间隔除以30得到第二个分区字段steps
2 按照id和steps进行分区,并且按id和steps进行分组和排序,用该分区的最大时间戳减去最小时间戳计算得到时长,使用count(*)得到步长

with tmp as (select id ,UNIX_TIMESTAMP(dt,"yyyy/MM/dd hh:mm") currentminute,dt,first_value(UNIX_TIMESTAMP(dt,"yyyy/MM/dd hh:mm")) over (partition by id order by(UNIX_TIMESTAMP(dt,"yyyy/MM/dd hh:mm")) ) firstcurrentmintusfrom t3    
)
select id,(max(currentminute)-min(currentminute))/60 timebetween ,count(*) totalcount,stepsfrom (select id,dt,currentminute,(currentminute-firstcurrentmintus)/60 timeintervals,floor((currentminute-firstcurrentmintus)/(60*30)) stepsfrom tmp order by id ,currentminute)tmp1 group by id,steps order by id,steps

结果

本文标签: hive sql常用题目测试