历史零售数据,做销售表现的对比,复盘分析(组图)

某零售行业拥有多家连锁零售门店,现为12月份,我们现在需要调出历史零售数据,做销售表现的对比,复盘分析今年的销售表现,看去年制定的KPI现在达标情况、做同比环比。 #
数据表的关系如下图: #
在hive中创建表: #
CREATE TABLE tbDate(dateID string,theyearmonth string,theyear string,themonth string,thedate string,theweek string,theweeks string,thequot string,thetenday string,thehalfmonth string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ;
CREATE TABLE tbStock(ordernumber STRING,locationid string,dateID string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ;
CREATE TABLE tbStockDetail(ordernumber STRING,rownum int,itemid string,qty int,price int ,amount int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ;
#
#
导入数据:
#
#LOAD DATA LOCAL INPATH '/home/yuanyi/tbDate.txt' INTO TABLE tbDate; LOAD DATA LOCAL INPATH '/home/yuanyi/tbStock.txt' INTO TABLE tbStock; LOAD DATA LOCAL INPATH '/home/yuanyi/tbStockDetail.txt' INTO TABLE tbStockDetail;
#
先把表聚合起来,方便后续使用: #
#hive> create table agg_tb_1 as > select sd.*,s.locationid,s.dateid,d.theyearmonth,d.theyear,d.themonth,d.thedate,d.theweek > from tbstockdetail sd join tbstock s > on sd.ordernumber=s.ordernumber > join tbdate d > on s.dateid=d.dateid;
#
显示创建成功后验证:
#
select * from agg_tb_1 limit 5;
#
#
能查看到五行记录 #
#hive> select * from agg_tb_1 limit 5; --得到287942条数据;
#
我们去开始提取和计算这条数据:
这里假设去年老板用历史占比法去制定每个月的KPI。 #
历史占比法:
#
即根据前一年,或者前几年中的每个维度的销售额对于整体销售额的占比达标率怎么算,去计算将来目标该如何分配到每个月。 #
举例来说,现在将目标拆分成一个月份维度,如下图第三列是2006年每个月的销售额,第四列是2006年整年销售额,老板希望2007年整体销售额提升20%,即: x 1.2 = ,按2006年每个月占全年销售额的占比,用总KPI x 该占比分配到2007年每个月的KPI,即第六列 #
一维分配KPI: #
下面计算上面例子中的一维分配KPI: #
Hive SQL实现代码:
hive> select c.*,c.year_amount*1.2*c.zhanbi as next_year_kpi
> from (select a.*,b.year_amount,round(a.amount/b.year_amount,4) as zhanbi
> from (select theyearmonth as yearmonth,sum(qty) as qty,sum(amount) as amount
> from agg_tb_1 where theyear =2006 group by theyearmonth order by theyearmonth)a join (select distinct theyearmonth as yearmonth,theyear as theyear, sum(amount) over (partition by theyear) as year_amount from agg_tb_1 where theyear =2006 order by theyear,theyearmonth)b
> on a.yearmonth=b.yearmonth order by a.yearmonth)c;
#
#
Hive SQL运行验证结果: #
现在是2007年年末,已有2007年全年销售数据,要去计算每个月的达标率,这里还是会用到上面代码的结果,并用lag()函数做位置偏移,使2006年算的均往下移至和2007年对齐,再算达标率: #
预计效果如图: #
Hive SQL实现代码:
hive> select
> d.*,
> lag(d.next_year_kpi,12,null) over (order by d.yearmonth) as kpi,
> concat(round(d.amount/(lag(d.next_year_kpi,12,0) over (order by d.yearmonth))*100,2),"%") as achieving_rate
> from
> (
> select c.*,c.year_amount*1.2*c.zhanbi as next_year_kpi
> from
> (
> select a.*,b.year_amount,round(a.amount/b.year_amount,4) as zhanbi
> from
> (select theyearmonth as yearmonth,theyear as theyear,sum(qty) as qty,sum(amount) as amount
> from agg_tb_1
> where theyear in (2006,2007)
> group by theyearmonth,theyear
> order by theyearmonth,theyear
> )a
> join
> (select
> distinct theyear as theyear,
> sum(amount) over (partition by theyear) as year_amount
> from agg_tb_1
> where theyear in (2006,2007)
> order by theyear
> )b
> on a.theyear=b.theyear
> order by a.yearmonth
> )c
> )d;
#
Hive SQL运行验证结果:
二维分配KPI:
假如我们希望再细分维度,希望对每个门店的每个月做分配KPI,同样是用历史占比法。
#
但要注意的是细分维度后,对于门店,可能有特殊情况,比如说:门店在2006年的某个月才开始开业,或者2007年的某个月或某几个月暂定停业。 #
这样会缺失对应的月份和门店数据,lag()偏移多少的参数无法确认。所以这里就要用两个表去链接:
#
先创建视图 #
#hive> create view location_monthly as > select a.*,b.year_amount,a.amount/b.year_amount as zhanbi > from > (select theyearmonth as yearmonth, locationid as location,theyear as theyear,sum(qty) as qty,sum(amount) as amount > from agg_tb_1 > where theyear in (2006,2007) > group by theyearmonth,locationid,theyear > order by theyearmonth,locationid,theyear)a > join > (select distinct theyear as theyear, > sum(amount) over (partition by theyear) as year_amount > from agg_tb_1 > where theyear in (2006,2007) > order by theyear)b > on a.theyear=b.theyear > order by a.yearmonth,a.location;
#
创建成功后,查看2006年整体销售额,并算出x1.2是多少。 因为hql不能 @变量=xxx,所以只能用set达标率怎么算,先用计算出数值,再用set定义。
这里可以对比一下MySQL和Hive SQL
Hive SQL: #
hive> set kpi=16419378;
--在hive中用set 变量去查看变量被赋值了什么
hive> set kpi;
kpi=16419378
hive> select * from(
> select
> a.*,
> a.year_amount*1.2 as next_year_kpi,
> b.zhanbi as last_year_zhanbi,
> ${hiveconf:kpi}*b.zhanbi as kpi,
> a.amount/(${hiveconf:kpi}*b.zhanbi) as attach_rate
> from
> location_monthly a left join location_monthly b
> on a.theyear-1=b.theyear
> and a.location = b.location
> and substr(a.yearmonth,5,2)=substr(b.yearmonth,5,2))a
> where kpi is not null;
#
Hive SQL运行效果:
#
MySQL:
#
select @kpi:=sum(amount)*1.2 from agg_stock where theyear=2006; select @kpi; --可以看到kpi的值为16419378 select * from( select a.*, a.year_amount*1.2 as next_year_kpi, b.zhanbi as last_year_zhanbi, @kpi*b.zhanbi as kpi, concat(round(a.amount/(@kpi*b.zhanbi)*100,2),"%") as attach_rate from location_monthly a left join location_monthly b --b表为2007年,-1年为2006年和a表链接 on a.theyear-1=b.theyear --地址相同的链接,月份相同的链接 and a.location = b.location and substr(a.yearmonth,5,2)=substr(b.yearmonth,5,2) )c --只看2007年有被分配到kpi的数据,也可以在select中选择所需字段 where kpi is not null;
#
#
MySQL在的效果如图:
#
同时可以用order by ,去更方便去看每个门店在每个月的销售额和达标率,或者用where去筛选门店看某个门店的两年对比情况。
#
举个例子: #
#select * from( select a.*, a.year_amount*1.2 as next_year_kpi, b.zhanbi as last_year_zhanbi, @kpi*b.zhanbi as kpi, concat(round(a.amount/(@kpi*b.zhanbi)*100,2),"%") as attach_rate from location_monthly a left join location_monthly b --b表为2007年,-1年为2006年和a表链接 on a.theyear-1=b.theyear --地址相同的链接,月份相同的链接 and a.location = b.location and substr(a.yearmonth,5,2)=substr(b.yearmonth,5,2) )c --只看2007年有被分配到kpi的数据,也可以在select中选择所需字段 where location = "RM" order by location,yearmonth;
#
这里可以看出RM这家店有可能是2006年6月新开业的,在2006奶奶1月到5月都没有数据,用历史占比发去计算2007年的6月-12月kpi,可以看出该门店的达标率情况不错。
#
计算每个月销售额同比、环比: #
期望达到效果: #
先存一个2006年和2007年的表:
#
hive> create view cal_diff as
> select
> theyearmonth,
> sum(qty) as qty,
> sum(amount) as amount
> from agg_tb_1
> where theyear in(2006,2007)
> group by theyearmonth
> order by theyearmonth;
#
#
同样可以用lag() 去算同比、环比
#
hive> select > *, > concat(round((amount-(lag(amount,1,null) over(order by theyearmonth)))/amount*100,2),"%") as huanbi, > concat(round((amount-(lag(amount,12,null) over(order by theyearmonth)))/amount*100,2),"%") as tongbi > from > cal_diff > limit 12,12;
#
Hive SQL运行验证结果: #
这里我们可以看到,除了一月和三月,其他都同比上升。 2月同比上升幅度最大,超过了50%。 #
注意千万不要用where筛选2007年的数据,同比就会显示不出来,因为算不了2006年的结果,如图: #