Skip to content

sql練習

Published: at 上午05:25

利用 2022/11 捷運 進出入資料來做 sql 資料分析練習

http://tcgmetro.blob.core.windows.net/stationod/%E8%87%BA%E5%8C%97%E6%8D%B7%E9%81%8B%E6%AF%8F%E6%97%A5%E5%88%86%E6%99%82%E5%90%84%E7%AB%99OD%E6%B5%81%E9%87%8F%E7%B5%B1%E8%A8%88%E8%B3%87%E6%96%99_202211.csv

使用的資料庫為 Postgresql

基本欄位有

日期 , 進站, 出站 , 人次

相對應 創立的欄位是 “date” ,“in” ,“out” ,people_count

  1. 總共有幾筆資料
select count(*) count from public."mrt-tran"
count
8921430
  1. 每日每趟旅程 總人數
SELECT  "date" ,"in" ,"out"  ,sum(people_count) sum_count
FROM public."mrt-tran" group by "date" ,"in" ,"out"  order by sum_count desc
  1. 當月進站抵達最高人次

select
	cc.gg ,
	bb."out" ,
	cc.ss
from
	(
	select
		aa."in" gg,
		max(aa.s) ss
	from
		(
		select
			"in",
			"out" ,
			sum(people_count) s
		from
			public."mrt-tran"
		group by
			"in",
			"out") aa
	group by
		aa."in") cc
left join (
	select
		"in",
		"out" ,
		sum(people_count) s
	from
		public."mrt-tran"	group by
			"in",
			"out"
) bb on
	cc.ss = bb.s
  1. 因為數據量大 如果一直查詢 重算 每次都會花很久時間 所以postgresql有個功能叫做物化視圖 適合放冷資料 不會需要最新資料 的東西
CREATE MATERIALIZED VIEW "mrt-monthly-count" AS SELECT TO_CHAR( de :: DATE, 'yyyy-mm') de,sum(cc) cc from "mrt-tran" group by TO_CHAR( de :: DATE, 'yyyy-mm')  order by cc
CREATE MATERIALIZED VIEW "mrt_dayliy_cc" as select de ,fin, fout,sum(cc) cc from "mrt-tran" group by de ,fin, fout

select fin,sum(cc)cc  from mrt_dayliy_cc mdc group by fin order by cc desc

ref

http://www.postgres.cn/docs/9.5/rules-materializedviews.html