聚合函数与 BOOLEAN
SUM()
和AVG()
可以(注意COUNT()
不行)处理BOOLEAN
类型的数据,一些难以使用WHERE
的判断可以利用该特性进行简单实现。
案例:AVG() 与 BOOLEAN
select
round(
avg(
order_date = customer_pref_delivery_date
) * 100,
2
) as immediate_percentage
from
Delivery
where
(customer_id, order_date) in (
select
customer_id,
min(order_date)
from
Delivery
group by
customer_id
);
日期处理
DATEDIFF()
如果想获取两个日期之间的差值,最好的方式是利用DATEDIFF()
,而不是直接相减。如果直接相减,在MySQL中实际上是将这两个日期首先转换成了数字,格式通常为YYYYMMDD
,然后再进行数学减法操作,显然这一操作并不总是可靠的。
语法
DATEDIFF(end_date, start_date)
案例:DATEDIFF() 的使用场景
https://leetcode.cn/problems/rising-temperature/description/?envType=study-plan-v2&envId=sql-free-50
select
w1.id
from
Weather w1
join Weather w2 on datediff(w1.recordDate, w2.recordDate) = 1
where
w1.temperature > w2.temperature;
TIMESTAMPDIFF()
实际上是DATEDIFF()
的加强版,可以进一步定位到小时、分钟和秒。
语法
TIMESTAMPDIFF(format, start_date, end_date)
注意顺序和DATEDIFF()
不同。format
可用的格式有DAY
, HOUR
, MINUTE
和SECOND
。
DATE_FORMAT()
用于以不同的格式显示日期/时间数据(把原始的日期转换为需要的格式)。
语法
DATE_FORMAT(date, format)
format
可用格式和使用范例见:https://www.w3school.com.cn/sql/func_date_format.asp
案例:DATE_FORMAT() 的使用场景
https://leetcode.cn/problems/monthly-transactions-i/?envType=study-plan-v2&envId=sql-free-50
select
date_format(trans_date, '%Y-%m') as month,
country,
count(*) as trans_count,
sum(
if(state = 'approved', 1, 0)
) as approved_count,
sum(amount) as trans_total_amount,
sum(
if(state = 'approved', amount, 0)
) as approved_total_amount
from
Transactions
group by
month,
country;
INTERVAL 关键词
INTERVAL
当关键词使用时(不同于INTERVAL()
函数),表示为设置时间间隔,如INTERVAL 1 DAY
表示间隔1天,INTERVAL -1 MONTH
表示反向间隔1个月。可以直接用于进行算数运算,也可以用在DATE_ADD()
和DATE_SUB()
等函数中,后者详见https://blog.csdn.net/weixin_44641729/article/details/103793995。
案例:INTERVAL 关键词的使用场景
https://leetcode.cn/problems/game-play-analysis-iv/?envType=study-plan-v2&envId=sql-free-50
select
round(
count(distinct player_id) / (
select
count(distinct player_id)
from
Activity
),
2
) as fraction
from
Activity
where
(player_id, event_date) in (
select
player_id,
min(event_date) + interval 1 day
from
Activity
group by
player_id
);
关于NULL
无法使用比较运算符(=
、<
、<>
等)处理NULL
。此时需要使用IS NULL
· IS NOT NULL
· IFNULL()
· COALESCE()
。
IFNULL()
接受两个参数,如果第一个参数不为NULL
,则返回第一个参数;否则返回第二个参数。COALESCE()
接受多个参数,返回参数列表中的第一个非NULL
值;如果所有参数都是NULL
,则返回NULL
,以避免在应用逻辑中出现空值引发的错误。
语法
SELECT column FROM table WHERE column IS NULL;
SELECT column FROM table WHERE column IS NOT NULL;
SELECT IFNULL(expression_1, expression_2) column_alias FROM table;
SELECT COALESCE (*expressions) column_alias FROM table;
案例:IFNULL() 的使用场景
https://leetcode.cn/problems/average-selling-price/?envType=study-plan-v2&envId=sql-free-50
select
p.product_id,
ifnull(
round(
sum(p.price * u.units) / sum(u.units),
2
),
0
) average_price
from
Prices p
left join UnitsSold u on p.product_id = u.product_id
and u.purchase_date between p.start_date
and p.end_date
group by
p.product_id;
案例:COALESCE()的使用场景
见下述“UNION ALL的使用场景”。这个案例中的订单终止时间根据订单性质的不同,会在3个不同的column中取得。
GROUP BY 和 WHERE · HAVING
有GROUP BY
时,WHERE
关键字是在分组前对原始数据进行筛选,HAVING
则是在分组后对分组的结果进行筛选。因此,WHERE
不能和聚合函数一起使用,但HAVING
能。
语法
SELECT column FROM table WHERE expression GROUP BY column;
SELECT column FROM table GROUP BY column HAVING expression;
案例:HAVING 的使用场景
select
e.name
from
Employee e
join (
select
managerId
from
Employee
where
managerId is not null
group by
managerId
having
count(id) >= 5
) sub on e.id = sub.managerId;
EXISTS
EXISTS
通常用来检查一个子查询是否返回了至少一行数据。如果子查询返回至少一行数据,EXISTS
返回TRUE
,否则返回FALSE
。在某些复杂的查询中,EXISTS
比JOIN
更有效率,因为EXISTS
在找到第一条符合条件的记录后就会停止搜索。
其适用的语句包括CASE WHEN [NOT] EXISTS () THEN...
和WHERE [NOT] EXISTS ()
等。其子查询内部通常使用SELECT 1 FROM...
惯例格式,虽然SELECT * FROM...
也可以达成同样的效果,但前者更加直观明了。
案例:EXISTS的使用场景
https://www.nowcoder.com/practice/f72d3fc27dc14f3aae76ee9823ccca6b
with exam_record_densed as (
select
uid,
sum(if(score, 0, 1)) incomplete_cnt,
round(sum(if(score, 0, 1)) / count(*), 3) incomplete_rate
from
exam_record
group by
uid
),
exam as (
select
u.uid,
level,
incomplete_cnt,
incomplete_rate
from
exam_record_densed e
right join user_info u on e.uid = u.uid
),
org_result as (
select
uid,
ifnull(incomplete_cnt, 0) imcomplete_cnt,
ifnull(incomplete_rate, 0.000) incomplete_rate
from
exam
where
level = 0
and exists (
select
1
from
exam
where
level = 0
and incomplete_cnt > 2
)
union all
select
uid,
incomplete_cnt,
incomplete_rate
from
exam
where
incomplete_cnt is not null
and not exists (
select
1
from
exam
where
level = 0
and incomplete_cnt > 2
)
)
select
*
from
org_result
order by
incomplete_rate;
IF · CASE
IF()
接受三个参数:条件、真返回值和假返回值。
CASE
语句是SQL中最灵活的条件逻辑处理工具,可以在更复杂的逻辑判断中使用。
语法
IF(condition, true_value, false_value)
CASE WHEN condition_1 THEN result_1 WHEN condition_2 THEN result_2 ELSE default_result END
当然,如果condition是一个特定列,只是对这个特定列的取值进行判断,则CASE语句可以进一步简写为:
CASE column WHEN value_1 THEN result_1 WHEN value_2 THEN result_2 ELSE default_result END
此外,IF
、CASE
的结果是一个值,显然可以和其他函数或子句连用,如聚合函数、分组、排序等。
案例:CASE 的使用场景
https://leetcode.cn/problems/exchange-seats/description/?envType=study-plan-v2&envId=sql-free-50
select
*
from
(
select
case when mod(id, 2) = 0 then id - 1 when mod(id, 2) = 1
and id < (
select
max(id)
from
Seat
) then id + 1 else id end as id,
student
from
Seat
) sub
order by
id;
算术运算 · 取整运算
MySQL支持基本算数运算符,即+
、-
、*
、/
、%
的使用。取余有多种表示方法:MOD(x, y)
或 x MOD y
或 x % y
。
MySQL取整运算包括四舍五入取整(ROUND(num, 0)
)、向上取整(CEILING(num)
或者CEIL(num)
)和向下取整(FLOOR(num)
)。
案例:取余的使用场景
见上述“案例:CASE 的使用场景”。
案例:向上取整的使用场景
https://www.nowcoder.com/practice/b626ff9e2ad04789954c2132c74c0513
with ranked_scores as (
select
id,
job,
score,
row_number() over(
partition by job
order by
score desc
) row_order,
rank() over(
partition by job
order by
score desc
) t_rank,
count(*) over(partition by job) total_count
from
grade
),
median_score as (
select
id,
job,
score,
row_order,
t_rank,
ceil(total_count / 2.0) mid_point,
case
when mod(total_count, 2) = 0 then ceil(total_count / 2.0) + 1
else ceil(total_count / 2.0)
end second_mid_point
from
ranked_scores
)
select
distinct id,
job,
score,
t_rank
from
median_score
where
row_order = mid_point
or row_order = second_mid_point
order by
id;
当然,取分位数的不止这一种需求形式:
https://www.nowcoder.com/practice/165d88474d434597bcd2af8bf72b24f1
with acum_cnt as (
select
grade,
number,
sum(number) over(
order by
grade
) acum_num
from
class_grade
),
get_median_num as (
select
ceil(
max(acum_num) / 2.0
) left_median,
floor(
max(acum_num) / 2.0 + 1
) right_median
from
acum_cnt
) (
select
grade
from
acum_cnt
where
acum_num >= (
select
left_median
from
get_median_num
)
limit
1
)
union
(
select
grade
from
acum_cnt
where
acum_num >= (
select
right_median
from
get_median_num
)
limit
1
);
DISTINCT · LIMIT
DISTINCT
可以对SELECT
结果集去重。LIMIT
可以选择性输出结果集中的行,常与ORDER BY
连用,可以用OFFSET
设置偏移量。
语法
SELECT DISTINCT column FROM table;
SELECT column FROM table LIMIT row_count OFFSET offset;
案例:LIMIT ... OFFSET ... 的使用场景
select
ifnull(
(
select
distinct salary
from
Employee
order by
salary desc
limit
1 offset 1
),
null
) as SecondHighestSalary;
窗口函数
由于本节内容过多,故将其置于新文章中。
UNION · UNION ALL
用以合并多个SELECT
结果集。UNION
有去重的效果,UNION ALL
没有。
语法
SELECT column FROM table1 UNION SELECT column FROM table2;
SELECT column FROM table1 UNION ALL SELECT column FROM table2;
案例:UNION ALL 的使用场景
UNION ALL
在此用于将分属不同column的对立类别按行融合进一个column,并添加类别属性。
https://www.nowcoder.com/practice/f301eccab83c42ab8dab80f28a1eef98
with
record_in_range as (
select
city,
event_time org_time,
coalesce(start_time, finish_time, end_time) term_time
from
tb_get_car_record r
left join tb_get_car_order o on r.order_id = o.order_id
where
date (event_time) between '2021-10-01' and '2021-10-31'
),
unioned as (
(
select
city,
org_time event_time,
1 operation
from
record_in_range
)
union all
(
select
city,
term_time event_time,
-1 operation
from
record_in_range
)
),
windowed as (
select
city,
sum(operation) over (
partition by
city
order by
event_time,
operation desc
) wait_uv
from
unioned
)
select
city,
max(wait_uv) max_wait_uv
from
windowed
group by
city
order by
max_wait_uv,
city;
RECURSIVE
MySQL的递归方法,常用于给出处于某个区间内(如日期)的结果集,或者查询层级关系等。
比如,现在要创造一个2024年1月每日96点的临时结果集,可以:
with recursive date_range as (
select
'2024-01-01' dt
union all
select
dt + interval 1 day
from
date_range
where
dt + interval 1 day < '2024-02-01'
),
recursive time_range as (
select
1 period
union all
select
period + 1
from
time_range
where
period + 1 < 97
)
select
dt,
period
from
date_range cross
join time_range
order by
dt,
period;
字符串操作
SUBSTRING()
字符串截取函数SUBSTRING()
相对于LEFT()
和RIGHT()
功能更全面,如SUBSTRING(name, 3)
表示从正序第3个字符开始,截取之后的所有字符;SUBSTRING(name, 5, 3)
表示从正序第5个字符开始,截取之后连续的3个字符;SUBSTRING(name, -4, 2)
表示从倒序第4个字符开始,截取之前(从正序看是之后)连续的2个字符。
案例:SUBSTRING() 的使用场景
select
user_id,
concat(
upper(
substring(name, 1, 1)
),
lower(
substring(
name
from
2
)
)
) name
from
Users
order by
user_id;
LIKE 和通配符
通配符(%
、_
、[charlist]
、[!charlist]
)必须和LIKE
模糊查询运算符一起使用。通配符含义见https://www.w3school.com.cn/sql/sql_wildcards.asp。
语法
SELECT column FROM table WHERE column LIKE pattern;
案例:LIKE 的使用场景
https://leetcode.cn/problems/patients-with-a-condition/?envType=study-plan-v2&envId=sql-free-50
select
*
from
Patients
where
conditions like 'DIAB1%'
or conditions like '% DIAB1%';
REPLACE()
REPLACE()
用于替换字符串中的字符。
语法
REPLACE(str, 'char1', 'char2')
案例:REPLACE()的使用场景
https://www.nowcoder.com/practice/e3870bd5d6744109a902db43c105bd50
这其实是REPLACE()
的巧用,在统计字符串中某字符出现次数时,将该字符替换为空字符,便于作差求解。
select
id,
(
length(string) - length(replace(string, ',', ''))
) cnt
from
strings;
CONCAT() 和 GROUP_CONCAT()
CONCAT()
用来将两个或多个字符串参数连接成一个字符串。GROUP_CONCAT()
则是用在分组操作中的,它会将同一个分组内的多条记录的某个字段值连接成一个字符串返回。
语法
CONCAT(str1, str2, ...)
GROUP_CONCAT([DISTINCT] column [ORDER BY ASC/DESC column] [SEPERATOR 'seperator'])
案例:CONCAT() 的使用场景
见上述“案例:SUBSTRING()
的使用场景”。
案例:GROUP_CONCAT() 的使用场景
select
sell_date,
count(*) num_sold,
group_concat(
product
order by
product asc separator ','
) products
from
(
select
distinct *
from
Activities
) sub
group by
sell_date
order by
sell_date;
REGEXP 和正则匹配字符
MySQL中使用REGEXP
或RLIKE
操作符来进行正则表达式匹配,两个操作符功能等价。正则匹配有很多不同类型的模式匹配字符,详见https://www.runoob.com/mysql/mysql-regexp.html。
语法
column REGEXP 'match_method'
案例:REGEXP 的使用场景
https://leetcode.cn/problems/find-users-with-valid-e-mails/?envType=study-plan-v2&envId=sql-free-50
select
*
from
Users
where
mail regexp '^[A-Za-z][A-Za-z0-9_.-]*@leetcode\\.com$';
意犹未尽?
现在是时候展现SQL大师的风采了:
https://www.nowcoder.com/practice/aef5adcef574468c82659e8911bb297f
博主把自己答题的过程原封不动粘贴于此看来平时没少写code:
with preprocess as (select distinct uid, date(in_time) date from tb_user_log where artical_id = 0 and sign_in = 1 and date(in_time) between '2021-07-07' and '2021-10-31'),
add_date_rnk as (select uid, date, row_number() over(partition by uid order by date) row_num from preprocess),
add_date_group as (select uid, date, (date - interval row_num day) date_sub_row from add_date_rnk),
add_consec_num as (select uid, date, row_number() over(partition by uid, date_sub_row order by date) consec_date from add_date_group),
add_prize as (select uid, date_format(date, '%Y%m') month, case consec_date % 7 when 3 then 3 when 0 then 7 else 1 end coin_this_day from add_consec_num)
select uid, month, sum(coin_this_day) coin from add_prize group by month, uid order by month, uid;
后续
差不多半年后,博主二刷此题:
with signed_day as (select uid, date(in_time) dt, row_number() over(partition by uid order by date(in_time)) date_order from tb_user_log where artical_id = 0 and sign_in = 1 and date(in_time) between '2021-07-07' and '2021-10-31'),
add_date_group as (select uid, date_format(dt, '%Y%m') month, row_number() over(partition by uid, dt - interval date_order day order by dt) succ_day from signed_day)
select uid, month, sum(case succ_day % 7 when 3 then 3 when 0 then 7 else 1 end) coin from add_date_group group by uid, month order by month, uid;
思路基本没变,但是代码组织方式有一定变化,精简了很多……