Database: SQL statement

How to use the database:
online transaction processing. Interaction with web/APP
data analytics

Data model: the concept of describing data, data relationship, data semantics and consistency constraints; the concept used to describe the data in the database is essentially how to organize the data
schema: Given a data model, the application stores data through this model
Tables also become relationships
database schema: schema. physical schema, logical schema, sub-schema

The SQL statement is Non-procedural (specify what data is needed, not how to obtain it)

1 relational model

Can be divided into three modules
Storage management: interacting with the file system
Query processing (works at the view level)
affairs management

The relational model is a collection of tables.
Each table has a unique name and is called a relation. Rows are called Tuples and columns are called attributes.

The legal range of values for an attribute is called a domain.
Atomic field: The elements in the field are indivisible, simply understood as not a set of values. Here more emphasis is placed on how it is used (whether it is used as a whole), rather than the value itself.

null value: unknown or does not exist

relation schema: equivalent to the variable declaration keyword (int) in the program; ———————— logic design, including table name, Attribute, AttributeType, (Constraint)
relation: equivalent to specific variables (

x)
relation instance: equivalent to the specific value of the variable (10). ———————————————— at a certain point in time Contents in the table (data may be changing all the time, add, delete, check and modify)

(1) Why is it called a relationship, each row is a relationship, and the entire table is a relation?
Assuming that there is a table that stores course IDs and prerequisite course IDs, each row represents such a relationship: a course is a prerequisite course for another course, so each row is called a relationship, and the entire table is called relation.

superkey: A set of attributes used to distinguish different tuples (one or more, can contain irrelevant attributes)
candidate keys: the smallest superkeys (not necessarily the smallest number, but does not contain irrelevant attributes);
primary keys: selected from candidate keys, it is best not to change afterwards.

Foreign Key Constraints: Relationships

r_1

The value of attribute A of r1?, which appears in the relation

r_2

r2? in the domain of primary key B. A is called

r_1

Foreign key to r1?.
Referential integrity constraints: not required to be primary keys.

1.1 Relational Algebra

\sigma

σ–select: select tuple

\Pi

Π–Projection: pick properties

\times

×–Cartesian product: Different relationships may have the same attribute, so it is necessary to add the relationship name to distinguish (instructor.ID, teaches.ID). The Cartesian product with itself is still problematic and requires a rename operation.

\Join

?–join: Can be generated by Cartesian product and select
Collection operation: Note that there must be the same attributes (and the same type)
The result of a relational algebra expression has no name and needs to use the Rename operation

1.2 SQL

(1) SQL or declarative language: Take sorting as an example, you only need to specify the data that needs to be sorted, without specifying what algorithm to use to complete the sorting.
SQL can be regarded as a collection of DML (addition, deletion, query and modification), DDL (defining schema and creating tables to store data), and DCL (security).

(2) SQL is actually based on bag, there is no fixed position to allow repeated elements
list allows repeated elements, but the position is fixed
set has no fixed position, but does not allow repeated elements

1.2.1 Basic SQL

(1) The char(n) type is a fixed-length character string, and spaces will be added if the length is not enough.
When comparing two char type strings, if the lengths are different, spaces will be appended to the shorter char variable to make the lengths the same.
When the char type is compared with the varchar type, spaces may or may not be appended, depending on the specific implementation.
It is recommended to use varchar

(2) Definition
create table create relationship
drop table

r; delete completely, that is, the pattern is also deleted
delete from

r; delete

All tuples in r, still retaining relations

r
alter table

r add

A \;D

AD; add a new attribute to an existing relationship, and all tuple new attributes are assigned a null value
alter table

r drop

A; delete attributes, many systems do not support

(3) Operation

From is followed by input. If there are multiple relationships, it can be considered as the Cartesian product of each relationship.
select is used to select attributes, which can be followed by the ±*/ operation of attributes, but there will be no default name; if you need to remove repeated elements, you need to specify select distinct, and all is the explicit repetition and the default method. instruction.* indicates all its attributes
where is the condition that the specified tuple needs to meet; logical comparison operators can be added; where salary between 90000 and 100000;
as is used to rename and can appear after select or from
like means pattern matching: % matches a string, _ matches a character, and is case-sensitive; if it does not match, it means not like
order by sorting, if there are multiple rules separated by commas, for example order by salary desc, name asc; ascending and descending order are asc/desc respectively

(4) Set operations: Duplicate elements will be deleted. If you want to keep them, you need to add all later; NULL participates in operations

union
intersect
except
If a value is Null in a mathematical operation, the result of the expression is Null. The result of the logical operation involving Null is unknown; if the result after where is false or unknown, the tuple will not be added to the result. Note: If two attribute values are Null, distinct will consider them equal. The above collection operations will also delete duplicate elements, so such a determination will also be made.

(5) Aggregate Functions: The input is a collection of values.

The input of sum and avg must be numbers
min, max, count
group by grouping application function; to ensure that the only attributes that appear in the select statement but are not aggregated are the attributes in the group by
having applies conditions in groups; also ensure that attributes appear in group by. If you want to select the aggregated data, you cannot use the result of the aggregation function in where, because it selects the input tuple. The correct way is to write after HAVING
Apart from
c

o

u

no

t

(

?

)

count(*)

count(?), the rest ignore Null values, where

?

*

? Represents all attributes, or written as count(1), which means adding 1 for each tuple
Aggregate functions can only appear in select clauses, and multiple aggregate functions can be placed
count, sum, avg support distinct

(6) Nested query
9. in/not in is used after where, the front is the attribute name, and it can be followed by a subquery or an enumeration set
10. >some means at least one, >all
11. exists If the subquery is not empty, return true
12. Does unique have duplicate tuples?
13. In the from clause; the result of the select-from-where statement is a relationship, so it can be written after from;
You need to add the lateral keyword to refer to the closure variable.
14. with defines a temporary relationship
15. The subquery result has only one tuple, can only be used in select, where, and having, if you want to use it in FROM, you need to use the following CTE technology; like a single value, it can be used in in the expression.

(7) Modification
16. insert into
17. update set
10.case end is similar to {}; when then, else is similar to if statement
11.coalesce(sum(credits), 0) If sum(credits) is null, return 0

1.2.1 Intermediate SQL

(1) join

from
r

1

r_1

r1? natural join

r

2

r_2

r2?; In different relationships, it is generally required that attributes with the same name need to be equal; duplicate attributes will only appear once in the result, and are located in the front of the attribute order of the result relationship.
join … using (); used to specify the properties of the connection
join … on ; specify the condition of join after on; the repeated attribute in this way will appear twice in the result; it is somewhat repeated with where, and it is somewhat different from where in the outer join. If the condition after on is not satisfied, it will be performed null-padded, and if on true writes the condition to where, it will not be filled.
outer join; keep some attribute mismatch information; left/right/full outer join

(2) The view will not store the result and needs to be recalculated every time. Stores query expression; similar to define in c, when used, the view name will be replaced by query expression

create view
v

v

v as ; Once created, it is available later and needs to be explicitly deleted
materialized views. The result is stored and the view is recalculated if the relationship used to generate the view changes.
View is updatable, if only one relationship after from is satisfied
transaction, begin…commit/rollback

(3) Integrity constraints

alter table table-name add/drop constraint xxx; Add an integrity constraint to the relationship, first check the relationship, if it is satisfied, it will be added, otherwise it will be rejected
unique(); attribute is superkey, but can still be null unless explicitly declared not null
check(\P); Each tuple needs to meet this condition, which can be a null value
on delete/update cascade; For the referenced relationship, if the foreign key constraint is violated when updating it, the tuple in the referenced relationship can be changed instead of rejecting the command.
constraint minsalary check (x); Specify a name for the check (x) constraint
set constraints constraint-list deferred ; During the execution of the transaction, the operation temporarily violates the integrity constraint, and it will be checked at the end of the transaction;
create assertion check ;
for all X, P(X) is expressed as not exists (X such that not P(X))

(4) Data type and mode

date, time, timestamp, interval
cast (e as t) ; type conversion
clob/blob large object type, which slices the data and obtains them one by one
custom data types distinct types;
create type Dollars as numeric(12,2) final; Only the same type (must be Dollars, even if the underlying type is the same) can be assigned
create domain DDollars as numeric(12,2) not null; can add constraints and set default values, not strongly typed
identity/auto increment Automatically generate ID
create table temp instructor like instructor generate new relation based on existing relation, with same schema
create table t1 as (select-from-where) with data;
Three-level naming structure; the top level is a directory

(5) index; because it belongs to the physical mode, it is not part of the SQL standard

create index < index-name> on < relation-name> (< attribute-list>); does not need to read the whole relation, just the index part
drop index < index-name>;

(6) Authority

grant/revoke…on…to/from; read (select), insert, update (can be given by attribute), delete
create role…;
with grant option; permissions can be granted to other users

1.2.2 Advanced SQL

(1)

connect; first establish a connection
statement; used to send SQL statements to the DB server; opening connection and statement consumes resources and needs to be explicitly closed
ResultSet class; result acquisition, one tuple at a time; next() tests whether there are unacquired tuples; getString() can obtain any basic SQL type
Prepared statements; the same query only needs to be compiled once and can be run multiple times with different parameters
Metadata Features; how many attributes, attribute names, attribute types
Embedded SQL needs to be processed by a special preprocessor before it can be compiled using the host language; EXEC SQL ;

(2) function

create function; returns integer
return table; return table; run function overload
declare, variable declaration
In contrast to aggregate functions, string functions, mathematical functions, and date functions can appear anywhere in the query; use ‖ to concatenate strings; the implementation of date functions in various databases is chaotic, and there are great differences

(3) Triggers

after update of takes on grade; only specific attributes will trigger

(4) transitive closure

create temporary table; creating a temporary table is equivalent to the feeling of a temporary variable, that is, in a concurrent scenario, each instance has its own copy
use iterative processing
with recursive; created a recursive view; requires that the query must be monotonic

(5)

output control
INTO writes the output to a table and creates a new table; insert into inserts into an existing table.
limit limits the number of output tuple results, offset controls skipping the first few tuples
Nested query: similar to nested for loops, except that it is out of order and does not traverse every tuple. It is essentially based on bags
rank(); For those with the same value, the rank value is the same, for example, there may be two at most, they are both 1, and the second largest is 3
The window function can be executed incrementally, and over indicates how to split the data;
rows no no n preceding; the first n values, the first one is only itself.
rows unbounded preceding means all previous values
row_num() represents the order in which it appears in the output
rank() is the order within the group after grouping and sorting. If there is no sorting, it is all 1
Using group by and aggregation functions will lose some tuples, but using window function will keep them. For example, to find the highest score, the aggregation function will only give the highest score, while the window function will also give other people’s scores, followed by the highest score of the current group.
pivot(); The attribute value becomes the attribute name pivot-table.
group by rollup
CTE, using the result of the subquery, is equivalent to a temporary variable
The with name as() clause will be executed one step before you execute the normal query
Can be executed recursively, recursive
The current understanding is: The result of the iterative query will enter the next iteration process, so if the iterative query does not generate a tuple, the entire iteration will end.

1.2.4 cmu2022 practice

Each command in sqlite3 needs to be followed by a semicolon “;”
.tables ; View all current tables
.schema table_name; view the schema of the relationship

The title in q2 refers to primary_title
The calculation method of DECADE in q5 adopts the method of floor(premiered/10)*10
NTILE() is also a window function, which divides tuples into specified buckets. Found in use

1.
SELECT name, round(AVG(rating), 2) as avf_rat, NTILE(10) over(ORDER BY avf_rat) but_idx
Error Error: no such column: avf_rat
2.
SELECT distinct title, ROW_NUMBER() OVER(ORDER BY title)
If done in this way, distinct will fail

The sorting in over() cannot use the renamed attribute, you need to use the CTE method
The 9th of the requirements in q9 should be sorted in ascending order first to find the data in the 9th bucket. Sort the data in descending order
In q10, CTE recursive needs to be used to concatenate all strings, and a new column needs to be created for unique identification to facilitate subsequent iterative operations.