Database: SQL statement

How to use the database:
online transaction processing. Interaction with web/APP
data analytics

Data model: the concept of describing data, data relationship, data semantics and consistency constraints; the concept used to describe the data in the database is essentially how to organize the data
schema: Given a data model, the application stores data through this model
Tables also become relationships
database schema: schema. physical schema, logical schema, sub-schema

The SQL statement is Non-procedural (specify what data is needed, not how to obtain it)

1 relational model

Can be divided into three modules
Storage management: interacting with the file system
Query processing (works at the view level)
affairs management

The relational model is a collection of tables.
Each table has a unique name and is called a relation. Rows are called Tuples and columns are called attributes.

The legal range of values for an attribute is called a domain.
Atomic field: The elements in the field are indivisible, simply understood as not a set of values. Here more emphasis is placed on how it is used (whether it is used as a whole), rather than the value itself.

null value: unknown or does not exist

relation schema: equivalent to the variable declaration keyword (int) in the program; ———————— logic design, including table name, Attribute, AttributeType, (Constraint)
relation: equivalent to specific variables (

x

x

x)
relation instance: equivalent to the specific value of the variable (10). ———————————————— at a certain point in time Contents in the table (data may be changing all the time, add, delete, check and modify)

(1) Why is it called a relationship, each row is a relationship, and the entire table is a relation?
Assuming that there is a table that stores course IDs and prerequisite course IDs, each row represents such a relationship: a course is a prerequisite course for another course, so each row is called a relationship, and the entire table is called relation.

superkey: A set of attributes used to distinguish different tuples (one or more, can contain irrelevant attributes)
candidate keys: the smallest superkeys (not necessarily the smallest number, but does not contain irrelevant attributes);
primary keys: selected from candidate keys, it is best not to change afterwards.

Foreign Key Constraints: Relationships

r

1

r_1

The value of attribute A of r1?, which appears in the relation

r

2

r_2

r2? in the domain of primary key B. A is called

r

1

r_1

Foreign key to r1?.
Referential integrity constraints: not required to be primary keys.

1.1 Relational Algebra

σ

\sigma

σ–select: select tuple

Π

\Pi

Π–Projection: pick properties

x

\times

×–Cartesian product: Different relationships may have the same attribute, so it is necessary to add the relationship name to distinguish (instructor.ID, teaches.ID). The Cartesian product with itself is still problematic and requires a rename operation.

?

\Join

?–join: Can be generated by Cartesian product and select
Collection operation: Note that there must be the same attributes (and the same type)
The result of a relational algebra expression has no name and needs to use the Rename operation

1.2 SQL

(1) SQL or declarative language: Take sorting as an example, you only need to specify the data that needs to be sorted, without specifying what algorithm to use to complete the sorting.
SQL can be regarded as a collection of DML (addition, deletion, query and modification), DDL (defining schema and creating tables to store data), and DCL (security).

(2) SQL is actually based on bag, there is no fixed position to allow repeated elements
list allows repeated elements, but the position is fixed
set has no fixed position, but does not allow repeated elements

1.2.1 Basic SQL

(1) The char(n) type is a fixed-length character string, and spaces will be added if the length is not enough.
When comparing two char type strings, if the lengths are different, spaces will be appended to the shorter char variable to make the lengths the same.
When the char type is compared with the varchar type, spaces may or may not be appended, depending on the specific implementation.
It is recommended to use varchar

(2) Definition
create table create relationship
drop table

r

r

r; delete completely, that is, the pattern is also deleted
delete from

r

r

r; delete

r

r

All tuples in r, still retaining relations

r

r

r
alter table

r

r

r add

A

D.

A \;D

AD; add a new attribute to an existing relationship, and all tuple new attributes are assigned a null value
alter table

r

r

r drop

A

A

A; delete attributes, many systems do not support

(3) Operation

  1. From is followed by input. If there are multiple relationships, it can be considered as the Cartesian product of each relationship.
  2. select is used to select attributes, which can be followed by the ±*/ operation of attributes, but there will be no default name; if you need to remove repeated elements, you need to specify select distinct, and all is the explicit repetition and the default method. instruction.* indicates all its attributes
  3. where is the condition that the specified tuple needs to meet; logical comparison operators can be added; where salary between 90000 and 100000;
  4. as is used to rename and can appear after select or from
  5. like means pattern matching: % matches a string, _ matches a character, and is case-sensitive; if it does not match, it means not like
  6. order by sorting, if there are multiple rules separated by commas, for example order by salary desc, name asc; ascending and descending order are asc/desc respectively

(4) Set operations: Duplicate elements will be deleted. If you want to keep them, you need to add all later; NULL participates in operations

  1. union
  2. intersect
  3. except
  4. If a value is Null in a mathematical operation, the result of the expression is Null. The result of the logical operation involving Null is unknown; if the result after where is false or unknown, the tuple will not be added to the result. Note: If two attribute values are Null, distinct will consider them equal. The above collection operations will also delete duplicate elements, so such a determination will also be made.

(5) Aggregate Functions: The input is a collection of values.

  1. The input of sum and avg must be numbers
  2. min, max, count
  3. group by grouping application function; to ensure that the only attributes that appear in the select statement but are not aggregated are the attributes in the group by
  4. having applies conditions in groups; also ensure that attributes appear in group by. If you want to select the aggregated data, you cannot use the result of the aggregation function in where, because it selects the input tuple. The correct way is to write after HAVING
  5. Apart from

    c

    o

    u

    no

    t

    (

    ?

    )

    count(*)

    count(?), the rest ignore Null values, where

    ?

    *

    ? Represents all attributes, or written as count(1), which means adding 1 for each tuple

  6. Aggregate functions can only appear in select clauses, and multiple aggregate functions can be placed
  7. count, sum, avg support distinct

(6) Nested query
9. in/not in is used after where, the front is the attribute name, and it can be followed by a subquery or an enumeration set
10. >some means at least one, >all
11. exists If the subquery is not empty, return true
12. Does unique have duplicate tuples?
13. In the from clause; the result of the select-from-where statement is a relationship, so it can be written after from;
You need to add the lateral keyword to refer to the closure variable.
14. with defines a temporary relationship
15. The subquery result has only one tuple, can only be used in select, where, and having, if you want to use it in FROM, you need to use the following CTE technology; like a single value, it can be used in in the expression.

(7) Modification
16. insert into
17. update set
10.case end is similar to {}; when then, else is similar to if statement
11.coalesce(sum(credits), 0) If sum(credits) is null, return 0

1.2.1 Intermediate SQL

(1) join

  1. from

    r

    1

    r_1

    r1? natural join

    r

    2

    r_2

    r2?; In different relationships, it is generally required that attributes with the same name need to be equal; duplicate attributes will only appear once in the result, and are located in the front of the attribute order of the result relationship.

  2. join … using (); used to specify the properties of the connection
  3. join … on ; specify the condition of join after on; the repeated attribute in this way will appear twice in the result; it is somewhat repeated with where, and it is somewhat different from where in the outer join. If the condition after on is not satisfied, it will be performed null-padded, and if on true writes the condition to where, it will not be filled.
  4. outer join; keep some attribute mismatch information; left/right/full outer join

(2) The view will not store the result and needs to be recalculated every time. Stores query expression; similar to define in c, when used, the view name will be replaced by query expression

  1. create view

    v

    v

    v as ; Once created, it is available later and needs to be explicitly deleted

  2. materialized views. The result is stored and the view is recalculated if the relationship used to generate the view changes.
  3. View is updatable, if only one relationship after from is satisfied
  4. transaction, begin…commit/rollback

(3) Integrity constraints

  1. alter table table-name add/drop constraint xxx; Add an integrity constraint to the relationship, first check the relationship, if it is satisfied, it will be added, otherwise it will be rejected
  2. unique(); attribute is superkey, but can still be null unless explicitly declared not null
  3. check(\P); Each tuple needs to meet this condition, which can be a null value
  4. on delete/update cascade; For the referenced relationship, if the foreign key constraint is violated when updating it, the tuple in the referenced relationship can be changed instead of rejecting the command.
  5. constraint minsalary check (x); Specify a name for the check (x) constraint
  6. set constraints constraint-list deferred ; During the execution of the transaction, the operation temporarily violates the integrity constraint, and it will be checked at the end of the transaction;
  7. create assertion check ;
    for all X, P(X) is expressed as not exists (X such that not P(X))

(4) Data type and mode

  1. date, time, timestamp, interval
  2. cast (e as t) ; type conversion
  3. clob/blob large object type, which slices the data and obtains them one by one
  4. custom data types distinct types;
    create type Dollars as numeric(12,2) final; Only the same type (must be Dollars, even if the underlying type is the same) can be assigned
    create domain DDollars as numeric(12,2) not null; can add constraints and set default values, not strongly typed
  5. identity/auto increment Automatically generate ID
  6. create table temp instructor like instructor generate new relation based on existing relation, with same schema
    create table t1 as (select-from-where) with data;
  7. Three-level naming structure; the top level is a directory

(5) index; because it belongs to the physical mode, it is not part of the SQL standard

  1. create index < index-name> on < relation-name> (< attribute-list>); does not need to read the whole relation, just the index part
  2. drop index < index-name>;

(6) Authority

  1. grant/revoke…on…to/from; read (select), insert, update (can be given by attribute), delete
  2. create role…;
  3. with grant option; permissions can be granted to other users
1.2.2 Advanced SQL

(1)

  1. connect; first establish a connection
  2. statement; used to send SQL statements to the DB server; opening connection and statement consumes resources and needs to be explicitly closed
  3. ResultSet class; result acquisition, one tuple at a time; next() tests whether there are unacquired tuples; getString() can obtain any basic SQL type
  4. Prepared statements; the same query only needs to be compiled once and can be run multiple times with different parameters
  5. Metadata Features; how many attributes, attribute names, attribute types
  6. Embedded SQL needs to be processed by a special preprocessor before it can be compiled using the host language; EXEC SQL ;

(2) function

  1. create function; returns integer
  2. return table; return table; run function overload
  3. declare, variable declaration
  4. In contrast to aggregate functions, string functions, mathematical functions, and date functions can appear anywhere in the query; use ‖ to concatenate strings; the implementation of date functions in various databases is chaotic, and there are great differences

(3) Triggers

  1. after update of takes on grade; only specific attributes will trigger

(4) transitive closure

  1. create temporary table; creating a temporary table is equivalent to the feeling of a temporary variable, that is, in a concurrent scenario, each instance has its own copy
  2. use iterative processing
  3. with recursive; created a recursive view; requires that the query must be monotonic

(5)

  1. output control
    INTO writes the output to a table and creates a new table; insert into inserts into an existing table.
    limit limits the number of output tuple results, offset controls skipping the first few tuples
    Nested query: similar to nested for loops, except that it is out of order and does not traverse every tuple. It is essentially based on bags
  2. rank(); For those with the same value, the rank value is the same, for example, there may be two at most, they are both 1, and the second largest is 3
  3. The window function can be executed incrementally, and over indicates how to split the data;
    rows no no n preceding; the first n values, the first one is only itself.
    rows unbounded preceding means all previous values
    row_num() represents the order in which it appears in the output
    rank() is the order within the group after grouping and sorting. If there is no sorting, it is all 1
    Using group by and aggregation functions will lose some tuples, but using window function will keep them. For example, to find the highest score, the aggregation function will only give the highest score, while the window function will also give other people’s scores, followed by the highest score of the current group.
  4. pivot(); The attribute value becomes the attribute name pivot-table.
  5. group by rollup
  6. CTE, using the result of the subquery, is equivalent to a temporary variable
    The with name as() clause will be executed one step before you execute the normal query
    Can be executed recursively, recursive
    The current understanding is: The result of the iterative query will enter the next iteration process, so if the iterative query does not generate a tuple, the entire iteration will end.

1.2.4 cmu2022 practice

Each command in sqlite3 needs to be followed by a semicolon “;”
.tables ; View all current tables
.schema table_name; view the schema of the relationship

  1. The title in q2 refers to primary_title
  2. The calculation method of DECADE in q5 adopts the method of floor(premiered/10)*10
  3. NTILE() is also a window function, which divides tuples into specified buckets. Found in use
1.
SELECT name, round(AVG(rating), 2) as avf_rat, NTILE(10) over(ORDER BY avf_rat) but_idx
Error Error: no such column: avf_rat
2.
SELECT distinct title, ROW_NUMBER() OVER(ORDER BY title)
If done in this way, distinct will fail

The sorting in over() cannot use the renamed attribute, you need to use the CTE method
The 9th of the requirements in q9 should be sorted in ascending order first to find the data in the 9th bucket. Sort the data in descending order
In q10, CTE recursive needs to be used to concatenate all strings, and a new column needs to be created for unique identification to facilitate subsequent iterative operations.