3. Give sufficient rules for a database to be able to perform recovery when: (a) undo logging is being used (b) redo logging is being used, (c) undo-redo logging is being used. (U1) If T modifies X, then a log of the form ⟨T,X,v⟩ must be written before the new value of X is written to disk. (U2) If a transaction commits, then its COMMIT log record must be written to disk only after all database elements changed by the transaction have been written to disk, but as soon thereafter as possible. [March 25th - Slide 15] (R1) [Also called the write-ahead logging (WAL) rule] Before modifying X on disk, all log records pertaining to this modification of X (that is, both the update record ⟨T,X,v⟩ and the ⟨COMMIT T⟩ record) must appear on disk. [March 25th - Slide 15] (UR1) Before modifying X on disk because of T, it is necessary that the update record ⟨T,X,v,w⟩ appear on disk. (v is the old value; w is the new) (UR2) A ⟨COMMIT T⟩ must be flushed to disk as soon as it appears in the log. [April 6th - Slide 12]
4. Explain how a nonquiescent checkpoint is done if we are using undo-redo logging. Explain how an non-quiescent archive is performed if we are using undo-redo logging. Checkpoint [April 6th - Slide 6] 1) Write a ⟨START CKPT(T1,...,Tk)⟩ to the log 2) Write all dirty buffers to disk. 3) Write an ⟨END CKPT⟩, flush the log. If this kind of checkpoint is being used and a crash occurs, then we look backwards through the log for the first <Start Ckpt (T1 .. Tk)> or <End ckpt>. If we see an <End ckpt>, we know we only have to consider transactions that started after the <Start Ckpt (T1 .. Tk)> If we see a <Start Ckpt (T1 .. Tk)> but no <End ckpt>, then we need to only consider after the transactions T1...Tk began. Archive [April 8th - Slide 4] 1) Write a ⟨START DUMP⟩ record. 2) Perform a check point. 3) Perform a full or incremental dump of the data disks as desired, copying the data in some fixed order, so don't copy the same info twice and eventually finish copying. 4) Copy enough of the log that it includes at least the prefix of the log up to an including the end of item (2). 5) Write a log record ⟨END DUMP⟩. We can now throw away old log from last archive to previous checkpoint.
t1(gets lock 1), t1(gets lock 2), t1(gets exclusive lock 3), t1 does work, t1(unlocks lock 1), t1(unlocks lock 2), t1(unlocks lock 3)
Group: Haitao Huang, Yulong Ran, Tuong Nguyen
1. Suppose T(R)=30000 and T(S)=20000 and S and R share only one attribute Y where V(R,Y)=100, V(S,Y) =1000.. What would be our cost-based plan selection estimate for the number of tuples returned by W=σA<c(R)? R⋈S ? W=σA<c(R): T(W) = T(R) / 3 = 30000/3 = 10000. R⋈S: T(R⋈S) =T(R)⋅T(S)/max(V(R,Y),V(S,Y)) = 30000 * 20000 / 1000 = 600000
2. Briefly explain how the SQL ANALYZE command works for sqlite as well as what kind of statistics are stored in the table sqlite_stat1. The ANALYZE command in SQLITE gathers statistics about tables and indices and stores the collected information in internal tables of the database where the query optimizer can access the information and use it to help make better query planning choices. If no arguments are given, all attached databases are analyzed. If a schema name is given as the argument, then all tables and indices in that one database are analyzed. By default, the statistical information in sqlite will be stored in the table sqlite_stat1 and will be used by the query optimizer to determine the query plan.sqlite_stat1 will have the following schema: sqlite_stat1(tbl,idx,stat); There is normally one row per index, with the index identified by the name in the sqlite_stat1.idx column. The sqlite_stat1.tbl column is the name of the table to which the index belongs. In each such row, the sqlite_stat.stat column will be a string consisting of a list of integers followed by zero or more arguments. The first integer in this list is the approximate number of rows in the index. (The number of rows in the index is the same as the number of rows in the table, except for partial indexes.) The second integer is the approximate number of rows in the index that have the same value in the first column of the index. The third integer is the number number of rows in the index that have the same value for the first two columns. The N-th integer (for N>1) is the estimated average number of rows in the index which have the same value for the first N-1 columns. For a K-column index, there will be K+1 integers in the stat column. If the index is unique, then the last integer will be 1.(Edited: 2020-05-11)
----------------------------------------------------------------- Part 1 solution: A mediator might expose the schema: GovEmpMediator(socialSecurity, agency, salary, fname, lname) ----------------------------------------------------------------- ----------------------------------------------------------------- Part 2 solution: Now suppose we receive the query for employees with million dollar salaries: SELECT fname, lname FROM GovEmpMediator WHERE salary>1000000; For a Federal employee database, we would rewrite this as: SELECT first_name, last_name FROM FedEmps WHERE payGrade>5 For a State employee database, we would rewrite this as: SELECT first, last FROM CaliforniaEmployees WHERE salary>1000000; -----------------------------------------------------------------(Edited: 2020-05-11)
Give the parallel algorithm from class for computing the join of two relations R⋈S.
Assume R is the smaller table:
Similarly, if S is the smaller table, we repeat this algorithm with appropriately swapped table values.