ssa7MyMH

· 6 years ago · Jun 09, 2019, 07:06 AM
1Lecture 1
2==========
3Transaction:
4	- a unit of execution that contains one or more read/write operations on a database
5	- the DBMS's abstract "view" of a user program interacting with a DB, i.e. a sequence of read/write operations
6	
7Transaction properties (ACID): properties a DBMS must ensure for a transaction in order to maintain data in the presence of concurrent execution / system failures
8	- atomicity:
9		- a transaction is atomic
10		- either all the operations in the transaction are executed or none are
11		
12	- consistency:
13		- a transaction must preserve the consistence of the DB after execution
14		- consistent: transactions do not violate the integrity constraints specified on the DB (e.g. restrictions declared via CREATE TABLE statements)
15	
16	- isolation:
17		- a transaction is protected from the effects of concurrently scheduling other transactions
18		- i.e. users must not deal with arbitrary effects produced by other concurrently running transactions (a transaction is seen as if it were in single-user mode)
19		
20	- durability:
21		- commited changes are persisted to the DB
22		- the effects of a commited transaction should persist even if a system crash occurs before all the changes were written to disk
23
24Write-Ahead Log property:
25	- changes are written to the log (on the disk) before being reflected in the DB
26	- ensures atomicity and durability
27	
28Interleaved Execution:
29	- 2 transactions only read a data object => no conflict, order of execution not important
30	- 2 transactions read/write different data objects => no conflict, order of execution not important
31	- 1 transaction writes and 1 transaction reads/writes on the same data object => conflict, order of execution is important
32	
33	Conflicts:
34		- WR conflict: T2 is reading a data object previously written by T1 (dirty reads)
35		- RW conflict: T2 is writing a data object previously read by T1 (unrepeatable reads)
36		- WW conflict: T2 is writing a data object previously written by T1 (overwriting uncommited data)
37		
38Lecture 2
39==========	
40Schedule: a list of operations (read/write/commit/abort) of n transactions with the property that the order of the operations in each transaction is preserved
41
42Serial schedule: a schedule in which the actions of different transactions are not interleaved
43Non-serial schedule: a schedule in which the actions of different transactions are interleaved
44
45Serializability:
46	- C - set of transactions
47	- Sch(C) - set of schedules for C
48	
49	- Serializable schedule:
50		- a schedule S from Sch(C) is serializable if the effect of S on a consistent DB instance is identical to the effect of some serial schedule S0 from Sch(C)
51		
52	- Objective:
53		- finding interleaved schedules that permit the transactions to execute concurrently without interfering with each other (such schedules result in a correct DB state)
54		
55	- The order of read and write operations is important
56	
57	- Actions that cannot be swapped in a schedule:
58		- actions belonging to the same transaction
59		- actions in different transactions operating on the same object if at least one of them is a write
60		
61	2 schedules are conflict equivalent if conflict(S1) = conflict(S2), i.e.:
62		- S1 and S2 contain the same operations of the same transactions
63		- every pair of conflicting operators is ordered the same in S1 and S2
64		
65	Conflict serializability:
66		- C - set of transactions
67		- Sch(C) - set of schedules for C
68		- Let S be a schedule in Sch(C)
69		- S is conflict serializable if there exists a serial schedule S0 from Sch(C) such that conflict(S) = conflict(S0), i.e. S is conflict equivalent to some serial schedule.
70		
71	A schedule S from Sch(C) is conflict serializable if and only if its precedence graph is acyclic.
72	
73	Precedence graph:
74		1. Create a node Ti for every commited transaction in the schedule
75		2. Arc (Ti, Tj) if Tj executes Read(A) after a Write(A) by Ti
76		3. Arc(Ti, Tj) if Tj executes Write(A) after a Read(A) by Ti
77		4. Arc(Ti, Tj) if Tj executes Write(A) after a Write(A) by Ti
78		S is conflict serializable if and only if the resulting graph is acyclic
79
80	- Every conflict serializable schedule is serializable (when items can only be updated, no insert/delete)
81	- There are serializable schedules that are not conflict serializable
82	- Conflict serializability is a sufficient condition for serializability but it is not a necessary one
83	
84	View serializability:
85		- sufficient condition for serializability
86		- based on view-equivalence
87		
88		S1 and S2 are view equivalent:
89			- if Ti reads the initial value V from S1, then Ti also reads the initial value of V in S2
90			- if Ti reads the value of V written by Tj in S1, then Ti also reads the value of V written by Tj in S2
91			- if Ti writes the final value of V in S1, then Ti also writes the final value of V in S2
92			
93		A schedule S is view serializable if there exists a serial schedule S0 from Sch(C) such that S is view equivalent to S0
94		
95	Recoverable schedules:
96		- a schedule in which a transaction T commits only after all transactions whose changes T read commit
97		
98		A schedule in which a transaction T is reading only changes of commited transactions is said to avoid cascading aborts
99		Avoiding cascading aborts => recoverable schedules
100		
101	Lock-Based concurrency control:
102		- technique used to guarantee serializable and recoverable schedules
103		- lock:
104			- tool used by the transaction manager to control concurrent access to data
105			- prevents a transaction from accessing a data object while another transaction accesses it
106		
107		- transaction protocol:
108			- a set of rules enforced by the transaction manager and obeyed by all transactions
109			- example: simple protocol: before a transaction can read/write an object, it must acquire the appropriate lock on that object
110			- locks in conjuction with transaction protocols allow interleaved executions
111			
112		SLock (shared/read lock):
113			- only read an object, not write it
114		XLock (exclusive/write lock):
115			- read/write an object
116			
117		If a transaction holds a SLock on an object, other transactions can acquire an SLock on that object, but not XLock.
118		Lock upgrade:
119			- An SLock granted to a transaction can be upgraded to an XLock
120			
121		Transactions are issuing lock requests to the lock manager
122		Locks are held until being explicitly released by transactions
123		Lock acquire/lock release requests are automatically inserted into transactions by the DBMS
124		Locking/unlocking - atomic operations
125		
126		Lock table:
127			- structure used by the lock manager to keep track of granted locks/lock requests
128			Entry in the lock table (corresponding to one data object):
129				- number of transactions holding a lock on that object
130				- lock type (SLock/XLock)
131				- pointer to a queue of lock requests
132				
133		Transactions table:
134			- structure maintainted by the DBMS
135			- one entry per transaction
136			- keeps a list of locks held by every transaction
137			
138		Strict Two-Phase Locking (2PL):
139			- all the locks held by a transaction are automatically released when it completes execution
140			- Strict 2PL allows only serializable schedules
141			
142		Deadlocks: lock-based concurrency control techniques can lead to deadlocks
143			- cycle of transactions waiting for one another to release a locked resource
144			- normal execution can no longer continue without an external intervention (i.e. until the deadlock is resolved)
145
146			Deadlock management:
147				- prevention:
148					- assign transactions timestamp-based priorities
149					2 policies:
150						- Wait-Die:
151							- if Ti.priority > Tj.priority, Ti can wait; otherwise, Ti is aborted
152						- Wound-Wait:
153							- if Ti.priority > Tj.priority, Tj is aborted; otherwise Ti can wait
154					- if an aborted transaction is restarted, it's assigned its original timestamp
155					
156					a. Waits-for graph:	
157						- 1 node per active transaction
158						- arc from Ti to Tj if Ti is waiting for Tj to release a lock
159						Cycle in the lock => deadlock
160						DBMS periodically checks if there are cycles in the waits-for graph
161						
162					b. Timeout mechanism:
163						- very simple, practical method
164						- if a transaction T has been waiting for too long for a lock on an object, a deadlock is assumed to exist and T is terminated
165				
166				Choosing the deadlock victim:
167					- number of locks held
168					- number of objects modified by the transaction
169					...
170					The policy should be "fair", i.e. if a transaction has been chosen as the victim several times, it should eventually be allowed to proceed
171				- detection:
172					- allow deadlocks to occur and resolve them when they arise
173					
174		2PL:
175			- once a transaction releases a lock, it cannot request other locks
176			
177			Phase 1 (growing phase):
178				- transaction acquires locks
179				
180			Phase 2 (shrinking phase):
181				- transaction releases locks
182				
183			If all transactions in C obey 2PL, then any schedule S from Sch(C) that completes normally is serializable
184			
185			Transaction Ti wrote object A => transaction Tj can read/write A only after Ti's completion (commit/abort)
186			Strict schedules => Avoiding cascading aborts => recoverable schedules
187			
188			Strict 2PL allows only strict schedules
189			
190Lecture 3
191==========
192The Phantom problem:
193	- in the presence of insert operations, conflict serializability does not guarantee serializability
194	
195Isolation levels:
196	- determine the degree to which a transaction is isolated from the changes made by other concurrently running transactions
197	
198	4 isolation levels:
199		- read uncommited
200		- read commited
201		- repeatable read
202		- serializable
203		
204		Read uncommited:
205			- a transaction must acquire an exclusive lock prior to writing an object
206			- no locks are requested when reading objects
207			- XLocks are released at the end of the transaction
208			- Lowest degree of isolation
209			
210		Read commited:
211			- a transaction must acquire an XLock when writing an object
212			- a transaction must acquire a SLock when reading an object
213			- SLock released instantly
214			- XLock released at the end of the transaction
215			
216		Repeatable read:
217			- XLock when writing
218			- SLock when reading
219			- SLocks and XLocks are released at the end of the transaction
220			
221		Serializable:
222			- Acquire locks on objects before reading/writing
223			- A transaction can also acquire locks on sets of objects that must remain unmodified
224			- Locks are released at the end of the transaction
225			- Highest degree of isolation
226			
227Lecture 4
228==========
229Recovery manager ensures 2 important properties of transactions:
230	- atomicity (effects of uncommited transactions are undone)
231	- durability (effects of commited transactions survive system crashes)
232	
233	file:///C:/Users/necso/Desktop/DBMS/Lectures&Seminaries/Week4/DBMSs_lecture4.pdf (slide 3 - picture)
234	
235Transaction failures:
236	- system failure (hardware failures, bugs in the OS...; all running transactions terminate):
237		- internal memory lost
238		- external memory not affected
239		
240	- application error (divison by 0, infinte loop etc)
241	
242	- action by the Transaction Manager:
243		- a transaction is chosen as the deadlock victim and terminated
244		- the transaction might execute successfully if executed again
245		
246	- self-abort:
247		- based on some computations, a transaction can decide to terminate and undo its actions
248		- special statements (e.g. ABORT, ROLLBACK)
249		
250Normal execution:
251	- transaction:
252		- reads/writes DB objects
253	- read DB object O:
254		- bring O from disk into a frame in the Buffer Pool (BP)
255		- copy O's value into a program variable
256	- write DB object O:
257		- modify an in-memory copy of O (in the BP)
258		- write the in-memory copy to disk
259		
260Buffer Management:
261	- higher level layer L in the DBMS:
262		- asks the BM for page P
263		- if P not in BP, BM brings it into a frame F in the BP
264		- when P is no longer needed:
265			- L releases P
266			- the BM is notified, so F can be reused
267			- if P has been modified, the BM is notified and propagates the changes in the BP to the disk
268			
269			(continuation slide 10 - le-am facut pe-astea semestrul trecut)
270	
271Writing objects: transaction T changes Object O (in the BP)
272	- steal approach:
273		- T's changes can be written to disk while T is in progress
274		
275		Advantage:
276			- changes of aborted transactions don't have to be undone
277		Drawback:
278			- assumption: all pages modified by active transactions can fit in the BP
279			
280	- no-steal approach:
281		- T's changes cannot be written to disk before T commits
282		
283	- force approach:
284		- T's changes are immediately forced to disk when T commits
285		
286		Advantage:
287			- actions of commited transactions don't have to be redone
288		Drawback:
289			- can result in excessive I/O
290			
291	- no-force approach:
292		- T's changes are not forced to disk when T commits
293			
294	Steal, no-force approach: used by most systems
295	
296ARIES:
297	- recovery algorithm
298	- steal, no-force
299	- three phases:
300		- analysis
301		- redo
302		- undo
303		
304	- fundamental principle: Write-Ahead logging
305		- a change to an object O is first recorded in the log (e.g. in log record LR)
306		- LR must be written to disk before the change to O is written to disk
307		
308	Analysis:
309		- determine active transactions at the time of the crash
310		- determine dirty pages (pages in BP whose changes have not been written to disk yet)
311		
312	Redo:
313		- reapply all changes (in normal order) (starting from a certain record in the log), i.e. bring the DB to the state it was in when the crash occured
314		
315	Undo:
316		- undo changes of uncommited transactions (in reverse order)
317		
318	System restarts after a crash:
319		Restart: 3 phases:
320			- Analysis:
321				- reconstructs state at the most recent checkpoint
322				- scans the log forward from the most recent checkpoint
323
324				- identifies:
325					- active transactions at the time of crash (to be undone)
326					- potentially dirty pages at the time of crash
327					- the starting point for Redo
328
329			- Redo:
330				- repeats history (reapplies changes to dirty pages)
331				- all updates are reapplied (regardless of whether that transaction commited or not)
332				- starting point is determined during Analysis
333				- scans the log forward until the last record
334
335			- Undo:
336				- effects of active transactions at the time of crash are undone
337				- changes are undone in the opposite order
338				
339Security (End of lecture 5, lecture 6)
340========
341- DB protection:
342	- security:
343		- protecting the data against unauthorized users
344		- users have the right to do what they are trying to do
345		
346	- integrity:
347		- protecting the data against authorized users
348		- the operations users are trying to execute are correct
349		
350Aspects related to security:
351	- legal, ethical:
352		- a person searches for a password or accidentally finds such a password and uses it
353	- physical controls:
354		- e.g. the computer room is locked/guarded
355	- software controls:
356		- files protection
357	- operational problems:
358		- if a password mechanism is used, how are the passwords kept secret and how often should a user change his/her password?
359		
360Data security/integrity - similarities:
361	- system enforces certain constraints (that users cannot violate)
362	- constraints are:
363		- specified in a declarative language
364		- saved in the system catalog
365	- the system monitors user's actions to enforce the specified constraints
366	
367DBMS's authorization subsystem (security subsystem):
368	- checks any given access request against the applicable constraints
369	- access request consists of:
370		- requested object
371		- requested operation
372		- requesting user
373
374	- the system must recognize the source of request:
375		- authentication mechanism:
376			- user && password
377			- fingerprint readers, voice verifiers, retina scanners, etc
378			
379	- main approaches to data security in a DBMS:
380		- discretionary control:
381			- users have different access right on different objects
382			- the operations users are allowed are explicitly specified
383			- anything not explicitly authorized is forbidden
384			
385		- mandatory control:
386			- every object has a classification (e.g. top secret, secret, etc)
387			- every user has a clearance level (same options as for classification levels)
388			- levels - strict ordering
389			
390	Bell and La Padula rules:
391		- user x can retrieve object y if the clearance level of x is >= classification level of y
392		- user x can update object y if cleareance of x == classification of y
393		
394	DB user management:
395		- user:
396			- username
397			- password
398			- additional information: time intervals during which access is allowed; privileges; answers to some questions
399			
400		SQL statements: CREATE/ALTER/DROP user
401		Privileges:
402			- GRANT/REVOKE privilege_list ON component TO/FROM user_list [options]
403			
404		DB admin is monitoring the DB and keeping track of all operations performed by users via audit trails that contain:
405			- executed SQL statements
406			- user names, IPs, dates and times
407			- affected objects
408			- old values and new values for changed records
409			
410		SQL injection:
411			- application -> execution of a SQL statement: fixed statement, statement that is generated using input data from the user
412			- a statement can be changed (when being generated) due to data supplied by the user
413			- such a change is attemped when an user is trying to obtain additional access rights
414			- the user inputs code into input variables; the code is concatenated with SQL statements and executed
415			- the DBMS executes all syntactically valid statements
416			
417			1. Changing an authentication statement:
418				- errors when executing the statement -> use error messages in subsequent executions
419				
420				- authenticating with an username && password:
421					SELECT ... WHERE user = "uname" AND password = "upassword"
422					- if the statement returns at least 1 record, authentication is successful
423					
424				Values that change the SQL statement (modify the action intended by the programmer):
425					- user: a; password: " OR 1 = 1 --
426					
427			- SELECT ... WHERE ... = "v" ...
428			Possible value entered by the user:
429				x" OR 1 = (SELECT COUNT(*) FROM table) --
430					error -> change table name
431					- one can obtain the name of a table
432					
433				x" AND user IS NULL --
434					error -> change column name
435					- one can obtain the name of a column
436					
437				x" AND user LIKE "%pop%" --
438					error/no data -> change the name of the column or the string in the condition
439					- one can obtain the name of a user
440					
441				0; DROP TABLE users
442					
443			Prevention:
444				- data validation:
445					- regular expressions, allow only certain types of characters in the input data
446					
447				- modify problematic characters:
448					- double the single/double quotes
449					- precede single and double quotes with \
450					- use parametrized statements:
451						SELECT ... WHERE user = ? AND password = ?, where ? is a parameter
452						- such a statement has a collection of parameters
453						
454			Data enconding:
455				- if an intruder gains physical access to the server or the data center, we need to encode data
456				- storing/transmitting encoded data => stolen data is illegible for the intruder
457				- some DBMSs store data in an encoded form; using the data is impossible/extremely difficult (decoding cost is enormous)
458				
459				Codes and ciphers:
460					- code:
461						- replace one word or phrase with another word/number/symbol
462					- cipher:
463						- replace letters with a letter/number/symbol
464						
465				Steganography:
466					- hide the existence of the messages
467				
468				Cryptography:
469					- hide the meaning of the message - encryption
470					- transposition:
471						- rearrange letters in the message (create anagrams)
472						- every letter keeps its identity, but changes position
473						
474					- substitution:
475						- pair the letters of the alphabet (randomly)
476						- replace each letter in the message with its pair
477						- every letter retains its position, but changes its identity
478						
479						- plaintext:
480							- message before encryption
481						
482						- ciphertext:
483							- message after encryption
484						
485						- plain alphabet:
486							- alphabet used to write the message
487							
488						- cipher alphabet:
489							- alphabet used to encrypt the message
490						
491						Example:
492							- shift the original alphabet by 1 position (replace A with B, B with C, etc)
493							
494				Algorithm:
495					- general encryption method
496					- not secret
497					
498				Key:
499					- details of a particular encryption
500					- must be kept secret
501					- large number of keys
502					
503				The key distribution problem:
504					- if 2 parties want to exchange a message, they must first exchange a secret key:
505						- meet in person, use couriers
506						- logistical difficulties, costs way too much
507						
508					- Whitfield Diffie, Martin Hellman, Ralph Merkle:
509						- one-way functions in modular arithmetic
510						
511						One-way function: Y^x(mod P) (lecture 6, slide 27)
512						
513				Data encoding methods:
514				
515				1. Use a secret encryption key:
516					- data: disciplina baze de date
517					- secret key: student
518					
519					lecture 6, slide 28, explicat bines
520					
521				2. RSA:
522					- r = p * q
523						- r can pe publicly announced
524						- p, q primes
525					
526					2. Choose a number such that:
527						- c co-prime with (p-1)(q-1)
528							c > (p-1)(q-1)
529							
530					- c is used as a public encryption key
531				
532				3. Determine a secret decryption key d that verifies:
533					d * c = 1 % (p-1)(q-1)
534					
535				Encryption:
536					- determine code v:
537						v = m^c % r
538				
539				Decryption:
540					v^d % r
541					
542Lecture 7
543==========
544	Index I with search key <a, b, c>
545	
546	a = 10 AND b = 5 AND c = 2 => B+ tree index, Hash index
547	a = 10 AND b = 5 => B+ tree index
548	b = 5 => Nothing
549	b = 5 and c = 2 => Nothing
550	d = 2 => Nothing
551	a = 20 AND b = 10 AND c = 5 AND d > 11 => B+ tree partly, Hash index partly
552	
553	Tree index: I matches C if C contains exactly one term for each attribute in a prefix of the search key of I
554	Ex: if key <a, b, c>, prefix: <a> or <a, b>
555	
556	Hash index: I matches C if C contains exactly one term for each attribute in the search key of I
557	Ex: if key <a, b, c> then a = 1 AND b = 1 AND c = 1
558	
559	Joins - implementation techniques:
560		- iteration:
561			- simple/page oriented nested loops join
562			- block nested loops join
563		- indexing:
564			- index nested loops join
565		- partitioning:
566			- sort-merge join
567			- hash join
568			
569		Equality join:
570			- join condition : Ei = Sj
571			
572		Exams E:
573			- M pages
574			- pE records/page
575		
576		Students S:
577			- N pages
578			- pS records/page
579			
580		Evaluation: number of I/O operations
581		
582		Simple nested loops join:
583			- foreach tuple in foreach tuple
584			- Total cost: M + pE * M * N
585			
586		Page-oriented loops join:
587			- foreach page in foreach page
588			- small improvement from simple nested loops join
589			- Total cost: M + M * N
590			
591		Block nested loops join:
592			- B = B - 2
593			- calc no of S blocks
594			- Total cost: M + B * N
595			
596		Index nested loops join:
597			- Total cost: M + ( (M*pE) * cost of finding corresponding records in S)
598			
599		Sorting:
600			- if data fits into available main memory:
601				- use an internal sorting algorithm
602				
603			- if data doesn't fit into available main memory:
604				- use an external sorting algorithm:
605					- objective: minimize cost of disk access
606					- create runs: sort records in the data source that fit into main memory
607					- place runs into temporary files
608					- merge runs using an external sorting algorithm
609					
610				Simple Two-Way merge sort:
611					- 3 buffer pages
612					- repeated passes over the data
613					- even large data sources can be sorted with a small amount of main memory
614					
615					N = number of pages
616					Total cost: 2 * N * ([log2(N)] + 1) I/Os
617
618				External Two-Way merge sort:
619					- simple 2way merge sort - buffer pages are not used effectively
620					
621					B - buffer pages
622					N - number of pages to sort
623					Total cost: 2 * N * ([log(B-1)[N/B]] + 1) I/Os
624					
625				Sort-Merge join:
626					M - no of pages on object Exams
627					N - no of pages on object Students
628					E join S => 
629						- cost: O(MlogM) to sort E
630						- cost: O(NlogN) to sort S
631						- merge cost:
632							- M + N I/Os
633							- M * N I/Os worst case
634							
635				Hash join:
636					- partitioning phase (building phase)
637					- probing phase (matching phase)
638					
639					Cost:	
640						partitioning - E,S - read, written once: 2*(M+N) I/Os
641						probing - scan each partition once: M+N I/Os
642						=> total cost: 3 * (M+N) I/Os
643						
644Lecture 10
645==========
646- linear trees:
647	- at least one child of a join node is a base relation
648- left-deep trees:
649	- right child of each join node is a base relation
650- bushy tree:	
651	- not linear
652	
653Lecture 11
654==========
655Centralized DB systems:
656	- all data: single site
657	- transaction processing: sequential
658	- processor fails => system fails
659	
660Distributed systems:
661	- multiple processors (+ memories)
662	- autonomous components
663	- data is stored at several sites
664	- each site is managed by a DBMS that can run independently of the other sites
665	- location of data - impact on:	
666		- query processing
667		- query optimization
668		- concurrency control
669		- recovery
670		
671	- impact of data distribution - transparent:
672		- users should be able to write queries without knowing/specifying the actual location of the data
673		- cost-based query optimization that takes into account communication costs && differences in local computation costs
674		
675	- distributed transaction atomicity:
676		- users should be able to write transactions accessing multiple sites just as they would write local transactions
677		- transactions are atomic:
678			- all changes persist if the transaction commits, none persist if it aborts
679			
680	- Homogenous:
681		- every site runs the same DBMS software
682		
683	- Heterogenous (multi db system):
684		- different sites runs different DBMSs
685		
686	- Gateway:
687		- a software component that accepts requests (in some subset of SQL), submits them to local DBMSs, and requests the answers to the requestors
688		
689	Challenges (for distributed DBs):
690		- deciding where to store data
691		- 2 sub-problems:
692			- fragmentation
693			- allocation
694		- distributed concurrency control:
695			- transaction schedules must be globally serializable
696			- distributed deadlock management
697		- reliability of distributed databases:
698			- transaction failures:
699				- one or more processors may fail
700				- the network may fail
701			- data must be synchronized
702			
703	Client-Server systems:
704		- the client submits requests to a single site
705		- query processing is done at the server
706	Collaborating Server Systems:
707		- queries can span several sites
708		
709	Storing data:
710		- relations stored at several sites
711		- accessing relations at remote sites => message-passing costs
712		- reduce costs:
713			- fragment a relation across several sites:
714				- store fragments where they are most often accessed
715			- replicate a relation at each site where it's needed the most
716			
717		Fragmentation:
718			- break a relation into smaller relations (fragments)
719			- store the fragments instead of the relation itself
720			
721			3 types:
722				- horizontal:
723					- fragment: subset of rows
724					- n selection predicates => n fragments (n record sets)
725					- fragments should be disjoint
726					- reconstruct the original relation:
727						- take the union of the horizontal fragments
728					- performed with selection predicates
729						
730				- vertical:
731					- fragment: subset of columns
732					- performed using projection operators:
733						- must obtain a good decomposition
734						- reconstruction operator - natural join
735				
736				- hybrid:
737					- horizontal + vertical fragmentation
738		
739		Replication:
740			- store multiple copies of:
741				- a relation
742				- a relation fragment
743				
744			How are the copies of the data kept current when the relation is changed?
745			- types:	
746				- synchronous:
747					- transaction T modified relation R
748					- before T commits, it synchronizes all copies of R
749					
750					- transaction T, object O (with several copies)
751					- no matter which copy of O it accesses, T should see the same value
752					- 2 basic techniques:
753						- voting:
754							- to modify O, T must write a majority of its copies
755							- when reading O, T must read enough copies to make sure it's seeing at least one current copy
756							- e.g. 10 copies; 7 copies written; 4 copies should be read
757							- each copy has a version number
758							- not an attractive approach in most cases, because reads are usually much more common than writes
759							
760						- read-any write-all:
761							- modify O:
762								- T must write all copies
763							- read O:
764								T can read any copy
765							- fast reads, slower copies (relative to the voting technique)
766							- most common approach to synchronous replication
767							
768					Costs:
769						- before an update transaction T can commit, it must lock all copies of the modified relation/fragment
770							- T sends lock requests to remote sites
771							- while waiting for the response, T hold on to other locks
772						- site/link failures:
773							- T cannot commit until the network/sites are back up
774						- no failures, locks immediately obtained:
775							- T must follow an expensive commit protocol
776						
777						=> asynchronous replication is more used
778					
779				- asynchronous:
780					- transaction T modifies relation R
781					- R's copies are synchronized periodically
782					- i.e. it's possible that some of R's copies are outdated for brief periods of time
783					- distributed data independence is compromised
784					- a lot of current systems use this approach
785				
786					- readers can just access one copy of O
787					- 2 approaches:
788						- difference between the 2 approaches: number of updatable copies (master copies)
789						
790						- primary site replication:
791							- exactly one copy of an object O is designed as the primary/master copy
792							- primary copy is published
793							- other sites can subscribe to (fragments of) the primary copy - secondary copies
794							- secondary copies cannot be updated
795							
796							- main issue:
797								- how are the changes to the primary copy propagated to the secondary copies?
798									- capture the changes made by commited transactions
799									- apply these changes to the secondary copies
800									
801							- capture:
802								- log-based capture:
803									- the log (used for recovery) is used to generate the Change Data Table (CDT) structure
804									- write log tail to stable storage => write all records affecting replicated relations to the CDT
805									- changes of aborted transactions must be removed from the CDT
806									- in the end, CDT contains only update log records of commited transactions
807								
808								- procedural capture:
809									- capture is performed through a procedure that is automatically invoked (e.g. a trigger)
810									- typically, the procedure just takes a snapshot of the primary copy
811									
812								- log based:
813									- smaller overhead
814									- smaller delay
815									- but it depends on proprietary log details
816									
817							- apply:
818								- applies changes collected in the Capture step (from the CDT/snapshot) to the secondary copies
819								- primary site can continuously send the CDT
820								- secondary sites can periodically request a snapshot or the CDT from the primary site
821								- each secondary site runs a copy of the apply process
822								
823							- log-based capture + continuous apply:
824								- minimizes delay in propagating changes
825							- procedural capture + application-driven apply:
826								- most flexible way to process changes
827								
828						- peer-to-peer replication:
829							- several copies of object O can be master copies (i.e. updatable)
830							- changes to master copy must be propagated to the other copies
831							- conflict resolution strategy:
832								- 2 master copies are changed in a conflicting manner:
833								- in general - ad hoc approaches to conflict resolution
834							
835							- best utilized when conflicts do not arise:
836								- each master site owns a fragment (usually a horizontal fragment):
837									- any 2 fragments updatable by different master sites are disjoint
838								- updating rights are held by one master site at a time