mE5yhFgV

· 5 years ago · Feb 02, 2020, 08:04 PM
1Subject:  TECH: Internals of Recovery
2Type:  REFERENCE 
3Creation Date:  13-SEP-1996 
4
5Oracle7 v7.2 Recovery Outline 
6
7Authors: Andrea Borr  & Bill Bridge 
8Version: 1 May 3, 1995 
9
10Abstract 
11 
12This document gives an overview of how database recovery works  
13in Oracle7 version 7.2. It is assumed that the reader is familiar  
14with the Database Administrator's Guide for Oracle7 version 7.2.  
15The intention of this document is to describe the recovery  
16algorithms and data structures, providing more details than the  
17Administrator's Guide. 
18 
19Table of Contents 
20 
211	Introduction 
22	1.1	Instance Recovery and Media Recovery: Common Mechanisms 
23	1.2	Instance Failure and Recovery, Crash Failure and Recovery 
24	1.3	Media Failure and Recovery 
25 
262	Fundamental Data Structures 
27	2.1	Controlfile 
28		2.1.1	Database Info Record (resetControlfile) 
29		2.1.2	Datafile Record (Controlfile) 
30		2.1.3	Thread Record (Controlfile) 
31		2.1.4	Logfile Record (Controlfile) 
32		2.1.5	Filename Record (Controlfile) 
33		2.1.6	Log-History Record (Controlfile) 
34	2.2	Datafile Header 
35	2.3	Logfile Header 
36	2.4	Change Vector 
37	2.5	Redo Record 
38	2.6	System Change Number (SCN) 
39	2.7	Redo Logs 
40	2.8	Thread of Redo 
41	2.9	Redo Byte Address (RBA) 
42	2.10	Checkpoint Structure 
43	2.11	Log History 
44	2.12	Thread Checkpoint Structure 
45	2.13	Database Checkpoint Structure 
46	2.14	Datafile Checkpoint Structure 
47	2.15	Stop SCN 
48	2.16	Checkpoint Counter 
49	2.17	Tablespace-Clean-Stop SCN 
50	2.18	Datafile Offline Range 
51 
523	Redo Generation 
53	3.1	Atomic Changes 
54	3.2	Write-Ahead Log 
55	3.3	Transaction Commit 
56	3.4	Thread Checkpoint 
57	3.5	Online-Fuzzy Bit 
58	3.6	Datafile Checkpoint 
59	3.7	Log Switch 
60	3.8	Archiving Log Switches 
61	3.9	Thread Open 
62	3.10	Thread Close 
63	3.11	Thread Enable 
64	3.12	Thread Disable 
65 
664	Hot Backup 
67	4.1	BEGIN BACKUP 
68	4.2	File Copy 
69	4.3	END BACKUP 
70	4.4	"Crashed" Hot Backup 
71 
725	Instance Recovery 
73	5.1	Detection of the Need for Instance Recovery 
74	5.2	Thread-at-a-Time Redo Application 
75	5.3	Current Online Datafiles Only 
76	5.4	Checkpoints 
77	5.5	Crash Recovery Completion 
78 
796	Media Recovery 
80	6.1	When to Do Media Recovery 
81	6.2	Thread-Merged Redo Application 
82	6.3	Restoring Backups 
83	6.4	Media Recovery Commands 
84		6.4.1	RECOVER DATABASE 
85		6.4.2	RECOVER TABLESPACE 
86		6.4.3	RECOVER DATAFILE 
87	6.5	Starting Media Recovery 
88	6.6	Applying Redo, Media Recovery Checkpoints 
89	6.7	Media Recovery and Fuzzy Bits 
90		6.7.1	Media-Recovery-Fuzzy 
91		6.7.2	Online-Fuzzy 
92		6.7.3	Hotbackup-Fuzzy 
93	6.8	Thread Enables 
94	6.9	Thread Disables 
95	6.10	Ending Media Recovery (Case of Complete Media Recovery) 
96	6.11	Automatic Recovery 
97	6.12	Incomplete Recovery 
98		6.12.1	Incomplete Recovery UNTIL Options 
99		6.12.2	Incomplete Recovery and Consistency 
100		6.12.3	Incomplete Recovery and Datafiles Known to the 
101			Controlfile 
102		6.12.4	Resetlogs Open after Incomplete Recovery 
103		6.12.5	Files Offline during Incomplete Recovery 
104	6.13	Backup Controlfile Recovery 
105	6.14	CREATE DATAFILE: Recover a Datafile Without a Backup 
106	6.15	Point-in-Time Recovery Using Export/Import 
107 
1087	Block Recovery 
109	7.1	Block Recovery Initiation and Operation 
110	7.2	Buffer Header RBA Fields 
111	7.3	PMON vs. Foreground Invocation 
112 
1138	Resetlogs 
114	8.1	Fuzzy Files 
115	8.2	Resetlogs SCN and Counter 
116	8.3	Effect of Resetlogs on Threads 
117	8.4	Effect of Resetlogs on Redo Logs 
118	8.5	Effect of Resetlogs on Online Datafiles 
119	8.6	Effect of Resetlogs on Offline Datafiles 
120	8.7	Checking Dictionary vs. Controlfile on Resetlogs Open 
121 
1229	Recovery-Related V$ Fixed-Views 
123	9.1	V$LOG 
124	9.2	V$LOGFILE 
125	9.3	V$LOG_HISTORY 
126	9.4	V$RECOVERY_LOG 
127	9.5	V$RECOVER_FILE 
128	9.6	V$BACKUP 
129 
13010	Miscellaneous Recovery Features 
131	10.1	Parallel Recovery (v7.1) 
132		10.1.1	Parallel Recovery Architecture 
133		10.1.2	Parallel Recovery System Initialization Parameters 
134		10.1.3	Media Recovery Command Syntax Changes 
135	10.2	Redo Log Checksums (v7.2) 
136	10.3	Clear Logfile (v7.2) 
137 
138 
139 
140 
141 
142 
143 
144 
1451  Introduction 
146 
147The Oracle RDBMS provides database recovery facilities capable  
148of preserving database integrity in the face of two major failure  
149modes: 
150 
1511.	Instance failure: loss of the contents of a buffer cache, or data  
152residing in memory. 
153 
1542.	Media failure: loss of database file storage on disk. 
155 
156Each of these two major failure modes raises its own set of  
157challenges for database integrity. For each, there is a set of  
158requirements that a recovery utility addressing that failure mode  
159must satisfy. 
160 
161Although recovery processing for the two failure modes has much  
162in common, the requirements differ enough to motivate the  
163implementation of two different recovery facilities: 
164 
1651.	Instance recovery: recovers data lost from the buffer cache  
166due to instance failure. 
167 
1682.	Media recovery: recovers data lost from disk storage. 
169 
1701.1  Instance Recovery and Media Recovery: Common Mechanisms 
171 
172Both instance recovery and media recovery depend for their  
173operation on the redo log. The redo log is organized into redo  
174threads, referred to hereafter simply as threads. The redo log of a  
175single-instance (non-Parallel Server option) database consists of a  
176single thread. A Parallel Server redo log has a thread per instance. 
177 
178A redo log thread is a set of operating system files in which an  
179instance records all changes it makes - committed and  
180uncommitted - to memory buffers containing datafile blocks.  
181Since this includes changes made to rollback segment blocks, it  
182follows that rollback data is also (indirectly) recorded in the redo  
183log. 
184 
185The first phase of both instance and media recovery processing is  
186roll-forward. Roll-forward is the task of the RDBMS recovery  
187layer. During roll-forward, changes recorded in the redo log are re- 
188applied (as needed) to the datafiles. Because changes to rollback  
189segment blocks are recorded in the redo log, roll-forward also  
190regenerates the corresponding rollback data. When the recovery  
191layer finishes its task, all changes recorded in the redo log have  
192been restored by roll-forward. At this point, the datafile blocks  
193contain not only all committed changes, but also any uncommitted  
194changes recorded in the redo log. 
195 
196The second phase of both instance and media recovery processing  
197is roll-back. Roll-back is the task of the RDBMS transaction layer.  
198During roll-back, undo information from rollback segments (as  
199well as from save-undo/deferred rollback segments, if appropriate)  
200is used to undo uncommitted changes that were applied during the  
201roll-forward phase. 
202 
2031.2  Instance Failure and Recovery, Crash Failure and Recovery 
204 
205Instance failure, a failure resulting in the loss of the instance's  
206buffer cache, occurs when an instance is aborted, either  
207unexpectedly or expectedly. Examples of reasons for unexpected  
208instance aborts are operating system crash, power failure, or  
209background process failure. Examples of reasons for expected  
210instance aborts are use of the commands SHUTDOWN ABORT  
211and STARTUP FORCE. 
212 
213Crash failure is the failure of all instances accessing a database. In  
214the case of a single-instance (non-Parallel Server option) database,  
215the terms crash failure and instance failure are used  
216interchangeably. Crash recovery (equivalent to instance recovery in  
217this case) is the process of recovering all online datafiles to a  
218consistent state following a crash. This is done automatically in  
219response to the ALTER DATABASE OPEN command. 
220 
221In the case of the Parallel Server option, the term crash failure is  
222used to refer to the simultaneous failures of all open instances.  
223Parallel Server crash recovery is the process of recovering all  
224online datafiles to a consistent state after all instances accessing the  
225database have failed. This is done automatically in response to the  
226ALTER DATABASE OPEN command. Parallel Server instance  
227failure refers to the failure of an instance while a surviving instance  
228continues in operation. Parallel Server instance recovery is the  
229automatic recovery by a surviving instance of a failed instance. 
230 
231Instance failure impairs database integrity because it results in loss  
232of the instance's dirty buffer cache. A "dirty" buffer is one whose  
233memory version differs from its disk version. An instance that  
234aborts has no opportunity for writing out "dirty" buffers so as to  
235prevent database integrity breakage on disk following a crash. Loss  
236of the dirty buffer cache is a problem due to the fact that the cache  
237manager uses algorithms optimized for OLTP performance rather  
238than for crash-tolerance. Examples of performance-optimizing  
239cache management algorithms that make the task of instance  
240recovery more difficult are as follows: 
241 
2427	LRU (least recently used) based buffer replacement 
243 
2447	no-datablock-force-at-commit (see 3.3). 
245 
246As a consequence of the performance-oriented cache management  
247algorithms, instance failure can cause database integrity breakage  
248as follows: 
249 
250A.	At crash time, the datafiles on disk might contain some but not  
251all of a set of datablock changes that constitute a single atomic  
252change to the database with respect to structural integrity 
253(see 2.5). 
254 
255B.	At crash time, the datafiles on disk might contain some dat- 
256ablocks modified by uncommitted transactions. 
257 
258C.	At crash time, the datafiles on disk might contain some dat- 
259ablocks missing changes from committed transactions. 
260 
261During instance recovery, the RDBMS recovery layer repairs  
262database integrity breakages A and C. It also enables subsequent  
263repair - by the RDBMS transaction layer - of database integrity  
264breakage B. 
265 
266In addition to the requirement that it repair any integrity breakages  
267resulting from the crash, instance recovery must meet the following  
268requirements: 
269 
2701.	Instance recovery must accomplish the repair using the current  
271online datafiles (as left on disk after the crash). 
272 
2732.	Instance Recovery must use only the on-line redo logs. It must  
274not require use of the archived logs. Although instance recov- 
275ery could work successfully from archived logs (except for a  
276database running in NOARCHIVELOG mode), it could not  
277work autonomously (requirement 4) if an operator were  
278required to restore archived logs. 
279 
2803.	The invocation of instance recovery must be automatic,  
281implicit at the next database startup. 
282 
2834.	Detection of the need for repair and the repair itself must pro- 
284ceed autonomously, without operator intervention. 
285 
2865.	The duration of the roll-forward phase of instance recovery is  
287governed by both RDBMS internal mechanisms (checkpoint)  
288and user-configurable parameters (e.g. number and sizes of  
289logfiles, checkpoint-frequency tuning parameters, parallel  
290recovery parameters). 
291 
292As seen above, Oracle's buffer cache component is optimized for  
293OLTP performance rather than for crash-tolerance. This document  
294describes some of the mechanisms used by the cache and recovery  
295components to solve the problems posed by use of performance- 
296optimizing cache algorithms such as LRU buffer replacement and  
297no-datablock-force-at-commit. These mechanisms enable instance  
298recovery to meet its requirements while allowing optimal OLTP  
299performance. These mechanisms include: 
300 
3017	Log-Force-at-Commit: see 3.3. 
302Facilitates repair of breakage type C by guaranteeing that, at  
303transaction commit time, all of the transaction's redo records,  
304including its "commit record," are stored on disk in the on-line  
305redo log. 
306 
3077	Checkpointing: see 3.4, 3.6. 
308Bounds the amount of transaction redo that instance recovery  
309must potentially apply. 
310Works in conjunction with online-log switch management to  
311ensure that instance recovery can be accomplished using only  
312online logs and current online datafiles. 
313 
3147	Online-Log Switch Management: see 3.7. 
315Works in conjunction with checkpointing to ensure that  
316instance recovery can be accomplished using only online logs  
317and current online datafiles. It guarantees that the current  
318checkpoint is beyond an online logfile before that logfile is  
319reused. 
320 
3217	Write-Ahead-Log: see 3.2. 
322Facilitates repair of breakage types A and B by guaranteeing  
323that: (i) at crash time there are no changes in the datafiles that  
324are not in the redo log; (ii) no datablock change was written to  
325disk without first writing to the log sufficient information to  
326enable undo of the change should a crash intervene before  
327commit.  
328 
3297	Atomic Redo Record Generation: see 3.1. 
330Facilitates repair of breakage types A and B. 
331 
3327	Thread-Open Flag: 5.1. 
333Enables detection at startup time of the need for crash recov- 
334ery. 
335 
3361.3  Media Failure and Recovery 
337 
338Instance failure affects logical database integrity. Because instance  
339failure leaves a recoverable version of the online datafiles on the  
340post-crash disk, instance recovery can use the online datafiles as a  
341starting point. 
342 
343Media failure, on the other hand, affects physical storage media  
344integrity or accessibility. Because the original datafile copies are  
345damaged, media recovery uses restored backup copies of the  
346datafiles as a starting point. Media recovery then uses the redo log  
347to roll-forward these files, either to a consistent present state or to a  
348consistent past state. Media recovery is run by issuing one of the  
349following commands: RECOVER DATABASE, RECOVER  
350TABLESPACE, RECOVER DATAFILE. 
351 
352Depending on the failure scenario, a media failure has the potential  
353for causing database integrity breakages similar to those caused by  
354an instance failure. For example, an integrity breakage of type A,  
355B, or C could result if I/O accessibility to a datablock were lost  
356between the time the block was read into the buffer cache and the  
357time DBWR attempted to write out an updated version of the  
358block. More typical, however, is the case of a media failure that  
359results in the permanent loss of the current version of a datafile, and  
360hence of all updates to that datafile that occurred since the last time  
361the file was backed up. 
362 
363Before media recovery is invoked, backup copies of the damaged  
364datafiles are restored. Media recovery then applies relevant  
365portions of the redo log to roll-forward the datafile backups,  
366making them current. Current implies a pre-failure state consistent  
367with the rest of the database 
368 
369Media recovery and instance recovery have in common the  
370requirement to repair database integrity breakages A-C. However,  
371media recovery and instance recovery differ with respect to  
372requirements 1-5. The requirements for media recovery are as  
373follows: 
374 
3751.	Media recovery must accomplish the repair using restored  
376backups of damaged datafiles. 
377 
3782.	Media recovery can use archived logs as well as the online  
379logs. 
380 
3813.	Invocation of media recovery is explicit, by operator com- 
382mand. 
383 
3844.	Detection of media failure (i.e. the need to restore a backup) is  
385not automatic.Once a backup has been restored however,  
386detection of the need to recover it via media recovery is auto- 
387matic. 
388 
3895.	The duration of the roll-forward phase of media recovery is  
390governed solely by user policy  
391(e.g. frequency of backups, parallel recovery parameters)  
392rather than by RDBMS internal mechanisms. 
393 
394 
395 
3962  Fundamental Data Structures 
397 
3982.1  Controlfile 
399 
400The controlfile contains records that describe and keep state  
401information about all the other files of the database. 
402 
403The controlfile contains the following categories of records: 
404 
4057	Database Info Record (1) 
406 
4077	Datafile Records (1 per datafile) 
408 
4097	Thread Records (1 per thread) 
410 
4117	Logfile Records (1 per logfile) 
412 
4137	Filename Records (1 per datafile or logfile group member) 
414 
4157	Log-History Records (1 per completed logfile) 
416 
417Fields of the controlfile records referenced in the remainder of this  
418document are listed below, together with the number(s) of the  
419section(s) describing their use: 
420 
4212.1.1  Database Info Record (Controlfile) 
422 
4237	resetlogs timestamp: 8.2 
424 
4257	resetlogs SCN: 8.2 
426 
4277	enabled thread bitvec: 8.3 
428 
4297	force archiving SCN: 3.8 
430 
4317	database checkpoint thread (thread record index): 2.13, 3.10 
432 
4332.1.2  Datafile Record (Controlfile) 
434 
4357	checkpoint SCN: 2.14, 3.4 
436 
4377	checkpoint counter: 2.16, 5.3, 6.2 
438 
4397	stop SCN: 2.15, 6.5, 6.10, 6.13 
440 
4417	offline range (offline-start SCN, offline-end checkpoint): 2.18 
442 
4437	online flag 
444 
4457	read-enabled, write-enabled flags (1-1: read/write, 1-0: read- 
446only) 
447 
4487	filename record index 
449 
4502.1.3  Thread Record (Controlfile) 
451 
4527	thread checkpoint structure: 2.12, 3.4, 8.3 
453 
4547	thread-open flag: 3.9, 3.11, 8.3 
455 
4567	current log (logfile record index) 
457 
4587	head and tail (logfile record indices) of list of logfiles in  
459thread: 2.8 
460 
4612.1.4  Logfile Record (Controlfile) 
462 
4637	log sequence number: 2.7 
464 
4657	thread number: 8.4 
466 
4677	next and previous (logfile record indices) of list of logfiles in  
468thread: 2.8 
469 
4707	count of files in group: 2.8 
471 
4727	low SCN: 2.7 
473 
4747	next SCN: 2.7 
475 
4767	head and tail (filename record indices) of list of filenames in  
477group: 2.8 
478 
4797	"being cleared" flag: 10.3 
480 
4817	"archiving not needed" flag: 10.3 
482 
4832.1.5  Filename Record (Controlfile) 
484 
4857	filename 
486 
4877	filetype 
488 
4897	next and previous (filename record indices) of list of filenames  
490in group: 2.8 
491 
4922.1.6  Log-History Record (Controlfile) 
493 
4947	thread number: 2.11 
495 
4967	log sequence number: 2.11 
497 
4987	low SCN: 2.11 
499 
5007	low SCN timestamp: 2.11 
501 
5027	next SCN: 2.11 
503 
5042.2  Datafile Header 
505 
506Fields of the datafile header referenced in the remainder of this  
507document are listed below, together with the number(s) of the  
508section(s) describing their use: 
509 
5107	datafile checkpoint structure: 2.14 
511 
5127	backup checkpoint structure: 4.1 
513 
5147	checkpoint counter: 2.16, 3.4, 5.3, 6.2 
515 
5167	resetlogs timestamp: 8.2 
517 
5187	resetlogs SCN: 8.2 
519 
5207	creation SCN: 8.1 
521 
5227	online-fuzzy bit: 3.5, 6.7.1, 8.1 
523 
5247	hotbackup-fuzzy bit: 4.1, 4.4, 6.7.1, 8.1 
525 
5267	media-recovery-fuzzy bit: 6.7.1, 8.1 
527 
5282.3  Logfile Header 
529 
530Fields of the logfile header referenced in the remainder of this  
531document are listed below, together with the number(s) of the  
532section(s) describing their use: 
533 
5347	thread number: 2.7 
535 
5367	sequence number: 2.7 
537 
5387	low SCN: 2.7 
539 
5407	next SCN: 2.7 
541 
5427	end-of-thread flag: 6.10 
543 
5447	resetlogs timestamp: 8.2 
545 
5467	resetlogs SCN: 8.2 
547 
5482.4  Change Vector 
549 
550A change vector describes a single change to a single datablock. It  
551has a header that gives the Data Block Address(DBA) of the block,  
552the incarnation number, the sequence number, and the operation.  
553After the header is information that depends on the operation. The  
554incarnation number and sequence number are copied from the  
555block header when the change vector is constructed. When a block  
556is made "new," the incarnation number is set to a value that is  
557greater than its previous incarnation number and the sequence  
558number is set to one. The sequence number on the block is  
559incremented after every change is applied. 
560 
5612.5  Redo Record 
562 
563A redo record is a group of change vectors describing a single  
564atomic change to the database. For example, a transaction's first  
565redo record might group a change vector for the transaction table  
566(rollback segment header), a change vector for the undo block  
567(rollback segment), and a change vector for the datablock. A  
568transaction can generate multiple redo records. The grouping of  
569change vectors into a redo record allows multiple database blocks  
570to be changed so that either all changes occur or no changes occur,  
571despite arbitrary intervening failures. This atomicity guarantee is  
572one of the fundamental jobs of the cache layer. Recovery preserves  
573redo record atomicity across failures. 
574 
5752.6  System Change Number (SCN) 
576 
577An SCN defines a committed version of the database. A query  
578reports the contents of the database as it looked at some specific  
579SCN. An SCN is allocated and saved in the header of a redo record  
580that commits a transaction. An SCN may also be saved in a record  
581when it is necessary to mark the redo as being allocated after a  
582specific SCN. SCN's are also allocated and stored in other data  
583structures such as the controlfile or datafile headers. An SCN is at  
584least 48 bits long. Thus they can be allocated at a rate of 16,384  
585SCN's per second for over 534 years without running out of them.  
586We will run out of SCN's in June, 2522 AD (we use 31 day months  
587for time stamps). 
588 
5892.7  Redo Logs 
590 
591All changes to database blocks are made by constructing a redo  
592record for the change, saving this record in a redo log, then  
593applying the change vectors to the datablocks. Recovery is the  
594process of applying redo to old versions of datablocks to make  
595them current. This is necessary when the current version has been  
596lost. 
597 
598When a redo log becomes full it is closed and a log switch occurs.  
599Each log is identified by its thread number (see below), sequence  
600number (within thread), and the range of SCN's spanned by its redo  
601records. This information is stored in the thread number, sequence  
602number, low SCN, and next SCN fields of the logfile header. 
603 
604The redo records in a log are ordered by SCN. Moreover, redo  
605records containing change vectors for a given block occur in  
606increasing SCN order across threads (case of Parallel Server). Only  
607some records have SCN's in their header, but every record is  
608applied after the allocation of the SCN appearing with or before it  
609in the log. The header of the log contains the low SCN and the next  
610SCN. The low SCN is the SCN associated with the first redo record  
611(unless there is an SCN in its header). The next SCN is the low  
612SCN of the log with the next higher sequence number for the same  
613thread. The current log of an enabled thread has an infinite next  
614SCN, since there is no log with a higher sequence number. 
615 
6162.8  Thread of Redo 
617 
618The redo generated by an instance - by each instance in the  
619Parallel Server case - is called a thread of redo. A thread is  
620comprised of an online portion and (in ARCHIVELOG mode) an  
621archived portion. The online portion of a thread is comprised of  
622two or more online logfile groups. Each group is comprised of one  
623or more replicated members. The set of members in a group is  
624referred to variously as a logfile group, group, redo log, online log,  
625or simply log. A redo log contains only redo generated by one  
626thread. Log sequence numbers are independently allocated for each  
627thread. Each thread switches logs independently. 
628 
629For each logfile, there is a controlfile record that describes it. The  
630index of a log's controlfile record is referred to as its log number.  
631Note that log numbers are equivalent to log group numbers, and are  
632globally unique (across all threads). The list of a thread's logfile  
633records is anchored in the thread record (i.e. via head and tail  
634logfile record indices), and linked through the logfile records, each  
635of which stores the thread number. The logfile record also has fields  
636identifying the number of group members, as well as the head and  
637tail (i.e. filename record indices) of the list (linked through  
638filename records) of filenames in the group. 
639 
6402.9  Redo Byte Address (RBA) 
641 
642An RBA points to a specific location in a particular redo thread. It  
643is ten bytes long and has three components: log sequence number,  
644block number within log, and byte number within block. 
645 
6462.10  Checkpoint Structure 
647 
648The checkpoint structure is a data structure that defines a point in  
649all the redo ever generated for a database. Checkpoint structures  
650are stored in datafile headers and in the per-thread records of the  
651controlfile. They are used by recovery to know where to start  
652reading the log thread(s) for redo application. 
653 
654The key fields of the checkpoint structure are the checkpoint SCN  
655and the enabled thread bitvec. 
656 
657The checkpoint SCN effectively demarcates a specific location in  
658each enabled thread (for a definition of enabled see 3.11). For each  
659thread, this location is where redo was being generated at some  
660point in time within the resolution of one commit. The redo record  
661headers in the log can be scanned to find the first redo record that  
662was allocated at the checkpoint SCN or higher. 
663 
664The enabled thread bitvec is a mask defining which threads were  
665enabled at the time the checkpoint SCN was allocated. Note that a  
666bit is set for each thread that was enabled, regardless of whether it  
667was open or closed. Every thread that was enabled has a redo log  
668that contains the checkpoint SCN. A log containing this SCN is  
669guaranteed to exist (either online or archived). 
670 
671The checkpoint structure also stores the time that the checkpoint  
672SCN was allocated. This timestamp is only used to print a message  
673to aid a person looking for a log. 
674 
675In addition, the checkpoint structure stores the number of the  
676thread that allocated the checkpoint SCN and the current RBA in  
677that thread when the checkpoint SCN was allocated. Having an  
678explicitly-stored thread RBA (as opposed to only having the  
679checkpoint SCN as an implicit thread location "pointer") makes the  
680log sequence number (part of the RBA) and archived log name  
681readily available for the single-instance (i.e. single-thread, non  
682Parallel Server) case. 
683 
684A checkpoint structure for a port that supports up to 1023 threads  
685of redo is 150 bytes long. A VMS checkpoint is 30 bytes and  
686supports up to 63 threads of redo. 
687 
6882.11  Log History 
689 
690The controlfile can be configured (using the MAXLOGHISTORY  
691clause of the CREATE DATABASE or CREATE CONTROLFILE  
692command) to contain a history record for every logfile that is  
693completed. Log history records are small (24 bytes on VMS). They  
694are overwritten in a circular fashion so that the oldest information  
695is lost. 
696 
697For each logfile, the log-history controlfile record contains the  
698thread number, log sequence number, low SCN, low SCN  
699timestamp, and next SCN (i.e. low SCN of the next log in  
700sequence). The purpose of the log history is to reconstruct archived  
701logfile names from an SCN and thread number. Since a log  
702sequence number is contained in the checkpoint structure (part of  
703the RBA), single thread (i.e. non-Parallel Server) databases do not  
704need log history to construct archived log names. 
705 
706The fields of the log history records are viewable via the  
707V$LOG_HISTORY "fixed-view" (see Section 9 for a description  
708of the recovery-related "fixed-views"). Additionally,  
709V$RECOVERY_LOG, which displays information about archived  
710logs needed to complete media recovery, is derived from  
711information in the log history records. Although log history is not  
712strictly needed for easy administration of single-instance (non- 
713Parallel Server) databases, enabling use of V$LOG_HISTORY and  
714V$RECOVERY_LOG might be a reason to configure it. 
715 
7162.12  Thread Checkpoint Structure 
717 
718Each enabled thread's controlfile record contains a checkpoint  
719structure called the thread checkpoint. The SCN field in this  
720structure is known as the thread checkpoint SCN. The thread  
721number and RBA fields in this structure refer to the associated  
722thread. 
723 
724The thread checkpoint structure is updated each time an instance  
725checkpoints its thread (see 3.4). During such thread checkpoint  
726events, the instance associated with the thread writes to disk in the  
727online datafiles all dirty buffers modified by redo generated before  
728the thread checkpoint SCN. 
729 
730A thread checkpoint event guarantees that all pre-thread- 
731checkpoint-SCN redo generated in that thread for all online  
732datafiles has been written to disk. (Note that if the thread is closed,  
733then there is no redo beyond the thread checkpoint SCN; i.e. the  
734RBA points just past the last redo record in the current log.) 
735 
736It is the job of instance recovery to ensure that all of the thread's  
737redo for all online datafiles is applied. Because of the guarantee  
738that all of the thread's redo prior to the thread checkpoint SCN has  
739already been applied, instance recovery can make the guarantee  
740that, by starting redo application at the thread checkpoint SCN, and  
741continuing through end-of-thread, all of the thread's redo will have  
742been applied. 
743 
7442.13  Database Checkpoint Structure 
745 
746The database checkpoint structure is the thread checkpoint of the  
747thread that has the lowest checkpoint SCN of all the open threads.  
748The number of the database checkpoint thread - the number of  
749the thread whose thread checkpoint is the current database  
750checkpoint - is recorded in the database info record of the  
751controlfile. If there are no open threads, then the database  
752checkpoint is the thread checkpoint that contains the highest  
753checkpoint SCN of all the enabled threads. 
754 
755Since each instance guarantees that all redo generated before its  
756own thread checkpoint SCN has been written, and since the  
757database checkpoint SCN is the lowest of the thread checkpoint  
758SCNs, it follows that all pre-database-checkpoint-SCN redo in all  
759instances has been written to all online datafiles. 
760 
761Thus, all pre-database-checkpoint-SCN redo generated in all  
762threads for all online datafiles is guaranteed to be in the files on  
763disk already. This is described by saying that the online datafiles  
764are checkpointed at the database checkpoint. This is the rationale  
765for using the database checkpoint to update the online datafile  
766checkpoints (see below) when an instance checkpoints its thread  
767(see 3.4). 
768 
7692.14  Datafile Checkpoint Structure 
770 
771The header of each datafile contains a checkpoint structure known  
772as the datafile checkpoint. The SCN field in this structure is known  
773as the datafile checkpoint SCN. 
774 
775All pre-checkpoint-SCN redo generated in all threads for a given  
776datafile is guaranteed to be in the file on disk already. An online  
777datafile has its checkpoint SCN replicated in its controlfile record.  
778Note: Oracle's recovery layer code is designed to "tolerate" a  
779discrepancy in checkpoint SCN between the file header and the  
780controlfile record. These values could get out of sync should an  
781instance failure occur between the time the file header was updated  
782and the time the controlfile "transaction" committed. (Note: A  
783controlfile "transaction" is an RDBMS internal mechanism,  
784independent of the Oracle transaction layer, that allows an  
785arbitrarily large update to the controlfile to be "committed"  
786atomically.) 
787 
788The execution of a datafile checkpoint (see 3.6) for a given datafile  
789updates the checkpoint structure in the file header, and guarantees  
790that all pre-checkpoint-SCN redo generated in all threads for that  
791datafile is on disk already. 
792 
793A thread checkpoint event (see 3.4) guarantees that all pre- 
794database-checkpoint-SCN redo generated in all threads for all  
795online datafiles has been written to disk. The execution of a thread  
796checkpoint may advance the database checkpoint (e.g. in the  
797single-instance case; or if the thread having the oldest checkpoint  
798changed from being the current thread to another thread). If the  
799database checkpoint does advance, then the new database  
800checkpoint is used to update the datafile checkpoints of all the  
801online datafiles (except those in hot backup: see Section 4). 
802 
803It is the job of media recovery (see Section 6) to ensure that all redo  
804for a recovery-datafile (i.e. a datafile being media-recovered)  
805generated in any thread through the recovery end-point is applied.  
806Because of the guarantee that all recovery-datafile-redo generated  
807in any enabled thread prior to that datafile's checkpoint SCN has  
808already been applied, media recovery can make the guarantee that,  
809by starting redo application in each enabled thread with the datafile  
810checkpoint SCN and continuing through the recovery end-point  
811(e.g. end-of-thread on all threads in the case of complete media  
812recovery), all redo for the recovery-datafile from all threads will  
813have been applied. 
814 
815Since the datafile checkpoint is stored in the header of the datafile  
816itself, it is also present in backup copies of the datafile. It is the job  
817of hot backup (see Section 4) to ensure that - despite the  
818occurrence of ongoing updates to the datafile during the backup  
819copy operation - the version of the datafile's checkpoint captured  
820in the backup copy satisfies the checkpoint-SCN guarantee with  
821respect to the versions of the datafile's datablocks captured in the  
822backup copy. 
823 
8242.15  Stop SCN 
825 
826Each datafile's controlfile record has a field called the stop SCN. If  
827the file is offline or read-only, the stop SCN is the SCN beyond  
828which no further redo exists for that datafile. If the file is online and  
829any instance has the database open, the stop SCN is set to  
830"infinity." The stop SCN is used during media recovery to  
831determine when redo application for a particular datafile can stop.  
832This ensures that media recovery will terminate when recovering  
833an offline file while the database is open. 
834 
835The stop SCN is set whenever a datafile is taken offline or set read- 
836only. This is true whether the offline was "immediate" (due to an I/ 
837O error, or due to taking the file's tablespace offline "immediate"),  
838"temporary" (due to taking the file's tablespace offline  
839"temporary"), or "normal" (due to taking the file's tablespace  
840offline "normal"). However, in the case of a datafile taken offline  
841"immediate," there is no file checkpoint (see 3.6), and dirty buffers  
842are discarded. Hence, media recovery may need to apply redo from  
843before the stop SCN in order to bring the datafile online. However,  
844media recovery does not need to look for redo after the stop SCN,  
845since it does not exist. If the stop SCN is equal to the datafile  
846checkpoint SCN, then the file does not need recovery. 
847 
8482.16  Checkpoint Counter 
849 
850There is a checkpoint counter kept in both the datafile header and  
851in the datafile's controlfile record. Its purpose is to allow detection  
852of the fact that a datafile or controlfile is a restored backup. 
853 
854The checkpoint counter is incremented every time checkpoints of  
855online files are being advanced (e.g. by thread checkpoint). Thus  
856the datafile's checkpoint counter is incremented even though the  
857datafile's checkpoint is not being advanced because the file is in hot  
858backup (see Section 4), or because its checkpoint SCN is already  
859beyond that of the intended checkpoint (e.g. the file is new or has  
860undergone a recent datafile checkpoint). 
861 
862The old value of the checkpoint counter - matching the  
863checkpoint counter in the datafile's controlfile record - is also  
864remembered in the file header. It is usually one less than the current  
865counter in the header, but may differ from the current counter by  
866more than one if the previous file header update failed after the  
867header was written but before the controlfile "transaction"  
868committed. 
869 
870A mismatch in checkpoint counters between the datafile header and  
871the datafile's controlfile record is used to detect when a backup  
872datafile (or a backup controlfile) has been restored. 
873 
8742.17  Tablespace-Clean-Stop SCN 
875 
876TS$, a data dictionary table that describes tablespaces, has a  
877column called the tablespace-clean-stop-SCN. It identifies an SCN  
878at which a tablespace was taken offline or set read-only "cleanly":  
879i.e. after checkpointing its datafiles (see 3.6). The SCN at which the  
880datafiles are checkpointed is recorded in TS$ as the  
881tablespace-clean-stop SCN. It allows such a "clean-stopped"  
882tablespace to survive (i.e. not need to be dropped after) a  
883RESETLOGS open (see 8.6). During media recovery, prior to  
884resetlogs, the "clean-stopped" tablespace would be set offline.  
885After resetlogs, the tablespace - which needs no recovery - is  
886permitted to be brought online and/or set read-write. (An  
887immediate backup of the tablespace is recommended). 
888 
889The tablespace-clean-stop SCN is set to zero (after being set  
890momentarily to "infinity" during datafile state transition) when  
891bringing an offline-clean tablespace online, or setting a read-only  
892tablespace read-write. The tablespace-clean-stop SCN is also  
893zeroed when taking a tablespace offline "immediate" or  
894"temporary." 
895 
896A tablespace that has a non-zero tablespace-clean-stop SCN in TS$  
897is clean at that SCN: the tablespace currently contains all redo up  
898through that SCN, and no redo for the tablespace beyond that SCN  
899exists. If the tablespace's datafiles are still in the state they had  
900when the tablespace was taken offline "normal" or set read-only -  
901i.e. they are not restored backups, are not fuzzy, and are  
902checkpointed at the clean-stop SCN - then the tablespace can be  
903brought online without recovery. Note that the semantics of the  
904tablespace-clean-stop SCN differ from those of a constituent  
905datafile's stop SCN in the datafile's controlfile record. The  
906controlfile stop SCN designates an SCN beyond which no redo for  
907the datafile exists. This does not imply that the datafile currently  
908contains all redo up through that SCN. 
909 
910The tablespace-clean-stop SCN is stored in TS$ rather than in the  
911controlfile so that it is covered by redo and will finish in the correct  
912state - i.e. reflecting the correct online/offline state of the  
913tablespace - following an incomplete recovery (see 6.12). Its  
914value will not be lost if a backup controlfile is restored, or if a new  
915controlfile is created. Furthermore, the presence of the tablespace- 
916clean-stop SCN in TS$ allows an offline normal (or read-only)  
917tablespace to survive (not need to be dropped after) a  
918RESETLOGS open, since it is known that no redo application is  
919needed to bring it online (see 8.6 for more detail). Thus, for  
920example, an offline normal (or read-only) tablespace that was  
921offline during an incomplete recovery can be brought online (or set  
922read-write) subsequent to a RESETLOGS open. Without the  
923tablespace-clean-stop SCN, there would be no way of knowing that  
924the tablespace does not need recovery using redo that was  
925discarded by the resetlogs. The only alternative would have been to  
926force the tablespace to be dropped. 
927 
9282.18  Datafile Offline Range 
929 
930The offline-start SCN and offline-end checkpoint fields of the  
931controlfile datafile record describe the offline range. If valid, they  
932delimit a log range guaranteed not to contain any redo for the  
933datafile. Thus, media recovery can skip this log range when  
934recovering the datafile, obviating the need to access old archived  
935log data (which may be uavailable or unusable due to resetlogs: see  
936Section 7). This optimization aids in recovering a datafile that is  
937presently online (or read-write), but that was offline-clean (or read- 
938only) for a long time, and whose last backup dates from that time.  
939For example, this would be the case if, after a RESETLOGS open,  
940an offline normal (or read-only) tablespace had been brought online  
941(or set read-write), but not yet backed up. 
942 
943When a datafile transitions from offline-clean to online (or from  
944read-only to read-write), the offline range is set as follows: The  
945offline-start SCN is set from the tablespace-clean-stop SCN saved  
946when setting the file offline (or read-only). The offline-end  
947checkpoint is set from the file checkpoint taken when setting the  
948file online (or read-write).  
949 
950 
951 
952 
953 
954 
955 
956 
957 
9583  Redo Generation 
959 
960Redo is generated to describe all changes made to database blocks.  
961This section describes the various operations that occur while the  
962database is open and generating redo. 
963 
9643.1  Atomic Changes 
965 
966The most fundamental operation is to atomically change a set of  
967datablocks. A foreground process intending to change one or more  
968datablocks first acquires exclusive access to cache buffers  
969containing those blocks. It then constructs the change vectors  
970describing the changes. Space is allocated in the redo log buffer to  
971hold the redo record. The redo log buffer - the buffer from which  
972LGWR writes the redo log - is located in the SGA (System  
973Global Area). It may be necessary to ask LGWR to write the buffer  
974to the redo log in order to make space. If the log is full, LGWR  
975may need to do a log switch in order to make the space available.  
976Note that allocating space in the redo buffer also allocates space in  
977the logfile. Thus, even though the redo buffer has been written, it  
978may not be possible to allocate redo log space. After the space is  
979allocated, the foreground process builds the redo record in the redo  
980buffer. Only after the redo record has been built in the redo buffer  
981may the datablock buffers be changed. Writing the redo to disk is  
982the real change to the database. Recovery ensures that all changes  
983that make it into the redo log make it into the datablocks (except in  
984the case of incomplete recovery). 
985 
9863.2  Write-Ahead Log 
987 
988Write-ahead log is a cache-enforced protocol governing the order  
989in which dirty datablock buffers are written vs. when the redo log  
990buffer is written. According to write-ahead log protocol, before  
991DBWR can write out a cache buffer containing a modified  
992datablock, LGWR must write out the redo log buffer containing  
993redo records describing changes to that datablock. 
994 
995Note that write-ahead log is independent of log-force-at-commit  
996(see 3.3). 
997 
998Note also that write-ahead log protocol only applies to datafile  
999writes that originate from the buffer cache. In particular, write- 
1000ahead log does not apply to so-called direct path writes (e.g.  
1001originating from direct path load, table create via subquery, or  
1002index create). Direct path writes (targeted above the segment high- 
1003water mark) originate not as writes out of the buffer cache, but as  
1004bulk-writes out of the foreground process' data space. Indeed,  
1005correct handling of direct path writes by media recovery dictates a  
1006write-behind-log protocol. (The basic reason is that, because the  
1007bulk-writes do not go through the buffer cache, there is no  
1008mechanism to guarantee their completion at checkpoint). 
1009 
1010One guarantee made by write-ahead log protocol is that there are  
1011no changes in the datafiles that are not in the redo log, regardless of  
1012intervening failure. This is what enables recovery to preserve the  
1013guarantee of redo record atomicity despite intervening failure. 
1014 
1015Another guarantee made by write-ahead log protocol is that no  
1016datablock change can be written to disk without first writing to the  
1017redo log sufficient information to enable the change to be undone  
1018should the transaction fail to commit. That undo-enabling  
1019information is written to the redo log in the form of "redo" for the  
1020rollback segment. 
1021 
1022Write-ahead log protocol plays a key role in enabling the  
1023transaction layer to preserve the guarantee of transaction atomicity  
1024despite intervening failure. 
1025 
10263.3  Transaction Commit 
1027 
1028Transaction commit allocates an SCN and builds a commit redo  
1029record containing that SCN. The commit is complete when all of  
1030the transaction's redo (including commit redo record) is on disk in  
1031the log. Thus, commit forces the redo log to disk - at least up to  
1032and including the transaction's commit record. This is termed log- 
1033force-at-commit. 
1034 
1035Recovery is designed such that it is sufficient to write only the redo  
1036log at commit time - rather than all datablocks changed by the  
1037transaction - in order to guarantee transaction durability despite  
1038intervening failure. This is termed no-datablock-force-at-commit. 
1039 
10403.4  Thread Checkpoint 
1041 
1042A thread checkpoint event, executed by the instance associated  
1043with the redo thread being checkpointed, forces to disk all dirty  
1044buffers in that instance that contain changes to any online datafile  
1045before a designated SCN - the thread checkpoint SCN. Once all  
1046redo in the thread prior to the checkpoint SCN has been written to  
1047disk, the thread checkpoint structure in the thread's controlfile  
1048record is updated in a controlfile transaction. 
1049 
1050When a thread checkpoint begins, an SCN is captured and a  
1051checkpoint structure is initialized. Then all the dirty buffers in the  
1052instance's cache are marked for checkpointing. DBWR proceeds to  
1053write out the marked buffers in a staged manner. Once all the  
1054marked buffers have been written, the SCN in the checkpoint  
1055structure is set to the captured SCN, and the thread checkpoint  
1056structure in the thread's controlfile record is updated in a controlfile  
1057transaction. 
1058 
1059A thread checkpoint might or might not advance the database  
1060checkpoint. If only one thread is open, the new checkpoint is the  
1061new database checkpoint. If multiple threads are open, the database  
1062checkpoint will advance if the local thread is the current database  
1063checkpoint. Since the new checkpoint SCN was allocated recently,  
1064it is most likely greater than the thread checkpoint SCN in some  
1065other open thread. If it advances, the database checkpoint becomes  
1066the new lowest-SCN open thread checkpoint. If the old checkpoint  
1067SCN for the local thread was higher than the current checkpoint  
1068SCN of some other open thread, then the database checkpoint does  
1069not change. 
1070 
1071If the database checkpoint is advanced, then the checkpoint counter  
1072is advanced in every online datafile header. Furthermore, for each  
1073online datafile that is not in hot backup (see Section 4), and not  
1074already checkpointed at a higher SCN (e.g. as would be the case for  
1075a recently added or recovered file), the datafile header checkpoint is  
1076advanced to the new database checkpoint, and the file header is  
1077written to disk. Also, the checkpoint SCN in the datafile's  
1078controlfile record is advanced to the new database checkpoint SCN. 
1079 
10803.5  Online-Fuzzy Bit 
1081 
1082Note that more changes - beyond those already in the marked  
1083buffers - may be generated after the start of checkpoint. Such  
1084changes would be generated at SCNs higher than the SCN that will  
1085be recorded in the file header. They could either be changes to  
1086marked buffers that were added since checkpoint start, or else  
1087changes to unmarked buffers. Buffers containing these changes  
1088could written out for a variety of reasons. Thus, the online files are  
1089online-fuzzy; that is, they generally contain changes in the future of  
1090(i.e. generated at higher SCNs than) their header checkpoint SCN.  
1091A datafile is virtually always online-fuzzy while it is online and the  
1092database is open. 
1093 
1094Online-fuzzy state is indicated by setting the so-called online-fuzzy  
1095bit in the datafile header. The online-fuzzy bits of all online  
1096datafiles are set at database open time. Also, when a datafile is  
1097brought online while the database is open, its online-fuzzy bit is  
1098set. 
1099 
1100The online-fuzzy bits are cleared after the last instance does a  
1101shutdown "normal" or "immediate." Other occasions for clearing  
1102the online-fuzzy bits are: (i) the finish of crash recovery; (ii) when  
1103media recovery "checkpoints" (flushes its buffers) after  
1104encountering an end-crash-recovery redo record (see 5.5); (iii)  
1105when taking a datafile offline "temporary" or "normal" (i.e. an  
1106offline operation that is preceded by a file checkpoint); (iv) when  
1107BEGIN BACKUP is issued (see 4.1). 
1108 
1109As will be seen in 8.1, open with resetlogs will fail if any online  
1110datafile has the online-fuzzy bit (or any fuzzy bit) set. 
1111 
11123.6  Datafile Checkpoint 
1113 
1114A datafile checkpoint event, executed by all open instances (for all  
1115open threads), forces to disk all dirty buffers in any instance that  
1116contain changes to a particular datafile (or set of datafiles) before a  
1117designated SCN - the datafile checkpoint SCN. Once all datafile- 
1118related redo from all open threads prior to the checkpoint SCN has  
1119been written to disk, the datafile checkpoint structure in the file  
1120header is updated and written to disk. 
1121 
1122Datafile checkpoints occur as part of operations such as beginning  
1123hot backup (see Section 4) and offlining datafiles as part of taking a  
1124tablespace offline normal (see 2.17). 
1125 
11263.7  Log Switch 
1127 
1128When an instance needs to generate more redo but cannot allocate  
1129enough blocks in the current log, it does a log switch. The first step  
1130in a log switch is to find an online log that is a candidate for reuse. 
1131 
1132The first requirement for the candidate log is that it must not be  
1133active: i.e. it must not be needed for crash/instance recovery. In  
1134other words, it must be overwritable without losing redo data  
1135needed for instance recovery. The principle enforced is that a  
1136logfile cannot be reused until the current thread checkpoint is  
1137beyond that logfile. Since instance recovery starts at the current  
1138thread checkpoint SCN/RBA (and expects to find that RBA in an  
1139online redo log), the ability to do instance recovery using only  
1140online logs translates into the requirement that the current thread  
1141checkpoint SCN be beyond the highest SCN associated with redo  
1142in the candidate log. If this is not the case, then the thread  
1143checkpoint currently in progress - e.g. the one started when the  
1144candidate log was originally switched into (see below) - is  
1145hurried up to complete. 
1146 
1147The other requirement for the candidate log is that it does not need  
1148archiving. Of course, this requirement only applies to a database  
1149running in ARCHIVELOG mode. If archiving is required, the  
1150archiver is posted. 
1151 
1152As soon as the log switch completes, a new thread checkpoint is  
1153started in the new log. Hopefully, the checkpoint will complete  
1154before the next log switch is needed. 
1155 
11563.8  Archiving Log Switches 
1157 
1158Each thread switches logs independently. Thus, when running  
1159Parallel Server, an SCN is almost never at the beginning of a log in  
1160all threads. However, it is desirable to have roughly the same range  
1161of SCNs in the archived logs of all enabled threads. This ensures  
1162that the last log archived in each thread is reasonably current. If an  
1163unarchived log for an enabled thread contained a very old SCN (as  
1164would occur in the case of a relatively idle instance), it would not  
1165be possible to use archived logs from a primary site to do recovery  
1166to a higher SCN at a standby site. This would be true even if the log  
1167with the low SCN contained no redo. 
1168 
1169This problem is solved by forcing log switches in other threads  
1170when their current log is significantly behind the log just archived.  
1171For the case of an open thread, a lock is used to "kick" the laggard  
1172instance into switching logs and archiving when it can. For the case  
1173of a closed thread, the archiving process in the active instance does  
1174the closed thread's log switch and archiving for it. Note that this  
1175can result in a thread that is enabled but never used having a bunch  
1176of archived logs with only a file header. A force archiving SCN is  
1177maintained in the database info controlfile record to implement this  
1178feature. The system strives to archive any log that contains that  
1179SCN or less. In general, the log with the lowest SCN is archived  
1180first. 
1181 
1182The command ALTER SYSTEM ARCHIVE LOG CURRENT can  
1183be used to manually archive the current logs of all enabled threads.  
1184It forces all threads, open and closed, to switch to a new log. It  
1185archives what is necessary to ensure all the old logs are archived. It  
1186does not return until all redo generated before the command was  
1187entered is archived. This command is useful for ensuring all redo  
1188logs necessary for the recovery of a hot backup are archived. It is  
1189also useful for ensuring the potential currency of a standby site in a  
1190configuration in which archived logs from a primary site are  
1191shipped to a standby site for application by recovery in case of  
1192disaster (i.e. "standby database"). 
1193 
11943.9  Thread Open 
1195 
1196When an instance opens the database, it needs to open a thread for  
1197redo generation. The thread is chosen at mount time. A system  
1198initialization parameter can be used to specify the thread to mount  
1199by number. Otherwise, any available publicly-enabled thread can  
1200be chosen by the instance at mount time. A thread-mounted lock is  
1201used to prevent two instances from mounting the same thread.  
1202When an instance opens a thread, it sets the thread-open flag in the  
1203thread's controlfile record. While the instance is alive, it holds a set  
1204of thread-opened locks (one held by each of LGWR, DBWR,  
1205LCK0, LCK1, ...). (These are released at instance death, enabling  
1206one instance to detect the death of another in the Parallel Server  
1207environment: see 5.1). Also at thread open time, a new checkpoint  
1208is captured and used for the thread checkpoint. If this is the first  
1209database open, this becomes the new database checkpoint, ensuring  
1210all online files have their header checkpoints advanced at open  
1211time. Note that a log switch may be forced at thread open time. 
1212 
12133.10  Thread Close 
1214 
1215When an instance closes the database, or when a thread is  
1216recovered by instance/crash recovery, the thread is closed. The first  
1217step in closing a thread is to ensure that no more redo is generated  
1218in it. The next step is to ensure that all changes described by  
1219existing redo records are in the online datafiles on disk. In the case  
1220of normal database close, this is accomplished by doing a thread  
1221checkpoint. The SCN from this final thread checkpoint is said to be  
1222the "SCN at which the thread was closed." Finally, the thread's  
1223controlfile record is updated to clear the thread-open flag. 
1224 
1225In the case of thread close by instance recovery, the presence in the  
1226online datafiles of all changes described by thread redo records is  
1227ensured by starting redo application at the most recent thread  
1228checkpoint and continuing through end-of-thread. Once all changes  
1229described by thread redo records are in the online datafiles, the  
1230thread checkpoint is advanced to the end-of-thread. Just as in the  
1231case of a normal thread checkpoint, this checkpoint may advance  
1232the database checkpoint. If this is the last thread close, the database  
1233checkpoint thread field in the database info controlfile record -  
1234which normally points to an open thread - will be left pointing at  
1235this thread, even though it is closed. 
1236 
12373.11  Thread Enable 
1238 
1239In order for a thread to be opened, it must be enabled. This ensures  
1240that its redo will be found during media recovery. A thread may be  
1241enabled in either public or private mode. A private thread can only  
1242be mounted by an instance that specifies it in the THREAD system  
1243initialization parameter. This is analogous to rollback segments. A  
1244thread must have at least two online redo log groups while it is  
1245enabled. An enabled thread always has one online log that is its  
1246current log. The next SCN of the current log is infinite, so that any  
1247new SCN allocated will be within the current log. A special thread- 
1248enable redo record is written in the thread of an instance enabling a  
1249new thread (i.e. via ALTER DATABASE ENABLE THREAD).  
1250The thread-enable redo record is used by media recovery to start  
1251applying redo from the new thread. Note that this means it takes an  
1252open thread to enable another thread. This chicken and egg  
1253problem is resolved by having thread one automatically enabled  
1254publicly at database creation. This also means that databases that  
1255do not run in Parallel Server mode do not need to enable a thread. 
1256 
12573.12  Thread Disable 
1258 
1259If a thread is not going to be used for a long while, it is best to  
1260disable it. This means that media recovery will not expect any redo  
1261to be found in the thread. Once a thread is disabled, its logs may be  
1262dropped. A thread must be closed before it can be disabled. This  
1263ensures all its changes have been written to the datafiles. A new  
1264SCN is allocated to save as the next SCN for the current log. The  
1265log header is marked with this SCN and flags saying it is the end of  
1266a disabled thread. It is important that a new current SCN is  
1267allocated. This ensures the SCN in any checkpoint with this thread  
1268enabled will appear in one of the logs from the thread. Note that  
1269this means a thread must be open in order to disable another thread.  
1270Thus, it is not possible to disable all threads. 
1271 
1272 
1273 
1274 
1275 
1276 
1277 
1278 
1279 
12804  Hot Backup 
1281 
1282A hot backup is a copy of a datafile that is taken while the file is in  
1283active use. Datafile writes (by DBWR) go on as usual during the  
1284time the backup is being copied. Thus, the backup gets a "fuzzy"  
1285copy of the datafile: 
1286 
12877	Some blocks may be ahead in time versus other blocks of the  
1288copy. 
1289 
12907	Some blocks of the copy may be ahead of the checkpoint SCN  
1291in the file header of the copy. 
1292 
12937	Some blocks may contain updates that constitute breakage of  
1294the redo record atomicity guarantee with respect to other  
1295blocks in this or other datafiles. 
1296 
12977	Some block copies may be "fractured" (due to front and back  
1298halves being copied at different times, with an intervening  
1299update to the block on disk). 
1300 
1301The "hotbackup-fuzzy" copy is unusable without "focusing" (via  
1302the redo log) that occurs when the backup is restored and  
1303undergoes media recovery. Media recovery applies redo (from all  
1304threads) from the begin-backup checkpoint SCN (see Step 2. in  
1305Section 4.1) through the end-point of the recovery operation (either  
1306complete or incomplete). The result is a transaction-consistent  
1307"focused" version of the datafile. 
1308 
1309There are three steps to taking a hot backup: 
1310 
13117	Execute the ALTER TABLESPACE ... BEGIN BACKUP  
1312command. 
1313 
13147	Use an operating system copy utility to copy the constituent  
1315datafiles of the tablespace(s). 
1316 
13177	Execute the ALTER TABLESPACE ... END BACKUP com- 
1318mand. 
1319 
13204.1  BEGIN BACKUP 
1321 
1322The BEGIN BACKUP command takes the following actions (not  
1323necessarily in the listed order) for each datafile of the tablespace: 
1324 
13251.	It sets a flag in the datafile header - the hotbackup-fuzzy bit  
1326- to indicate that the file is in hot backup. The header with  
1327this flag set (copied by the copy utility) enables the copy to be  
1328recognized as a hot backup. A further purpose of this flag in  
1329the online file header is to cause the checkpoint in the file  
1330header to be "frozen" at the begin-backup checkpoint value  
1331that will be set in Step 4. This is the value that it must have in  
1332the backup copy in order to ensure that, when the backup is  
1333recovered, media recovery will start redo application at a suffi- 
1334ciently early checkpoint SCN so as to cover all changes to the  
1335file in all threads since the execution of BEGIN BACKUP (see  
13366.5). Since we cannot guarantee that the file header will be the  
1337first block to be written out by the copy utility, it is important  
1338that the file header checkpoint structure remain "frozen" until  
1339END BACKUP time. This flag keeps the datafile checkpoint  
1340structure "frozen" during hot backup, preventing it (and the  
1341checkpoint SCN in the datafile's controlfile record) from being  
1342updated during thread checkpoint events that advance the  
1343database checkpoint. New in v7.2: While the file is in hot  
1344backup, a new "backup" checkpoint structure in the datafile  
1345header receives the updates that the "frozen" checkpoint  
1346would have received. 
1347 
13482.	It executes a datafile checkpoint, capturing the resultant  
1349"begin-backup" checkpoint information, including the begin- 
1350backup checkpoint SCN. When the file is checkpointed, all  
1351instances are requested to write out all dirty buffers they have  
1352for the file. If the need for instance recovery is detected at this  
1353time, the file checkpoint operation waits until it is completed  
1354before proceeding. Checkpointing the file at begin-backup  
1355time ensures that only file blocks changed after begin-backup  
1356time might have been written to disk during the course of the  
1357file copy. This guarantee is crucial to enabling block before- 
1358image logging to cope with the fractured block problem, as  
1359described in Step 3. 
1360 
13613.	[Platform-dependent option]: It starts block before-image log- 
1362ging for the file. During block before-image logging, all  
1363instances log a full block before-image to the redo log prior to  
1364the first change to each block of the file (since the backup  
1365started, or since the block was read anew into the buffer  
1366cache). This is to forestall a recovery problem that would arise  
1367if the backup were to contain a fractured block copy (mis- 
1368matched halves). This could happen if (the database block size  
1369is greater than the operating system block size, and) the front  
1370and back halves of the block were copied to the backup at dif- 
1371ferent times - with an intervening update to the block on  
1372disk. In this eventuality, recovery can reconstruct the block  
1373using the logged block before-image. 
1374 
13754.	It sets the checkpoint in the file header equal to the begin- 
1376backup checkpoint captured in Step 2. This file header check- 
1377point will be "frozen" until END BACKUP is executed. 
1378 
13795.	It clears the file's online-fuzzy bit. The online-fuzzy bit  
1380remains clear during the course of the file copy operation, thus  
1381ensuring a cleared online-fuzzy bit in the file copy. Note that  
1382the online-fuzzy bit is set again by the execution of END  
1383BACKUP. 
1384 
13854.2  File Copy 
1386 
1387The file copy is done by utilities that are not part of Oracle. The  
1388presumption is that the platform vendor will have backup facilities  
1389that are superior to any portable facility that we could develop. It is  
1390the responsibility of the administrator to ensure that copies are only  
1391taken between the BEGIN BACKUP and END BACKUP  
1392commands, or when the file is not in use. 
1393 
13944.3  END BACKUP 
1395 
1396The END BACKUP command takes the following actions for each  
1397datafile of the tablespace: 
1398 
13991.	It restores (i.e. sets) the file's online-fuzzy bit. 
1400 
14012.	It creates an end-backup redo record (end-backup "marker")  
1402for the datafile. This record, interpreted only by media recov- 
1403ery, contains the begin-backup checkpoint SCN (i.e. the SCN  
1404matching that in the "frozen" checkpoint in the backup's  
1405header). This record serves to mark the end of the redo gener- 
1406ated during the backup. The end-backup "marker" is used by  
1407media recovery to determine when all redo generated between  
1408BEGIN BACKUP and END BACKUP has been applied to the  
1409datafile. Upon encountering the end-backup "marker", media  
1410recovery can (at the next media recovery checkpoint: see  
14116.7.1) clear the hotbackup-fuzzy bit. This is only important in  
1412preventing an incomplete recovery that might erroneously  
1413attempt to end before all redo generated between BEGIN  
1414BACKUP and END BACKUP has been applied. Ending  
1415incomplete recovery at such a point may result in an inconsis- 
1416tent file, since the backup copy may already have contained  
1417changes beyond this endpoint. As will be seen on 8.1, open  
1418with resetlogs following incomplete media recovery will fail if  
1419any online datafile has the hotbackup-fuzzy bit (or any other  
1420fuzzy bit) set. 
1421 
14223.	It clears the file's hotbackup-fuzzy bit. 
1423 
14244.	It stops block before-image logging for the file. 
1425 
14265.	It advances the file checkpoint to the current database check- 
1427point. This compensates for any file header update(s) missed  
1428during thread checkpoints that may have advanced the data- 
1429base checkpoint while the file was in hot backup state, with its  
1430checkpoint "frozen". 
1431
14324.4  "Crashed" Hot Backup 
1433 
1434A normal shutdown of the instance that started a backup, or the last  
1435remaining instance, is not allowed while any files are in hot  
1436backup. Nor may a file in backup be taken offline normal or  
1437temporary. This is to ensure an end-backup "marker" is generated  
1438whenever possible, and to make administrators aware that they  
1439forgot to issue the END BACKUP command, and that the backup  
1440copy is unusable. 
1441 
1442When an instance failure or shutdown abort leaves a hot backup  
1443operation incomplete (i.e. lacking termination via END BACKUP),  
1444any file that was in backup before the failure has its hotbackup- 
1445fuzzy bit set and its checkpoint "frozen" at the begin-backup  
1446checkpoint. Even though the online file's datablocks are actually  
1447current to the database checkpoint, the file's header makes it look  
1448like a restored backup that needs media recovery and is current  
1449only to the begin-backup checkpoint. Crash recovery will fail -  
1450claiming media recovery is required - if it encounters an online  
1451file in "crashed" hot backup state. The file does not actually need  
1452media recovery, however, but only an adjustment to its file header  
1453to take it out of "crashed" hot backup state. 
1454 
1455Media recovery could be used to recover and allow normal open of  
1456a database that has files left in "crashed" hot backup state. For v7.2  
1457however, a preferable option - because it requires no archived  
1458logs - is to use the (new in v7.2) command ALTER DATABASE  
1459DATAFILE... END BACKUP on the files left in "crashed" hot  
1460backup state (identifiable using the V$BACKUP fixed-view: see  
14619.6). Following execution of this command, crash recovery will  
1462suffice to open the database. Note that the ALTER TABLESPACE  
1463... END BACKUP format of the command cannot be used when the  
1464database is not open. This is because the database must be open in  
1465order to translate (via the data dictionary) tablespace names into  
1466their constituent datafile names. 
1467 
1468 
1469 
1470 
1471 
1472 
1473 
1474 
1475 
14765  Instance Recovery 
1477 
1478Instance recovery is used to recover from both crash failures and  
1479Parallel Server instance failures. Instance recovery refers either to  
1480crash recovery or to Parallel Server instance recovery (where a  
1481surviving instance recovers when one or more other instances fail). 
1482 
1483The goal of instance recovery is to restore the datablock changes  
1484that were in the cache of the dead instance and to close the thread  
1485that was left open. Instance recovery uses only online redo logfiles  
1486and current online datafiles (not restored backups). It recovers one  
1487thread at a time, starting at the most recent thread checkpoint and  
1488continuing until end-of-thread. 
1489 
14905.1  Detection of the Need for Instance Recovery 
1491 
1492The kernel performs instance recovery automatically upon  
1493detecting that an instance died leaving its thread-open flag set in  
1494the controlfile. Instance recovery is performed automatically on  
1495two occasions:  
1496 
14971.	at the first database open after a crash (crash recovery); 
1498 
14992.	when some but not all instances of a Parallel Server fail. 
1500 
1501In the case of Parallel Server, a surviving instance detects the need  
1502to perform instance recovery for one or more failed instances by  
1503the following means: 
1504 
15051.	A foreground process in a surviving instance detects an  
1506"invalid block lock" condition when it attempts to bring a  
1507datablock into the buffer cache. This is an indication that  
1508another instance died while a block covered by that lock was  
1509in a potentially "dirty" state in its buffer cache. 
1510 
15112.	The foreground process sends a notification to its instance's  
1512SMON process, which begins a search for dead instances. 
1513 
15143.	The death of another instance is detected if the current  
1515instance is able to acquire that instance's thread-opened locks  
1516(see 3.9). 
1517 
1518SMON in the surviving instance obtains a stable list of dead  
1519instances, together with a list of "invalid" block locks. Note: After  
1520instance recovery is complete, locks in this list will undergo "lock  
1521cleanup" (i.e. they will have their "invalid" condition cleared,  
1522making the underlying blocks accessible again). 
1523 
15245.2  Thread-at-a-Time Redo Application  
1525 
1526Instance recovery operates by processing one thread at a time,  
1527thereby recovering one instance at a time. It applies all redo (from  
1528the thread checkpoint through the end-of-thread) from each thread  
1529before starting on the next thread. This algorithm depends on the  
1530fact that only one instance at a time can have a given block  
1531modified in its cache. Between changes to the block by different  
1532instances, the block is written to disk. Thus, a given block (as read  
1533from disk during instance recovery) can need redo applied from at  
1534most one thread - the thread containing the most recent  
1535modification. 
1536 
1537Instance recovery can always be accomplished using the online  
1538redo logs for the thread being recovered. Crash recovery operates  
1539on the thread with the lowest checkpoint SCN first. It proceeds to  
1540recover the threads in the order of increasing thread checkpoint  
1541SCNs. This ensures that the database checkpoint is advanced by  
1542each thread recovered. 
1543 
15445.3  Current Online Datafiles Only 
1545 
1546The checkpoint counters are used to ensure that the datafiles are the  
1547current online files rather than restored backups. If a backup copy  
1548of a datafile is restored, then media recovery is required. 
1549 
1550Media recovery is required for a restored backup even if recovery  
1551can be accomplished using the online logs. The reason is that crash  
1552recovery applies all post-thread-checkpoint redo from each thread  
1553before starting on the next thread. Crash recovery can use this  
1554thread-at-a-time redo application algorithm because a given  
1555datablock can need redo application from at most one thread. 
1556 
1557However, starting recovery from a restored backup enables no such  
1558assumption about the number of threads that have relevant redo.  
1559Thus, the thread-at-a-time algorithm would not work. Recovering a  
1560backup requires thread-merged redo application: i.e. application of  
1561all post-file-checkpoint redo, simultaneously merging redo from all  
1562threads in SCN order. This thread-merged redo application  
1563algorithm is the one used by media recovery (see Section 6). 
1564 
1565Crash recovery would not suffice - even with thread-merged redo  
1566application - to recover a backup datafile, even if it were  
1567checkpointed at the current database checkpoint. The reason is that  
1568in all but the database checkpoint thread, crash recovery would  
1569miss applying redo between the database checkpoint and the  
1570(higher) thread checkpoint. By contrast, media recovery would  
1571start redo application at the file checkpoint in all threads.  
1572Furthermore, crash recovery might fail even if it started redo  
1573application at the file checkpoint in all threads. The reason is that  
1574crash recovery assumes that it will need only online logfiles. All  
1575but the database checkpoint thread might have already archived  
1576and re-used a needed log. 
1577 
1578If the STARTUP RECOVER command is used (in place of simple  
1579STARTUP), and crash recovery fails due to datafiles needing  
1580media recovery (e.g. they are restored backups), then media  
1581recovery via RECOVER DATABASE (see 6.4.1) is automatically  
1582executed prior to database open. 
1583 
15845.4  Checkpoints 
1585 
1586Instance recovery does not attempt to apply redo that is before the  
1587checkpoint SCN of a datafile. (The datafile header checkpoint  
1588SCNs are not used to decide where to start recovery, however.) 
1589 
1590The redo from the thread checkpoint through the end-of-thread  
1591must be read to find the end-of-thread and the highest SCN  
1592allocated by the thread. These are then used to close the thread and  
1593advance the thread checkpoint. The end of a instance recovery  
1594almost always advances the datafile checkpoints, and always  
1595advances the checkpoint counters. 
1596 
15975.5  Crash Recovery Completion 
1598 
1599At the termination of crash recovery, the "fuzzy bits" - online- 
1600fuzzy, hotbackup-fuzzy, media-recovery-fuzzy - of all online  
1601datafiles are cleared. A special redo record, the end-crash-recovery  
1602"marker," is generated. This record is interpreted by media  
1603recovery to know when it is permissible to clear the online-fuzzy  
1604and hotbackup-fuzzy bits of the datafiles undergoing recovery (see  
16056.6). 
1606 
1607 
1608 
1609 
1610 
1611 
1612 
1613 
1614 
16156  Media Recovery 
1616 
1617Media recovery is used to recover from a lost or damaged datafile,  
1618or from a lost current controlfile. It is used to transform a restored  
1619datafile backup into a "current" datafile. It is also used to restore  
1620changes that were lost when a datafile went offline without a  
1621checkpoint. Media recovery can apply archived logs as well as  
1622online logs. Unlike instance or crash recovery, media recovery is  
1623invoked only via explicit command. 
1624 
16256.1  When to Do Media Recovery 
1626 
1627As was seen in 5.3, a restored datafile backup always needs media  
1628recovery, even if its recovery can be accomplished using only  
1629online logs. The same is true of a datafile that went offline without  
1630a checkpoint. The database cannot be opened if any of the online  
1631datafiles needs media recovery. A datafile that needs media  
1632recovery cannot be brought online until media recovery has been  
1633executed. Unless the database is not open by any instance, media  
1634recovery can only operate on offline files. Media recovery may be  
1635explicitly invoked to recover a database prior to open even when  
1636crash recovery would have sufficed. If so, crash recovery - though  
1637it may find nothing to do - will still be invoked automatically at  
1638database open. Note that media recovery may be run - and, in  
1639cases such as restored backups or datafiles that went offline  
1640immediate, must be run - even if recovery can be accomplished  
1641using only the online logs. Media recovery may find nothing to do  
1642- and signal the "no recovery required" error - if invoked for  
1643files that do not need recovery. 
1644 
1645If the current controlfile is lost and a backup controlfile is restored  
1646in its place, media recovery must be done. This is the case even if  
1647all of the datafiles are current. 
1648 
16496.2  Thread-Merged Redo Application 
1650 
1651Media recovery uses a thread-merged redo application algorithm:  
1652i.e. it applies redo from all threads simultaneously, merging redo  
1653records in increasing SCN order. The process of media-recovering  
1654a backup datafile differs from the process of crash-recovering a  
1655current online datafile in the following fundamental way: Crash  
1656recovery applies redo from one thread at a time because any block  
1657of a current online file can need redo from at most one thread (one  
1658instance at a time can dirty a block in cache). With a restored  
1659backup, however, no assumption can be made about the number of  
1660threads that have redo relevant to particular block. In general,  
1661recovering a backup requires simultaneous application of redo  
1662from all threads, with merging of redo records across threads in  
1663SCN order. Note that this algorithm depends on a redo-generation- 
1664time guarantee that changes for a given block occur in increasing  
1665SCN order across threads (case of Parallel Server).  
1666 
16676.3  Restoring Backups 
1668 
1669The administrator may copy backup versions of datafiles to the  
1670current datafile while the database is shut down or the file is offline.  
1671There is a strong assumption that backups are never copied to files  
1672that are currently accessible. Every file header read verifies that this  
1673has not been done by comparing the checkpoint counter in the file  
1674header with the checkpoint counter in the datafile's controlfile  
1675record. 
1676 
16776.4  Media Recovery Commands 
1678 
1679   There are three media recovery commands: 
1680 
16817	RECOVER DATABASE 
1682 
16837	RECOVER TABLESPACE 
1684 
16857	RECOVER DATAFILE 
1686 
1687The only essential difference in these commands is in how the set  
1688of files to recover is determined. They all use the same criteria for  
1689determining if the files can be recovered. There is a lock per  
1690datafile that is held exclusive by a process doing media recovery on  
1691a file, and is held shared by an instance that has the database open  
1692with the file online. Media recovery signals an error if it cannot get  
1693the lock for a file it is asked to recover. This prevents two recovery  
1694sessions from recovering the same file, and prevents media  
1695recovery of a file that is in use. 
1696 
16976.4.1  RECOVER DATABASE 
1698 
1699This command does media recovery on all online datafiles that  
1700need any redo applied. If all instances were cleanly shutdown, and  
1701no backups were restored, this command will signal the "no  
1702recovery required" error. It will also fail if any instances have the  
1703database open, since they will have the datafile locks. 
1704 
17056.4.2  RECOVER TABLESPACE 
1706 
1707This command does media recovery on all datafiles in the  
1708tablespaces specified. In order to translate (i.e. via the data  
1709dictionary) the tablespace names into datafile names, the database  
1710must be open. This means that the tablespaces and their constituent  
1711datafiles must be offline in order to do the recovery. An error is  
1712signalled if none of the tablepace's constituent files needs recovery. 
1713 
17146.4.3  RECOVER DATAFILE 
1715 
1716This command specifies the datafiles to be recovered. The database  
1717may be open; or it may be closed, as long as the media recovery  
1718locks can be acquired. If the database is open in any instance, then  
1719datafile recovery can only recover offline files. 
1720 
17216.5  Starting Media Recovery 
1722 
1723Media recovery starts by finding the media-recovery-start SCN: i.e.  
1724the lowest SCN of the datafile header checkpoints of the files being  
1725recovered. Note: An exception occurs if a file's checkpoint is in its  
1726offline range (see 2.18). In that case, the file's offline-end  
1727checkpoint is used in place of its datafile header checkpoint in  
1728computing the media-recovery-start SCN. 
1729 
1730A buffer for reading redo is allocated for each thread in the enabled  
1731thread bitvec of the media-recovery-start checkpoint (i.e. the  
1732datafile checkpoint with the lowest SCN). The initial file header  
1733checkpoint SCN of every file is saved to ensure that no redo from a  
1734previous use of the file number is applied, as well as to eliminate  
1735needlessly attempting to apply redo to a file from before its  
1736checkpoint. The stop SCNs (from the datafiles' controlfile records)  
1737are also saved. If finite, the highest stop SCN can be used to allow  
1738recovery to terminate without needlessly searching for redo beyond  
1739that SCN to apply (see 6.10). At recovery completion, any datafile  
1740initially found to have a finite stop SCN will be left checkpointed at  
1741that stop SCN (rather than at the recovery end-point). This allows  
1742an offline-clean or read-only datafile to be left checkpointed at an  
1743SCN that matches the tablespace-clean-stop-SCN of its tablespace. 
1744 
17456.6  Applying Redo, Media Recovery Checkpoints 
1746 
1747A log is opened for each thread of redo that was enabled at the time  
1748the media-recovery-start SCN was allocated (i.e. for each thread in  
1749the enabled thread bitvec of the media-recovery-start checkpoint).  
1750If the log is online, then it is automatically opened. If the log was  
1751archived, then the user is prompted to enter the name of the log  
1752(unless automatic recovery is being used). The redo is applied from  
1753all the threads in the order it was generated, switching threads as  
1754needed. The order of application of redo records without an SCN is  
1755not precise, but it is good enough for rollback to make the database  
1756consistent. 
1757 
1758Except in the case of cancel-based incomplete recovery (see  
17596.12.1) and backup controlfile recovery (see 6.13), the next online  
1760log in sequence is accessed automatically, if it is on disk. If not, the  
1761user is prompted for the next log. 
1762 
1763At log boundaries, media recovery executes a "checkpoint." As  
1764part of media recovery checkpoint, the dirty recovery buffers are  
1765written to disk and the datafile header checkpoints of the files  
1766undergoing recovery are advanced, so that the redo does not need  
1767to be reapplied. Another type of media recovery "checkpoint"  
1768occurs when a datafile initially found to have a finite stop SCN  
1769reaches that stop SCN. At such a stop SCN boundary, all dirty  
1770recovery buffers are written to disk, and the datafiles that have been  
1771made current have their datafile header checkpoints advanced to  
1772their stop SCN values. 
1773 
17746.7  Media Recovery and Fuzzy Bits 
1775 
17766.7.1  Media-Recovery-Fuzzy 
1777 
1778The media-recovery-fuzzy bit is a flag in the datafile header that is  
1779used to indicate that - due to ongoing redo application by media  
1780recovery - the file may contain changes in the future of (at SCNs  
1781beyond) the current header checkpoint SCN. The media-recovery- 
1782fuzzy bit is set at the start of media recovery for each file  
1783undergoing recovery. Generally the media-recovery-fuzzy bits can  
1784be cleared when a media recovery checkpoint advances the  
1785checkpoints in the datafile headers. They are left clear when a  
1786media recovery session completes successfully or is cancelled. As  
1787will be seen on 8.1, open with resetlogs following incomplete  
1788media recovery will fail if any online datafile has the media- 
1789recovery-fuzzy bit (or any fuzzy bit) set. 
1790 
17916.7.2  Online-Fuzzy 
1792 
1793Upon encountering an end-crash-recovery "marker" (or a file- 
1794specific offline-immediate "marker": generated when a datafile  
1795goes offline without a checkpoint), media recovery can (at the next  
1796media recovery checkpoint) clear (if set) the online-fuzzy and  
1797hotbackup-fuzzy bits in the appropriate datafile header(s). 
1798 
17996.7.3  Hotbackup-Fuzzy 
1800 
1801Upon encountering an end-backup "marker" (or an end-crash- 
1802recovery "marker"), media recovery can (at the next media  
1803recovery checkpoint) clear the hotbackup-fuzzy bit. Open with  
1804resetlogs following incomplete media recovery will fail if any  
1805online datafile has the hotbackup-fuzzy bit (or any fuzzy bit) set.  
1806This prevents a successful RESETLOGS open following an  
1807incomplete recovery that terminated before all redo generated  
1808between BEGIN BACKUP and END BACKUP had been applied.  
1809Ending incomplete recovery at such a point would generally result  
1810in an inconsistent file, since the backup copy may already have  
1811contained changes between this endpoint and the END BACKUP. 
1812 
18136.8  Thread Enables 
1814 
1815A special thread-enable redo record is written in the thread of an  
1816instance enabling a new thread. If media recovery encounters a  
1817thread-enable redo record, it allocates a new redo buffer, opens the  
1818appropriate log in the new thread, and prepares to start applying  
1819redo from the new thread. 
1820 
18216.9  Thread Disables 
1822 
1823When a thread is disabled, its current log is marked as the end of a  
1824disabled thread. After media recovery finishes applying redo from  
1825such a log, it deallocates the thread's redo buffer and stops looking  
1826for redo from the thread. 
1827 
18286.10  Ending Media Recovery (Case of Complete Media Recovery) 
1829 
1830The current (i.e. last) log in every enabled thread has the end-of- 
1831thread flag set in its header. Complete (as opposed to incomplete:  
1832see 6.12) media recovery always continues redo application  
1833through the end-of-thread in all threads. The end-of-thread log can  
1834be identified without having the current controlfile, since the end- 
1835of-thread flag is in the log header rather than in the logfile's  
1836controlfile record. 
1837 
1838Note: Backing up and later restoring copies of current online logs  
1839is dangerous, and can lead to mis-identification of the current true  
1840end-of-thread. This is because the end-of-thread flag in the backup  
1841copy will in general be out-of-date with respect to the current end- 
1842of-thread log. 
1843 
1844If the datafiles being recovered have finite stop SCNs in their  
1845controlfile records (assuming a current controlfile), then media  
1846recovery can stop prior to the end-of-threads. Redo application for  
1847a datafile with a finite stop SCN can terminate at that SCN, since it  
1848is guaranteed that no redo for that datafile beyond that SCN was  
1849generated. 
1850 
1851As described on 2.15, the stop SCN is set when a datafile goes  
1852offline. Note that without the optimization that allows recovery of a  
1853file with a finite stop SCN to terminate at that SCN, it could not be  
1854guaranteed that recovery of an offline datafile while the database is  
1855open would terminate. 
1856 
18576.11  Automatic Recovery 
1858 
1859Automatic recovery is invoked by using the AUTOMATIC option  
1860of the media recovery command. It saves the user the trouble of  
1861entering the names of archived logfiles, provided they are on disk.  
1862If the sequence number of the log can be determined, then a name  
1863can be constructed by concatenating the current values of the  
1864initialization parameters LOG_ARCHIVE_DEST and  
1865LOG_ARCHIVE_FORMAT. The current LOG_ARCHIVE_DEST  
1866is assumed, unless the user overrides it by specifying a different  
1867archiving destination for the recovery session. The media- 
1868recovery-start checkpoint (see 6.5) contains (in the RBA field) the  
1869initial log sequence number for one thread (i.e. the thread that  
1870generated the checkpoint). If multiple threads of redo are enabled,  
1871the log history section of the controlfile (if configured) can be used  
1872to map the media-recovery-start SCN to a log sequence number for  
1873each thread. Once the initial recovery log is found for a thread, all  
1874subsequent logs needed from the thread follow in order. If it is not  
1875possible to determine the initial log sequence number, the user will  
1876have to guess and try logs until the right one is accepted. The  
1877timestamp from the media-recovery-start checkpoint is reported to  
1878aid in this effort. 
1879 
18806.12  Incomplete Recovery 
1881 
1882A RECOVER DATABASE execution can be stopped and the  
1883database opened before all the redo has been applied. This type of  
1884recovery is termed incomplete recovery. The subsequent database  
1885open is termed a RESETLOGS open. 
1886 
1887Incomplete recovery effectively sets the entire database backwards  
1888in time to a transaction-consistent state at or near the recovery end- 
1889point. All subsequent updates to the database are lost and must be  
1890re-entered. 
1891 
1892Use of incomplete recovery is indicated in the following  
1893circumstances: 
1894 
18957	Media recovery is necessary (e.g. due to datafile damage or  
1896loss), but cannot be complete (i.e. all redo cannot be applied)  
1897because all copies of a needed online or archived redo log  
1898were lost. 
1899 
19007	All copies of an active (i.e. needed for instance recovery) log  
1901were damaged or lost while the database was open. Since  
1902crash recovery is precluded, this case reduces to the previous  
1903case. 
1904 
19057	It is necessary to reverse the effect of an erroneous user action  
1906(e.g. table drop or batch run); and it is acceptable to set the  
1907entire database - not just the affected schema objects -  
1908backwards to a point-in-time before the error. 
1909 
19106.12.1  Incomplete Recovery UNTIL Options 
1911 
1912There are three types of incomplete recovery. They differ in the  
1913means used to stop the recovery: 
1914 
19157	Cancel-Based (RECOVER DATABASE UNTIL CANCEL) 
1916 
19177	Change-Based (RECOVER DATABASE UNTIL CHANGE) 
1918 
19197	Time-Based (RECOVER DATABASE UNTIL TIME) 
1920 
1921The UNTIL CANCEL option terminates recovery when the user  
1922enters "cancel" rather than the name of a log. Online logs are not  
1923automatically applied in this mode in case cancellation at the next  
1924log is desired. If multiple threads of redo are being recovered, there  
1925may be logs in other threads that are partially applied when the  
1926recovery is cancelled. 
1927 
1928The UNTIL CHANGE option terminates redo application just  
1929before any redo associated with the specified SCN or higher. Thus  
1930the transaction that committed at that SCN will be rolled back. If  
1931you want to recover through a transaction that committed at a  
1932specific SCN, then add one to the specified SCN. 
1933 
1934The UNTIL TIME option works similarly to the UNTIL CHANGE  
1935option, except that a time rather than an SCN is specified.  
1936Recovery uses the timestamps in the redo block headers to convert  
1937the specified time into an SCN. Then recovery is stopped when that  
1938SCN is reached. 
1939 
19406.12.2  Incomplete Recovery and Consistency 
1941 
1942In order to avoid database corruption when running incomplete  
1943recovery, all datafiles must be recovered to the exact same point.  
1944Furthermore, no datafile must have any changes in the future of this  
1945point. This requires that incomplete media recovery must start from  
1946datafiles restored from backups whose copies completed prior to  
1947the intended stop time. The system uses file header fuzzy bits (see  
19488.1) to ensure that the datafiles contain no changes in the future of  
1949the stop time. 
1950 
19516.12.3  Incomplete Recovery and Datafiles Known to the Controlfile 
1952 
1953If recovering to a time before a datafile was dropped, the dropped  
1954file must appear in the controlfile used for recovery. Otherwise it  
1955would not be recovered. One alternative for achieving this is to  
1956recover using a backup controlfile made before the datafile was  
1957dropped. Another alternative is to use the CREATE  
1958CONTROLFILE command to construct a controlfile that lists the  
1959dropped datafile. 
1960 
1961Recovering to a time before a file was added is not a problem. The  
1962extra datafile will be eliminated from the controlfile after the  
1963database is open. The unwanted file may be taken offline before the  
1964recovery to avoid accessing it. 
1965 
19666.12.4  Resetlogs Open after Incomplete Recovery 
1967 
1968The next database open after an incomplete recovery must specify  
1969the RESETLOGS option. Amongst other effects (see Section 7),  
1970resetlogs throws away the redo that was not applied during the  
1971incomplete recovery, and marks the database so that the skipped  
1972redo can never be accidentally applied by a subsequent recovery. If  
1973the incomplete recovery was a mistake (e.g. the lost log was  
1974found), the next open can specify the NORESETLOGS option.  
1975However, for the open with NORESETLOGS to succeed, it must  
1976be preceded by a successful execution of complete recovery (i.e.  
1977one in which all redo is applied). 
1978 
19796.12.5  Files Offline during Incomplete Recovery 
1980 
1981If a file is offline during incomplete recovery, it will not be  
1982recovered. This is ok if the file is part of a tablespace that was taken  
1983offline normal, and that is still offline normal at the recovery end- 
1984point. Otherwise, if the file is still offline when the resetlogs is  
1985done, the tablespace containing the file will have to be dropped.  
1986This is because it will need media recovery with logs from before  
1987the resetlogs. In general V$DATAFILE should be checked to  
1988ensure that files are online before running an incomplete recovery.  
1989Only files that will be dropped and files that are part of offline  
1990normal (or read-only) tablespaces should be offline (Section 8.6). 
1991 
19926.13  Backup Controlfile Recovery 
1993 
1994If recovery is done with a controlfile other than the current one,  
1995then backup controlfile recovery (RECOVER  
1996DATABASE...USING BACKUP CONTROLFILE) must be used.  
1997This applies both to the case of a restored controlfile backup, and to  
1998the case of a "backup" controlfile created via CREATE  
1999CONTROLFILE...RESETLOGS. 
2000 
2001Use of CREATE CONTROLFILE...RESETLOGS makes a  
2002controlfile that is a "backup." Only a backup controlfile recovery  
2003can be run after executing CREATE  
2004CONTROLFILE...RESETLOGS. Only a RESETLOGS open can  
2005be used after executing CREATE  
2006CONTROLFILE...RESETLOGS. Use of CREATE  
2007CONTROLFILE...RESETLOGS is indicated if (all copies of) an  
2008online redo log were lost in addition to (all copies of) the control  
2009file. 
2010 
2011By contrast, CREATE CONTROLFILE...NORESETLOGS makes  
2012a controlfile that is "current"; i.e. it has knowledge of the current  
2013state of the online logfiles and log sequence numbers. A backup  
2014controlfile recovery is not necessary following CREATE  
2015CONTROLFILE...NORESETLOGS. Indeed, no recovery at all is  
2016required if there was a clean shutdown, and if no datafile backups  
2017have been restored. A normal or NORESETLOGS open may  
2018follow CREATE CONTROLFILE ...NORESETLOGS. 
2019 
2020A backup controlfile lacks valid information about the current  
2021online logs and datafile stop SCNs. Hence, recovery cannot look  
2022for online logs to automatically apply. Moreover, recovery must  
2023assume infinite stop SCN's. A RESETLOGS open corrects this  
2024information. The backup controlfile may have a different set of  
2025threads enabled than did the original controlfile. That set will be the  
2026effective enabled thread set following RESETLOGS open. 
2027 
2028The BACKUP CONTROLFILE option may be used either alone or  
2029in conjunction with an incomplete recovery option. Unless an  
2030incomplete recovery option is included, all threads must be applied  
2031to the end-of-thread. This is validated at open resetlogs time. 
2032 
2033It is currently required that a RESETLOGS open follow execution  
2034of backup controlfile recovery, even if no incomplete recovery  
2035option was used. The following procedure could be used to avoid a  
2036backup controlfile recovery and resetlogs in case the only problem  
2037is a lost current controlfile (and a backup controlfile exists): 
2038 
20391.	Copy the backup controlfile to the current control file and do a  
2040STARTUP MOUNT. 
2041 
20422.	Issue ALTER DATABASE BACKUP CONTROLFILE TO  
2043TRACE NORESETLOGS. 
2044 
20453.	Issue the CREATE CONTROLFILE...NORESETLOGS com- 
2046mand from the SQL script output by Step 2.  
2047 
2048It is important to assure that the CREATE CONTROLFILE  
2049command issued in Step 3 creates a controlfile reflecting a database  
2050structure equivalent to that of the lost current controlfile. For  
2051example, if a datafile was added since the backup controlfile was  
2052saved, then the CREATE CONTROLFILE command should be  
2053modified to declare the added datafile. 
2054 
2055Failure to specify the BACKUP CONTROLFILE option on the  
2056RECOVER DATABASE command when the controlfile is indeed a  
2057backup can frequently be detected. One indication of a restored  
2058backup controlfile would be a datafile header checkpoint count that  
2059is greater than the checkpoint count in the datafile's controlfile  
2060record. However, this test may not catch the backup controlfile if  
2061the datafiles are also backups. Another test validates the online  
2062logfile headers against their corresponding controlfile records, but  
2063this too may not always catch an old controlfile. 
2064 
20656.14  CREATE DATAFILE: Recover a Datafile Without a Backup 
2066 
2067If a datafile is lost or damaged and no backup of the file is  
2068available, it can be recovered using only information in the redo  
2069logs and control file. The following conditions must be met: 
2070 
20711.	All redo logs written since the datafile was originally created  
2072must be available. 
2073 
20742.	A control file in which the datafile is declared (i.e. name and  
2075size information) must be available or re-creatable. 
2076 
2077The CREATE DATAFILE clause of the ALTER DATABASE  
2078command is first used to create a new, empty replacement for the  
2079lost datafile. RECOVER DATAFILE is then used to apply all redo  
2080generated for the file from the time of its original creation until the  
2081time it was lost. After all redo logs written since the datafile was  
2082originally created have been applied, the file will have been  
2083restored to its state at the time it was lost. This mechanism is useful  
2084for recovering a recently-created datafile for which no backup has  
2085yet been taken. The original datafiles of the SYSTEM tablespace  
2086cannot be recovered by this means, however, since relevant redo  
2087data is not saved at database creation time. 
2088 
20896.15  Point-in-Time Recovery Using Export/Import 
2090 
2091Occasionally, it may become necessary to reverse the effect of an  
2092erroneous user action (e.g. table drop or batch run). One approach  
2093would be to perform an incomplete media recovery to a point-in- 
2094time before the corruption, then open the database with the  
2095RESETLOGS option. Using this approach, the entire database -  
2096not just the affected schema objects - would be set backwards in  
2097time. 
2098 
2099This approach has an undesirable side-effect: it discards committed  
2100transactions. Any updates that occurred subsequent to the resetlogs  
2101SCN are lost and must be re-entered. Resetlogs has another  
2102undesirable side-effect: it renders all pre-existing backups unusable  
2103for future recovery. 
2104 
2105Setting a mission-critical database globally back in time is often  
2106not an acceptable solution. The following procedure is an  
2107alternative whose effect on the mission-critical database is to set  
2108just the affected schema objects - termed the recovery-objects -  
2109backwards in time. 
2110 
2111Point-in-time incomplete media recovery is run against a side-copy  
2112of the production database, called the recovery-database. The  
2113initial version of the recovery-database is created using backups of  
2114the production database that were taken before the corruption  
2115occurred. Non-relevant objects in the recovery-database can be  
2116taken offline in order to avoid unnecessarily recovering them.  
2117However, the SYSTEM tablespace and all tablespaces containing  
2118rollback segments must participate in the media recovery in order  
2119to allow a clean open. (Note that this is a good reason to place  
2120rollback segments and data segments into separate tablespaces.) 
2121 
2122After it has undergone point-in-time incomplete media recovery,  
2123the recovery-database is opened with the RESETLOGS option.  
2124The recovery-database is now set backwards to a point-in-time  
2125before the recovery-objects were corrupted. This effectively  
2126creates pre-corruption versions of the recovery-objects in the  
2127recovery-database. These objects can then be exported from the  
2128recovery-database and imported back into the production database.  
2129Prior to importing the recovery-objects, the production database is  
2130prepared as follows: 
2131 
21327	In the case of recovering an erroneously updated schema  
2133object, the copy of the object in the production database is pre- 
2134pared by discarding just the data; e.g. the table is truncated. 
2135 
21367	In the case of recovering an erroneously dropped schema  
2137object, the object is re-created (empty) in the production data- 
2138base. 
2139 
2140The import operation is then executed, using the data-only option  
2141as appropriate. Since export/import can be a lengthy process, it  
2142may be desirable to postpone it until a time when recovery-object  
2143unavailability can be tolerated. In the meantime, the recovery- 
2144objects can be made available, albeit at degraded performance, via  
2145a database link between the production database and the recovery- 
2146database. 
2147 
2148An undesirable side-effect of this approach is that transaction  
2149consistency across objects is lost. This side-effect can be avoided  
2150by widening the recovery-object set to include all objects that must  
2151be kept transaction-consistent. 
2152 
2153 
2154 
2155 
2156 
2157 
2158 
2159 
2160 
21617  Block Recovery 
2162 
2163Block recovery is the simplest type of recovery. It is performed  
2164automatically by the system during normal operation of the  
2165database, and is transparent to the user. 
2166 
21677.1  Block Recovery Initiation and Operation 
2168 
2169Block recovery is used to clean up the state of a buffer whose  
2170modification by a foreground process (in the middle of invoking a  
2171redo application callback to apply a change vector to the buffer)  
2172was interrupted by the foreground process dying or signalling an  
2173error. Recovery involves (i) reading the block from disk; (ii) using  
2174the current thread's online redo logs to reconstruct the buffer to a  
2175state consistent with the redo already generated; and (iii) writing  
2176the recovered block back to disk. If block recovery fails, then after  
2177a second attempt, the block is marked logically corrupt (by setting  
2178the block sequence number to zero) and a corrupt block error is  
2179signalled. 
2180 
2181Block recovery is guaranteed doable using only the current thread's  
2182online redo logs, since: 
2183 
21841.	Block recovery cannot require redo from another thread or  
2185from before the last thread checkpoint. 
2186 
21872.	Online logs are not reused until the current thread checkpoint  
2188is beyond the log. 
2189 
21903.	No buffer currently in the cache can need recovery from  
2191before the last thread checkpoint. 
2192 
21937.2  Buffer Header RBA Fields 
2194 
2195The buffer header (an in-memory data structure) contains the  
2196following fields pertaining to block recovery: 
2197 
2198Low-RBA and High-RBA: Delineate the range of redo (from the  
2199current thread) that needs to be applied to the disk version of the  
2200block in order make it consistent with redo already generated. 
2201 
2202Recovery-RBA: A place marker for recording progress in case the  
2203invoker of block recovery is PMON and complete recovery in  
2204one invocation would take too long (see next section). 
2205 
22067.3  PMON vs. Foreground Invocation 
2207 
2208If an error is signalled while a foreground process is in a redo  
2209application callback, then the process itself executes block  
2210recovery. If foreground process death is detected during a redo  
2211application callback, on the other hand, PMON executes block  
2212recovery. 
2213 
2214Block recovery may require an unbounded amount of time and I/O.  
2215However, PMON cannot be allowed to spend an inordinate amount  
2216of time working on the recovery of one block while neglecting  
2217other necessary time-critical tasks. Therefore, a limit is placed on  
2218the amount of redo applied by one PMON call to block recovery.  
2219(A port-specific constant specifies the maximum number of redo  
2220log blocks applied per invocation). As PMON applies redo during  
2221invocations of block recovery, it updates the recovery-RBA in the  
2222buffer header to record its progress. When a PMON call to block  
2223recovery causes the recovery-RBA to reach the high-RBA, then  
2224block recovery for that block is complete. 
2225 
2226 
2227 
2228 
2229 
2230 
2231 
2232 
2233 
2234 
2235 
22368  Resetlogs 
2237 
2238The RESETLOGS option is needed on the first database open  
2239following: 
2240 
22417	Incomplete recovery 
2242 
22437	Backup controlfile recovery 
2244 
22457	CREATE CONTROLFILE...RESETLOGS. 
2246 
2247The primary function of resetlogs is to discard the redo that was not  
2248applied during incomplete recovery, ensuring that the skipped redo  
2249can never be accidentally applied by a subsequent recovery. To  
2250accomplish this, resetlogs effectively invalidates all existing redo  
2251in all online and archived redo logfiles. This has the side effect of  
2252making any existing datafile backups unusable for future recovery  
2253operations. 
2254 
2255Resetlogs also reinitializes the controlfile information about online  
2256logs and redo threads, clears the contents of any existing online  
2257redo log files, creates the online redo log files if they do not  
2258currently exist, and resets the log sequence number in all threads to  
2259one. 
2260 
22618.1  Fuzzy Files 
2262 
2263The most important requirement when doing a RESETLOGS open  
2264is that all datafiles be validated as recovered to the same point-in- 
2265time. This is what ensures that all the changes in a single redo  
2266record are done atomically. It is also important for other  
2267consistency reasons. If all threads of redo have been applied  
2268through end-of-thread to all online datafiles, then we can be sure  
2269that the database is consistent. 
2270 
2271If incomplete recovery was done, there is the possibility that a file  
2272was not restored from a sufficiently old backup. In the general case,  
2273this is detectable if the file has a different checkpoint than the other  
2274files (exceptions: offline or read-only files). 
2275 
2276The other possibility is that the file is fuzzy - i.e. it may contain  
2277changes in the future of its checkpoint. As seen earlier, the  
2278following "fuzzy bits" are maintained in the file header to  
2279determine if a file is fuzzy: 
2280 
22817	online-fuzzy bit (see 3.5, 6.7.2) 
2282 
22837	hotbackup-fuzzy bit (see 4, 6.7.3) 
2284 
22857	media-recovery-fuzzy bit (see 6.7.1) 
2286 
2287Open with resetlogs following incomplete media recovery will fail  
2288if any online datafile has any of the three fuzzy bits set. 
2289 
2290Redo records are created at the end of a hot backup (the end- 
2291backup "marker") and after crash recovery (the end-crash-recovery  
2292"marker") to enable media recovery to determine when it can clear  
2293the fuzzy bits. Resetlogs signals an error if any of the datafiles has  
2294any of the fuzzy bits set. 
2295 
2296Except in the following special circumstances, resetlogs signals an  
2297error if any of the datafiles is recovered to a checkpoint SCN  
2298different from the one at which the other files are checkpointed (i.e.  
2299the resetlogs SCN: see 8.2): 
2300 
23011.	A file recovered to an SCN earlier than the resetlogs SCN  
2302would be tolerated in case there were no redo generated for the  
2303file between its checkpoint SCN and the resetlogs SCN. For  
2304example, such would be the case if the file were read-only, and  
2305its offline range spanned the checkpoint SCN and resetlogs  
2306SCN. In this case, resetlogs would allow the file but set it  
2307offline. 
2308 
23092.	A file checkpointed at an SCN later than the resetlogs SCN  
2310would be tolerated in case its creation SCN (allocated at file  
2311creation time and stored in the file header) showed it to have  
2312been created after the resetlogs SCN. During the data dictio- 
2313nary vs. controlfile check performed by RESETLOGS open  
2314(see 8.7), such a file would be found to be missing from the  
2315data dictionary but present in the controlfile. As a conse- 
2316quence, it would be eliminated from the controlfile. 
2317 
23188.2  Resetlogs SCN and Counter 
2319 
2320A resetlogs SCN and resetlogs timestamp - known together as the  
2321resetlogs data - are kept in the database info record of the  
2322controlfile. The resetlogs data is intended to uniquely identify each  
2323execution of a RESETLOGS open. The resetlogs data is also stored  
2324in each datafile header and in each logfile header. A redo log cannot  
2325be applied by recovery if its resetlogs data does not match that in  
2326the database info record of the controlfile. Except for some very  
2327special circumstances (e.g. offline normal or read-only  
2328tablespaces), a datafile cannot be recovered or accessed if its  
2329resetlogs data does not match that of the database info record of the  
2330controlfile. This ensures that changes discarded by resetlogs do not  
2331get back into the database. It also renders previous backups  
2332unusable for future recovery operations, making it prudent to take a  
2333database backup immediately after a resetlogs. 
2334 
23358.3  Effect of Resetlogs on Threads 
2336 
2337Each thread's controlfile record is updated to clear the thread-open  
2338flag and to set the thread-checkpoint SCN to the resetlogs SCN.  
2339Thus, the thread appears to have been closed at the resetlogs SCN.  
2340The set of enabled threads from the enabled thread bitvec of the  
2341database info controlfile record is used as is. It does not matter  
2342which threads were enabled at the end of recovery, since none of  
2343the old redo can ever be applied to the database again. The log  
2344sequence numbers in all threads are also reset to one. One of the  
2345enabled threads is picked as the database checkpoint. 
2346 
23478.4  Effect of Resetlogs on Redo Logs 
2348 
2349The redo is thrown away by zeroing all the online logs. Note that  
2350this means that redo in the online logs would be lost forever - and  
2351there would be no way to undo the resetlogs in an emergency - if  
2352the online logs were not backed up prior to executing resetlogs.  
2353Note that ensuring the ability to undo an erroneous resetlogs is the  
2354only valid rationale for making backups of online logs. Undoing an  
2355erroneous resetlogs requires re-running the entire recovery  
2356operation from the beginning, after restoring backups of all  
2357datafiles, controlfile, and online logs. 
2358 
2359One log is picked to be the current log for every enabled thread.  
2360That log header is written as log sequence number one. Note that  
2361the set of logs and their thread association is picked up from the  
2362controlfile (i.e. using the thread number and log list fields of the  
2363logfile records). If it is a backup controlfile, this may be different  
2364from what was current the last time the database was open. 
2365 
23668.5  Effect of Resetlogs on Online Datafiles 
2367 
2368The headers of all the online datafiles are updated to be  
2369checkpointed at the new database checkpoint. The new resetlogs  
2370data is also written to the header. 
2371 
23728.6  Effect of Resetlogs on Offline Datafiles 
2373 
2374The controlfile record for an offline file is set to indicate the file  
2375needs media recovery. However that will not be possible because it  
2376would be necessary to apply redo from logs with the wrong  
2377resetlogs data. This means that the tablespace containing the file  
2378will have to be dropped. There is one important exception to this  
2379rule. When a tablespace is taken offline normal or set read-only, the  
2380checkpoint SCN written to the headers of the tablespace's  
2381constituent datafiles is saved in the data dictionary TS$ table as the  
2382tablespace-clean-stop SCN (see 2.17). No recovery is ever needed  
2383to bring a tablespace and its files online if the files are not fuzzy  
2384and are checkpointed at exactly the tablespace-clean-stop SCN.  
2385Even the resetlogs data in the offline file header is ignored in this  
2386case. Thus a tablespace that is offline normal is unaffected by any  
2387resetlogs that leaves the database at a time when the tablespace is  
2388offline. 
2389 
23908.7  Checking Dictionary vs. Controlfile on Resetlogs Open 
2391 
2392After the rollback phase of RESETLOGS open, the datafiles listed  
2393in the data dictionary FILE$ table are compared with the datafiles  
2394listed in the controlfile. This is also done on the first open after a  
2395CREATE CONTROLFILE. There is the possibility that incomplete  
2396recovery ended at a time when the files in the database were  
2397different from those in the controlfile used for the recovery. Using a  
2398backup controlfile or creating one can have the same problem.  
2399Checking the dictionary does not do any harm, so it could be done  
2400on every database open; however there is no point in wasting the  
2401time under normal circumstances. 
2402 
2403   The entry in FILE$ is compared with the entry in the controlfile  
2404for every file number. Since FILE$ reflects the space allocation  
2405information in the database, it is correct, and the controlfile might  
2406be wrong. If the file does not exist in FILE$ but the controlfile  
2407record says the file exists, then the file is simply dropped from the  
2408controlfile. 
2409 
2410If a file exists in FILE$ but not in the controlfile, a placeholder  
2411entry is created in the control file under the name MISSINGnnnn  
2412(where nnnn is the file number in decimal). MISSINGnnnn is  
2413flagged in the control file as being offline and needing media  
2414recovery. The actual file corresponding (with respect to the file  
2415header contents as opposed to the file name) to MISSINGnnnn can  
2416be made accessible by renaming MISSINGnnnn to point to it. 
2417 
2418In the RESETLOGS open case however, rename can succeed in  
2419making the file usable only in case the file was read-only or offline  
2420normal. If, on the other hand, MISSINGnnnn corresponds to a file  
2421that was not read-only or offline normal, then the rename operation  
2422cannot be used to make it accessible, since bringing it online would  
2423require media recovery with redo from before the resetlogs. In this  
2424case, the tablespace containing the datafile must be dropped. 
2425 
2426When the dictionary check is due to open after CREATE  
2427CONTROLFILE...NORESETLOGS rather than to open resetlogs,  
2428media recovery may be used to make the file current. 
2429 
2430Another option is to repeat the entire operation that lead up to the  
2431dictionary check with a controlfile that lists the same datafiles as  
2432the data dictionary. For incomplete recovery, this would involve  
2433restoring all backups and repeating the recovery. 
2434 
2435 
2436 
2437 
2438 
2439 
2440 
2441 
2442 
24439  Recovery-Related V$ Fixed-Views 
2444 
2445The V$ fixed-views contain columns that extract information from  
2446data structures dynamically maintained in memory by the kernel.  
2447These "views" make this information accessible to the DBA under  
2448SYS. The following is a summary of recovery-related information  
2449that is viewable via V$ views: 
2450 
24519.1  V$LOG 
2452 
2453Contains log group information from the controlfile: 
2454 
2455GROUP# 
2456 
2457THREAD# 
2458 
2459SEQUENCE# 
2460 
2461SIZE_IN_BYTES 
2462 
2463MEMBERS_IN_GROUP 
2464 
2465ARCHIVED_FLAG 
2466 
2467STATUS_OF_ GROUP (unused, current, active, inactive) 
2468 
2469LOW_SCN 
2470 
2471LOW_SCN_TIME 
2472 
24739.2  V$LOGFILE 
2474 
2475Contains log file (i.e. group member) information from the  
2476controlfile: 
2477 
2478GROUP# 
2479 
2480STATUS_OF_MEMBER (invalid, stale, deleted) 
2481 
2482NAME_OF_MEMBER 
2483 
24849.3  V$LOG_HISTORY 
2485 
2486Contains log history information from the controlfile: 
2487 
2488THREAD# 
2489 
2490SEQUENCE# 
2491 
2492LOW_SCN 
2493 
2494LOW_SCN_TIME 
2495 
2496NEXT_SCN 
2497 
24989.4  V$RECOVERY_LOG 
2499 
2500Contains information (from the controlfile log history) about  
2501archived logs needed to complete media recovery.: 
2502 
2503THREAD# 
2504 
2505SEQUENCE# 
2506 
2507LOW_SCN_TIME 
2508 
2509ARCHIVED_NAME 
2510 
25119.5  V$RECOVER_FILE 
2512 
2513Contains information on the status of files needing media recovery: 
2514 
2515FILE# 
2516 
2517ONLINE_FLAG 
2518 
2519REASON_MEDIA_RECOVERY_NEEDED 
2520 
2521RECOVERY_START_SCN 
2522 
2523RECOVERY_START_SCN_TIME 
2524 
25259.6  V$BACKUP 
2526 
2527Contains status information relative to datafiles in hot backup: 
2528 
2529FILE# 
2530 
2531FILE_STATUS (no-backup-active, backup-active, offline-normal,  
2532error) 
2533 
2534BEGIN_BACKUP_SCN 
2535 
2536BEGIN_BACKUP_TIME 
2537 
2538 
2539 
2540 
2541 
2542 
2543 
2544 
2545 
254610  Miscellaneous Recovery Features 
2547 
254810.1  Parallel Recovery (v7.1) 
2549 
2550The goal of the parallel recovery feature is to use compute and I/O  
2551parallelism to reduce the elapsed time required to perform crash  
2552recovery, single-instance recovery, or media recovery. Parallel  
2553recovery is most effective at reducing recovery time when several  
2554datafiles on several disks are being recovered concurrently. 
2555 
255610.1.1  Parallel Recovery Architecture 
2557 
2558Parallel recovery partitions recovery processing into two  
2559operations: 
2560 
25611.	Reading the redo log. 
2562 
25632.	Applying the change vectors. 
2564 
2565Operation #1 does not easily lend itself to parallelization. The redo  
2566log(s) must be read in sequentially, and merged in the case of  
2567media recover. Thus, this task is assigned to one process: the  
2568redo-reading-process. 
2569 
2570Operation #2, on the other hand, easily lends itself to  
2571parallelization. Thus, the task of change vector application is  
2572delegated to some number of redo-application-slave-processes.  
2573The redo-reading-process sends change vectors to the redo- 
2574application-slave-processes using the same IPC (inter-process- 
2575communication) mechanism used by parallel query. The change  
2576vectors are distributed based on the hash function that takes the  
2577block address as argument (i.e. DBA modulo # redo-application- 
2578slave-processes). Thus, each redo-application-slave-process  
2579handles only change vectors for blocks whose DBAs hash to its  
2580"bucket" number. The redo-application-slave-processes are  
2581responsible for reading the datablocks into cache, checking  
2582whether or not the change vectors need to be applied, and applying  
2583the change vectors if needed. 
2584 
2585This architecture achieves parallelism in log read I/O, datablock  
2586read I/O, and change vector processing. It allows overlap of log  
2587read I/Os with datablock read I/Os. Moreover, it allows overlap of  
2588datablock read I/Os for different hash "buckets." Recovery elapsed  
2589time is reduced as long as the benefits of compute and I/O  
2590parallelism outweigh the costs of process management and inter- 
2591process-communication. 
2592 
259310.1.2  Parallel Recovery System Initialization Parameters 
2594 
2595PARALLEL_RECOVERY_MAX_THREADS 
2596 
2597PARALLEL_RECOVERY_MIN_THREADS 
2598These initialization parameters control the number of redo- 
2599application-slave-processes used during crash recovery or  
2600media recovery of all datafiles. 
2601 
2602PARALLEL_INSTANCE_RECOVERY_THREADS 
2603This initialization parameter controls the number of redo-appli- 
2604cation-slave-processes used during instance recovery. 
2605 
260610.1.3  Media Recovery Command Syntax Changes 
2607 
2608RECOVER DATABASE has a new optional parameter for specify- 
2609ing the number of redo-application-slave-processes. If specified,  
2610it overrides PARALLEL_RECOVERY_MAX_THREADS. 
2611 
2612RECOVER TABLESPACE has a new optional parameter for spec- 
2613ifying the number of redo-application-slave-processes. If speci- 
2614fied, it overrides PARALLEL_RECOVERY_MIN_THREADS. 
2615 
2616RECOVER DATAFILE has a new optional parameter for specify- 
2617ing the number of redo-application-slave-processes. If specified,  
2618it overrides PARALLEL_RECOVERY_MIN_THREADS. 
2619 
262010.2  Redo Log Checksums (v7.2) 
2621 
2622The log checksum feature allows a potential corruption in an online  
2623redo log to be detected when the log is read for archiving. The goal  
2624is to prevent the corruption from being propagated, undetected, to  
2625the archive log copy. This feature is intended to be used in  
2626conjunction with a new command, CLEAR LOGFILE, that allows  
2627a corrupted online redo log to be discarded without having to  
2628archive it. 
2629 
2630A new initialization parameter, LOG_BLOCK_CHECKSUM,  
2631controls activation of log checksums. If it is set, a log block  
2632checksum is computed and placed in the header of each log block  
2633as it is written out of the redo log buffer. If present, checksums are  
2634validated whenever log blocks are read for archiving or recovery. If  
2635a checksum is detected as invalid, an attempt is made to read  
2636another member of the log group (if any). If an irrecoverable  
2637checksum error is detected - i.e. the checksum is invalid in all  
2638members - then the log read operation fails. 
2639 
2640Note that a rudimentary mechanism for detecting log block header  
2641corruption was added, along with log group support, in v7.1. The  
2642log checksum feature extends corruption detection to the whole  
2643block. 
2644 
2645If an irrecoverable checksum error prevents a log from being read  
2646for archiving, then the log cannot be reused. Eventually log switch  
2647- and redo generation - will stall. If no action is taken, the  
2648database will hang. The CLEAR LOGFILE command provides a  
2649way to obviate the requirement that the log be archived before it  
2650can be reused. 
2651 
265210.3  Clear Logfile (v7.2) 
2653 
2654If all members of an online redo log group are "lost" or "corrupted"  
2655(e.g. due to checksum error, media error, etc.), redo generation may  
2656proceed normally until it becomes necessary to reuse the logfile.  
2657Once the thread checkpoints of all threads are beyond the log, it is a  
2658potential candidate for reuse. Possible scenarios preventing reuse  
2659are the following: 
2660 
26611.	The log cannot be archived due to a checksum error; it cannot  
2662be reused because it needs archiving. 
2663 
26642.	A log switch attempt fails because the log is inaccessible (e.g.  
2665due to a media error). The log may or may not have been  
2666archived. 
2667 
2668The ALTER DATABASE CLEAR LOGFILE command is  
2669provided as an aid to recovering from such scenarios involving an  
2670inactive online redo log group (i.e. one that is not needed for crash  
2671recovery). CLEAR LOGFILE allows an inactive online logfile to  
2672be "cleared": i.e. discarded and reinitialized, in a manner analogous  
2673to DROP LOGFILE followed by ADD LOGFILE. In many cases,  
2674use of this command obviates the need for database shutdown or  
2675resetlogs. 
2676 
2677Note: CLEAR LOGFILE cannot be used to clear a log needed for  
2678crash recovery (i.e. a "current" or "active" log of an open thread).  
2679Instead, if such a log becomes lost or corrupted, shutdown abort  
2680followed by incomplete recovery and open resetlogs will be  
2681necessary. 
2682 
2683Use of the UNARCHIVED option allows the log clear operation to  
2684proceed even if the log needs archiving: an operation that would be  
2685disallowed by DROP LOGFILE. Furthermore, CLEAR LOGFILE  
2686allows the log clear operation to proceed in the following cases: 
2687 
26887	There are only two logfile groups in the thread. 
2689 
26907	All log group members have been lost through media failure. 
2691 
26927	The logfile being cleared is the current log of a closed thread. 
2693 
2694All of these operations would be disallowed in the case of DROP  
2695LOGFILE. 
2696 
2697Clearing an unarchived log makes unusable any existing backup  
2698whose recovery would require applying redo from the cleared log.  
2699Therefore, it is recommended that the database be immediately  
2700backed up following use of CLEAR LOGFILE with the  
2701UNARCHIVED option. Furthermore, the UNRECOVERABLE  
2702DATAFILE option must be used if there is a datafile that is offline,  
2703and whose recovery prior to onlining requires application of redo  
2704from the cleared logfile. Following use of CLEAR LOGFILE with  
2705the UNRECOVERABLE DATAFILE option, the offline datafile,  
2706together with its entire tablespace, will have to be dropped from the  
2707database. This is due to the fact that redo necessary to bring it  
2708online has been cleared, and there is no other copy of it. 
2709 
2710The foreground process executing CLEAR LOGFILE processes  
2711the command in several steps: 
2712 
27137	It checks that the logfile is not needed for crash recovery and  
2714is clearable. 
2715 
27167	It sets the "being cleared" and "archiving not needed" flags in  
2717the logfile controlfile record. While the "being cleared" flag is  
2718set, the logfile is ineligible for reuse by log switch. 
2719 
27207	It recreates a new logfile, and performs multiple writes to clear  
2721it to zeroes (a lengthy process). 
2722 
27237	It resets the "being cleared" flag. 
2724 
2725If the foreground process executing CLEAR LOGFILE dies while  
2726execution is in process, the log will not be usable as the current log.  
2727Redo generation may stall and the database may hang, much as  
2728would happen if log switch had to wait for checkpoint completion,  
2729or for log archive completion. Should the process executing  
2730CLEAR LOGFILE die, the operation should be completed by  
2731reissuing the same command. Another option would be to drop the  
2732partially-cleared log. CLEAR LOGFILE could also fail due to an I/ 
2733O error encountered while writing zeros to a log group member. An  
2734option for recovering would be to drop that member and add  
2735another to replace it.