NLHDH K54CA
Fundamentals of Operating
Systems
Nguyen Tri Thanh
• Why do we need Operating Systems?
• Why is an OS?
• What does an OS do?
• Classification of Operating Systems
• Disambiguation of some conceptsReference
• Chapter 1,2 of Operating System ConceptsWhy do we need OSes?
• What does a
computer can do?
• What language does
it “speak”?
• What does a
computer have?
• Can we “use” the
computer and its
resources directly?What is an Operating System?
• A program that acts as an intermediary
between a user of a computer and the
computer hardware.
• Operating system goals:
– Execute user programs and make solving user
problems easier.
– Make the computer system convenient to use.
• Use the computer hardware (resources) in
an efficient manner.Position of an Operating System
ApplicationsTypical Operating SystemsList of operating systems
• http://en.wikipedia.org/wiki/List_of_operatin
g_systems
• http://en.wikipedia.org/wiki/List_of_Linux_di
stributions
• http://en.wikipedia.org/wiki/Mobile_operatin
g_systemMain tasks of an OS
• Process Management
• Memory Management (RAM)
• Storage Management
– File/directory
– Disks
• Protection and Security
• Networking
• Main purposes are usually implemented in kernel
(core)OS classification
• Batch processing
– very early OSes
• Uniprogramming
– less powerful (weak) machines
• Multiprogramming (powerful machines)
– Timesharing/multitasking
– Multi8user system
• Parallel processing (PC8cluster) (highly
computational application)
– search engines, e.g., google, yahoo,..OS classification (cont’d)
• Embedded (embedded into devices to do a
specific task)
– the task has limited (few) functions
– usually is made as a firmware (NOT as software)
– Calculator, game players, digital camera, mp3 player, …
• Special8purpose systems
– designed to perform a specific task
• Factory’s control system
• GPS receiverOS classification (cont’d)
• Real8time systems
– Do a task with a time constraint
• produce output when the input comes before it is too
late
• NASA’s control system
• missile defense systems
• earth8quake detection systems
• robot control systems
• Boeing automatic flying systems
• Boundaries among OS types are not clear
– one OS can have characteristics of many typesSpecial purpose – Embedded
SystemsReal-time SystemsOS classification (cont’d)
• you’re a perfect real8time system when
– drive a car
– play a game
– play sports
– …Why multiprogramming?
• Multiprogramming needed for efficiency
– Single user cannot keep CPU and I/O devices busy at all
– Multiprogramming organizes jobs (code and data) so CPU
always has one to execute
– A subset of total jobs in system is kept in memory
• Timesharing (multitasking) is logical extension in which
CPU switches jobs so frequently that users can
interact with each job while it is running, creating
interactive computing
– Each user has at least one program executing in memory
process
– If several jobs ready to run at the same time CPU
schedulingTimesharing system
Computer Workstation
Workstation
Users
TimeOperating System StructureOS structure
Layered approach
– OS is divided into
several levels
– A higher level can only
access/use its direct
lower level
• level 3 can access level
2
• level 4 cannot access
level 2
– Why? Motivation?Layered approach example
• UNIX, LINUX
• How many levels
are there?
A. 1
B. 2
C. 3
D. 4Microkernel
• Keep minimum/essential functions in kernel
• Others are built as libraries/applications
• Examples
– Mach, Tru64 UNIX, QNX
– http://en.wikipedia.org/wiki/Mach_(kernel)Module approach
• Considered as most effective approach
– Inherits OOP paradigm
– Sun Solaris is an example
– http://en.wikipedia.org/wiki/Solaris_(operating_
system)Virtual MachineVirtual machine
• A concept refers to the abstraction of
computer resourcesVirtual machine
No VM
Has VMVirtual machine - VMWareVirtual machine
VirtualBox
QEMUVirtual machine - OpenVZEnd of Chapter Question?Homework
Gets used to with C++ programming
language in Linux environment
– Provide me the account list of the class
– Use the provided account to login the Linux
server
– Practice with programming with C++, write,
compile, run
– Reference: Chapter 182, LHp trình C/C++ trên
Linux, NguyKn Trí Thành, nhà xuNt bOn Giáo
dQc, 2010
Fundamentals of Operating
Systems
Nguyen Tri Thanh
[email protected] and Process
Scheduling
2/25/2011 2Objectives
• Present what a process is
• Present 4 process scheduling approaches
• Scheduling in multi-queue systems
• Implement the scheduling algorithms
2/25/2011 3Reference
• Chapter 3, 5 of Operating System Concepts
2/25/2011 4Process statistic
2/25/2011 5Process classification
Processes are classified into 2 categories
A. System processes
Created by system account
Run essential services
B. User processes
Created by user accounts
Usually are application processes (Word, Excel, YM,…)
2/25/2011 6Process structure
• A process at least consists of
1. A program counter
(Instruction Pointer)
2. Text (code)
3. Stack + Heap
4. Data section
• The process structure is
different among OSes
2/25/2011 7Process states
• A process has many states during its life
• The number of states is different among
OSes
2/25/2011 8Process state switch
• New
– a new process is initiated
• Running
– Process instructions are being run
• Waiting
– Process is waiting for a certain resource or event
• Ready
– Process just waits for its turn to run
• Terminate
– The process completes
2/25/2011 9Process control block (PCB)
Information associated with each process
– Process state
– Program counter
– CPU registers
– CPU scheduling information
– Memory-management information
– Accounting information
– I/O status information
• PCB is different among OSes
2/25/2011 10CPU And I/O Bursts
• Burst – a time span (duration)
• Two burst types
– IO burst
– CPU burstProcess classification
• CPU-bound process
– use CPU a lot (for computation)
• IO-bound process
– does IO a lot
• These types of processes affect schedulers
2/25/2011 12Process context switch
• Context switch
– CPU stops current process and runs another one
• Progress steps
– save the state of the current process
– put it into the READY queue
– pick the target process
– restore the state of the target process
– run the target process
2/25/2011 13Process scheduling
2/25/2011 14Problem
• You have 5 exams within a week
– How do you manage to study?
• A shop saler has many customers waiting
– How does he/she do?
• A CPU has several processes
– How does it run them
2/25/2011 15Queue
• When there are several people waiting at
the counter (in a supermarket)
– What do they do?
Queue is the input/output
of a process scheduler
2/25/2011 16Different scheduler
• Long-term scheduler (or job scheduler)
– selects which processes should be brought into the ready
queue
• Short-term scheduler (or CPU scheduler)
– selects which process should be executed next
• Medium-term scheduler
– selects which process to temporarily swap out (of the
MEM)
2/25/2011 17Dispatcher
• Dispatcher module gives control of the CPU
to the process selected by the short-term
scheduler; this involves:
– switching context
– switching to user mode
– jumping to the proper location in the user
program to restart that program
• Dispatch latency – time it takes for the
dispatcher to stop one process and start
another running
2/25/2011 18CPU scheduling
2/25/2011 19CPU scheduler type
• Non pre-emptive
– running process has privilege to use CPU until it
terminates or changes into WAITING state
– Ex: Apple Macintosh, Windows 3.1
• Pre-emptive
– running process may be forced to release CPU
– Ex: Current Windows versions, Linux, Unix
• Which type is more effective?
2/25/2011 20First comes first serves (FCFS)
• Use FIFO queue
• Process at the head of the queue is run first
2/25/2011 21First comes first serves (FCFS)
2/25/2011 22
Process Burst Time
P1 24
P2 3
P3 3
• Suppose that the processes arrive in
the order: P1 , P2 , P3
The Gantt Chart for the schedule is:
P1 P2 P3
24 27 30 0Shortest Job First (SJF)
• Also called Shortest Job Next (SJN)
• Shortest job in the queue is selected to be
run
• There are two flavors
– Non-preemptive
– Preemptive (Shortest Remaining Time First –
SRTF)
2/25/2011 23
Let me
go firstProcess Arrival Time Burst Time
P1 0 7
P2 2 4
P3 4 1
P4 5 4
Example of Non-Preemptive SJF
P1 P3 P2
7 3 16 0
P4
8 12
2/25/2011 24Example of Preemptive SJF
Process Arrival Time Burst Time
P1 0 7
P2 2 4
P3 4 1
P4 5 4
P1 P3 P2
4 2 11 0
P4
5 7
P2 P1
16
2/25/2011 25Next CPU burst estimation
• What if we don’t know the length of burst
time?
• Can only estimate
• Can be done by using the length of previous
CPU bursts, using exponential averaging
( ) . 1 : Define 4.
1 0 , 3.
burst CPU next for the value predicted 2.
burst CPU of length actual 1.
1
1
th
τ α α τ
α α
τ
− + =
≤ ≤
=
=
=
+
2/25/2011 26Examples
• α =0
– τn+1 = τn
– Recent history does not count
• α =1
– τn+1 = α tn
– Only the actual last CPU burst counts
• If we expand the formula, we get:
τn+1 = α tn+(1 - α)α tn -1 + …
+(1 - α ) j
α tn -j
+ …
+(1 - α ) n +1
τ0
• Since both α and (1 - α) are less than or equal to 1, each
successive term has less weight than its predecessorPriority Scheduling
• A priority number (integer) is associated with each
process
• The CPU is allocated to the process with the
highest priority (smallest integer ≡ highest priority)
– Preemptive
– Non-preemptive
• SJF is a priority scheduling where priority is the
predicted next CPU burst time
• Problem ≡ Starvation – low priority processes may
never execute
• Solution ≡ Aging – as time progresses increase the
priority of the processRound Robin (RR)
• Each process gets a small unit of CPU time
– time quantum (usually 10-100 milliseconds)
– After time quantum, the process is preempted
and added to the end of the READY queue.
• Performance
– q large ⇒ FIFO
– q small ⇒ q must be large with respect to
context switch, otherwise overhead is too highRound Robin
Computer Workstation
Workstation
Users
TimeExample of RR
Process Burst Time
P1 53
P2 17
P3 68
P4 24
• Quantum is 20
P1 P2 P3 P4 P1 P3 P4 P1 P3 P3
0 20 37 57 77 97 117 121 134 154 162Multilevel Queue
• Ready queue is partitioned into separate queues
– foreground (interactive)
– background (batch)
• Each queue has its own scheduling algorithm
– foreground – RR
– background – FCFSMultilevel Queue (cont’d)
• Scheduling must be done between the queues
– Fixed priority scheduling
• (i.e., serve all from foreground then from
background)
• Possibility of starvation
– Time slice
• each queue gets a certain amount of CPU time which
it can schedule amongst its processes
– i.e., 80% to foreground in RR
– 20% to background in FCFS Multilevel Queue SchedulingMultilevel Feedback Queue
• A process can move between the various queues
– aging can be implemented this way
• Multilevel-feedback-queue scheduler defined by
the following parameters
– number of queues
– scheduling algorithms for each queue
– method used to determine when to upgrade a process
– method used to determine when to demote a process
– method used to determine which queue a process will
enter when that process needs serviceExample
• Three queues:
– Q0 – RR with time quantum 8 milliseconds
– Q1 – RR time quantum 16 milliseconds
– Q2 – FCFS
• Scheduling
– A new job enters queue Q0 which is served FCFS. When
it gains CPU, job receives 8 milliseconds. If it does not
finish in 8 milliseconds, job is moved to queue Q1.
– At Q1 job is again served FCFS and receives 16 additional
milliseconds. If it still does not complete, it is preempted
and moved to queue Q2.Multiple-Processor Scheduling
• CPU scheduling more complex when
multiple CPUs are available
– Homogeneous processors within a
multiprocessor
– Load sharing
• Asymmetric multiprocessing
– only one processor accesses the system data
structures, alleviating the need for data
sharingScheduling criteria
• CPU utilization
– keep the CPU as busy as possible
• Throughput
– # of complete processes per time unit
• Turnaround time
– amount of time to execute a particular process
• Waiting time
– amount of time waiting in the ready queue
• Response time
– amount of time it takes from when a request was
submitted until the first response is produced, not
output (for time-sharing environment)Process Arrival Time Burst Time
P1 0 7
P2 2 4
P3 4 1
P4 5 4
Example of Non-Preemptive SJF
P1 P3 P2
7 3 16 0
P4
8 12
2/25/2011 39End of Chapter
2/25/2011 40Question?
2/25/2011 41
1
Fundamentals of
Operating Systems
Nguyen Tri Thanh
Inter-Process
Communication
and Synchronization3
Objectives
Present what IPC is
Write a simple IPC program in Linux
Present why we need synchronization
Methods of synchronization
Write a simple synchronization program4
Reference
Chapter 2, 6 of Operating System Concepts5
Inter-process Communication
(IPC)6
Introduction
In some situations, processes need to
communicate with each other
To send/receive data (web browser – web server)
To control the other process
To synchronize with each other
This can be done by IPC
IPC is implemented differently among OSes
Linux: message queue, semaphore, shared
segment, …7
Introduction (cont’d)
IPC can be divided into 2 categories
IPC among processes within the same system
Linux: pipe, named pipe, file mapping, …
IPC among processes not within the same system
Remote Procedure Call (RPC), Socket, Remote
Method Invocation (RMI)8
Process Synchronization9
Problem
Write process P:
while (true) {
while (counter==SIZE) ;
buf[in] = nextItem;
in = (in+1) % SIZE;
counter++;
}
buf: Buffer
SIZE: size of buffer
counter: global variable
Read process Q:
while (true) {
while (counter==0) ;
nextItem = buf[out];
out = (out+1) % SIZE;
counter--;
}
Bounded-circular buffer10
Question
A:
counter++;
How the instruction counter++ is executed
by machine?
B:
register1 = counter;
register1 = register1 + 1;
C:
register1 = counter;
register1 = register1 + 1;
counter = register1;
D:
counter = counter+1;11
Problem (cont’d)
counter++
register1 = counter;
register1 = register1 + 1;
counter = register1;
counter--
register2 = counter;
register2 = register2 - 1;
counter = register2;
Low level implementation
P and Q can get different result of counter. Why?12
Problem (cont’d)
Suppose counter=5
register1 = counter; // register1=5
register1 = register1 + 1; // register1=6
register2 = counter; // register2=5
register2 = register2 - 1; // register2=4
counter = register1; // counter=6 !!
counter = register2; // counter=4 !!13
Problem (cont’d)
Cause: P and Q simultaneously operate on
global variable counter
Solution: Let them operate separately
register1 = counter; // register1=5
register1 = register1 + 1; // register1=6
counter = register1; // counter=6
register2 = counter; // register2=6
register2 = register2 - 1; // register2=5
counter = register2; // counter=514
Another Problem
Write process P:
while (true) {
update access set
counter=counter+1;
}
access: a table in
database
counter: a field in the
table
Write process Q:
while (true) {
update access set
counter=counter+1;
}15
Race condition
Happen when many processes simultaneously
work with shared data
To avoid, processes need synchronization16
Critical section
Suppose n processes P0, P1, ..., Pn share a
global variable v
v can also be other resource, e.g, file
Each process has a segment of code CSi
which operates on v
CSi
is called critical section
Because it is critical to prone errors
Need to make the critical section safe17
Critical section18
Question
Process P:
while (true) {
waitForNewRequest();
if(found){
hit+=2;
val=hit;
}
Respond();
}
hit: a global variable
Which is the critical
section of the code
when multiple
processes of P run?Solution of Critical section
Critical section must satisfy 3 conditions
1. Mutual Exclusion
o If a process is in its critical section, then no other
processes can be in their critical sections
2. Progress
o If no process is in its critical section
o other processes waiting to enter their critical section,
o then the selection of the process to enter the critical
section cannot be postponed indefinitely
3. Bounded Waiting
o No process has to wait indefinitely to enter its critical
section20
Critical section (cont’d)
Each process has to
request to run (enter section) its critical section
CSi
and announce its completion (exit section) of its
CSi
.21
Critical section (cont’d)
Common structure
do {
Enter_Section (CSi
Run CSi
Exit_Section(CSi
Run (REMAINi
); // Remainder section
} while (TRUE);22
Critical section (cont’d)
Short description
do {
ENTRYi
; // Entry section
Run CSi
; // Critical section
EXITi
; // Exit section
REMAINi
; // Remainder section
} while (TRUE);Peterson’s Solution
Solution for two process
The two processes share two variables:
int turn; // with the value of 0 or 1
Boolean flag[2]
The variable turn indicates whose turn it is to
enter the critical section
If turn==i then Pi
is in turn to run its CSi
The flag array is used to indicate if a process is
ready to enter the critical section. flag[i] = true
implies that process Pi
is ready!24
Peterson’s solution (cont’d)
Program Pi
:
do {
flag[i] = TRUE;
turn = j;
while (flag[j] && turn == j) ;
CSi
flag[j] = FALSE;
REMAINi
} while (1);25
Peterson’s solution (cont’d)
The proof of this solution is provided on page
196 of the textbook
Comments
Complicated when then number of processes
increases
Difficult to control26
Semaphore27
Reference infomation
Semaphore is proposed
by Edsger Wybe Dijkstra
(Dutch) for Computer
Science in 1972
Semaphore was firstly
used in his book “The
operating system”
Edsger Wybe Dijkstra
(1930-2002)28
Semaphore
Semaphore is an integer, can be only access
through two atomic operator wait (or P) and
signal (or V).
P: proberen – check (in Dutch)
V: verhogen – increase (in Dutch)
Processes can share a semaphore
Atomic operators guarantee the consistency29
wait and signal operators
wait(S) // or P(S)
{
while (S<=0);
S--;
}
Wait if semaphore
S<=0 else decrease S
by 1
signal(S) // or V(S)
{
S++;
}
Increase S by 130
Using semaphore
Apply for critical section
do {
wait(s); // s is a semaphore initialized by 1
CSi
signal(s);
REMAINi
} while (1);31
Question
Process P:
while (true) {
waitForNewRequest();
if(found){
hit+=2;
val=hit;
}
Respond();
}
hit: a global variable
Use semaphore to make
the code safe?32
Using semaphore (cont’d)
P1 needs to do O1; P2 need to do O2; O2 can
only be done after O1
Solution: use a semaphore synch = 0
P1:
...
O1;
signal(synch);
...
P2:
...
wait(synch);
O2;
...33
Semaphore implementation
In the above semaphore implementation
Use busy waiting (while loop)
Resource wasting
This type of semaphore is called spinlock34
Semaphore implementation
(cont’d)
Remove the busy waiting loop by using block
To restored a blocked process, use wakeup
Semaphore data structure
typedef struct {
int value; // value of semaphore
struct process *L; //waiting process list
} semaphore;35
void wait(semaphore *S)
{
S->value--;
if (S->value<0) {
Add the requested
process P into s->L;
block(P);
}
}
void signal(semaphore *S)
{
S->value++;
if (S->value<=0) {
remove a process P
from s->L;
wakeup(P);
}
}
Semaphore implementation
(cont’d)36
Semaphore implementation
(cont’d)37
Binary semaphore
Semaphore only has the value of 0 or 1
Other semaphore type is counting
semaphore38
Classical synchronization
problems39
Synchronization is everywhereQuestion
What is the disadvantage of the previous
bounded-buffer problem?Bounded-Buffer Problem
N buffers, each can hold one item
Semaphore mutex initialized to the value 1
Semaphore full initialized to the value 0
Semaphore empty initialized to the value N.42
Bounded-buffer problem (cont’d)
Write process P:
do {
wait(empty);
wait(mutex);
Write (item);
signal(mutex);
signal(full);
} while (TRUE);
Read process Q:
do {
wait(full);
wait(mutex);
Read(item);
signal(mutex);
signal(empty);
} while (TRUE);43
Bounded-buffer problem (cont’d)44
Readers-writers problem
A data set is shared among a number of
concurrent processes
Readers – only read
Writers – can both read and write
Problem
allow multiple readers to read at the same time
Only one single writer can access the shared data
at the same time45
Readers-writers problem46
Readers-writers problem (cont’d)
Shared data
Data set
Semaphore wrt initialized by 1, mutex (1)
Used to manage write access
Integer readcount initialized by 0 to count the
number of readers that are reading
Semaphore mutex initialized by 1
Used to manage readcount access47
Readers-writers problem (cont’d)
Process writer Pw:
do {
wait(wrt);
write(data_set);
signal(wrt);
}while (TRUE);
Process reader Pr:
do {
wait(mutex);
readcount++;
if (readcount ==1) wait(wrt);
signal(mutex);
read(data_set);
wait(mutex);
readcount --;
if (readcount ==0) signal(wrt);
signal(mutex);
} while (TRUE);48
Dining-Philosophers Problem
Five philosophers at a
table having 5
chopsticks, 5 bows and
a rice cooker
A philosopher just eats
or thinks
How to make sure
philosophers correctly
use the “shared data” –
the chopsticks49
Dining-philosophers problem (cont’d)
Use semaphore to protect
chopstick access
semaphore chopstick[5];
Solution is provided as in the
next text box
Code of philosopher i:
do {
wait(chopstick[i]);
wait(chopstick[(i+1)%5];
Eat(i);
signal(chopstick[i]);
signal(chopstick[(i+1)%5];
Think(i);
} while (TRUE);50
Limitations of semaphore
Semaphores need correct calls to wait and
signal
Incorrect use of semaphore may lead to
deadlock
Even correct use of semaphores may lead to
deadlock, in some cases 51
Limitations of semaphore (cont’d)
Compare the two code snippets
Snippet 1
...
wait(mutex);
//Critical section
signal(mutex);
...
Snippet 2
...
signal(mutex);
//Critical section
wait(mutex);
...52
Limitations of semaphore (cont’d)
Compare the two code snippets
Snippet 2
...
wait(mutex);
CS2;
signal(mutex);
...
Snippet 1
...
wait(mutex);
CS1;
wait(mutex);
...53
Limitations of semaphore (cont’d)
If programmers forgot to call wait() and/or
signal() in a critical section, there will be
Deadlock
Exclusive condition54
Process P1
...
wait(S);
wait(Q);
...
signal(S);
signal(Q);
Process P2
...
wait(Q);
wait(S);
...
signal(Q);
signal(S);
Limitations of semaphore (cont’d)55
Monitor56
Reference information
Per Brinch Hansen
(Denmark) proposed the
concept and
implemented in 1972
Monitor was firstly used
in Concurrent Pascal
programming language
Per Brinch Hansen
(1938-2007)57
What is monitor?
Monitor means to supervise
It is a type of construct in a high level
programming language for synchronization
purpose
C# programming language
http://msdn.microsoft.com/en-us/library/hf5de04k.aspx
Java programming language
http://www.artima.com/insidejvm/ed2/threadsynch.html
http://journals.ecs.soton.ac.uk/java/tutorial/java/threads/monitors.html
Monitor was studied and developed to
overcome the limitations of semaphores58
Monitor
A monitor usually has
Member variables as shared resources
A set of procedures which operate on the shared
resources
Exclusive lock
Constraints to manage race condition
This description of monitor is like a class59
A sample monitor type
monitor tên_monitor {
//Shared resources
procedure P1(...) { ...
}
procedure P2(...) { ...
}
...
procedure Pn(...) { ...
}
initialization_code (..) { ...
}
}60
Schematic view of a Monitor61
Monitor implementation
Monitor must be implemented so that
only one process can enter the monitor at a time
(mutual exclusive)
programmer do not need to write code for this
Other monitor implementation
have more synchronization mechanism
add condition variable62
Condition type
Declaration
condition x, y;
Use condition variable
there are two operators: wait and signal
x.wait():
process calls x.wait() will have to wait or suspend
x.signal():
process calls x.signal() will wakeup a waiting process
– the one that called x.wait()63
Monitor with condition64
x.signal() characteristics
x.signal() wakeup only one waiting process
If no waiting process, it does nothing
x.signal() is different from that of classical
semaphore
signal in classical semaphore always change the
state (value) of semaphoreSolution to Dining Philosophers
monitor DP
{
enum { THINKING; HUNGRY, EATING) state [5] ;
condition self [5];
void pickup (int i) {
state[i] = HUNGRY;
if (state[i] != EATING) self [i].wait;
}
void putdown (int i) {
state[i] = THINKING;
// test left and right neighbors
test((i + 4) % 5);
test((i + 1) % 5);
}Solution to Dining Philosophers (cont)
void test (int i) {
if ( (state[(i + 4) % 5] != EATING) &&
(state[i] == HUNGRY) &&
(state[(i + 1) % 5] != EATING) ) {
state[i] = EATING ;
self[i].signal () ;
}
}
initialization_code() {
for (int i = 0; i < 5; i++)
state[i] = THINKING;
}
}Solution to Dining Philosophers (cont)
Each philosopher invokes the operations
pickup() and putdown() in the following
sequence
dp.pickup (i)
EAT
dp.putdown (i)Monitor Implementation Using Semaphores
Variables
semaphore mutex; // (initially = 1)
semaphore next; // (initially = 0)
int next-count = 0;
Each procedure F will be replaced by
wait(mutex);
…
body of F;
…
if (next-count > 0)
signal(next)
else
signal(mutex);
Mutual exclusion within a monitor is ensured.Monitor Implementation
For each condition variable x, we have:
semaphore x-sem; // (initially = 0)
int x-count = 0;
The operation x.wait can be implemented as:
x-count++;
if (next-count > 0)
signal(next);
else
signal(mutex);
wait(x-sem);
x-count--;Monitor Implementation
The operation x.signal can be implemented
as:
if (x-count > 0) {
next-count++;
signal(x-sem);
wait(next);
next-count--;
}Linux Synchronization
Linux:
disables interrupts to implement short critical
sections
Linux provides:
semaphores
spin locksPthreads Synchronization
pthreads API is OS-independent
It provides:
mutex locks
condition variables
Non-portable extensions include:
read-write locks
spin locks73
End of Chapter74
Question?
1
Fundamentals of
Operating Systems
Nguyen Tri Thanh
3/16/20112
Deadlock
3/16/20113
Objectives
Introduce what a deadlock is
Introduce methods of handling deadlocks
Implement deadlock handling algorithms
3/16/20114
Reference
Chapter 7 of Operating System Concepts
3/16/20115
Definition of deadlock
A set of blocked processes each
holding a resource and
waiting to acquire a resource held by another
process in the set
There must be a circular wait in the set of
processes
3/16/20116
Deadlock examples
3/16/20117
Deadlock example (cont’d)
Process A:
{
…
Lock file F1;
...
Open file F2;
…
Unlock F1;
}
Process B
{
…
Lock file F2;
...
Open file F1;
…
Unlock F1;
}
3/16/2011System Model
Resource types R1, R2, . . ., Rm
shared variables, memory space, I/O devices,
Each resource type Ri
has Wi
instances.
Each process utilizes a resource as
follows:
request
use
release
3/16/2011 8Deadlock Characterization
Deadlock can arise if four conditions hold
simultaneously
C1: Mutual exclusion
C2: Hold and wait holding one resource, waiting
other resources held by another
C3: No preemption only process has right to
release its holding resources
C4: Circular wait there exists a set {P0, P1, …, P0}
of processes:
P0
is waiting for a resource that is held by P1,
P1 is waiting for a resource that is held by P2, …
Pn is waiting for a resource that is held by P0.
3/16/2011 9Resource-Allocation Graph
V is partitioned into two types
P = {P1, P2, …, Pn}, the set consisting of all the
processes in the system
R = {R1, R2, …, Rm}, the set consisting of all
resource types in the system.
request edge – directed edge Pi
→ Rj
assignment edge – directed edge Rj
→ Pi
A set of vertices V and a set of edges E.
3/16/2011 10Resource-Allocation Graph
(Cont'd)
Process
Resource Type with 4 instances
Pi
requests instance of Rj
Pi
is holding an instance of Rj
Pi
Pi
Rj
Rj
3/16/2011 11Example of a RAG
3/16/2011 12RAG With A Deadlock
3/16/2011 13
When P3 asks for R2
There are two circulars
P1→R1→P2→R2→P3→R3→
P1
P2→R2→P3→R3→P2
Set of P1, P2, P3 is
deadlockGraph With A Cycle But No
Deadlock
3/16/2011 14Basic Facts
If graph contains no cycles ⇒ no deadlock.
If graph contains a cycle ⇒
if only one instance per resource type, then
deadlock.
if several instances per resource type, possibility
of deadlock.
3/16/2011 1516
Deadlock handlingMethods for Handling Deadlocks
Ensure that the system will never enter a
deadlock state
Deadlock prevention, deadlock avoidance
Allow the system to enter a deadlock state and
then recover
Deadlock detection and recovery
Ignore the problem and pretend that deadlocks
never occur in the system
used by most operating systems, including UNIX.
3/16/2011 17Deadlock Prevention
The method prevents at least one of the four
deadlock conditions from occurring
This method is classified as a static method
3/16/2011 18Deadlock Prevention
C1: Mutual Exclusion
In some situations, this condition is required
Not feasible to make this NOT to happen
3/16/2011 19Deadlock Prevention
C2: Hold and Wait
must guarantee that whenever a process requests a
resource, it does not hold any other resources
require process to request and be allocated all its
resources before it begins execution, or
allow process to request resources only when the
process has none.
low resource utilization; starvation possible.
3/16/2011 20Deadlock Prevention (Cont'd)
C3: No Preemption
If a process holding some resources requests
another resource that cannot be immediately
allocated to it,
then all resources currently being held are released
released resources are added to the list of resources
for which the process is waiting
Process will be restarted only when it can regain its
old resources and the new requesting ones
3/16/2011 21Deadlock Prevention (Cont'd)
C4: Circular Wait
impose a total ordering of all resource types and
require that each process requests resources in an
increasing order of enumeration
3/16/2011 2223
Deadlock avoidanceDeadlock Avoidance
This method requires additional information to
decide resource allocation so that deadlock
will not happen
each process has to register the number of each
required resource types as additional information
The deadlock-avoidance algorithm
dynamically examines the resource-allocation
state to ensure that there can never be a
circular-wait condition
3/16/2011 24Deadlock Avoidance
Deadlock avoidance algorithms check the
state of resource-allocation to decide
allocation
Resource-allocation state is defined by the
number of available and allocated resources,
and the maximum demands of the processes
3/16/2011 25Safe State
System is in safe state if a sequence <P1, P2,
…, Pn> of ALL the processes exists
Pi
can be satisfied by currently available
resources + resources held by all the Pj
, with j < i
processes terminate in the above order
3/16/2011 26Basic Facts
If a system is in safe state ⇒ no deadlocks
If a system is in unsafe state ⇒ possibility of
deadlock
Avoidance ⇒ ensure that a system will never
enter an unsafe state
3/16/2011 27Safe, Unsafe , Deadlock State
3/16/2011 2829
Example
A system has 12 tapes, and 3 processes P0,
P1, P2 with corresponding requests:
P0 requests at most 10 tapes
P1 requests at most 4 tapes
P2 requests at most 9 tapes
At t0, P0 has 5 tapes, P1 and P2 each has 2
tapes
3 tapes available
3/16/201130
Example (cont’d)
Max request Current request
P0 10 5
P1 4 2
P2 9 2
At t0, the system is in safe state
The sequence <P1, P0, P2> is a safe sequence
At t1, P2 request 1 more tape
the system is in unsafe state
it is wrong to allocate a tape for P2
3/16/2011Avoidance algorithms
Single instance of a resource type
Use a resource-allocation graph
Multiple instances of a resource type
Use the banker’s algorithm
3/16/2011 31RAG Algorithm convention
Claim edge Pi
→ Rj
process Pj
may request resource Rj
presented as a dash line
Claim edge converts to request edge when a
process requests a resource
Request edge becomes an assignment edge
when the resource is assigned to it
When a resource is released by a process,
assignment edge reconverts to a claim edge
Resources must be claimed a priori in the system.
3/16/2011 32Resource-Allocation Graph
3/16/2011 33Unsafe State In
Resource-Allocation Graph
3/16/2011 34Resource-Allocation Graph
Algorithm
Suppose that process Pi
requests Rj
The request can be granted only if
converting the request edge to an assignment
edge does not result in a cycle in the RAG
3/16/2011 35Banker’s Algorithm
Multiple instances
Each process must a priori claim maximum use
When a process requests a resource it may
have to wait
When a process gets all its resources it must
return them in a finite amount of time
3/16/2011 36Data Structures for the Banker’s Algorithm
Available: Vector of length m
Available [j] = k: there are k instances of resource
type Rj
available
Max: n x m matrix
Max [i,j] = k: Pi
may request at most k instances of Rj
Allocation: n x m matrix
Allocation[i,j] = k: Pi
is allocated k instances of Rj
Let n = number of processes, and m = number of resources types
3/16/2011 37Data Structures for the Banker’s Algorithm
Need: n x m matrix
Need[i,j] = k: Pi
may need k more instances of Rj
Need [i,j] = Max[i,j] – Allocation [i,j]
Let Work and Finish be vectors of length m and
n, respectively
Let n = number of processes, and m = number of resources types
3/16/2011 38Safety/Banker Algorithm
1. Initialize
Work = Available
Finish [i] = false for i = 0, 1, …, n- 1
2.Find and i such that satisfies both
(a) Finish [i] = false
(b) Needi
≤ Work
If no such i exists, go to step 4
3.Work = Work + Allocationi
Finish[i] = true
go to step 2
4.If Finish [i] == true for all i, then the system is
in a safe state 3/16/2011 39Question
Which of the following is correct about the
Work algorithm?
A. It stores the available resources when each
process finishes
B. It is a redundant variable
C. It stores the state of the system
D. It stores possible resource for each process
3/16/2011 40Question
Which of the following is correct about
banker’s algorithm?
it detects the unsafe state of the system
it detects the deadlock state of the system
it detects the safe sequence of the system
it detects the available resources
3/16/2011 41Resource-Request Algorithm
Resource-request algorithm
another algorithm to avoid unsafe state
Additional data structure
Request = request vector for process Pi
Requesti
[j] = k: process Pi
wants k instances of Rj
3/16/2011 42Example of Banker’s Algorithm
5 processes: P0
- P4;
3 resource types
A (10 instances), B (5 instances), and C (7
instances)
At time T0:
Allocation Max Available
A B C A B C A B C
P0 0 1 0 7 5 3 3 3 2
P1 2 0 0 3 2 2
P2 3 0 2 9 0 2
P3 2 1 1 2 2 2
P4 0 0 2 4 3 3
3/16/2011 43Example (Cont'd)
Matrix Need = Max – Allocation
Need
A B C
P0 7 4 3
P1 1 2 2
P2 6 0 0
P3 0 1 1
P4 4 3 1
The system is in a safe state
sequence < P1, P3, P4, P2, P0> satisfies safety
criteria.
3/16/2011 44Resource-Request Algorithm
1. If Requesti
≤ Needi
go to step 2 Otherwise, raise
error condition
since process has exceeded its maximum claim
2. If Requesti
≤ Available, go to step 3 Otherwise Pi
must wait
since resources are not available
3. Pretend to allocate requested resources to Pi
by
modifying the state as follows:
Available = Available – Request;
Allocationi
= Allocationi
+ Requesti
Needi
= Needi
– Requesti
If safe ⇒ the resources are allocated to Pi.
If unsafe ⇒ Pi must wait, and the old resource-allocation
state is restored
3/16/2011 45Example: P1 Request (1,0,2)
Request ≤ Available ((1,0,2) ≤ (3,3,2) ⇒ true)
Allocation Need Available
A B C A B C A B C
P0 0 1 0 7 4 3 2 3 0
P1 3 0 2 0 2 0
P2 3 0 1 6 0 0
P3 2 1 1 0 1 1
P4 0 0 2 4 3 1
< P1, P3, P4, P0, P2> is a safety sequence
Can request for (3,3,0) by P4 be granted?
Can request for (0,2,0) by P0 be granted?
3/16/2011 4647
Deadlock detectionDeadlock Detection
Allow system to enter deadlock state
Use detection algorithms
Recover from deadlock
3/16/2011 48Single Instance of Each
Resource Type
Maintain wait-for graph
Nodes are processes
Pi
→ Pj
if Pi
is waiting for Pj
Periodically invoke an algorithm that searches
for a cycle in the graph
If there is a cycle, there exists a deadlock
An algorithm to detect a cycle in a graph
requires an order of n2 operations
where n is the number of vertices in the graph
3/16/2011 49Resource-Allocation Graph and
Wait-for Graph
Resource-Allocation Graph Corresponding wait-for graph
3/16/2011 50Several Instances of a
Resource Type
Available: A vector of length m
number of available resources of each type
Allocation: An n x m matrix
number of resources of each type currently allocated
to each process
Request: An n x m matrix
current request of each process
If Request [i
j
] = k, then process Pi
is requesting k
more instances of resource type Rj
3/16/2011 51Detection Algorithm
Let Work and Finish be vectors of length m and
n, respectively
1. Initialize:
(a) Work = Available
(b) For i = 1,2, …, n, if Allocationi
≠ 0, then
Finish[i] = false;otherwise, Finish[i] = true.
2.Find an index i such that both
(a) Finish[i] == false
(b)Requesti
≤ Work
If no such i exists, go to step 4.
3/16/2011 52Detection Algorithm (Cont'd)
3.Work = Work + Allocationi
Finish[i] = true
go to step 2
4. If Finish[i] == false, for some i, 1 ≤ i ≤ n, then
the system is in deadlock state
Moreover, if Finish[i] == false, then Pi
deadlocked.
Algorithm requires an order of O(m x n2)
operations to detect
whether the system is in deadlocked state.
3/16/2011 53Example of Detection Algorithm
Processes P0 - P4; resources (numbers)
A (7), B (2), and C (6)
Snapshot at time T0
Allocation Request Available
A B C A B C A B C
P0 0 1 0 0 0 0 0 0 0
P1 2 0 0 2 0 2
P2 3 0 3 0 0 0
P3 2 1 1 1 0 0
P4 0 0 2 0 0 2
Sequence <P0, P2, P3, P1, P4> will result in
Finish[i] = true for all i.
3/16/2011 54Example (Cont'd)
P2 requests an additional instance of type C
Request
A B C
P0 0 0 0
P1 2 0 1
P2 0 0 1
P3 1 0 0
P4 0 0 2
State of system
Can reclaim resources held by process P0, but insufficient
resources to fulfill other processes; requests
Deadlock exists, consisting of processes P1, P2, P3, and P4
3/16/2011 55Detection-Algorithm Usage
When, and how often, to invoke depends on:
How often a deadlock is likely to occur?
How many processes will need to be rolled back?
one for each disjoint cycle
If detection algorithm is invoked arbitrarily
there may be many cycles in the resource graph
would not be able to tell which of the many
deadlocked processes “caused” the deadlock.
3/16/2011 56Recovery from Deadlock
3/16/2011 57Recovery from Deadlock:
Process Termination
Abort all deadlocked processes
Abort each process until the deadlock is removed
In which order should we choose to abort?
Priority of the process.
How long process has computed, and how much
longer to completion.
Resources the process has used.
Resources process needs to complete.
How many processes will need to be terminated.
Is process interactive or batch?
3/16/2011 58Discussion
For each abort condition, discuss which process
will be selected to be cancelled
Priority of the process
How long process has computed, and how much
longer to completion
Resources the process has used
Resources process needs to complete
How many processes will need to be terminated
Is process interactive or batch?
3/16/2011 59Recovery from Deadlock:
Resource Preemption
Selecting a victim
minimize cost
Rollback
return to some safe state, restart process for that
state
Starvation
same process may always be picked as victim,
include number of rollback in cost factor
3/16/2011 6061
End of chapter
3/16/201162
Question?
3/16/2011
1
Fundamentals of
Operating Systems
Nguyen Tri Thanh
4/20/20112
Virtual Memory
Paging on demand
Page replacement
Frame allocation
Thrashing
4/20/20113
Objectives
Introduce paging method
Introduce segmentation method
4/20/20114
Reference
Chapter 9 of Operating System Concepts
4/20/2011Paging on demand
4/20/2011 5Food order
Each person prefers different food
One more beer
One more ice cream
One more steak
In a minute
4/20/2011 6Paging on demand
Bring a page into memory only when needed
Less I/O needed
Less memory needed
Faster response
More users
Page is needed ⇒ reference to it
invalid reference ⇒ abort
not,in,memory ⇒ bring to memory
Lazy swapper – never swaps a page into
memory unless page is needed
Swapper that deals with pages is a pager 4/20/2011 7Transfer of a Paged Memory to
Contiguous Disk Space
4/20/2011 8Instruction execution
4/20/2011 9Valid-Invalid Bit
With each page table entry a valid–invalid bit is
associated
(v ⇒ in,memory, i ⇒ not,in,memory)
Initially valid–invalid bit is set to i on all entries
Example of a page table snapshot:
During address translation, if
valid–invalid bit in page table
entry is I ⇒ page fault
v
v
v
v
….
Frame # valid,invalid bit
page table 4/20/2011 10Page Table When Some Pages
Are Not in Main Memory
4/20/2011 11Page Fault
If there is a reference to a page,
first reference to that page will trap to operating
system: page fault
1. Operating system looks at another table to decide:
Invalid reference ⇒ abort
Just not in memory
2. Get empty frame
3. Swap page into frame
4. Reset tables
5. Set validation bit = v
6. Restart the instruction that caused the page fault
4/20/2011 12Page Fault (Cont.)
Restart instruction
block move
auto increment/decrement location
NEW
OLD
MEM
NEW
OLD
4/20/2011 13Steps in Handling a Page Fault
4/20/2011 1415
Process
information
4/20/201116
If no free frame available
Call page replacement procedure
swap out an unused page from MEM
Algorithms
FIFO, Optimal, LRU, LRU,approximation
Performance of the algorithm
page,fault rate
which algorithm is better?
4/20/2011Performance of paging on
demand
Page Fault Rate 0 ≤ p ≤ 1.0
if p = 0 no page faults
if p = 1, every reference is a page fault
Effective Access Time (EAT)
EAT = (1 – p) * memory access
+ p (page fault overhead
+ swap page out
+ swap page in
+ restart overhead)
4/20/2011 17Paging on Demand Example
Memory access time = 200 nanoseconds
Average page,fault service time = 8
milliseconds
EAT = (1 – p) x 200 + p (8 milliseconds)
= (1 – p) x 200 + p x 8,000,000
= 200 + p x 7,999,800
If one access out of 1,000 causes a page
fault
EAT = 8.2 microseconds.
slowdown by a factor of 40!!
4/20/2011 18Page Replacement
Prevent over,allocation of memory
include page replacement in page,fault service
routine
Use modify (dirty) bit to reduce overhead of
page transfers
only modified pages are written to disk
Page replacement completes separation
between logical memory and physical memory
large virtual memory can be provided on a smaller
physical memory
4/20/2011 19Page Replacement
4/20/2011 20Need For Page Replacement
4/20/2011 21Basic Page Replacement
1. Find the location of the desired page on disk
2. Find a free frame
If there is a free frame, use it
If there is no free frame, use a page replacement
algorithm to select a victim frame
3. Bring the desired page into the (newly) free
frame; update the page and frame tables
4. Resume the process
4/20/2011 22Page Replacement
4/20/2011 23Page Replacement Algorithms
Want lowest page,fault rate
Evaluate algorithm
runn it on a particular string of memory references
(reference string)
compute the number of page faults on that string
In all our examples, the reference string is
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
4/20/2011 24Graph of Page Faults Versus The
Number of Frames
4/20/2011 25First-In-First-Out (FIFO)
Algorithm
Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2,
3, 4, 5
3 frames
4 frames
⇒ Belady’s anomaly
Belady’s Anomaly: more frames ⇒ more page faults
1
2
3
1
2
3
4
1
2
5
3
4
9 page faults
1
2
3
1
2
3
5
1
2
4
5 10 page faults
4 4 3
4/20/2011 26FIFO Illustrating Belady’s
Anomaly
4/20/2011 27FIFO Page Replacement
4/20/2011 28Optimal Algorithm
Replace page that will not be used for longest
period of time
4 frames example
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
How do you know this?
Used for measuring
how well the algorithm
performs
1
2
3
4
6 page faults
4 5
4/20/2011 29Optimal Page Replacement
4/20/2011 30Least Recently Used (LRU)
Algorithm
Least recently used page is swapped out first
Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
4 frames
5
2
4
3
1
2
3
4
1
2
5
4
1
2
5
3
1
2
4
3
4/20/2011 31Least Recently Used (LRU)
Algorithm
Counter implementation
Every page entry has a counter;
every time page is referenced through this
entry, copy the clock into the counter
When a page needs to be changed,
look at the counters to determine which are to
change
4/20/2011 32LRU Page Replacement
4/20/2011 33LRU Algorithm (Cont.)
Stack implementation
keep a stack of page numbers in a double link
form
Page referenced:
move it to the top
requires 6 pointers to be changed
No search for replacement
4/20/2011 34Use Of A Stack to Record The Most
Recent Page References
4/20/2011 35LRU Approximation Algorithms
Reference bit
With each page associate a bit, initially = 0
When page is referenced bit set to 1
Replace the one which is 0 (if one exists)
We do not know the order
Second chance (follow clock order)
Need reference bit
Clock replacement
If page to be replaced has reference bit = 1 then:
set reference bit 0
leave page in memory
replace next page, subject to same rules 4/20/2011 36Second-Chance (clock) Page-
Replacement Algorithm
4/20/2011 37Counting Algorithms
Keep a counter of the number of references
that have been made to each page
LFU Algorithm
replaces page with smallest count
MFU Algorithm
based on the argument that the page with the
smallest count was probably just brought in and
has yet to be used
4/20/2011 38Allocation of Frames
Each process needs minimum number of
pages
Example: IBM 370 – 6 pages to handle SS
MOVE instruction:
instruction is 6 bytes, might span 2 pages
2 pages to handle from
2 pages to handle to
Two major allocation schemes
fixed allocation
priority allocation
4/20/2011 39Fixed Allocation
Equal allocation
For example, if there are 100 frames and 5
processes, give each process 20 frames.
Proportional allocation
Allocate according to the size of process
m S
p a
s S
× = =
=
∑ =
=
for allocation
frames of number total
process of size
59 64
137
127
5 64
137
10
127
10
64
2
1
2
1
≈ × =
≈ × =
=
=
=
a
a
4/20/2011 40Priority Allocation
Use a proportional allocation scheme using
priorities rather than size
If process Pi
generates a page fault,
select for replacement one of its frames
select for replacement a frame from a process
with lower priority number
4/20/2011 41Global vs. Local Allocation
Global replacement
process selects a replacement frame from the set
of all frames;
one process can take a frame from another
Local replacement
each process selects from only its own set of
allocated frames
4/20/2011 42Thrashing
If a process does not have “enough” pages,
the page,fault rate is very high. This leads to:
low CPU utilization
operating system thinks that it needs to increase
the degree of multiprogramming
another process added to the system
Thrashing
a process is busy swapping pages in and out
4/20/2011 43Thrashing (Cont.)
4/20/2011 44Solutions to Thrashing
Use local allocation
Use priority allocation
not good solution
Working set model
A suitable solution
4/20/2011 45Working-Set Model
≡ working,set window
a number of page references, e.g. 10,000
Working set of Process Pi
WSSi
=total number of pages referenced in the most
recent (varies in time)
if too small will not encompass entire locality
if too large will encompass several localities
if = ∞ ⇒ will encompass entire program
D = Σ WSSi
≡ total demand frames
if D > m (total of frames)⇒ Thrashing
Policy if D > m, then suspend one process
4/20/2011 46Working-set model
4/20/2011 47Keeping Track of
the Working Set
Approximate with interval timer + a reference bit
Example: = 10,000
Timer interrupts after every 5000 time units
Keep in memory 2 bits for each page
Whenever a timer interrupts copy and sets the values
of all reference bits to 0
If one of the bits in memory = 1 ⇒ page in working set
Why is this not completely accurate?
Improvement = 10 bits and interrupt every 1000
time units
4/20/2011 4849
End of chapter
Question?
4/20/201150
New terms
Paging on demand: phân trang theo yêu cSu
Lazy swapper: bT tráo ñWi lưYi
Page Fault: lZi trang
Handling a Page Fault: x[ lý mTt lZi trang
Page Replacement: thay trang
Belady’s Anomaly: d] thưYng Belady
Optimal Algorithm: gi^i thu_t t'i ưu
Least Recently Used (LRU) Algorithm: gi^i thu_t (thay trang) ít ñưbc s[
dcng gSn ñây nhdt
LRU Approximation Algorithms: gi^i thu_t xdp xe LRU
Counting Algorithms: gi^i thu_t ñfm
Fixed Allocation: phân ph'i c' ñ]nh
Priority Allocation: phân ph'i ưu tiên
Global vs. Local Allocation: phân ph'i toàn ccc và ccc bT
Working,Set Model: mô hình t_p làm vijc
4/20/2011
1
Fundamentals of
Operating Systems
Nguyen Tri Thanh
4/27/2011File System
Implementation
File-System Structure
File-System Implementation
Directory Implementation
Allocation Methods
Free-Space Management
Efficiency and Performance
Recovery
4/27/2011 23
Objectives
Introduce what a file is
Introduce file system implementation
Introduce directory implementation
Introduce 3 allocation methods
Introduce free space management
4/27/20114
Reference
Chapter 10, 11 of Operating System
Concepts
4/27/2011File Concept
Contiguous logical address space
Types:
Data
numeric
character
binary
Program
4/27/2011 5File Structure
None - sequence of words, bytes
Simple record structure
Lines
Fixed length
Variable length
Complex Structures
Formatted document
Relocatable load file
Can simulate last two with first method by
inserting appropriate control characters
Who decides:
Operating system
Program 4/27/2011 6File Attributes
Name
only information kept in human-readable form
Identifier
unique (number) identifies file within file system
Type
needed for systems that support different types
Location
pointer to file location on storage device
4/27/2011 7File Attributes (cont’d)
Size
current file size
Protection
controls who can do reading, writing, executing
Time, date, and user identification
data for protection, security, and usage monitoring
Information about files are kept in the
directory structure on the disk
4/27/2011 8Inode on Linux
4/27/2011 9File System Structure
File system resides on secondary storage
disks, CD ROM, DVD, flash drive,…
File system organized into layers
File control block (FCB)
storage structure of information about a file
inode (index node) is an implementation of FCB
on Linux
4/27/2011 10Layered File System
4/27/2011 11A Typical File Control Block
4/27/2011 12In-Memory File System
Structures
The following figure illustrates the necessary
file system structures provided by the
operating systems
open a file
read a file
4/27/2011 13In-Memory File System
Structures
4/27/2011 14Virtual File Systems
Virtual File Systems (VFS)
provide an object-oriented way of implementing
file systems
allow the same system call interface (the API) to
be used for different types of file systems
the API is to the VFS interface, rather than any
specific type of file system.
4/27/2011 15Schematic View of
Virtual File System
4/27/2011 16Linux VFS
NTFS FAT
4/27/2011 17Directory Structure
A collection of nodes containing information
about all files
F 1 F 2
F 3
F 4
F n
Directory
Files
4/27/2011 18Directory Implementation
Linear list
list of file names with pointer to the data blocks
simple to program
time-consuming to execute
FAT http://en.wikipedia.org/wiki/File_Allocation_Table
Linux FS (Ext3)
Hash Table
linear list with hash data structure
decreases directory search time
collisions resolution
fixed size
RAISERFS http://en.wikipedia.org/wiki/ReiserFS
4/27/2011 19Directory in Linux
4/27/2011 20File management API
4/27/2011 21File Operations
Create
Write
Read
Reposition within file
Delete
Truncate
Open(Fi
search the directory structure on disk for entry Fi
, and
move the content of entry to memory
Close (Fi
move the content of entry Fi
in memory to directory
structure on disk
4/27/2011 22Operations Performed on
Directory
Search for a file
Create a directory
Delete a directory
List a directory
Rename a directory
Traverse the file system
4/27/2011 23File/directory protection
4/27/2011 24Goals of Protection
Ensure that each file/directory is
accessed correctly and only by those
processes that are allowed to do so
4/27/2011 25Protection
File owner/creator should be able to control:
what can be done
by whom
Types of access
Read
Write
Execute
Append
Delete
List
4/27/2011 26Access Lists and Groups
Mode of access: read, write, execute
Three classes of users
RWX
a) owner access 7 ⇒ 1 1 1
RWX
b) group access 6 ⇒ 1 1 0
RWX
c) public access 1 ⇒ 0 0 1
Ask manager to create a group (unique name), say G, and add
some users to the group.
For a particular file (say game) or subdirectory, define an
appropriate access.
owner group public
chmod 761 game
Attach a group to a file
chgrp G game
4/27/2011 27Windows XP
Access-control
List
Management
4/27/2011 28Storage Allocation
Methods
4/27/2011 29Disk partitions
A disk can be split into partitions
each is a consecutive range of cylinders
each is also called logic disk
e.g., Windows called: C, D, E drives
4/27/2011 30Partition organization
Organization is specific to each OS
region for storing data is divided into equal
blocks
block is a read/write unit of OS
EXT3
This part is divided
into blocks
4/27/2011 31Allocation Methods
An allocation method refers to how disk
blocks are allocated for files
Contiguous allocation
Linked allocation
Indexed allocation
Each method needs an appropriate way to
access file
4/27/2011 32Contiguous Allocation
Each file occupies a set of contiguous blocks
on the disk
Simple
only starting location (block #) and length (number
of blocks) are required
Random access
Wasteful of space
dynamic storage-allocation problem
Files cannot grow
4/27/2011 33Contiguous Allocation
Mapping from logical to physical
LA is position to access
LA/512
Q
R
Suppose block size is 512KB
Block to be accessed = Q + starting address
Displacement/offset into block = R
4/27/2011 34Contiguous Allocation
of Disk Space
4/27/2011 35Extent-Based Systems
Many newer file systems (I.e. Veritas File
System)
use a modified contiguous allocation scheme
Extent-based file systems
allocate disk blocks in extents
An extent is a contiguous block of disks
Extents are allocated for file allocation
A file consists of one or more extents
extents of a file are not necessarily contiguous
4/27/2011 36Linked Allocation
Each file is a linked list of disk blocks:
blocks may be scattered anywhere on the disk.
Data
block =
4/27/2011 37Linked Allocation
4/27/2011 38Linked Allocation (Cont.)
Simple
need only starting address
Free-space management system
no waste of space
No random access
File-allocation table (FAT) – disk-space allocation
used by MS-DOS and OS/2.
4/27/2011 39File-Allocation Table (DOS)
• Separate the
pointers from
data
• pointers are
stored in File
Allocation Table
(FAT)
• The system
stores two
identical copies
of FAT
-1
4/27/2011 40Indexed Allocation
Brings all pointers together into the index block
Logical view
index table data blocks
4/27/2011 41Example of Indexed Allocation
4/27/2011 42Indexed Allocation (Cont.)
Need index block
Random access
Dynamic access without external
fragmentation,
have overhead of index block
4/27/2011 43Indexed Allocation (cont’d)
Two-level index
Two level of index block
Method to increase file size
File entry
M
outer-index
index table data
…
…
4/27/2011 44Address locating
Go to USA →
San Francisco
→
California St
→ 3333
4/27/2011 45Indexed Allocation (cont’d)
A system uses two-level index
block size 4KB
pointer size 4 Bytes
Which of the following is the correct maximum
file size?
4096MB
40GB
4GB
2GB
4/27/2011 46Indexed Allocation (cont’d)
A system uses two-level index
Which of the following is the correct steps to
location the data at file position n?
Identify block number → block number in block
table → offset
Identify block number in block table → block number
→ offset
Identify offset → block number
none of the above
4/27/2011 47Indexed Allocation (cont’d)
A system uses two-level index
block size 2KB
pointer size 4 Bytes
File size 20MB
Which is the correct address of file at position
15MB?
4/27/2011 48Indexed Allocation (Cont.)
Linked scheme
no limit on file size
Combine linked list with
index block
index blocks are linked
last pointer of a index block is
the address of the next one
index table data blocks
4/27/2011 49Combined Scheme: UNIX
(4K bytes per block) 4/27/2011 50LINUX allocation
(10 direct pointers) 4/27/2011 51Free-Space Management
Bit vector (n blocks), e.g. Linux (ext3)
Easy to get contiguous blocks
…
0 1 2 n-1
bit[i] =
678
0 ⇒ block[i] free
1 ⇒ block[i] occupied
• Bit map requires extra space
Example:
block size = 212 bytes
disk size = 230 bytes (1 gigabyte)
n = 230/212 = 218 bits (or 32K bytes)
4/27/2011 52Free-Space Management
(Cont.)
Linked list (free list)
Cannot get contiguous space easily
No waste of space
Grouping
Use linked blocks to store pointers to free blocks
Last pointer in a block is the address of the next one
Counting
several contiguous blocks are freed/allocated for a file
each entry has
address of the first free block
number of contiguously free blocks 4/27/2011 53Free-Space Management
(Cont.)
Need to protect:
Pointer to free list
Bit map
Must be kept on disk
Copy in memory and disk may differ
Cannot allow for block[i] to have a situation where bit[i] =
1 in memory and bit[i] = 0 on disk
Solution:
Set bit[i] = 1 in disk
Allocate block[i]
Set bit[i] = 1 in memory
4/27/2011 54Linked Free Space List on Disk
4/27/2011 55Page Cache
Page cache
caches pages rather than disk blocks using virtual
memory techniques
Memory-mapped I/O uses a page cache
Routine I/O through the file system uses the
buffer (disk) cache
This leads to the following figure
4/27/2011 56Shared file systems
A file system is shared to other machines
the file system may be large and powerful
file sharing is needed in many applications
available in many systems, e.g., Windows, Linux
(NFS)
4/27/2011 57Shared file systems
4/27/2011 58Network File System (NFS)
Architecture
4/27/2011 59End of Chapter
4/27/2011 60Question?
4/27/2011 61
1
Fundamentals of
Operating Systems
Nguyen Tri Thanh
5/11/2011Storage Systems
File-System Structure
File-System Implementation
Directory Implementation
Allocation Methods
Free-Space Management
Efficiency and Performance
Recovery3
Objectives
Introduce the list of mass storage devices
Introduce the structure/organization of disks
Introduce disk scheduling algorithms
Introduce reliable storages
Introduce non-violate storages
Implement disk scheduling algorithms
5/11/20114
Reference
Chapter 12 of Operating System Concepts
5/11/2011Overview of Mass Storage
Structure
Magnetic disks provide bulk of secondary storage
of modern computers
rotate at 60 to 300 rounds per second
Transfer rate
rate of data flow between drive and computer
Positioning time (random-access time)
time to move disk arm to desired cylinder (seek time) and
time for desired sector to rotate under the disk head
(rotational latency)
Head crash
disk head making contact with the disk surface
That’s badOverview of Mass Storage
Structure (cont’d)
Disks can be removable
Drive attached to computer via I/O bus
EIDE, ATA, SATA, USB, Fibre Channel, SCSI
Host controller
computer uses bus to talk to
Disk controller
built into drive or storage arrayMoving-head Disk MechanismDisk Structure
Disk drives are treated as
a large 1-dimensional arrays of logical blocks
a logical block is the smallest unit of transfer
array of logical blocks is mapped into the sectors
of the disk sequentially.
Sector 0 is the first sector of the first track on the
outermost cylinder
Mapping proceeds in order through that track
then the rest of the tracks in that cylinder,
and then through the rest of the cylinders from
outermost to innermost.SectorsSectors
A number of sectors in each cylinder is not numbered (unused)Disk SchedulingDisk Scheduling
The operating system is responsible for using
hardware efficiently
for the disk drives, this means having a fast
access time and disk bandwidth
Access time has two major components
Seek time
the time for the disk are to move the heads to the
cylinder containing the desired sector.
Rotational latency
the additional time waiting for the disk to rotate the
desired sector to the disk head.Disk Scheduling (Cont.)
Target
Minimize seek time
Seek time ≈ seek distance
Disk bandwidth
(total number of bytes transferred) / (total time
between the first request for service and the
completion of the last transfer)Disk Scheduling Algorithms
Several algorithms exist to schedule the
servicing of disk I/O requests
We illustrate them with a request queue (0-
199)
98, 183, 37, 122, 14, 124, 65, 67
Current head pointer 53FCFS
Illustration shows total head movement of 640 cylinders.SSTF
Selects the request with the minimum seek
time from the current head position
SSTF scheduling is a form of SJF scheduling;
may cause starvation of some requests
Illustration shows total head movement of
236 cylinders.SSTF (Cont.)SCAN
The disk arm starts at one end of the disk,
and moves toward the other end,
servicing requests until it gets to the other end of
the disk,
head movement is reversed and servicing
continues.
Sometimes called the elevator algorithm
Illustration shows total head movement of
208 cylindersSCAN (Cont.)C-SCAN
Provides a more uniform wait time than SCAN
The head moves from one end of the disk to
the other
servicing requests as it goes
When it reaches the other end, however,
it immediately returns to the beginning of the disk,
without servicing any requests on the return trip.
Treats the cylinders as a circular list that wraps
around from the last cylinder to the first oneC-SCAN (Cont.)C-LOOK
Version of C-SCAN
Arm only goes as far as the last request in
each direction,
then reverses direction immediately,
without first going all the way to the end of the
disk. C-LOOK (Cont.)Selecting a Disk-Scheduling
Algorithm
SSTF is common and has a natural appeal
SCAN and C-SCAN perform better for systems that
place a heavy load on the disk.
Performance depends on the number and types of
requests.
Requests for disk service can be influenced by the file-
allocation method.
The disk-scheduling algorithm should be written as a
separate module of the operating system, allowing it to
be replaced with a different algorithm if necessary.
Either SSTF or LOOK is a reasonable choice for the
default algorithm.Disk Management
Low-level formatting, or physical formatting
Dividing a disk into sectors that the disk controller can
read and write
To use a disk to hold files, the operating system
still needs to record its own data structures on the
disk
Partition the disk into one or more groups of cylinders
Logical formatting or “making a file system”Disk Management (cont’d)
Boot block initializes system
The bootstrap is stored in ROM
Bootstrap loader program
Methods such as sector sparing used to handle
bad blocksBooting from a Disk in
Windows 2000Booting from a Disk in LinuxBooting from a Disk in Linux
Boot
block
Super
block
map
inode
list
data data data data … dataSwap-Space Management
Swap-space — Virtual memory uses disk
space as an extension of main memory.
Swap-space can be
carved out of the normal file system,
more commonly, it can be in a separate disk
partition.Swap-Space Management
Swap-space management
4.3BSD allocates swap space when process
starts; holds text segment (the program) and data
segment.
Kernel uses swap maps to track swap-space use.
Solaris 2 allocates swap space only when a page
is forced out of physical memory, not when the
virtual memory page is first created.Data Structures for Swapping on
Linux SystemsReliable storage
(reliable means data is safe even disks are broken)RAID Structure
RAID=Redundant Array of Inexpensive
Disks
RAID – multiple disk drives provides
reliability via redundancy
RAID is arranged into six different levels
There are also combinations RAID (cont’d)
Several improvements in disk-use techniques
involve the use of multiple disks working
cooperatively
Disk striping uses a group of disks as one
storage unitRAID (cont’d)
RAID schemes improve performance and
improve the reliability of the storage system
by storing redundant data
Mirroring or shadowing keeps duplicate of each
disk
Block interleaved parity uses much less
redundancyRAID LevelsRAID 0 - StrippingRAID 1 -MirroringRAID (0 + 1) and (1 + 0)RAID 0+1RAID 0+1RAID 1+0RAID 5
Parity blocks are used instead of mirroring
A1 (1110) A2 (0100) A3 (1001) AP(0011)RAID 6 RAID 50 RAID 60 RAID 100Stable-Storage Implementation
Write-ahead log scheme requires stable
storage
To implement stable storage
Replicate information on more than one
nonvolatile storage media with independent
failure modes
Update information in a controlled manner to
ensure that we can recover the stable data after
any failure during data transfer or recoveryStable-Storage Implementation
Write everything twice to separate disks
Be sure 1st
write does not invalidate previous 2nd copy
RAID 1 is okay; RAID 4/5 not okay!
Read blocks back to validate; then report completion
Reading both copies
If 1st
copy okay, use it – i.e., newest value
If 2nd copy different or bad, update it with 1st
copy
If 1st
copy is bad; update it with 2nd copy – i.e., old
value51
Stable Storage (continued)
Crash recovery
Scan disks, compare corresponding blocks
If one is bad, replace with good one
If both good but different, replace 2nd with 1st
copy
Result:
If 1st
block is good, it contains latest value
If not, 2nd block still contains previous value
An abstraction of an atomic disk write of a
single block
Uninterruptible by power failure, etc.52
Stable Storage
Analysis of the influence of crashes on stable writesTertiary Storage Devices
Low cost is the defining characteristic of
tertiary storage
Generally, tertiary storage is built using
removable media
CD-ROMs; Floppy, Flash, WORM, tapesEnd of chapterQuestion?
Bạn đang đọc truyện trên: AzTruyen.Top