admin 管理员组

文章数量: 887021


2024年1月18日发(作者:amaze ui 上传文件)

中文3150字

本科毕业设计外文文献翻译

学生姓名:XXX

学 院:信息工程学院

系 别:计算机系

专 业:软件工程

班 级:软件06

指导教师:XXX 副教 授

二 〇 一 〇 年 六 月

XXX工业大学本科毕业设计外文文献翻译

Process Management

The process is one of the fundamental abstractions in Unix operating systems. A process

is a program(object code stored on some media) in execution. Processes are, however, more

than just the executingprogram code (often called the text section in Unix). They also include

a set of resources such as open files and pending signals, internal kernel data, processor state,

an address space, one or more threads ofexecution, and a data section containing global

variables. Processes, in effect, are the living result of running program code.

Threads of execution, often shortened to threads, are the objects of activity within the

process. Each thread includes a unique program counter, process stack, and set of processor

registers. The kernel schedules individual threads, not processes. In traditional Unix systems,

each process consists of one thread. In modern systems, however, multithreaded

programsthose that consist of more than one threadare common. Linux has a unique

implementation of threads: It does not differentiate between threads and processes. To Linux,

a thread is just a special kind of process.

On modern operating systems, processes provide two virtualizations: a virtualized

processor and virtual memory. The virtual processor gives the process the illusion that it alone

monopolizes the system, despite possibly sharing the processor among dozens of other

processes. discusses this virtualization. Virtual memory lets the process allocate and manage

memory as if it alone owned all the memory in the system. Interestingly, note that threads

share the virtual memory abstraction while each receives its own virtualized processor.

A program itself is not a process; a process is an active program and related resources.

Indeed, two or more processes can exist that are executing the same program. In fact, two or

more processes can exist that share various resources, such as open files or an address space.

A process begins its life when, not surprisingly, it is created. In Linux, this occurs by

means of the fork()system call, which creates a new process by duplicating an existing one.

The process that calls fork() is the parent, whereas the new process is the child. The parent

resumes execution and the child starts execution at the same place, where the call returns. The

fork() system call returns from the kernel twice:once in the parent process and again in the

newborn child.

1

XXX工业大学本科毕业设计外文文献翻译

Often, immediately after a fork it is desirable to execute a new, different, program. The

exec*() family of function calls is used to create a new address space and load a new program

into it. In modern Linux kernels, fork() is actually implemented via the clone() system call,

which is discussed in a followingsection.

Finally, a program exits via the exit() system call. This function terminates the process

and frees all its resources. A parent process can inquire about the status of a terminated child

via the wait4() system call, which enables a process to wait for the termination of a specific

process. When a process exits, it is placed into a special zombie state that is used to represent

terminated processes until the parent calls wait() or waitpid().

Another name for a process is a task. The Linux kernel internally refers to processes as

tasks. although when I say task I am generally referring to a process from the kernel's point of

view.

1 Process Descriptor and the Task Structure

The kernel stores the list of processes in a circular doubly linked list called the task list.

Each element in the task list is a process descriptor of the type struct task_struct, which is

defined in .The process descriptor contains all the information about a

specific process.

The task_struct is a relatively large data structure, at around 1.7 kilobytes on a 32-bit

machine. This size,however, is quite small considering that the structure contains all the

information that the kernel has and needs about a process. The process descriptor contains the

data that describes the executing programopen files, the process's address space, pending

signals, the process's state, and much more

2 Allocating the Process Descriptor

The task_struct structure is allocated via the slab allocator to provide object reuse and

cache coloring Prior to the 2.6 kernel series, struct task_struct was stored at the end of the

kernel stack of each process. This allowed architectures with few registers, such as x86, to

calculate the location of the process descriptor via the stack pointer without using an extra

register to store the location. With the process descriptor now dynamically created via the slab

2

XXX工业大学本科毕业设计外文文献翻译

allocator, a new structure, struct thread_info, was created that again lives at the bottom of the

stack and at the top of the stack . The new structure also makes it rather easy to calculate

offsets of its values for use in assembly code.

The thread_info structure is defined on x86 in as

struct thread_info {

struct task_struct *task;

struct exec_domain *exec_domain;

unsigned long flags;

unsigned long status;

__u32 cpu;

__s32 preempt_count;

mm_segment_t addr_limit;

struct restart_block restart_block;

unsigned long previous_esp;

__u8 supervisor_stack[0];

};

Each task's tHRead_info structure is allocated at the end of its stack. The task element of the

structure is a pointer to the task's actual task_struct.

3 Storing the Process Descriptor

The system identifies processes by a unique process identification value or PID. The PID

is a numerical value that is represented by the opaque type pid_t, which is typically an int.

Because of backward compatibility with earlier Unix and Linux versions, however, the

default maximum value is only 32,768 although the value can optionally be increased to the

full range afforded the type. The kernel stores this value as pid inside each process descriptor.

This maximum value is important because it is essentially the maximum number of

processes that may exist concurrently on the system. Although 32,768 might be sufficient for

a desktop system, large servers may require many more processes. The lower the value, the

sooner the values will wrap around, destroying the useful notion that higher values indicate

later run processes than lower values. If the system is willing to break compatibility with old

3

XXX工业大学本科毕业设计外文文献翻译

applications, the administrator may increase the maximum value via

/proc/sys/kernel/pid_max.

Inside the kernel, tasks are typically referenced directly by a pointer to their task_struct

structure. In fact, most kernel code that deals with processes works directly with struct

task_struct. Consequently, it is very useful to be able to quickly look up the process descriptor

of the currently executing task, which is done via the current macro. This macro must be

separately implemented by each architecture. Some architectures save a pointer to the

task_struct structure of the currently running process in a register, allowing for efficient

access. Other architectures, such as x86 (which has few registers to waste), make use of the

fact that struct thread_info is stored on the kernel stack to calculate the location of thread_info

and subsequently the task_struct.

On x86, current is calculated by masking out the 13 least significant bits of the stack

pointer to obtain the thread_info structure. This is done by the current_thread_info() function.

The assembly is shown here:

movl $-8192, %eax

andl %esp, %eax

This assumes that the stack size is 8KB. When 4KB stacks are enabled, 4096 is used in lieu of

8192.

Finally, current dereferences the task member of thread_info to return the

task_struct:current_thread_info()->task; Contrast this approach with that taken by PowerPC

(IBM's modern RISC-based microprocessor), which stores the current task_struct in a register.

Thus, current on PPC merely returns the value stored in the register r2. PPC can take this

approach because, unlike x86, it has plenty of registers. Because accessing the process

descriptor is a common and important job, the PPC kernel developers deem using a register

worthy for the task.

4 Process State

The state field of the process descriptor describes the current condition of the process.

Each process on the system is in exactly one of five different states. This value is represented

by one of five flags:

4

XXX工业大学本科毕业设计外文文献翻译

(1) TASK_RUNNING The process is runnable; it is either currently running or on a runqueue

waiting to run. This is the only possible state for a process executing in user-space; it can also

apply to a process in kernel-space that is actively running.

(2) TASK_INTERRUPTIBLE. The process is sleeping (that is, it is blocked), waiting for

some condition to exist. When this condition exists, the kernel sets the process's state to

TASK_RUNNING. The process also awakes prematurely and becomes runnable if it

receives a signal.

(3) TASK_UNINTERRUPTIBLE This state is identical to TASK_INTERRUPTIBLE except

that it does not wake up and become runnable if it receives a signal. This is used in situations

where the process must wait without interruption or when the event is expected to occur quite

quickly. Because the task does not respond to signals in this state, this state is less often used

than TASK_INTERRUPTIBLE

(4)TASK_ZOMBIE The task has terminated, but its parent has not yet issued a wait4()

system call. The task's process descriptor must remain in case the parent wants to access it. If

the parent calls wait4(), the process descriptor is deallocated.

(5) TASK_STOPPED Process execution has stopped; the task is not running nor is it eligible

to run. This occurs if the task receives the SIGSTOP, SIGTSTP, SIGTTIN, or SIGTTOU

signal or if it receives any signal while it is being debugged.

5 Manipulating the Current Process State

Kernel code often needs to change a process's state. The preferred mechanism is using

set_task_state(task, state); This function sets the given task to the given state. If applicable, it

also provides a memory barrier to force ordering on other processors (this is only needed on

SMP systems). Otherwise, it is equivalent to task->state = state; The method

set_current_state(state) is synonymous to set_task_state(current, state).

6 Process Context

One of the most important parts of a process is the executing program code. This code is

read in from an executable file and executed within the program's address space. Normal

program execution occurs in userspace. When a program executes a system call or triggers an

5

XXX工业大学本科毕业设计外文文献翻译

exception, it enters kernel-space. At this point, the kernel is said to be "executing on behalf of

the process" and is in process context. When in process context, the current macro is valid.

Upon exiting the kernel, the process resumes execution in user-space, unless a higher-priority

process has become runnable in the interim, in which case the scheduler is invoked to select

the higher priority process.

System calls and exception handlers are well-defined interfaces into the kernel. A

process can begin executing in kernel.

6

XXX工业大学本科毕业设计外文文献翻译

进程管理

进程是Uinx操作系统最基本的抽象之一。一个进程就是处于执行期间的程序(目标代码放在某种存储介质上)。但进程并不仅仅局限于一段可执行程序(Unix称其为代码段(text section))。通常进程还要包含其他资源,像打开的文件、挂起的信号、内核内部数据、处理器状态、地址空间及一个或多个执行线程、当然还包括用来存放全局变量的数据段等。实际上,进程就是正在执行的程序代码的活标本。

执行线程,简称线程(thread),是在进程中活动的对象。每个线程用由一个独立的程序计数器、进程栈和一组进程寄存器。内核调度的对象是线程,而不是进程。在传统的Unix系统中,一个进程只包含一个线程,但现在的系统中,包含多个线程的多线程程序司空见惯。Linux系统的线程实现非常特别—他对线程和进程并不特别区分。对Linux而言,线程只不过是一种特殊的进程罢了。

在现代操作系统中,进程提供两种虚拟机制:虚拟处理器和虚拟内存。虽然实际上可能是许多进程正在分享一个处理器,但虚拟处理器给进程一种假象,让这些进程觉得自己在独享处理器。而虚拟内存让进程在获取和使用内存是觉得自己拥有整个操作系统的所有内存资源。有趣的是,注意在线程之间(这里是指包含在同一个进程中的进程)可以共享虚拟内存,但拥有各自的虚拟处理器。

程序本身并不是进程:进程是处于执行期间的程序以及它所包含的资源的总称。实际上完全可以存在两个或者多个不同的进程执行的是同一个程序。并且两个或两个以上并存的进程还可以共享许多诸如打开的文件、地址空间之类的资源。无疑,进程在它被创建的时刻开始存活。在Linux系统中,这通常是调用fork()系统调用的结果,该系统调用通过复制一个现有进程来创建一个全新的进程。调用fork()的进程被称为父子进程,新产生的进程被称为子进程。在调用结束的时,在返回这个相同位置上,父进程恢复执行,子进程开始执行。Fork()系统调用从内核返回两次:一次回到父进程,另一个回到新诞生的子进程。

通常,创建新的进程都是为了立即执行新的、不同的程序,而接着调用exec()这族函数就可以创建新的地址空间,并把新的程序载入。在现代Linux内核中,fork()实际上是由clone()系统调用实现的,后者将在后面讨论。

最终,程序通过exit()系统调用退出。这个函数会终结进程并将其占有的资源释放掉。父进程可以通过wait()系统调用查询子进程是否终结,这其实使得进程拥有了

7

XXX工业大学本科毕业设计外文文献翻译

等待指定进程执行完毕的能力。进程退出执行后被置为僵死状态,直到它的父进程调用wait()或waitpid()为止。

进程的另一个名字是任务(task)。Linux内核通常把进程也叫做任务。在这里所说的任务是指从内核观点看到的进程。

1 进程描述符及任务结构

内核把进程存放在叫做任务队列(task list)的双向循环链表中。链表的每一项都是类型为task_struct、称为进程描述符的结构,改结构定义在文件中。进程描述符中包含一个具体进程的所有信息。

task_struct相对较大,在32位机器上,它大约有1.7k字节。但如果考虑到该结构内包含了内核管理一个进程所需要的所有信息,那么它的大小也相当小了。进程描述符中包含的数据能完整的描述一个正在执行的程序:它打开的文件,进程的地址空间,挂起的信号,进程的状态,还有其他更多的信息。

2 进程描述符

Linux通过slab非配器分配task_struct结构,这样能达到对象复用和缓存着色的目的。在2.6以前的内核中,各个进程的task_struct存放在他们的内核栈的尾端。这样做的目的是为了让那些像x86这样寄存器较的硬件体系结构只要通过栈指针就能算出它的位置,从而避免使用额外的寄存器专门记录。由于现在用slab分配器动态生成task_struct,所以只需在栈底或栈顶创建一个新的结构struct thread)info。这个新的结构能使在汇编代码中计算器偏移变得相当的容易。

在x86上,thread_info {

Struct task_struct *任务;

Struct exec_domain *exec_domain;

Unsigned long flags;

Unsigned long

__u32

__s32

status;

cpu;

preempt_count;

Mm_segment addr_limit;

Struct restart_block restart_block;

Unsigned long

8

previous_esp;

XXX工业大学本科毕业设计外文文献翻译

}

__u8 supervisor_stack[0];

每个任务的thread_info 结构在它的内核栈的尾端分配。结构中task域中存放的是指向该任务实际task_struct的指针。

3 进程描述符的存放

内核通过一个唯一的进程标识值或PID来表示每个进程。PID 是一个数,表示为pid_t隐含类型,实际上就是一个int类型。为了老版本的Unix和Linux兼容,PID 的最大值默认设置为32768,尽管这个值也可以增加到类型所允许的范围。内核把每个进程PID存放在他们各自的进策划那个描述符中。

这个值很重要,因为它实际上就是系统中允许同时存在的进程的最大数目。尽管32768对一般的桌面系统足够用了,但是大型服务器可能需要更新进程。这个值越小,转一圈就越快,本类数值大的进程比数值小的进程迟运行,但这样一来就破坏了这一原则。如果确实需要的话,可以不考虑与老式系统的兼容,由系统管理员通过修改/proc/sys/kernel/pid_max来提高上限。

在内核中,访问任务通常需要获得指向其task_struct指针。实际上,内核中大部分处理进程的代码都是直接通过task_struct进行的。因此,通过current宏查找到当前正在运行进程的进程描述符的速度就显得尤为重要。硬件体系结构不同,该宏的实现也就不同,它必须针对专门的硬件体系结构作处理。有的硬件体系结构可以拿出一个专门寄存器来存放指向当前进程task_strcut的指针,用于加快访问速度。而有些像x86这样的体系结构,就只能在内核栈的尾端创建thread_info结构,通过计算偏移间接地查找task_struct结构。

在x86体系上,current把栈指针的后13个有效位屏蔽掉,用来计算出thread_info的偏移。该操作通过current_thread_info()函数完成的。汇编代码如下:

Mov $-81925, %eax

Andl %esp, %eax

这里假定栈的大小为8KB。当4KB的栈启用时,就用4096,而不是8192。

最后,current_thread_info()->task;

对比一下这部分在PowerPC上的实现(IBM基于RISC的现代微处理器),我们可以发现当前task_struct的地址是保存在一个寄存器中的。也就是说,在PPC上,

9

XXX工业大学本科毕业设计外文文献翻译

current宏只需要把r2寄存器中的值返回就行了。与x86不一样,PPC有足够多的寄存器,所以它的实现有这样的余地。而访问进程描述符是一个重要的频繁的操作,所以PPC的内核开发者会觉得完全有必要为此使用一个专门的寄存器。

4 进程状态

进程描述符中的state域描述了进程的当前状态。系统的每个进程都必然处于五种进程状态的一种。该域的值也必为下列五种状态标志之一:

(1) TASK_RUNNING(运行)——进程是可执行的,它或者正在执行,或者在运行队列中等待执行。这是进程在用户空间中执行唯一可能的状态,也可以应用到内核空间中正在执行的进程。

(2) TASK_INTERRUPTIBLE(可中断)——进程正在睡眠(也就是说它被阻塞),等待某些条件的达成。一档这些条件达成,内核就会把进程状态设置为运行。处于此状态的进程也会因为接受到信号而提前被唤醒并投入到运行。

(3) TASK_UNINTERRUPTIBLE(不可中断)——除了不会因为接受到信号而被唤醒从而投入运行外,这个状态与可打断的状态相同。这个状态通常在进程必须在等待时不受干扰或等待时间很快就会发生时出现。由于处于此状态的任务对信号不做响应,所以较之可中断状态,使用的较少。

(4) TASK_ZOMBIE(僵死)——该进程已经结束了,但是其父进程还没有调用wait()系统调用,为了父进程能够获知它的消息,子进程的进程描述符仍然被保留着。一旦父进程调用了wait进程描述符就会被释放掉。

(5) TASK_STOPPED(停止)——进程停止执行,进程没有投入运行也不能投入欲行。通常这种状态发生在接受到SIGSTOP、SIGTTIN、SIGTTOU等信号的时候。此外,在调试期间受到任何信号,都会使进程进入这种状态。

5 设置当前进程状态

内核经常需要调整某个进程的状态。这时最好使用set_task_state(task, state); 函数。该函数将制定的进程设置为给定的状态。必要的时候,它会设置内存屏障来强制其他处理器作重新排序(一般只有在SMP系统中有此必要),否则,它等价于:Task->state = state;方法set_current_state(state),和set_task_state(current, state)含义是等同的。

6 进程上下文

10

XXX工业大学本科毕业设计外文文献翻译

可执行程序代码是进程的重要组成部分。这些代码从可执行文件载入到进程的地址空间执行。一般程序在用户空间执行。当一个程序调用执行了系统调用或者触发了某个异常,它就陷入了内核空间。此时,我们称内核“代表进程执行”并处于进程上下文中。在此上下文中current宏是有效的。除非在此间隙有更高优先级的进程需要执行并由调度器做出了相应的调整,否则在内核退出的时候,程序恢复在用户空间继续执行。

系统调用和异常处理程序是对内核明确定义的接口。进程只有通过这些接口才能陷入内核执行——对内核的所有的访问都必须通过这些接口。

11


本文标签: 进程 执行 内核