File types
-
regular: ordinary Unix files
-
directory: similar to ordinary files but can only be accessed
and modified through restrictive system calls
-
device files: associates device names with their driver.
There exists two types of devices:
-
block special: Memory devices such as disk which enable caching
(ie buffering in the kernel) and block accesses.
-
character files: also called raw devices. (It is possible
for disks to be both a character and block device).
-
fifo: pipes
-
symbolic links
Of course, new types can be added as Unix evolves.
Disk data structures
A disk is divided into partitions.
Partitions are joined together logically only after the
kernel boots, via mount. However, this is system administration,
and hence not covered by POSIX.
Partitions
The disk is divided into partitions, which are logical disks.
Each partition contains a logical file system.
Not only is each partition logically complete, but different
types of filesystems can be stored on different partitions. For example,
a linux file system in one partition and a windows FAT16 file system in
a different partition.
It is necessary to perform a file system specific format
on the disk to lay down the data structures on which the file system will
be built. We describe this in the following sections.
A partition is divided into blocks, typically of 4-8Kb.
The blocks are of the following forms:
-
boot block: used to start the OS
-
super block: describes the partition
-
inode blocks: describes a file
-
file blocks: the data contained in the file
-
indirect blocks: used to construct large files
-
double indirect blocks: used to construct very large files
Boot block
Not all partitions are boot partitions. However, there must
be a bootable partitions, which enables the kernel to be loaded. Only after
the kernel is loaded, can the kernel understand the rest of the partitions
to access the file system.
Superblock
Each partition is described by a superblock. The super block
contains:
-
The size of the partition
-
The number of inodes
-
The number of file blocks
-
The size of flile blocks
-
the set of free blocks
-
the set of free inodes
-
Shutdown status
inodes
Each file on disk is described by an inode, which
contains the following information about a file:
-
inode number
-
file type
-
hard link count
-
UID
-
GID
-
size in bytes
-
permissions: read,write,execute, set-uid, set-gid
-
last accessed time
-
last modified time
-
last change time: last time the file access permissions,
UID, GID, or hardlink count have changed.
-
indirect block
-
double indirect block
Each inode is kept at a fixed address, and the root inode
is at index 2 within the partition.
Directories
A directory could almost be an ordinary file, but because
of its importance structurally to the filesystem, there are special APIs
to manipulate it.
The directories consist of a number of pairs of <names,
inode>.
File operations
-
open: create a new entry in the File Descriptor table
-
creat: like open, but a new file is created
-
dup: copy over a file descriptor into the lowest number file
descriptor
-
pipe: create a pipe
-
close: remove a File Descriptor entry
-
mknod: creates special, regular file, or named pipe
-
link: Create a directory entry for an existing file
-
unlink: remove a directory entry
-
chown: Change the owner and group of a file
-
chmod: Change access modes of the file
-
stat: Info about files
-
read: input
-
write: output
-
lseek: change the file pointer
-
chdir: Change the current directory
File Descriptor Table
There is a File Descriptor table per process in the u area.
A file descriptor entry contains:
-
A pointer to a file table entry
Several File Descriptors are opened for every process:
-
stdin
-
stdout
-
stderr
The File Descriptor table contains OPEN_MAX entries, which
must be at least as large as POSIX_OPEN_MAX.
File Table
There is one File Table in the kernel. File Table entries
are pointed to by File Descriptors, and in turn point to file IDs:
-
count of the number of File Descriptors pointing to this
entry
-
access mode: read or write
-
file ID: (called inode in Unix parlance)
-
current offset into the file
Opening, Closing and Manipulating file description table
Open
Finds the lowest numbered free file descriptor entry, and
creates a new entry which points to a new file table entry.
#include <sys/types.h>
#include <fcnt.h>
int open(const char *pathName, int accessMode, mode_t permission);
which returns -1 on failure, the file descriptor table index
on success.
The parameters are:
-
pathName is either an:
-
absolute path: if it begins with a /
-
relative to current working directory: otherwise
-
accessMode contains one of the following:
-
O_RDONLY: open the file with read only access
-
O_WRONLY: open the file with write only access
-
O_RDWR: open the file with both read and write access.
In addition, the following options may be ored with the above:
-
O_APPEND: Append data to the end of the file. (regular file
only)
-
O_CREAT: Create the file if it does not exist. (regular file
only)
-
O_EXCL: Used only w/O_CREAT, to specify that the open fails
if the file already exists. (regular file only)
-
O_TRUNC: If the file exists, delete its contents setting
file size to zero. (regular file only)
-
O_NONBLOCK: Any subsequent read or write on the file is non-blocking.
(FIFO and device files only)
-
O_NOCTTY: Specifies that the named terminal device file is
not to be used as the calling process control terminal. (terminal device
files only)
-
permission: used only if the file is created to set the owner/group/other
file permissions, otherwise ignored. Defined in <sys/stat.h>. The actual
file permissions set are permission - umask.
Creat
Create a new file and open it using first unused file descriptor.
#include <sys/types.h>
#include <fcnt.h>
int creat(const char *pathName, mode_t permission);
This is equivalent to:
open(pathname, O_WRONLY|O_CREAT|O_TRUNC, permissions);
Dup
int dup(int fd)
Finds the first unused file descriptor and copies the file
descriptor at fd to it.
Example of replacing stdin with "/tmp/x": close(0); fd
= open("/tmp/x", O_RDONLY); dup(fd); close(fd);
Close
Frees the file descriptor in the process.
#include <unistd.h>
int close(int fdesc)
returns -1 on failure, 0 on success.
Controlling open files
Read
Read a specified number of bytes into a buffer, given a file
descriptor.
#include <sys/types.h>
#include <unistd.h>
ssize_t read(int fdesc, void *buff, size_t size)
returns number of bytes read on success, -1 on failure. The
number of bytes read can be less than that requested on end-of-file. arguments:
-
fdesc: an open file descriptor
-
buff: buffer of at least size bytes
-
size: number of bytes to be read.
Can be interupted by a signal.
Write
Write size bytes from buff to file specified by fdesc.
#include <sys/types.h>
#include <unistd.h>
ssize_t write(int fdesc, const void *buff, size_t size)
returns -1 on failure, number of bytes written on success.
fsync
#include
int fsync(int fildes);
The fsync() function moves all modified data and attributes
of the file descriptor fildes to a storage device. When fsync() returns,
all in-memory modified copies of buffers associated with fildes have been
written to the physical medium.
This call is useful since write to block devices (such
as file systems) are buffered and not written for up to 30 seconds typically.
In cases where the completion or ordering of writes is important, fsync's
must be performed.
Lseek
Change the position in the file.
#include <sys/types.h>
#include <unistd.h>
off_t lseek(int fdesc, off_t pos, int whence);
returns -1 on failure, number of bytes written on success.
Whence specifies what pos is relative to
-
SEEK_CURR: Current file pointer address
-
SEEK_SET: the beggining of the file
-
SEEK_END: the end of the file
if lseek seeks to a position beyond the end of the file,
and the file has been open for write then the file is extended with missing
blocks (which are given default value of 0). Hence, a file may have a megabyte
size, but not consume a megabyte of storage. (If file was open read-only,
the operation fails)
fcntl
#include <fcntl.h>
int fcntl(int fdesc, int cmd, ...);
-
G_GETFL: returns the access control flags of the file descriptor
fdesc
-
G_SETFL: Sets the O_NONBLOCK and O_APPEND to the values specified
in the third argument of fcntl.
-
G_GETFD: returns the close-on-exec flag (0 for false, non-zero
for true).
-
G_SETFD: third argument is 0 to clear and 1 to set the close-on-exec
flag.
-
G_DUPFD: duplicates the file descriptor in the first unused
file descriptor which is greater than or equal to the third paramenter.
Returns the duplicated file descriptor.
Miscellaneous
Chown
Change the owner or group of the file
#include <unistd.h>
int fchown(int fdesc, uid_t uid, gid_t gid);
int chown(const char *pathName, uid_t uid, gid_t gid);
int lchown(const char *pathName, uid_t uid, gid_t gid);
fchown works on the file descriptor, while chown and lchown
work on path. The difference between chown and lchown is that if the pathName
specifies a symbolic link, lchown changes the symbolic link's UID and GID,
while chown changes the referenced files UID and GID.
If _POSIX_CHOWN_RESTRICTED is
-
defined: then if superuser can change any uid,gid otherwise
if eUID=fileUID and gid is either an effective or supplemental group id,
then the gid can be changed.
-
undefined: if eUID=fileUID or eUID=0, then we can change
the fileUID and fileGID. If not super user, changing fileUID (fileGID)
will clear the set-UID (set-GID) bit. (implementation dependent if superuser).
The above restrictions are to prevent security holes.
If uid (gid) is equal to -1, then uid (gid) is unchanged.
Chmod
Change owner, group, other permissions, set-UID, set-GID,
and sticky bit. The caller must be the owner of the file or super user.
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
int fchmod(int fdesc, mode_t flag);
int chmod(const char *pathName, mode_t flag);
Directory
Link
Adds another directory entry which points to the file's inode.
#include <unistd.h>
int link(const char *currPathName, const char *newPathName);
returns -1 on failure, 0 on success. Arguments
-
currPathName is a absolute or relative pathname of a file
-
newPathName is an absolute or relative pathname
In unix, both pathnames must be in the same partition.
Unlink
Removes a directory entry.
#include <unistd.h>
int unlink(const char *pathName);
returns -1 on failure, 0 on success.
Rename
Removes a file name from one directory and adds it to another.
#include <unistd.h>
int rename(const char *currPathName, const char *newPathName);
returns -1 on failure, 0 on success.
Mkdir
Create a new empty directory.
#include <sys/stat.h>
#include <unistd.h>
int mkdir(const char *pathName, mode_t mode);
returns 0 on success, -1 on failure.
The pathName specifies the directory to be created, the
mode less the umask is used to set the file access permissions.
Rmdir
Remove an empty directory
#include <sys/stat.h>
#include <unistd.h>
int rmdir(const char *pathName);
returns 0 on success, -1 on failure.
The pathName specifies the directory to be created, the
mode less the umask is used to set the file access permissions.
Traversing the directory
Read-only access to the directory can be provided by the
following calls:
#include <sys/types.h>
#include <dirent.h>
typedef struct dirent Dirent;
DIR *opendir(const char *pathName); // open the directory for read & point to first entry
Dirent *readir(DIR *dirFdesc); // get the next directory entry in the file
int closedir(DIR* dirFdesc); // close the directory file
void rewinddir(DIR *dirFdesc); // point DIR at the first entry in the directory
Fifo Files (named pipes)
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
int mkfifo(const char *pathName, mode_t mode);
Symbolic Links
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
int symlink(const char *existingPathName, const char *symLinkPath);
int readlink(const char *existingPathName, char *symLinkPath, int size);
int lstat(const char *existingPathName, struct stat *StatPtr);
File locking
Unix is highly oriented towards shared access of files. File
locking can either be mandatory (enforced by the kernel on all accesses)
or advisory (correct locking requires processes to use a given locking
sequence.
POSIX supports only advisory locks. To use advisory locks
all accesses to possibly locked files follow the following sequence.
-
set a lock on the file region desired
-
access the locked region
-
release the lock
Fcntl for file locking
#include <fcntl.h>
int fcntl(int fdesc, int cmd, ...);
where cmd is one of:
-
F_SETLK: set file lock, but don't block if cannot succeed
immediatly.
-
F_SETLKW: set file lock, blocking if cannot succeed immediatly
-
F_GETLK: finds out what process has locked the given file.
the third argument is a pointer to the flock structure:
struct flock {
short l_type; /* lock type */
short l_whence; /* relative to where */
off_t l_start; /* starting offset relative to whence */
off_t l_len; /* length of locked region */
pid_t l_pid; /* PID of a process which has locked the file */
};
where l_type is one of
-
F_RDLCK: sets a read (shared) lock
-
F_WRLCK: sets a write (exclusive) lock
-
F_UNLCK: unlocks a specified region
where l_len:
-
>0: size of the locked region in bytes
-
=0: lock the entire rest of the file, even if it grows.