Gender: Male
Status: In a Relationship
Age: 99
Sign: Cancer
City: Prontera City
State: Morroc
Country: MY
Signup Date: 8/26/2004
|
|
|
|
Wednesday, June 27, 2007
 |
Di telan pahit Hendak ku buang terasa sayang Engkau bermadu Hendak ku minum engkau beracun
Kau bisa hingga punah-ranah Dan mati akan diriku Kau kebumikanlah aku di sini
Kau serang hendap Cinta yang berbunga di hatiku Berikan racun pada aku yang Sedang dahaga
Cintamu membakar aku Hingga jadi abu Bergunakah aku lagi Untukmu
Dewasalah aku dalam Perjalanan hari ini Binasalah aku Andai dikudakan telunjukmu
Powered by  | | English | | Albanian | | Arabic | | Bulgarian | | Catalan | | Chinese | | Croatian | | Czech | | Danish | | Dutch | | Estonian | | Filipino | | Finnish | | French | | Galician | | German | | Greek | | Hebrew | | Hindi | | Hungarian | | Indonesian | | Italian | | Japanese | | Korean | | Latvian | | Lithuanian | | Maltese | | Norwegian | | Polish | | Portuguese | | Romanian | | Russian | | Serbian | | Slovak | | Slovenian | | Spanish | | Swedish | | Thai | | Turkish | | Ukrainian | | Vietnamese |
|
|
|
|
Thursday, August 11, 2005
 |
Ping This diagnostic command verifies connections to one or more remote hosts Here's how to use Ping: Go to Start-->Run-->type in cmd. This will bring up a C:\> prompt. Type in (without the quotes) "ping" and then hit the ENTER key. For example: ping www.trucks.com . ******************************************************
C:\>ping www.trucks.com
Pinging trucks.com 4.41.177.94 with 32 bytes of data:
Reply from 4.41.177.94: bytes=32 time=110ms TTL=245 Reply from 4.41.177.94: bytes=32 time=100ms TTL=245 Reply from 4.41.177.94: bytes=32 time=100ms TTL=245 Reply from 4.41.177.94: bytes=32 time=90ms TTL=245
C:\>
*******************************************************
The ping command verifies connections to remote host or hosts, by sending Internet Control Message Protocol (ICMP) echo packets to the host and listening for echo reply packets. The ping command waits up to 1 second for each packet sent and prints the number of packets transmitted and received. Each received packet is validated against the transmitted message. By default, four echo packets containing 64 bytes of data are transmitted. You can use the ping utility to test both the host name and IP address of the host. If the IP address is verified but the host name is not, you may have a name resolution problem.
One good use of this is determining whether or not certain servers are operational. An example:
ping www.trucks.com
If you get a reply, the server is up and name resolution is working properly. If you do not get a reply (time outs) then try this:
ping 4.41.177.94
If you get a reply, the server is up and name resolution is not working. You may need to Reboot your machine and try again. If you do not get a reply (more time outs), then the server is down or you need to reconfigure your network settings. More on that in a different section.
If at the C:> you just type ping and press enter, you will get a list of options or switches that you can use with ping. Play around with these and see what happens.
Echo
Turns the command-echoing feature on or off, or displays a message.
echo [on | off] [message]
Parameters
on | off
Specifies whether to turn the command-echoing feature on or off. To display the current echo setting, use the echo command without a parameter.
message
Specifies text you want Windows 2000 to display on the screen.
Nbtstat
This diagnostic command displays protocol statistics and current TCP/IP connections using NBT (NetBIOS over TCP/IP). This command is available only if the TCP/IP protocol has been installed.
nbtstat [-a remotename] [-A IP address] [-c] [-n] [-R] [-r] [-S] [-s] [interval]
Parameters
-a remotename
Lists the remote computer's name table using its name.
-A IP address
Lists the remote computer's name table using its IP address.
-c
Lists the contents of the NetBIOS name cache giving the IP address of each name.
-n
Lists local NetBIOS names. Registered indicates that the name is registered by broadcast (Bnode) or WINS (other node types).
-R
Reloads the Lmhosts file after purging all names from the NetBIOS name cache.
-r
Lists name resolution statistics for Windows networking name resolution. On a Windows 2000 computer configured to use WINS, this option returns the number of names resolved and registered via broadcast or via WINS.
-S
Displays both client and server sessions, listing the remote computers by IP address only.
-s
Displays both client and server sessions. It attempts to convert the remote computer IP address to a name using the Hosts file.
interval
Redisplays selected statistics, pausing interval seconds between each display. Press CTRL+C to stop redisplaying statistics. If this parameter is omitted, nbtstat prints the current configuration information once.
Tracert
This diagnostic utility determines the route taken to a destination by sending Internet Control Message Protocol (ICMP) echo packets with varying Time-To-Live (TTL) values to the destination. Each router along the path is required to decrement the TTL on a packet by at least 1 before forwarding it, so the TTL is effectively a hop count. When the TTL on a packet reaches 0, the router is supposed to send back an ICMP Time Exceeded message to the source system. Tracert determines the route by sending the first echo packet with a TTL of 1 and incrementing the TTL by 1 on each subsequent transmission until the target responds or the maximum TTL is reached. The route is determined by examining the ICMP Time Exceeded messages sent back by intermediate routers. However, some routers silently drop packets with expired TTL values and are invisible to tracert.
tracert [-d] [-h maximum_hops] [-j computer-list] [-w timeout] target_name
Parameters
-d
Specifies not to resolve addresses to computer names.
-h maximum_hops
Specifies maximum number of hops to search for target.
-j computer-list
Specifies loose source route along computer-list.
-w timeout
Waits the number of milliseconds specified by timeout for each reply.
target_name
Name of the target computer.
Ipconfig
This diagnostic command displays all current TCP/IP network configuration values. This command is of particular use on systems running , allowing users to determine which TCP/IP configuration values have been configured by DHCP.
ipconfig [/all | /renew [adapter] | /release [adapter]]
Parameters
all
Produces a full display. Without this switch, ipconfig displays only the IP address, subnet mask, and default gateway values for each network card.
/renew [adapter]
Renews DHCP configuration parameters. This option is available only on systems running the DHCP Client service. To specify an adapter name, type the adapter name that appears when you use ipconfig without parameters.
/release [adapter]
Releases the current DHCP configuration. This option disables TCP/IP on the local system and is available only on DHCP clients. To specify an adapter name, type the adapter name that appears when you use ipconfig without parameters.
With no parameters, the ipconfig utility presents all of the current TCP/IP configuration values to the user, including IP address and subnet mask. This utility is especially useful on systems running DHCP, allowing users to determine which values have been configured by DHCP.
Finger
Displays information about a user on a specified system running the Finger service. Output varies based on the remote system. This command is available only if the TCP/IP protocol has been installed.
finger [-l] [user]@computer [...]
Parameters
-l
Displays information in long list format.
user
Specifies the user you want information about. Omit the user parameter to display information about all users on the specified computer:
@computer
Arp
Displays and modifies the IP-to-Ethernet or token ring physical address translation tables used by the Address Resolution Protocol (ARP). This command is available only if the protocol has been installed.
arp -a [inet_addr] [-N [if_addr]]
arp -d inet_addr [if_addr]
arp -s inet_addr ether_addr [if_addr]
Parameters
-a
Displays current ARP entries by querying TCP/IP. If inet_addr is specified, only the IP and physical addresses for the specified computer are displayed.
-g
Identical to -a.
inet_addr
Specifies an IP address in dotted decimal notation.
-N
Displays the ARP entries for the network interface specified by if_addr.
if_addr
Specifies, if present, the IP address of the interface whose address translation table should be modified. If not present, the first applicable interface is used.
-d
Deletes the entry specified by inet_addr.
-s
Adds an entry in the ARP cache to associate the IP address inet_addr with the physical address ether_addr. The physical address is given as 6 hexadecimal bytes separated by hyphens. The IP address is specified using dotted decimal notation. The entry is permanent, that is, it is automatically removed from the cache after the time-out expires.
Powered by  | | English | | Albanian | | Arabic | | Bulgarian | | Catalan | | Chinese | | Croatian | | Czech | | Danish | | Dutch | | Estonian | | Filipino | | Finnish | | French | | Galician | | German | | Greek | | Hebrew | | Hindi | | Hungarian | | Indonesian | | Italian | | Japanese | | Korean | | Latvian | | Lithuanian | | Maltese | | Norwegian | | Polish | | Portuguese | | Romanian | | Russian | | Serbian | | Slovak | | Slovenian | | Spanish | | Swedish | | Thai | | Turkish | | Ukrainian | | Vietnamese |
|
|
|
|
Monday, July 04, 2005
 |
Recipient of the infected attachment: CFS - Club CreditInboxSubject of the message: SERVER REPORTOne or more attachments were deleted Attachment readme.bat was Deleted for the following reasons: Virus W32.Lovgate.X@mm was found.
Powered by  | | English | | Albanian | | Arabic | | Bulgarian | | Catalan | | Chinese | | Croatian | | Czech | | Danish | | Dutch | | Estonian | | Filipino | | Finnish | | French | | Galician | | German | | Greek | | Hebrew | | Hindi | | Hungarian | | Indonesian | | Italian | | Japanese | | Korean | | Latvian | | Lithuanian | | Maltese | | Norwegian | | Polish | | Portuguese | | Romanian | | Russian | | Serbian | | Slovak | | Slovenian | | Spanish | | Swedish | | Thai | | Turkish | | Ukrainian | | Vietnamese |
|
|
|
|
Friday, July 01, 2005
 |
Find and Replace text within file(s) SYNTAX awk <options> 'PROGRAM' awk <options> 'PROGRAM' Input-File1 Input-File2 ...
If no Input-File is specified then `awk' applies the PROGRAM to the "standard input", this can either be the piped output of some other command or whatever you type on the terminal. Typed input will continue until you indicate end-of-file by typing `Control-d'. KEY`-F FS'`--field-separator FS' Use FS for the input field separator (the value of the `FS' predefined variable).`-f PROGRAM-FILE'`--file PROGRAM-FILE' Read the `awk' program source from the file PROGRAM-FILE, instead of from the first command line argument.`-mf NNN'`-mr NNN' The `f' flag sets the maximum number of fields, and the `r' flag sets the maximum record size. These options are ignored by `gawk', since `gawk' has no predefined limits; they are only for compatibility with the Bell Labs research version of Unix `awk'.`-v VAR=VAL'`--assign VAR=VAL' Assign the variable VAR the value VAL before program execution begins.`-W traditional'`-W compat'`--traditional'`--compat' Use compatibility mode, in which `gawk' extensions are turned off.`-W lint'`--lint' Give warnings about dubious or non-portable `awk' constructs.`-W lint-old'`--lint-old' Warn about constructs that are not available in the original Version 7 Unix version of `awk'.`-W posix'`--posix' Use POSIX compatibility mode, in which `gawk' extensions are turned off and additional restrictions apply.`-W re-interval'`--re-interval' Allow interval expressions, in regexps.`-W source=PROGRAM-TEXT'`--source PROGRAM-TEXT' Use PROGRAM-TEXT as `awk' program source code. This option allows mixing command line source code with source code from files, and is particularly useful for mixing command line programs with library functions.`--' Signal the end of options. This is useful to allow further arguments to the `awk' program itself to start with a `-'. This is mainly for consistency with POSIX argument parsing conventions.'PROGRAM' a series of patterns and actions:
PROGRAM patterns and actions
The PROGRAM statement that tells `awk' what to do consists of a series of "rules". Each rule specifies one pattern to search for, and one action to perform when that pattern is found.
For ease of reading, each line in an `awk' program is normally a separate PROGRAM statement , like this:
PATTERN { ACTION } PATTERN { ACTION } ...However, `gawk' will ignore newlines after any of the following: , { ? : || && do elsee.g. 2 patterns each followed by an action:awk '/15/ { print $0 } /40/ { print $0 }' BBS-list
A regular expression enclosed in slashes (`/') is an `awk' pattern that matches every input record whose text belongs to that set. e.g. the pattern /foo/ matches any input record containing the three characters `foo', *anywhere* in the record.
Comments - start with a `#', and continue to the end of the line: # This program prints a nice friendly message.
`awk' patterns may be one of the following: /REGULAR EXPRESSION/ - Match =PATTERN && PATTERN - ANDPATTERN || PATTERN - OR! PATTERN - NOTPATTERN ? PATTERN : PATTERN - If, Then, ElsePATTERN1, PATTERN2 - Range Start - endBEGIN - Perform action BEFORE input file is readEND - Perform action AFTER input file is read
In addition to simple pattern matching `awk' has a huge range of text and arithmetic Functions, Variables and Operators.
For full details see the info documentation A few examples...This program prints the length of the longest input line: awk '{ if (length($0) > max) max = length($0) } END { print max }' data This program prints every line that has at least one field. Thisis an easy way to delete blank lines from a file (or rather, tocreate a new file similar to the old file but from which the blanklines have been deleted) awk 'NF > 0' dataThis program prints seven random numbers from zero to 100,inclusive. awk 'BEGIN { for (i = 1; i <= 7; i ) print int(101 * rand()) }'This program prints the total number of bytes used by FILES. ls -lg FILES | awk '{ x = $5 } ; END { print "total bytes: " x }'This program prints a sorted list of the login names of all users. awk -F: '{ print $1 }' /etc/passwd | sortThis program counts lines in a file. awk 'END { print NR }' dataThis program prints the even numbered lines in the data file. Ifyou were to use the expression `NR == 1' instead, it wouldprint the odd numbered lines. awk 'NR == 0' data
"Justice is such a fine thing that we cannot pay too dearly for it - Alain-Rene Lesage
Related Linux Bash commands:
`awk', `oawk', and `nawk' - Alternative, older and newer versions of awk egrep - egrep foo FILES ...is essentially the same as awk '/foo/' FILES ... expr - Evaluate expressions eval - Evaluate several commands/arguments for - Expand words, and execute commands grep - search file(s) for lines that match a given pattern m4 - Macro processor tr - Translate, squeeze, and/or delete characters
Equivalent Windows XP commands:
FOR - Conditionally perform a command several times SET - Display, set, or remove Windows NT environment variables
Powered by  | | English | | Albanian | | Arabic | | Bulgarian | | Catalan | | Chinese | | Croatian | | Czech | | Danish | | Dutch | | Estonian | | Filipino | | Finnish | | French | | Galician | | German | | Greek | | Hebrew | | Hindi | | Hungarian | | Indonesian | | Italian | | Japanese | | Korean | | Latvian | | Lithuanian | | Maltese | | Norwegian | | Polish | | Portuguese | | Romanian | | Russian | | Serbian | | Slovak | | Slovenian | | Spanish | | Swedish | | Thai | | Turkish | | Ukrainian | | Vietnamese |
|
|
|
|
Friday, June 24, 2005
 |
Create an alias, aliases allow a string to be substituted for a word when it is used as the first word of a simple command.
SYNTAX
alias [-p] [name[=value] ...]
unalias [-a] [name ... ]
If arguments are supplied, an alias is defined for each name whose value is given.
If no value is given, `alias' will print the current value of the alias.
Without arguments or with the `-p' option, alias prints the list of aliases on the standard output in a form that allows them to be reused as input.
`unalias' will remove each name from the list of aliases. If `-a' is supplied, all aliases are removed.
`alias' and `unalias' are BASH built-ins.
The first word of each simple command, if unquoted, is checked to see if it has an alias. If so, that word is replaced by the text of the alias. The alias name and the replacement text may contain any valid shell input, including shell metacharacters, with the exception that the alias name may not contain `='.
The first word of the replacement text is tested for aliases, but a word that is identical to an alias being expanded is not expanded a second time. This means that one may alias ls to "ls -F", for instance, and Bash does not try to recursively expand the replacement text.
If the last character of the alias value is a space or tab character, then the next command word following the alias is also checked for alias expansion.
There is no mechanism for using arguments in the replacement text, as in csh. If arguments are needed, a shell function should be used . Aliases are not expanded when the shell is not interactive, unless the expand_aliases shell option is set using shopt .
The rules concerning the definition and use of aliases are somewhat confusing. Bash always reads at least one complete line of input before executing any of the commands on that line. Aliases are expanded when a command is read, not when it is executed. Therefore, an alias definition appearing on the same line as another command does not take effect until the next line of input is read. The commands following the alias definition on that line are not affected by the new alias. This behavior is also an issue when functions are executed. Aliases are expanded when a function definition is read, not when the function is executed, because a function definition is itself a compound command. As a consequence, aliases defined in a function are not available until after that function is executed. To be safe, always put alias definitions on a separate line, and do not use alias in compound commands.
For almost every purpose, shell functions are preferred over aliases.
Examples
alias ls='ls -F'
Now issuing the command 'ls' will actually run 'ls -F'
Making an alias permanent: Use your favorite text editor to create a .bash_aliases file, and type the alias commands into the file. .bash_aliases will run at login (or you can just execute it with ..bash_aliases )
Related commands:
export - Set an environment variable env - Display, set, or remove environment variables echo - Display message on screen readonly - Mark variables/functions as readonly shift - Shift positional parameters
Equivalent Windows NT commands:
SET - Display, set, or remove Windows NT environment variables SETX - Set environment variables permanently SETLOCAL - Begin localisation of environment changes in a batch file
Powered by  | | English | | Albanian | | Arabic | | Bulgarian | | Catalan | | Chinese | | Croatian | | Czech | | Danish | | Dutch | | Estonian | | Filipino | | Finnish | | French | | Galician | | German | | Greek | | Hebrew | | Hindi | | Hungarian | | Indonesian | | Italian | | Japanese | | Korean | | Latvian | | Lithuanian | | Maltese | | Norwegian | | Polish | | Portuguese | | Romanian | | Russian | | Serbian | | Slovak | | Slovenian | | Spanish | | Swedish | | Thai | | Turkish | | Ukrainian | | Vietnamese |
|
|
|
|
Saturday, June 18, 2005
 |
Current mood:  busy
What is it?Devfs is an alternative to "real" character and block special devices on your root filesystem. Kernel device drivers can register devices by name rather than major and minor numbers. These devices will appear in devfs automatically, with whatever default ownership and protection the driver specified. A daemon (devfsd) can be used to override these defaults. Devfs has been in the kernel since 2.3.46.
NOTE that devfs is entirely optional. If you prefer the old disc-based device nodes, then simply leave CONFIG_DEVFS_FS=n (the default). In this case, nothing will change. ALSO NOTE that if you do enable devfs, the defaults are such that full compatibility is maintained with the old devices names.
There are two aspects to devfs: one is the underlying device namespace, which is a namespace just like any mounted filesystem. The other aspect is the filesystem code which provides a view of the device namespace. The reason I make a distinction is because devfs can be mounted many times, with each mount showing the same device namespace. Changes made are global to all mounted devfs filesystems. Also, because the devfs namespace exists without any devfs mounts, you can easily mount the root filesystem by referring to an entry in the devfs namespace.
The cost of devfs is a small increase in kernel code size and memory usage. About 7 pages of code (some of that in __init sections) and 72 bytes for each entry in the namespace. A modest system has only a couple of hundred device entries, so this costs a few more pages. Compare this with the suggestion to put /dev on a ramdisc. On a typical machine, the cost is under 0.2 percent. On a modest system with 64 MBytes of RAM, the cost is under 0.1 percent. The accusations of "bloatware" levelled at devfs are not justified.
Why do it?There are several problems that devfs addresses. Some of these problems are more serious than others (depending on your point of view), and some can be solved without devfs. However, the totality of these problems really calls out for devfs.
The choice is a patchwork of inefficient user space solutions, which are complex and likely to be fragile, or to use a simple and efficient devfs which is robust.
There have been many counter-proposals to devfs, all seeking to provide some of the benefits without actually implementing devfs. So far there has been an absence of code and no proposed alternative has been able to provide all the features that devfs does. Further, alternative proposals require far more complexity in user-space (and still deliver less functionality than devfs). Some people have the mantra of reducing "kernel bloat", but don't consider the effects on user-space.
A good solution limits the total complexity of kernel-space and user-space.
Major&minor allocationThe existing scheme requires the allocation of major and minor device numbers for each and every device. This means that a central co-ordinating authority is required to issue these device numbers (unless you're developing a "private" device driver), in order to preserve uniqueness. Devfs shifts the burden to a namespace. This may not seem like a huge benefit, but actually it is. Since driver authors will naturally choose a device name which reflects the functionality of the device, there is far less potential for namespace conflict. Solving this requires a kernel change.
/dev managementBecause you currently access devices through device nodes, these must be created by the system administrator. For standard devices you can usually find a MAKEDEV programme which creates all these (hundreds!) of nodes. This means that changes in the kernel must be reflected by changes in the MAKEDEV programme, or else the system administrator creates device nodes by hand.
The basic problem is that there are two separate databases of major and minor numbers. One is in the kernel and one is in /dev (or in a MAKEDEV programme, if you want to look at it that way). This is duplication of information, which is not good practice. Solving this requires a kernel change.
/dev growthA typical /dev has over 1200 nodes! Most of these devices simply don't exist because the hardware is not available. A huge /dev increases the time to access devices (I'm just referring to the dentry lookup times and the time taken to read inodes off disc: the next subsection shows some more horrors). An example of how big /dev can grow is if we consider SCSI devices: host 6 bits (say up to 64 hosts on a really big machine)channel 4 bits (say up to 16 SCSI buses per host)id 4 bitslun 3 bitspartition 6 bitsTOTAL 23 bits This requires 8 Mega (1024*1024) inodes if we want to store all possible device nodes. Even if we scrap everything but id,partition and assume a single host adapter with a single SCSI bus and only one logical unit per SCSI target (id), that's still 10 bits or 1024 inodes. Each VFS inode takes around 256 bytes (kernel 2.1.78), so that's 256 kBytes of inode storage on disc (assuming real inodes take a similar amount of space as VFS inodes). This is actually not so bad, because disc is cheap these days. Embedded systems would care about 256 kBytes of /dev inodes, but you could argue that embedded systems would have hand-tuned /dev directories. I've had to do just that on my embedded systems, but I would rather just leave it to devfs.
Another issue is the time taken to lookup an inode when first referenced. Not only does this take time in scanning through a list in memory, but also the seek times to read the inodes off disc. This could be solved in user-space using a clever programme which scanned the kernel logs and deleted /dev entries which are not available and created them when they were available. This programme would need to be run every time a new module was loaded, which would slow things down a lot.
There is an existing programme called scsidev which will automatically create device nodes for SCSI devices. It can do this by scanning files in /proc/scsi. Unfortunately, to extend this idea to other device nodes would require significant modifications to existing drivers (so they too would provide information in /proc). This is a non-trivial change (I should know: devfs has had to do something similar). Once you go to this much effort, you may as well use devfs itself (which also provides this information). Furthermore, such a system would likely be implemented in an ad-hoc fashion, as different drivers will provide their information in different ways.
Devfs is much cleaner, because it (naturally) has a uniform mechanism to provide this information: the device nodes themselves!
Node to driver file_operations translationThere is an important difference between the way disc-based character and block nodes and devfs entries make the connection between an entry in /dev and the actual device driver.
With the current 8 bit major and minor numbers the connection between disc-based c&b nodes and per-major drivers is done through a fixed-length table of 128 entries. The various filesystem types set the inode operations for c&b nodes to {chr,blk}dev_inode_operations, so when a device is opened a few quick levels of indirection bring us to the driver file_operations.
For miscellaneous character devices a second step is required: there is a scan for the driver entry with the same minor number as the file that was opened, and the appropriate minor open method is called. This scanning is done *every time* you open a device node. Potentially, you may be searching through dozens of misc. entries before you find your open method. While not an enormous performance overhead, this does seem pointless.
Linux *must* move beyond the 8 bit major and minor barrier, somehow. If we simply increase each to 16 bits, then the indexing scheme used for major driver lookup becomes untenable, because the major tables (one each for character and block devices) would need to be 64 k entries long (512 kBytes on x86, 1 MByte for 64 bit systems). So we would have to use a scheme like that used for miscellaneous character devices, which means the search time goes up linearly with the average number of major device drivers on your system. Not all "devices" are hardware, some are higher-level drivers like KGI, so you can get more "devices" without adding hardware You can improve this by creating an ordered (balanced:-) binary tree, in which case your search time becomes log(N). Alternatively, you can use hashing to speed up the search. But why do that search at all if you don't have to? Once again, it seems pointless.
Note that devfs doesn't use the major&minor system. For devfs entries, the connection is done when you lookup the /dev entry. When devfs_register() is called, an internal table is appended which has the entry name and the file_operations. If the dentry cache doesn't have the /dev entry already, this internal table is scanned to get the file_operations, and an inode is created. If the dentry cache already has the entry, there is *no lookup time* (other than the dentry scan itself, but we can't avoid that anyway, and besides Linux dentries cream other OS's which don't have them:-). Furthermore, the number of node entries in a devfs is only the number of available device entries, not the number of *conceivable* entries. Even if you remove unnecessary entries in a disc-based /dev, the number of conceivable entries remains the same: you just limit yourself in order to save space.
Devfs provides a fast connection between a VFS node and the device driver, in a scalable way.
/dev as a system administration toolRight now /dev contains a list of conceivable devices, most of which I don't have. Devfs only shows those devices available on my system. This means that listing /dev is a handy way of checking what devices are available.
Major&minor sizeExisting major and minor numbers are limited to 8 bits each. This is now a limiting factor for some drivers, particularly the SCSI disc driver, which consumes a single major number. Only 16 discs are supported, and each disc may have only 15 partitions. Maybe this isn't a problem for you, but some of us are building huge Linux systems with disc arrays. With devfs an arbitrary pointer can be associated with each device entry, which can be used to give an effective 32 bit device identifier (i.e. that's like having a 32 bit minor number). Since this is private to the kernel, there are no C library compatibility issues which you would have with increasing major and minor number sizes. See the section on "Allocation of Device Numbers" for details on maintaining compatibility with userspace.
Solving this requires a kernel change.
Since writing this, the kernel has been modified so that the SCSI disc driver has more major numbers allocated to it and now supports up to 128 discs. Since these major numbers are non-contiguous (a result of unplanned expansion), the implementation is a little more cumbersome than originally.
Just like the changes to IPv4 to fix impending limitations in the address space, people find ways around the limitations. In the long run, however, solutions like IPv6 or devfs can't be put off forever.
Read-only root filesystemHaving your device nodes on the root filesystem means that you can't operate properly with a read-only root filesystem. This is because you want to change ownerships and protections of tty devices. Existing practice prevents you using a CD-ROM as your root filesystem for a *real* system. Sure, you can boot off a CD-ROM, but you can't change tty ownerships, so it's only good for installing.
Also, you can't use a shared NFS root filesystem for a cluster of discless Linux machines (having tty ownerships changed on a common /dev is not good). Nor can you embed your root filesystem in a ROM-FS.
You can get around this by creating a RAMDISC at boot time, making an ext2 filesystem in it, mounting it somewhere and copying the contents of /dev into it, then unmounting it and mounting it over /dev.
A devfs is a cleaner way of solving this.
Non-Unix root filesystemNon-Unix filesystems (such as NTFS) can't be used for a root filesystem because they variously don't support character and block special files or symbolic links. You can't have a separate disc-based or RAMDISC-based filesystem mounted on /dev because you need device nodes before you can mount these. Devfs can be mounted without any device nodes. Devlinks won't work because symlinks aren't supported. An alternative solution is to use initrd to mount a RAMDISC initial root filesystem (which is populated with a minimal set of device nodes), and then construct a new /dev in another RAMDISC, and finally switch to your non-Unix root filesystem. This requires clever boot scripts and a fragile and conceptually complex boot procedure.
Devfs solves this in a robust and conceptually simple way.
PTY securityCurrent pseudo-tty (pty) devices are owned by root and read-writable by everyone. The user of a pty-pair cannot change ownership/protections without being suid-root. This could be solved with a secure user-space daemon which runs as root and does the actual creation of pty-pairs. Such a daemon would require modification to *every* programme that wants to use this new mechanism. It also slows down creation of pty-pairs.
An alternative is to create a new open_pty() syscall which does much the same thing as the user-space daemon. Once again, this requires modifications to pty-handling programmes.
The devfs solution allows a device driver to "tag" certain device files so that when an unopened device is opened, the ownerships are changed to the current euid and egid of the opening process, and the protections are changed to the default registered by the driver. When the device is closed ownership is set back to root and protections are set back to read-write for everybody. No programme need be changed. The devpts filesystem provides this auto-ownership feature for Unix98 ptys. It doesn't support old-style pty devices, nor does it have all the other features of devfs.
Intelligent device managementDevfs implements a simple yet powerful protocol for communication with a device management daemon (devfsd) which runs in user space. It is possible to send a message (either synchronously or asynchronously) to devfsd on any event, such as registration/unregistration of device entries, opening and closing devices, looking up inodes, scanning directories and more. This has many possibilities. Some of these are already implemented. See: http://www.atnf.csiro.au/~rgooch/linux/
Device entry registration events can be used by devfsd to change permissions of newly-created device nodes. This is one mechanism to control device permissions.
Device entry registration/unregistration events can be used to run programmes or scripts. This can be used to provide automatic mounting of filesystems when a new block device media is inserted into the drive.
Asynchronous device open and close events can be used to implement clever permissions management. For example, the default permissions on /dev/dsp do not allow everybody to read from the device. This is sensible, as you don't want some remote user recording what you say at your console. However, the console user is also prevented from recording. This behaviour is not desirable. With asynchronous device open and close events, you can have devfsd run a programme or script when console devices are opened to change the ownerships for *other* device nodes (such as /dev/dsp). On closure, you can run a different script to restore permissions. An advantage of this scheme over modifying the C library tty handling is that this works even if your programme crashes (how many times have you seen the utmp database with lingering entries for non-existent logins?).
Synchronous device open events can be used to perform intelligent device access protections. Before the device driver open() method is called, the daemon must first validate the open attempt, by running an external programme or script. This is far more flexible than access control lists, as access can be determined on the basis of other system conditions instead of just the UID and GID.
Inode lookup events can be used to authenticate module autoload requests. Instead of using kmod directly, the event is sent to devfsd which can implement an arbitrary authentication before loading the module itself.
Inode lookup events can also be used to construct arbitrary namespaces, without having to resort to populating devfs with symlinks to devices that don't exist.
Speculative Device ScanningConsider an application (like cdparanoia) that wants to find all CD-ROM devices on the system (SCSI, IDE and other types), whether or not their respective modules are loaded. The application must speculatively open certain device nodes (such as /dev/sr0 for the SCSI CD-ROMs) in order to make sure the module is loaded. This requires that all Linux distributions follow the standard device naming scheme (last time I looked RedHat did things differently). Devfs solves the naming problem.
The same application also wants to see which devices are actually available on the system. With the existing system it needs to read the /dev directory and speculatively open each /dev/sr* device to determine if the device exists or not. With a large /dev this is an inefficient operation, especially if there are many /dev/sr* nodes. A solution like scsidev could reduce the number of /dev/sr* entries (but of course that also requires all that inefficient directory scanning).
With devfs, the application can open the /dev/sr directory (which triggers the module autoloading if required), and proceed to read /dev/sr. Since only the available devices will have entries, there are no inefficencies in directory scanning or device openings.
Who else does it?FreeBSD has a devfs implementation. Solaris and AIX each have a pseudo-devfs (something akin to scsidev but for all devices, with some unspecified kernel support). BeOS, Plan9 and QNX also have it. SGI's IRIX 6.4 and above also have a device filesystem.
While we shouldn't just automatically do something because others do it, we should not ignore the work of others either. FreeBSD has a lot of competent people working on it, so their opinion should not be blithely ignored.
How it works
Registering device entriesFor every entry (device node) in a devfs-based /dev a driver must call devfs_register(). This adds the name of the device entry, the file_operations structure pointer and a few other things to an internal table. Device entries may be added and removed at any time. When a device entry is registered, it automagically appears in any mounted devfs'.
Inode lookupWhen a lookup operation on an entry is performed and if there is no driver information for that entry devfs will attempt to call devfsd. If still no driver information can be found then a negative dentry is yielded and the next stage operation will be called by the VFS (such as create() or mknod() inode methods). If driver information can be found, an inode is created (if one does not exist already) and all is well.
Manually creating device nodesThe mknod() method allows you to create an ordinary named pipe in the devfs, or you can create a character or block special inode if one does not already exist. You may wish to create a character or block special inode so that you can set permissions and ownership. Later, if a device driver registers an entry with the same name, the permissions, ownership and times are retained. This is how you can set the protections on a device even before the driver is loaded. Once you create an inode it appears in the directory listing.
Unregistering device entriesA device driver calls devfs_unregister() to unregister an entry.
Chroot() gaols
2.2.x kernelsThe semantics of inode creation are different when devfs is mounted with the "explicit" option. Now, when a device entry is registered, it will not appear until you use mknod() to create the device. It doesn't matter if you mknod() before or after the device is registered with devfs_register(). The purpose of this behaviour is to support chroot(2) gaols, where you want to mount a minimal devfs inside the gaol. Only the devices you specifically want to be available (through your mknod() setup) will be accessible.
2.4.x kernelsAs of kernel 2.3.99, the VFS has had the ability to rebind parts of the global filesystem namespace into another part of the namespace. This now works even at the leaf-node level, which means that individual files and device nodes may be bound into other parts of the namespace. This is like making links, but better, because it works across filesystems (unlike hard links) and works through chroot() gaols (unlike symbolic links).
Because of these improvements to the VFS, the multi-mount capability in devfs is no longer needed. The administrator may create a minimal device tree inside a chroot(2) gaol by using VFS bindings. As this provides most of the features of the devfs multi-mount capability, I removed the multi-mount support code (after issuing an RFC). This yielded code size reductions and simplifications.
If you want to construct a minimal chroot() gaol, the following command should suffice: mount --bind /dev/null /gaol/dev/null Repeat for other device nodes you want to expose. Simple!
Operational issues
Instructions for the impatientNobody likes reading documentation. People just want to get in there and play. So this section tells you quickly the steps you need to take to run with devfs mounted over /dev. Skip these steps and you will end up with a nearly unbootable system. Subsequent sections describe the issues in more detail, and discuss non-essential configuration options.
DevfsdOK, if you're reading this, I assume you want to play with devfs. First you should ensure that /usr/src/linux contains a recent kernel source tree. Then you need to compile devfsd, the device management daemon, available at http://www.atnf.csiro.au/~rgooch/linux/. Because the kernel has a naming scheme which is quite different from the old naming scheme, you need to install devfsd so that software and configuration files that use the old naming scheme will not break.
Compile and install devfsd. You will be provided with a default configuration file /etc/devfsd.conf which will provide compatibility symlinks for the old naming scheme. Don't change this config file unless you know what you're doing. Even if you think you do know what you're doing, don't change it until you've followed all the steps below and booted a devfs-enabled system and verified that it works.
Now edit your main system boot script so that devfsd is started at the very beginning (before any filesystem checks). /etc/rc.d/rc.sysinit is often the main boot script on systems with SysV-style boot scripts. On systems with BSD-style boot scripts it is often /etc/rc. Also check /sbin/rc.
NOTE that the line you put into the boot script should be exactly: /sbin/devfsd /dev DO NOT use some special daemon-launching programme, otherwise the boot script may not wait for devfsd to finish initialising.
System LibrariesThere may still be some problems because of broken software making assumptions about device names. In particular, some software does not handle devices which are symbolic links. If you are running a libc 5 based system, install libc 5.4.44 (if you have libc 5.4.46, go back to libc 5.4.44, which is actually correct). If you are running a glibc based system, make sure you have glibc 2.1.3 or later.
/etc/securettyPAM (Pluggable Authentication Modules) is supposed to be a flexible mechanism for providing better user authentication and access to services. Unfortunately, it's also fragile, complex and undocumented (check out RedHat 6.1, and probably other distributions as well). PAM has problems with symbolic links. Append the following lines to your /etc/securetty file: vc/1vc/2vc/3vc/4vc/5vc/6vc/7vc/8 This will not weaken security. If you have a version of util-linux earlier than 2.10.h, please upgrade to 2.10.h or later. If you absolutely cannot upgrade, then also append the following lines to your /etc/securetty file: 12345678 This may potentially weaken security by allowing root logins over the network (a password is still required, though). However, since there are problems with dealing with symlinks, I'm suspicious of the level of security offered in any case.
XFree86While not essential, it's probably a good idea to upgrade to XFree86 4.0, as patches went in to make it more devfs-friendly. If you don't, you'll probably need to apply the following patch to /etc/security/console.perms so that ordinary users can run startx. Note that not all distributions have this file (e.g. Debian), so if it's not present, don't worry about it. --- /etc/security/console.perms.orig Sat Apr 17 16:26:47 1999 /etc/security/console.perms Fri Feb 25 23:53:55 2000 @@ -14,7 14,7 @@ # man 5 console.perms # file classes -- these are regular expressions -<console>=tty[0-9][0-9]* :[0-9].[0-9] :[0-9] <console>=tty[0-9][0-9]* vc/[0-9][0-9]* :[0-9].[0-9] :[0-9] # device classes -- these are shell-style globs <floppy>=/dev/fd[0-1]* If the patch does not apply, then change the line: <console>=tty[0-9][0-9]* :[0-9].[0-9] :[0-9] with: <console>=tty[0-9][0-9]* vc/[0-9][0-9]* :[0-9].[0-9] :[0-9]
Disable devptsI've had a report of devpts mounted on /dev/pts not working correctly. Since devfs will also manage /dev/pts, there is no need to mount devpts as well. You should either edit your /etc/fstab so devpts is not mounted, or disable devpts from your kernel configuration.
Unsupported driversNot all drivers have devfs support. If you depend on one of these drivers, you will need to create a script or tarfile that you can use at boot time to create device nodes as appropriate. There is a section which describes this. Another section lists the drivers which have devfs support.
/dev/mouseMany disributions configure /dev/mouse to be the mouse device for XFree86 and GPM. I actually think this is a bad idea, because it adds another level of indirection. When looking at a config file, if you see /dev/mouse you're left wondering which mouse is being referred to. Hence I recommend putting the actual mouse device (for example /dev/psaux) into your /etc/X11/XF86Config file (and similarly for the GPM configuration file).
Alternatively, use the same technique used for unsupported drivers described above.
The KernelFinally, you need to make sure devfs is compiled into your kernel. Set CONFIG_EXPERIMENTAL=y, CONFIG_DEVFS_FS=y and CONFIG_DEVFS_MOUNT=y by using favourite configuration tool (i.e. make config or make xconfig) and then make dep; make clean and then recompile your kernel and modules. At boot, devfs will be mounted onto /dev.
If you encounter problems booting (for example if you forgot a configuration step), you can pass devfs=nomount at the kernel boot command line. This will prevent the kernel from mounting devfs at boot time onto /dev.
In general, a kernel built with CONFIG_DEVFS_FS=y but without mounting devfs onto /dev is completely safe, and requires no configuration changes. One exception to take note of is when LABEL= directives are used in /etc/fstab. In this case you will be unable to boot properly. This is because the mount(8) programme uses /proc/partitions as part of the volume label search process, and the device names it finds are not available, because setting CONFIG_DEVFS_FS=y changes the names in /proc/partitions, irrespective of whether devfs is mounted.
Now you've finished all the steps required. You're now ready to boot your shiny new kernel. Enjoy.
Changing the configurationOK, you've now booted a devfs-enabled system, and everything works. Now you may feel like changing the configuration (common targets are /etc/fstab and /etc/devfsd.conf). Since you have a system that works, if you make any changes and it doesn't work, you now know that you only have to restore your configuration files to the default and it will work again.
Permissions persistence across rebootsIf you don't use mknod(2) to create a device file, nor use chmod(2) or chown(2) to change the ownerships/permissions, the inode ctime will remain at 0 (the epoch, 12 am, 1-JAN-1970, GMT). Anything with a ctime later than this has had it's ownership/permissions changed. Hence, a simple script or programme may be used to tar up all changed inodes, prior to shutdown. Although effective, many consider this approach a kludge.
A much better approach is to use devfsd to save and restore permissions. It may be configured to record changes in permissions and will save them in a database (in fact a directory tree), and restore these upon boot. This is an efficient method and results in immediate saving of current permissions (unlike the tar approach, which saves permissions at some unspecified future time).
The default configuration file supplied with devfsd has config entries which you may uncomment to enable persistence management.
If you decide to use the tar approach anyway, be aware that tar will first unlink(2) an inode before creating a new device node. The unlink(2) has the effect of breaking the connection between a devfs entry and the device driver. If you use the "devfs=only" boot option, you lose access to the device driver, requiring you to reload the module. I consider this a bug in tar (there is no real need to unlink(2) the inode first).
Alternatively, you can use devfsd to provide more sophisticated management of device permissions. You can use devfsd to store permissions for whole groups of devices with a single configuration entry, rather than the conventional single entry per device entry.
Permissions database stored in mounted-over /devIf you wish to save and restore your device permissions into the disc-based /dev while still mounting devfs onto /dev you may do so. This requires a 2.4.x kernel (in fact, 2.3.99 or later), which has the VFS binding facility. You need to do the following to set this up:
- make sure the kernel does not mount devfs at boot time
- make sure you have a correct /dev/console entry in your root file-system (where your disc-based /dev lives)
- create the /dev-state directory
- add the following lines near the very beginning of your boot scripts:
mount --bind /dev /dev-statemount -t devfs none /devdevfsd /dev
- add the following lines to your /etc/devfsd.conf file:
REGISTER ^pt[sy] IGNORECREATE ^pt[sy] IGNORECHANGE ^pt[sy] IGNOREDELETE ^pt[sy] IGNOREREGISTER .* COPY /dev-state/$devname $devpathCREATE .* COPY $devpath /dev-state/$devnameCHANGE .* COPY $devpath /dev-state/$devnameDELETE .* CFUNCTION GLOBAL unlink /dev-state/$devnameRESTORE /dev-state Note that the sample devfsd.conf file contains these lines, as well as other sample configurations you may find useful. See the devfsd distribution
- reboot.
Permissions database stored in normal directoryIf you are using an older kernel which doesn't support VFS binding, then you won't be able to have the permissions database in a mounted-over /dev. However, you can still use a regular directory to store the database. The sample /etc/devfsd.conf file above may still be used. You will need to create the /dev-state directory prior to installing devfsd. If you have old permissions in /dev, then just copy (or move) the device nodes over to the new directory.
Which method is better?The best method is to have the permissions database stored in the mounted-over /dev. This is because you will not need to copy device nodes over to /dev-state, and because it allows you to switch between devfs and non-devfs kernels, without requiring you to copy permissions between /dev-state (for devfs) and /dev (for non-devfs).
Dealing with drivers without devfs supportCurrently, not all device drivers in the kernel have been modified to use devfs. Device drivers which do not yet have devfs support will not automagically appear in devfs. The simplest way to create device nodes for these drivers is to unpack a tarfile containing the required device nodes. You can do this in your boot scripts. All your drivers will now work as before.
Hopefully for most people devfs will have enough support so that they can mount devfs directly over /dev without losing most functionality (i.e. losing access to various devices). As of 22-JAN-1998 (devfs patch version 10) I am now running this way. All the devices I have are available in devfs, so I don't lose anything.
WARNING: if your configuration requires the old-style device names (i.e. /dev/hda1 or /dev/sda1), you must install devfsd and configure it to maintain compatibility entries. It is almost certain that you will require this. Note that the kernel creates a compatibility entry for the root device, so you don't need initrd.
Note that you no longer need to mount devpts if you use Unix98 PTYs, as devfs can manage /dev/pts itself. This saves you some RAM, as you don't need to compile and install devpts. Note that some versions of glibc have a bug with Unix98 pty handling on devfs systems. Contact the glibc maintainers for a fix. Glibc 2.1.3 has the fix.
Note also that apart from editing /etc/fstab, other things will need to be changed if you *don't* install devfsd. Some software (like the X server) hard-wire device names in their source. It really is much easier to install devfsd so that compatibility entries are created. You can then slowly migrate your system to using the new device names (for example, by starting with /etc/fstab), and then limiting the compatibility entries that devfsd creates.
IF YOU CONFIGURE TO MOUNT DEVFS AT BOOT, MAKE SURE YOU INSTALL DEVFSD BEFORE YOU BOOT A DEVFS-ENABLED KERNEL!
Now that devfs has gone into the 2.3.46 kernel, I'm getting a lot of reports back. Many of these are because people are trying to run without devfsd, and hence some things break. Please just run devfsd if things break. I want to concentrate on real bugs rather than misconfiguration problems at the moment. If people are willing to fix bugs/false assumptions in other code (i.e. glibc, X server) and submit that to the respective maintainers, that would be great.
All the way with DevfsThe devfs kernel patch creates a rationalised device tree. As stated above, if you want to keep using the old /dev naming scheme, you just need to configure devfsd appopriately (see the man page). People who prefer the old names can ignore this section. For those of us who like the rationalised names and an uncluttered /dev, read on.
If you don't run devfsd, or don't enable compatibility entry management, then you will have to configure your system to use the new names. For example, you will then need to edit your /etc/fstab to use the new disc naming scheme. If you want to be able to boot non-devfs kernels, you will need compatibility symlinks in the underlying disc-based /dev pointing back to the old-style names for when you boot a kernel without devfs.
You can selectively decide which devices you want compatibility entries for. For example, you may only want compatibility entries for BSD pseudo-terminal devices (otherwise you'll have to patch you C library or use Unix98 ptys instead). It's just a matter of putting in the correct regular expression into /dev/devfsd.conf.
There are other choices of naming schemes that you may prefer. For example, I don't use the kernel-supplied names, because they are too verbose. A common misconception is that the kernel-supplied names are meant to be used directly in configuration files. This is not the case. They are designed to reflect the layout of the devices attached and to provide easy classification.
If you like the kernel-supplied names, that's fine. If you don't then you should be using devfsd to construct a namespace more to your liking. Devfsd has built-in code to construct a namespace that is both logical and easy to manage. In essence, it creates a convenient abbreviation of the kernel-supplied namespace.
You are of course free to build your own namespace. Devfsd has all the infrastructure required to make this easy for you. All you need do is write a script. You can even write some C code and devfsd can load the shared object as a callable extension.
Other Issues
The init programmeAnother thing to take note of is whether your init programme creates a Unix socket /dev/telinit. Some versions of init create /dev/telinit so that the telinit programme can communicate with the init process. If you have such a system you need to make sure that devfs is mounted over /dev *before* init starts. In other words, you can't leave the mounting of devfs to /etc/rc, since this is executed after init. Other versions of init require a named pipe /dev/initctl which must exist *before* init starts. Once again, you need to mount devfs and then create the named pipe *before* init starts.
The default behaviour now is not to mount devfs onto /dev at boot time for 2.3.x and later kernels. You can correct this with the "devfs=mount" boot option. This solves any problems with init, and also prevents the dreaded: Cannot open initial console message. For 2.2.x kernels where you need to apply the devfs patch, the default is to mount.
If you have automatic mounting of devfs onto /dev then you may need to create /dev/initctl in your boot scripts. The following lines should suffice: mknod /dev/initctl pkill -SIGUSR1 1 # tell init that /dev/initctl now exists Alternatively, if you don't want the kernel to mount devfs onto /dev then you could use the following procedure is a guideline for how to get around /dev/initctl problems: # cd /sbin# mv init init.real# cat > init#! /bin/shmount -n -t devfs none /devmknod /dev/initctl pexec /sbin/init.real $*[control-D]# chmod a x init Note that newer versions of init create /dev/initctl automatically, so you don't have to worry about this.
Module autoloadingYou will need to configure devfsd to enable module autoloading. The following lines should be placed in your /etc/devfsd.conf file: LOOKUP .* MODLOAD As of devfsd-v1.3.10, a generic /etc/modules.devfs configuration file is installed, which is used by the MODLOAD action. This should be sufficient for most configurations. If you require further configuration, edit your /etc/modules.conf file. The way module autoloading work with devfs is:
- a process attempts to lookup a device node (e.g. /dev/fred)
- if that device node does not exist, the full pathname is passed to devfsd as a string
- devfsd will pass the string to the modprobe programme (provided the configuration line shown above is present), and specifies that /etc/modules.devfs is the configuration file
- /etc/modules.devfs includes /etc/modules.conf to access local configurations
- modprobe will search it's configuration files, looking for an alias that translates the pathname into a module name
- the translated pathname is then used to load the module.
If you wanted a lookup of /dev/fred to load the mymod module, you would require the following configuration line in /etc/modules.conf: alias /dev/fred mymod The /etc/modules.devfs configuration file provides many such aliases for standard device names. If you look closely at this file, you will note that some modules require multiple alias configuration lines. This is required to support module autoloading for old and new device names.
Mounting root off a devfs deviceIf you wish to mount root off a devfs device when you pass the "devfs=only" boot option, then you need to pass in the "root=<device>" option to the kernel when booting. If you use LILO, then you must have this in lilo.conf: append = "root=<device>" Surprised? Yep, so was I. It turns out if you have (as most people do): root = <device> then LILO will determine the device number of <device> and will write that device number into a special place in the kernel image before starting the kernel, and the kernel will use that device number to mount the root filesystem. So, using the "append" variety ensures that LILO passes the root filesystem device as a string, which devfs can then use.
Note that this isn't an issue if you don't pass "devfs=only".
TTY issuesThe ttyname(3) function in some versions of the C library makes false assumptions about device entries which are symbolic links. The tty(1) programme is one that depends on this function. I've written a patch to libc 5.4.43 which fixes this. This has been included in libc 5.4.44 and a similar fix is in glibc 2.1.3.
Kernel Naming SchemeThe kernel provides a default naming scheme. This scheme is designed to make it easy to search for specific devices or device types, and to view the available devices. Some device types (such as hard discs), have a directory of entries, making it easy to see what devices of that class are available. Often, the entries are symbolic links into a directory tree that reflects the topology of available devices. The topological tree is useful for finding how your devices are arranged.
Below is a list of the naming schemes for the most common drivers. A list of reserved device names is available for reference. Please send email to rgooch at atnf.csiro.au to obtain an allocation. Please be patient (the maintainer is busy). An alternative name may be allocated instead of the requested name, at the discretion of the maintainer.
Disc DevicesAll discs, whether SCSI, IDE or whatever, are placed under the /dev/discs hierarchy: /dev/discs/disc0 first disc /dev/discs/disc1 second disc Each of these entries is a symbolic link to the directory for that device. The device directory contains: disc for the whole disc part* for individual partitions
CD-ROM DevicesAll CD-ROMs, whether SCSI, IDE or whatever, are placed under the /dev/cdroms hierarchy: /dev/cdroms/cdrom0 first CD-ROM /dev/cdroms/cdrom1 second CD-ROM Each of these entries is a symbolic link to the real device entry for that device.
Tape DevicesAll tapes, whether SCSI, IDE or whatever, are placed under the /dev/tapes hierarchy: /dev/tapes/tape0 first tape /dev/tapes/tape1 second tape Each of these entries is a symbolic link to the directory for that device. The device directory contains: mt for mode 0 mtl for mode 1 mtm for mode 2 mta for mode 3 mtn for mode 0, no rewind mtln for mode 1, no rewind mtmn for mode 2, no rewind mtan for mode 3, no rewind
SCSI DevicesTo uniquely identify any SCSI device requires the following information: controller (host adapter) bus (SCSI channel) target (SCSI ID) unit (Logical Unit Number) All SCSI devices are placed under /dev/scsi (assuming devfs is mounted on /dev). Hence, a SCSI device with the following parameters: c=1,b=2,t=3,u=4 would appear as: /dev/scsi/host1/bus2/target3/lun4 device directory Inside this directory, a number of device entries may be created, depending on which SCSI device-type drivers were installed.
See the section on the disc naming scheme to see what entries the SCSI disc driver creates.
See the section on the tape naming scheme to see what entries the SCSI tape driver creates.
The SCSI CD-ROM driver creates: cd The SCSI generic driver creates: generic
IDE DevicesTo uniquely identify any IDE device requires the following information: controller bus (aka. primary/secondary) target (aka. master/slave) unit All IDE devices are placed under /dev/ide, and uses a similar naming scheme to the SCSI subsystem.
XT Hard DiscsAll XT discs are placed under /dev/xd. The first XT disc has the directory /dev/xd/disc0.
TTY devicesThe tty devices now appear as: New name Old-name Device Type -------- -------- ----------- /dev/tts/{0,1,...} /dev/ttyS{0,1,...} Serial ports /dev/cua/{0,1,...} /dev/cua{0,1,...} Call out devices /dev/vc/0 /dev/tty Current virtual console /dev/vc/{1,2,...} /dev/tty{1...63} Virtual consoles /dev/vcc/{0,1,...} /dev/vcs{1...63} Virtual consoles /dev/pty/m{0,1,...} /dev/ptyp?? PTY masters /dev/pty/s{0,1,...} /dev/ttyp?? PTY slaves
RAMDISCSThe RAMDISCS are placed in their own directory, and are named thus: /dev/rd/{0,1,2,...}
Meta DevicesThe meta devices are placed in their own directory, and are named thus: /dev/md/{0,1,2,...}
Floppy discsFloppy discs are placed in the /dev/floppy directory.
Loop devicesLoop devices are placed in the /dev/loop directory.
Sound devicesSound devices are placed in the /dev/sound directory (audio, sequencer, ...).
Devfsd Naming SchemeDevfsd provides a naming scheme which is a convenient abbreviation of the kernel-supplied namespace. In some cases, the kernel-supplied naming scheme is quite convenient, so devfsd does not provide another naming scheme. The convenience names that devfsd creates are in fact the same names as the original devfs kernel patch created (before Linus mandated the Big Name Change). These are referred to as "new compatibility entries".
In order to configure devfsd to create these convenience names, the following lines should be placed in your /etc/devfsd.conf: REGISTER .* MKNEWCOMPATUNREGISTER .* RMNEWCOMPAT This will cause devfsd to create (and destroy) symbolic links which point to the kernel-supplied names.
SCSI Hard DiscsAll SCSI discs are placed under /dev/sd (assuming devfs is mounted on /dev). Hence, a SCSI disc with the following parameters: c=1,b=2,t=3,u=4 would appear as: /dev/sd/c1b2t3u4 for the whole disc /dev/sd/c1b2t3u4p5 for the 5th partition /dev/sd/c1b2t3u4p5s6 for the 6th slice in the 5th partition
SCSI TapesAll SCSI tapes are placed under /dev/st. A similar naming scheme is used as for SCSI discs. A SCSI tape with the parameters:c=1,b=2,t=3,u=4 would appear as: /dev/st/c1b2t3u4m0 for mode 0 /dev/st/c1b2t3u4m1 for mode 1 /dev/st/c1b2t3u4m2 for mode 2 /dev/st/c1b2t3u4m3 for mode 3 /dev/st/c1b2t3u4m0n for mode 0, no rewind /dev/st/c1b2t3u4m1n for mode 1, no rewind /dev/st/c1b2t3u4m2n for mode 2, no rewind /dev/st/c1b2t3u4m3n for mode 3, no rewind
SCSI CD-ROMsAll SCSI CD-ROMs are placed under /dev/sr. A similar naming scheme is used as for SCSI discs. A SCSI CD-ROM with the parameters:c=1,b=2,t=3,u=4 would appear as: /dev/sr/c1b2t3u4
SCSI Generic DevicesThe generic (aka. raw) interface for all SCSI devices are placed under /dev/sg. A similar naming scheme is used as for SCSI discs. A SCSI generic device with the parameters:c=1,b=2,t=3,u=4 would appear as: /dev/sg/c1b2t3u4
IDE Hard DiscsAll IDE discs are placed under /dev/ide/hd, using a similar convention to SCSI discs. The following mappings exist between the new and the old names: /dev/hda /dev/ide/hd/c0b0t0u0 /dev/hdb /dev/ide/hd/c0b0t1u0 /dev/hdc /dev/ide/hd/c0b1t0u0 /dev/hdd /dev/ide/hd/c0b1t1u0
IDE TapesA similar naming scheme is used as for IDE discs. The entries will appear in the /dev/ide/mt directory.
IDE CD-ROMA similar naming scheme is used as for IDE discs. The entries will appear in the /dev/ide/cd directory.
IDE FloppiesA similar naming scheme is used as for IDE discs. The entries will appear in the /dev/ide/fd directory.
XT Hard DiscsAll XT discs are placed under /dev/xd. The first XT disc would appear as /dev/xd/c0t0.
Old Compatibility NamesThe old compatibility names are the legacy device names, such as /dev/hda, /dev/sda, /dev/rtc and so on. Devfsd can be configured to create compatibility symlinks so that you may continue to use the old names in your configuration files and so that old applications will continue to function correctly.
In order to configure devfsd to create these legacy names, the following lines should be placed in your /etc/devfsd.conf: REGISTER .* MKOLDCOMPATUNREGISTER .* RMOLDCOMPAT This will cause devfsd to create (and destroy) symbolic links which point to the kernel-supplied names.
SCSI Host Probing IssuesDevfs allows you to identify SCSI discs based in part on SCSI host numbers. If you have only one SCSI host (card) in your computer, then clearly it will be given host number 0. Life is not always that easy is you have multiple SCSI hosts. Unfortunately, it can sometimes be difficult to guess what the probing order of SCSI hosts is. You need to know the probe order before you can use device names. To make this easy, there is a kernel boot parameter called "scsihosts". This allows you to specify the probe order for different types of SCSI hosts. The syntax of this parameter is: scsihosts=<name_1>:<name_2>:<name_3>:...:<name_n> where <name_1>,<name_2>,...,<name_n> are the names of drivers used in the /proc filesystem. For example: scsihosts=aha1542:ppa:aha1542::ncr53c7xx means that devices connected to - first aha1542 controller - will be /dev/scsi/host0/bus#/target#/lun#- first parallel port ZIP - will be /dev/scsi/host1/bus#/target#/lun#- second aha1542 controller - will be /dev/scsi/host2/bus#/target#/lun#- first NCR53C7xx controller - will be /dev/scsi/host4/bus#/target#/lun#- any extra controller - will be /dev/scsi/host5/bus#/target#/lun#, /dev/scsi/host6/bus#/target#/lun#, etc- if any of above controllers will not be found - the reserved names will not be used by any other device.- /dev/scsi/host3/bus#/target#/lun# names will never be used You can use ',' instead of ':' as the separator character if you wish. I have used the devfsd naming scheme here.
Note that this scheme does not address the SCSI host order if you have multiple cards of the same type (such as NCR53c8xx). In this case you need to use the driver-specific boot parameters to control this.
Device drivers currently ported- All miscellaneous character devices support devfs (this is done transparently through misc_register())- SCSI discs and generic hard discs- Character memory devices (null, zero, full and so on) Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>- Loop devices (/dev/loop?) - TTY devices (console, serial ports, terminals and pseudo-terminals) Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>- SCSI tapes (/dev/scsi and /dev/tapes)- SCSI CD-ROMs (/dev/scsi and /dev/cdroms)- SCSI generic devices (/dev/scsi)- RAMDISCS (/dev/ram?)- Meta Devices (/dev/md*)- Floppy discs (/dev/floppy)- Parallel port printers (/dev/printers)- Sound devices (/dev/sound) Thanks to Eric Dumas <dumas@linux.eu.org> and C. Scott Ananian <cananian@alumni.princeton.edu>- Joysticks (/dev/joysticks)- Sparc keyboard (/dev/kbd)- DSP56001 digital signal processor (/dev/dsp56k)- Apple Desktop Bus (/dev/adb)- Coda network file system (/dev/cfs*)- Virtual console capture devices (/dev/vcc) Thanks to Dennis Hou <smilax@mindmeld.yi.org>- Frame buffer devices (/dev/fb)- Video capture devices (/dev/v4l)
Allocation of Device NumbersDevfs allows you to write a driver which doesn't need to allocate a device number (major&minor numbers) for the internal operation of the kernel. However, there are a number of userspace programmes that use the device number as a unique handle for a device. An example is the find programme, which uses device numbers to determine whether an inode is on a different filesystem than another inode. The device number used is the one for the block device which a filesystem is using. To preserve compatibility with userspace programmes, block devices using devfs need to have unique device numbers allocated to them. Furthermore, POSIX specifies device numbers, so some kind of device number needs to be presented to userspace.
The simplest option (especially when porting drivers to devfs) is to keep using the old major and minor numbers. Devfs will take whatever values are given for major&minor and pass them onto userspace.
Alternatively, you can have devfs choose unique device numbers for you. When you register a character or block device using devfs_register you can provide the optional DEVFS_FL_AUTO_DEVNUM flag, which will then automatically allocate a unique device number (the allocation is separated for the character and block devices). This device number is a 16 bit number, so this leaves plenty of space for large numbers of discs and partitions. This scheme can also be used for character devices, in particular the tty devices, which are currently limited to 256 pseudo-ttys (this limits the total number of simultaneous xterms and remote logins). Note that the device number is limited to the range 36864-61439 (majors 144-239), in order to avoid any possible conflicts with existing official allocations.
Please note that using dynamically allocated block device numbers may break the NFS daemons (both user and kernel mode), which expect dev_t for a given device to be constant over the lifetime of remote mounts.
A final note on this scheme: since it doesn't increase the size of device numbers, there are no compatibility issues with userspace.
Questions and Answers
Making things workHere are some common questions and answers.
Devfsd doesn't start
- Make sure you have compiled and installed devfsd
- Make sure devfsd is being started from your boot scripts
- Make sure you have configured your kernel to enable devfs (see below)
- Make sure devfs is mounted (see below)
Devfsd is not managing all my permissions
Make sure you are capturing the appropriate events. For example, device entries created by the kernel generate REGISTER events, but those created by devfsd generate CREATE events. Devfsd is not capturing all REGISTER events
See the previous entry: you may need to capture CREATE events. X will not start
Why don't my network devices appear in devfs?
This is not a bug. Network devices have their own, completely separate namespace. They are accessed via socket(2) and setsockopt(2) calls, and thus require no device nodes. I have raised the possibilty of moving network devices into the device namespace, but have had no response. How can I test if I have devfs compiled into my kernel?
All filesystems built-in or currently loaded are listed in /proc/filesystems. If you see a devfs entry, then you know that devfs was compiled into your kernel. If you have correctly configured and rebuilt your kernel, then devfs will be built-in. If you think you've configured it in, but /proc/filesystems doesn't show it, you've made a mistake. Common mistakes include:
- Using a 2.2.x kernel without applying the devfs patch (if you don't know how to patch your kernel, use 2.4.x instead, don't bother asking me how to patch)
- Forgetting to set CONFIG_EXPERIMENTAL=y
- Forgetting to set CONFIG_DEVFS_FS=y
- Forgetting to set CONFIG_DEVFS_MOUNT=y (if you want devfs to be automatically mounted at boot)
- Editing your .config manually, instead of using make config or make xconfig
- Forgetting to run make dep; make clean after changing the configuration and before compiling
- Forgetting to compile your kernel and modules
- Forgetting to install your kernel
- Forgetting to install your modules
Please check twice that you've done all these steps before sending in a bug report. How can I test if devfs is mounted on /dev?
The device filesystem will always create an entry called ".devfsd", which is used to communicate with the daemon. Even if the daemon is not running, this entry will exist. Testing for the existence of this entry is the approved method of determining if devfs is mounted or not. Note that the type of entry (i.e. regular file, character device, named pipe, etc.) may change without notice. Only the existence of the entry should be relied upon. When I start devfsd, I see the error: Error opening file: ".devfsd" No such file or directory?
This means that devfs is not mounted. Make sure you have devfs mounted. How do I mount devfs?
First make sure you have devfs compiled into your kernel (see above). Then you will either need to:
- set CONFIG_DEVFS_MOUNT=y in your kernel config
- pass devfs=mount to your boot loader
- mount devfs manually in your boot scripts with: mount -t none devfs /dev
Mount by volume LABEL=<label> doesn't work with devfs
Most probably you are not mounting devfs onto /dev. What happens is that if your kernel config has CONFIG_DEVFS_FS=y then the contents of /proc/partitions will have the devfs names (such as scsi/host0/bus0/target0/lun0/part1). The contents of /proc/partitions are used by mount(8) when mounting by volume label. If devfs is not mounted on /dev, then mount(8) will fail to find devices. The solution is to make sure that devfs is mounted on /dev. See above for how to do that. I have extra or incorrect entries in /dev
You may have stale entries in your dev-state area. Check for a RESTORE configuration line in your devfsd configuration (typically /etc/devfsd.conf). If you have this line, check the contents of the specified directory for stale entries. Remove any entries which are incorrect, then reboot. I get "Unable to open initial console" messages at boot
This usually happens when you don't have devfs automounted onto /dev at boot time, and there is no valid /dev/console entry on your root file-system. Create a valid /dev/console device node.
Alternatives to devfsI've attempted to collate all the anti-devfs proposals and explain their limitations. Under construction.
Why not just pass device create/remove events to a daemon?Here the suggestion is to develop an API in the kernel so that devices can register create and remove events, and a daemon listens for those events. The daemon would then populate/depopulate /dev (which resides on disc).
This has several limitations:
- it only works for modules loaded and unloaded (or devices inserted and removed) after the kernel has finished booting. Without a database of events, there is no way the daemon could fully populate /dev
- if you add a database to this scheme, the question is then how to present that database to user-space. If you make it a list of strings with embedded event codes which are passed through a pipe to the daemon, then this is only of use to the daemon. I would argue that the natural way to present this data is via a filesystem (since many of the events will be of a hierarchical nature), such as devfs. Presenting the data as a filesystem makes it easy for the user to see what is available and also makes it easy to write scripts to scan the "database"
- the tight binding between device nodes and drivers is no longer possible (requiring the otherwise perfectly avoidable table lookups)
- you cannot catch inode lookup events on /dev which means that module autoloading requires device nodes to be created. This is a problem, particularly for drivers where only a few inodes are created from a potentially large set
- this technique can't be used when the root FS is mounted read-only
Just implement a better scsidevThis suggestion involves taking the scsidev programme and extending it to scan for all devices, not just SCSI devices. The scsidev programme works by scanning /proc/scsi
Problems:
- the kernel does not currently provide a list of all devices available. Not all drivers register entries in /proc or generate kernel messages
- there is no uniform mechanism to register devices other than the devfs API
- implementing such an API is then the same as the proposal above
Put /dev on a ramdiscThis suggestion involves creating a ramdisc and populating it with device nodes and then mounting it over /dev.
Problems:
- this doesn't help when mounting the root filesystem, since you still need a device node to do that
- if you want to use this technique for the root device node as well, you need to use initrd. This complicates the booting sequence and makes it significantly harder to administer and configure. The initrd is essentially opaque, robbing the system administrator of easy configuration
- insufficient information is available to correctly populate the ramdisc. So we come back to the proposal above to "solve" this
- a ramdisc-based solution would take more kernel memory, since the backing store would be (at best) normal VFS inodes and dentries, which take 284 bytes and 112 bytes, respectively, for each entry. Compare that to 72 bytes for devfs
Do nothing: there's no problemSometimes people can be heard to claim that the existing scheme is fine. This is what they're ignoring:
- device number size (8 bits each for major and minor) is a real limitation, and must be fixed somehow. Systems with large numbers of SCSI devices, for example, will continue to consume the remaining unallocated major numbers. USB will also need to push beyond the 8 bit minor limitation
- simply increasing the device number size is insufficient. Apart from causing a lot of pain, it doesn't solve the management issues of a /dev with thousands or more device nodes
- ignoring the problem of a huge /dev will not make it go away, and dismisses the legitimacy of a large number of people who want a dynamic /dev
- the standard response then becomes: "write a device management daemon", which brings us back to the proposal above
What I don't like about devfsHere are some common complaints about devfs, and some suggestions and solutions that may make it more palatable for you. I can't please everybody, but I do try :-)
I hate the naming schemeFirst, remember that no naming scheme will please everybody. You hate the scheme, others love it. Who's to say who's right and who's wrong? Ultimately, the person who writes the code gets to choose, and what exists now is a combination of the choices made by the devfs author and the kernel maintainer (Linus).
However, not all is lost. If you want to create your own naming scheme, it is a simple matter to write a standalone script, hack devfsd, or write a script called by devfsd. You can create whatever naming scheme you like.
Further, if you want to remove all traces of the devfs naming scheme from /dev, you can mount devfs elsewhere (say /devfs) and populate /dev with links into /devfs. This population can be automated using devfsd if you wish. You can even use the VFS binding facility to make the links, rather than using symbolic links. This way, you don't even have to see the "destination" of these symbolic links.
Devfs puts policy into the kernelThere's already policy in the kernel. Device numbers are in fact policy (why should the kernel dictate what device numbers I use?). Face it, some policy has to be in the kernel. The real difference between device names as policy and device numbers as policy is that no one will use device numbers directly, because device numbers are devoid of meaning to humans and are ugly. At least with the devfs device names, (even though you can add your own naming scheme) some people will use the devfs-supplied names directly. This offends some people :-)
Devfs is bloatwareThis is not even remotely true. As shown above, both code and data size are quite modest.
How to report bugsIf you have (or think you have) a bug with devfs, please follow the steps below:
- make sure you have enabled debugging output when configuring your kernel. You will need to set (at least) the following config options:
- CONFIG_DEVFS_DEBUG=y
- CONFIG_DEBUG_KERNEL=y
- CONFIG_DEBUG_SLAB=y
- please make sure you have the latest devfs patches applied. The latest kernel version might not have the latest devfs patches applied yet (Linus is very busy)
- save a copy of your complete kernel logs (preferably by using the dmesg programme) for later inclusion in your bug report. You may need to use the -s switch to increase the internal buffer size so you can capture all the boot messages. Don't edit or trim the dmesg output
- try booting with devfs=dall passed to the kernel boot command line (read the documentation on your bootloader on how to do this), and save the result to a file. This may be quite verbose, and it may overflow the messages buffer, but try to get as much of it as you can
- if you get an Oops, run ksymoops to decode it so that the names of the offending functions are provided. A non-decoded Oops is pretty useless
- send a copy of your devfsd configuration file(s)
- send the bug report to me first. Don't expect that I will see it if you post it to the linux-kernel mailing list. Include all the information listed above, plus anything else that you think might be relevant. Put the string devfs somewhere in the subject line, so my mail filters mark it as urgent
Here is a general guide on how to ask questions in a way that greatly improves your chances of getting a reply: http://www.tuxedo.org/~esr/faqs/smart-questions.html. If you have a bug to report, you should also read http://www.chiark.greenend.org.uk/~sgtatham/bugs.html.
Strange kernel messagesYou may see devfs-related messages in your kernel logs. Below are some messages and what they mean (and what you should do about them, if anything).
- devfs_register(fred): could not append to parent, err: -17
You need to check what the error code means, but usually 17 means EEXIST. This means that a driver attempted to create an entry fred in a directory, but there already was an entry with that name. This is often caused by flawed boot scripts which untar a bunch of inodes into /dev, as a way to restore permissions. This message is harmless, as the device nodes will still provide access to the driver (unless you use the devfs=only boot option, which is only for dedicated souls:-). If you want to get rid of these annoying messages, upgrade to devfsd-v1.3.20 and use the recommended RESTORE directive to restore permissions.
- devfs_mk_dir(bill): using old entry in dir: c1808724 ""
This is similar to the message above, except that a driver attempted to create a directory named bill, and the parent directory has an entry with the same name. In this case, to ensure that drivers continue to work properly, the old entry is re-used and given to the driver. In 2.5 kernels, the driver is given a NULL entry, and thus, under rare circumstances, may not create the require device nodes. The solution is the same as above.
Compilation problems with devfsdUsually, you can compile devfsd just by typing in make in the source directory, followed by a make install (as root). Sometimes, you may have problems, particularly on broken configurations.
- error messages relating to DEVFSD_NOTIFY_DELETE
This happened because you have an ancient set of kernel headers installed in /usr/include/linux or /usr/src/linux. Install kernel 2.4.10 or later. You may need to pass the KERNEL_DIR variable to make (if you did not install the new kernel sources as /usr/src/linux), or you may copy the devfs_fs.h file in the kernel source tree into /usr/include/linux.
Other resources
Powered by  | | English | | Albanian | | Arabic | | Bulgarian | | Catalan | | Chinese | | Croatian | | Czech | | Danish | | Dutch | | Estonian | | Filipino | | Finnish | | French | | Galician | | German | | Greek | | Hebrew | | Hindi | | Hungarian | | Indonesian | | Italian | | Japanese | | Korean | | Latvian | | Lithuanian | | Maltese | | Norwegian | | Polish | | Portuguese | | Romanian | | Russian | | Serbian | | Slovak | | Slovenian | | Spanish | | Swedish | | Thai | | Turkish | | Ukrainian | | Vietnamese |
|
|
|
|
Saturday, August 28, 2004
 |
Network Working Group
Request for Comments: 2396
Updates: 1808, 1738
Category: Standards Track
Uniform Resource Identifiers (URI): Generic Syntax
Status of this Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
IESG Note
This paper describes a "superset" of operations that can be applied
to URI. It consists of both a grammar and a description of basic
functionality for URI. To understand what is a valid URI, both the
grammar and the associated description have to be studied. Some of
the functionality described is not applicable to all URI schemes, and
some operations are only possible when certain media types are
retrieved using the URI, regardless of the scheme used.
Abstract
A Uniform Resource Identifier (URI) is a compact string of characters
for identifying an abstract or physical resource. This document
defines the generic syntax of URI, including both absolute and
relative forms, and guidelines for their use; it revises and replaces
the generic definitions in RFC 1738 and RFC 1808.
This document defines a grammar that is a superset of all valid URI,
such that an implementation can parse the common components of a URI
reference without knowing the scheme-specific requirements of every
possible identifier type. This document does not define a generative
grammar for URI; that task will be performed by the individual
specifications of each URI scheme.
1. Introduction
Uniform Resource Identifiers (URI) provide a simple and extensible
means for identifying a resource. This specification of URI syntax
and semantics is derived from concepts introduced by the World Wide
Web global information initiative, whose use of such objects dates
from 1990 and is described in "Universal Resource Identifiers in WWW"
[RFC1630]. The specification of URI is designed to meet the
recommendations laid out in "Functional Recommendations for Internet
Resource Locators" [RFC1736] and "Functional Requirements for Uniform
Resource Names" [RFC1737].
This document updates and merges "Uniform Resource Locators"
[RFC1738] and "Relative Uniform Resource Locators" [RFC1808] in order
to define a single, generic syntax for all URI. It excludes those
portions of RFC 1738 that defined the specific syntax of individual
URL schemes; those portions will be updated as separate documents, as
will the process for registration of new URI schemes. This document
does not discuss the issues and recommendation for dealing with
characters outside of the US-ASCII character set [ASCII]; those
recommendations are discussed in a separate document.
All significant changes from the prior RFCs are noted in Appendix G.
1.1 Overview of URI
URI are characterized by the following definitions:
Uniform
Uniformity provides several benefits: it allows different types
of resource identifiers to be used in the same context, even
when the mechanisms used to access those resources may differ;
it allows uniform semantic interpretation of common syntactic
conventions across different types of resource identifiers; it
allows introduction of new types of resource identifiers
without interfering with the way that existing identifiers are
used; and, it allows the identifiers to be reused in many
different contexts, thus permitting new applications or
protocols to leverage a pre-existing, large, and widely-used
set of resource identifiers.
Resource
A resource can be anything that has identity. Familiar
examples include an electronic document, an image, a service
(e.g., "today's weather report for Los Angeles"), and a
collection of other resources. Not all resources are network
"retrievable"; e.g., human beings, corporations, and bound
books in a library can also be considered resources.
The resource is the conceptual mapping to an entity or set of
entities, not necessarily the entity which corresponds to that
mapping at any particular instance in time. Thus, a resource
can remain constant even when its content---the entities to
which it currently corresponds---changes over time, provided
that the conceptual mapping is not changed in the process.
Identifier
An identifier is an object that can act as a reference to
something that has identity. In the case of URI, the object is
a sequence of characters with a restricted syntax.
Having identified a resource, a system may perform a variety of
operations on the resource, as might be characterized by such words
as `access', `update', `replace', or `find attributes'.
1.2. URI, URL, and URN
A URI can be further classified as a locator, a name, or both. The
term "Uniform Resource Locator" (URL) refers to the subset of URI
that identify resources via a representation of their primary access
mechanism (e.g., their network "location"), rather than identifying
the resource by name or by some other attribute(s) of that resource.
The term "Uniform Resource Name" (URN) refers to the subset of URI
that are required to remain globally unique and persistent even when
the resource ceases to exist or becomes unavailable.
The URI scheme (Section 3.1) defines the namespace of the URI, and
thus may further restrict the syntax and semantics of identifiers
using that scheme. This specification defines those elements of the
URI syntax that are either required of all URI schemes or are common
to many URI schemes. It thus defines the syntax and semantics that
are needed to implement a scheme-independent parsing mechanism for
URI references, such that the scheme-dependent handling of a URI can
be postponed until the scheme-dependent semantics are needed. We use
the term URL below when describing syntax or semantics that only
apply to locators.
Although many URL schemes are named after protocols, this does not
imply that the only way to access the URL's resource is via the named
protocol. Gateways, proxies, caches, and name resolution services
might be used to access some resources, independent of the protocol
of their origin, and the resolution of some URL may require the use
of more than one protocol (e.g., both DNS and HTTP are typically used
to access an "http" URL's resource when it can't be found in a local
cache).
A URN differs from a URL in that it's primary purpose is persistent
labeling of a resource with an identifier. That identifier is drawn
from one of a set of defined namespaces, each of which has its own
set name structure and assignment procedures. The "urn" scheme has
been reserved to establish the requirements for a standardized URN
namespace, as defined in "URN Syntax" [RFC2141] and its related
specifications.
Most of the examples in this specification demonstrate URL, since
they allow the most varied use of the syntax and often have a
hierarchical namespace. A parser of the URI syntax is capable of
parsing both URL and URN references as a generic URI; once the scheme
is determined, the scheme-specific parsing can be performed on the
generic URI components. In other words, the URI syntax is a superset
of the syntax of all URI schemes.
1.3. Example URI
The following examples illustrate URI that are in common use.
ftp://ftp.is.co.za/rfc/rfc1808.txt
-- ftp scheme for File Transfer Protocol services
gopher://spinaltap.micro.umn.edu/00/Weather/California/Los Angeles
-- gopher scheme for Gopher and Gopher Protocol services
http://www.math.uio.no/faq/compression-faq/part1.html
-- http scheme for Hypertext Transfer Protocol services
mailto:mduerst@ifi.unizh.ch
-- mailto scheme for electronic mail addresses
news:comp.infosystems.www.servers.unix
-- news scheme for USENET news groups and articles
telnet://melvyl.ucop.edu/
-- telnet scheme for interactive services via the TELNET Protocol
1.4. Hierarchical URI and Relative Forms
An absolute identifier refers to a resource independent of the
context in which the identifier is used. In contrast, a relative
identifier refers to a resource by describing the difference within a
hierarchical namespace between the current context and an absolute
identifier of the resource.
Some URI schemes support a hierarchical naming system, where the
hierarchy of the name is denoted by a "/" delimiter separating the
components in the scheme. This document defines a scheme-independent
`relative' form of URI reference that can be used in conjunction with
a `base' URI (of a hierarchical scheme) to produce another URI. The
syntax of hierarchical URI is described in Section 3; the relative
URI calculation is described in Section 5.
1.5. URI Transcribability
The URI syntax was designed with global transcribability as one of
its main concerns. A URI is a sequence of characters from a very
limited set, i.e. the letters of the basic Latin alphabet, digits,
and a few special characters. A URI may be represented in a variety
of ways: e.g., ink on paper, pixels on a screen, or a sequence of
octets in a coded character set. The interpretation of a URI depends
only on the characters used and not how those characters are
represented in a network protocol.
The goal of transcribability can be described by a simple scenario.
Imagine two colleagues, Sam and Kim, sitting in a pub at an
international conference and exchanging research ideas. Sam asks Kim
for a location to get more information, so Kim writes the URI for the
research site on a napkin. Upon returning home, Sam takes out the
napkin and types the URI into a computer, which then retrieves the
information to which Kim referred.
There are several design concerns revealed by the scenario:
o A URI is a sequence of characters, which is not always
represented as a sequence of octets.
o A URI may be transcribed from a non-network source, and thus
should consist of characters that are most likely to be able to
be typed into a computer, within the constraints imposed by
keyboards (and related input devices) across languages and
locales.
o A URI often needs to be remembered by people, and it is easier
for people to remember a URI when it consists of meaningful
components.
These design concerns are not always in alignment. For example, it
is often the case that the most meaningful name for a URI component
would require characters that cannot be typed into some systems. The
ability to transcribe the resource identifier from one medium to
another was considered more important than having its URI consist of
the most meaningful of components. In local and regional contexts
and with improving technology, users might benefit from being able to
use a wider range of characters; such use is not defined in this
document.
1.6. Syntax Notation and Common Elements
This document uses two conventions to describe and define the syntax
for URI. The first, called the layout form, is a general description
of the order of components and component separators, as in
/;?
The component names are enclosed in angle-brackets and any characters
outside angle-brackets are literal separators. Whitespace should be
ignored. These descriptions are used informally and do not define
the syntax requirements.
The second convention is a BNF-like grammar, used to define the
formal URI syntax. The grammar is that of [RFC822], except that "|"
is used to designate alternatives. Briefly, rules are separated from
definitions by an equal "=", indentation is used to continue a rule
definition over more than one line, literals are quoted with "",
parentheses "(" and ")" are used to group elements, optional elements
are enclosed in "[" and "]" brackets, and elements may be preceded
with * to designate n or more repetitions of the following
element; n defaults to 0.
Unlike many specifications that use a BNF-like grammar to define the
bytes (octets) allowed by a protocol, the URI grammar is defined in
terms of characters. Each literal in the grammar corresponds to the
character it represents, rather than to the octet encoding of that
character in any particular coded character set. How a URI is
represented in terms of bits and bytes on the wire is dependent upon
the character encoding of the protocol used to transport it, or the
charset of the document which contains it.
The following definitions are common to many elements:
alpha = lowalpha | upalpha
lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
"j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
"s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
"J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
"S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
"8" | "9"
alphanum = alpha | digit
The complete URI syntax is collected in Appendix A.
2. URI Characters and Escape Sequences
URI consist of a restricted set of characters, primarily chosen to
aid transcribability and usability both in computer systems and in
non-computer communications. Characters used conventionally as
delimiters around URI were excluded. The restricted set of
characters consists of digits, letters, and a few graphic symbols
were chosen from those common to most of the character encodings and
input facilities available to Internet users.
uric = reserved | unreserved | escaped
Within a URI, characters are either used as delimiters, or to
represent strings of data (octets) within the delimited portions.
Octets are either represented directly by a character (using the US-
ASCII character for that octet [ASCII]) or by an escape encoding.
This representation is elaborated below.
2.1 URI and non-ASCII characters
The relationship between URI and characters has been a source of
confusion for characters that are not part of US-ASCII. To describe
the relationship, it is useful to distinguish between a "character"
(as a distinguishable semantic entity) and an "octet" (an 8-bit
byte). There are two mappings, one from URI characters to octets, and
a second from octets to original characters:
URI character sequence->octet sequence->original character sequence
A URI is represented as a sequence of characters, not as a sequence
of octets. That is because URI might be "transported" by means that
are not through a computer network, e.g., printed on paper, read over
the radio, etc.
A URI scheme may define a mapping from URI characters to octets;
whether this is done depends on the scheme. Commonly, within a
delimited component of a URI, a sequence of characters may be used to
represent a sequence of octets. For example, the character "a"
represents the octet 97 (decimal), while the character sequence ",
"0", "a" represents the octet 10 (decimal).
There is a second translation for some resources: the sequence of
octets defined by a component of the URI is subsequently used to
represent a sequence of characters. A 'charset' defines this mapping.
There are many charsets in use in Internet protocols. For example,
UTF-8 [UTF-8] defines a mapping from sequences of octets to sequences
of characters in the repertoire of ISO 10646.
In the simplest case, the original character sequence contains only
characters that are defined in US-ASCII, and the two levels of
mapping are simple and easily invertible: each 'original character'
is represented as the octet for the US-ASCII code for it, which is,
in turn, represented as either the US-ASCII character, or else the
" escape sequence for that octet.
For original character sequences that contain non-ASCII characters,
however, the situation is more difficult. Internet protocols that
transmit octet sequences intended to represent character sequences
are expected to provide some way of identifying the charset used, if
there might be more than one [RFC2277]. However, there is currently
no provision within the generic URI syntax to accomplish this
identification. An individual URI scheme may require a single
charset, define a default charset, or provide a way to indicate the
charset used.
It is expected that a systematic treatment of character encoding
within URI will be developed as a future modification of this
specification.
2.2. Reserved Characters
Many URI include components consisting of or delimited by, certain
special characters. These characters are called "reserved", since
their usage within the URI component is limited to their reserved
purpose. If the data for a URI component would conflict with the
reserved purpose, then the conflicting data must be escaped before
forming the URI.
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | " " |
"$" | ","
The "reserved" syntax class above refers to those characters that are
allowed within a URI, but which may not be allowed within a
particular component of the generic URI syntax; they are used as
delimiters of the components described in Section 3.
Characters in the "reserved" set are not reserved in all contexts.
The set of characters actually reserved within any given URI
component is defined by that component. In general, a character is
reserved if the semantics of the URI changes if the character is
replaced with its escaped US-ASCII encoding.
2.3. Unreserved Characters
Data characters that are allowed in a URI but do not have a reserved
purpose are called unreserved. These include upper and lower case
letters, decimal digits, and a limited set of punctuation marks and
symbols.
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
Unreserved characters can be escaped without changing the semantics
of the URI, but this should not be done unless the URI is being used
in a context that does not allow the unescaped character to appear.
2.4. Escape Sequences
Data must be escaped if it does not have a representation using an
unreserved character; this includes data that does not correspond to
a printable character of the US-ASCII coded character set, or that
corresponds to any US-ASCII character that is disallowed, as
explained below.
2.4.1. Escaped Encoding
An escaped octet is encoded as a character triplet, consisting of the
percent character " followed by the two hexadecimal digits
representing the octet code. For example, " " is the escaped
encoding for the US-ASCII space character.
escaped = " hex hex
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" |
"a" | "b" | "c" | "d" | "e" | "f"
2.4.2. When to Escape and Unescape
A URI is always in an "escaped" form, since escaping or unescaping a
completed URI might change its semantics. Normally, the only time
escape encodings can safely be made is when the URI is being created
from its component parts; each component may have its own set of
characters that are reserved, so only the mechanism responsible for
generating or interpreting that component can determine whether or
not escaping a character will change its semantics. Likewise, a URI
must be separated into its components before the escaped characters
within those components can be safely decoded.
In some cases, data that could be represented by an unreserved
character may appear escaped; for example, some of the unreserved
"mark" characters are automatically escaped by some systems. If the
given URI scheme defines a canonicalization algorithm, then
unreserved characters may be unescaped according to that algorithm.
For example, "~" is sometimes used instead of "~" in an http URL
path, but the two are equivalent for an http URL.
Because the percent " character always has the reserved purpose of
being the escape indicator, it must be escaped as "%" in order to
be used as data within a URI. Implementers should be careful not to
escape or unescape the same string more than once, since unescaping
an already unescaped string might lead to misinterpreting a percent
data character as another escaped character, or vice versa in the
case of escaping an already escaped string.
2.4.3. Excluded US-ASCII Characters
Although they are disallowed within the URI syntax, we include here a
description of those US-ASCII characters that have been excluded and
the reasons for their exclusion.
The control characters in the US-ASCII coded character set are not
used within a URI, both because they are non-printable and because
they are likely to be misinterpreted by some control mechanisms.
c.
The space character is excluded because significant spaces may
disappear and insignificant spaces may be introduced when URI are
transcribed or typeset or subjected to the treatment of word-
processing programs. Whitespace is also used to delimit URI in many
contexts.
space =
The angle-bracket "<" and ">" and double-quote (") characters are
excluded because they are often used as the delimiters around URI in
text documents and protocol fields. The character "." is excluded
because it is used to delimit a URI from a fragment identifier in URI
references (Section 4). The percent character " is excluded because
it is used for the encoding of escaped characters.
delims = "<" | ">" | "." | " | <">
Other characters are excluded because gateways and other transport
agents are known to sometimes modify such characters, or they are
used as delimiters.
unwise = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"
Data corresponding to excluded characters must be escaped in order to
be properly represented within a URI.
3. URI Syntactic Components
The URI syntax is dependent upon the scheme. In general, absolute
URI are written as follows:
:
An absolute URI contains the name of the scheme being used ()
followed by a colon (":") and then a string (the ) whose interpretation depends on the scheme.
The URI syntax does not require that the scheme-specific-part have
any general structure or set of semantics which is common among all
URI. However, a subset of URI do share a common syntax for
representing hierarchical relationships within the namespace. This
"generic URI" syntax consists of a sequence of four main components:
://?
each of which, except , may be absent from a particular URI.
For example, some URI schemes do not allow an component,
and others do not use a component.
absoluteURI = scheme ":" ( hier_part | opaque_part )
URI that are hierarchical in nature use the slash "/" character for
separating hierarchical components. For some file systems, a "/"
character (used to denote the hierarchical structure of a URI) is the
delimiter used to construct a file name hierarchy, and thus the URI
path will look similar to a file pathname. This does NOT imply that
the resource is a file or that the URI maps to an actual filesystem
pathname.
hier_part = ( net_path | abs_path ) [ "?" query ]
net_path = "//" authority [ abs_path ]
abs_path = "/" path_segments
URI that do not make use of the slash "/" character for separating
hierarchical components are considered opaque by the generic URI
parser.
opaque_part = uric_no_slash *uric
uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |
"&" | "=" | " " | "$" | ","
We use the term to refer to both the and
constructs, since they are mutually exclusive for any
given URI and can be parsed as a single component.
3.1. Scheme Component
Just as there are many different methods of access to resources,
there are a variety of schemes for identifying such resources. The
URI syntax consists of a sequence of components separated by reserved
characters, with the first component defining the semantics for the
remainder of the URI string.
Scheme names consist of a sequence of characters beginning with a
lower case letter and followed by any combination of lower case
letters, digits, plus (" "), period ("."), or hyphen ("-"). For
resiliency, programs interpreting URI should treat upper case letters
as equivalent to lower case in scheme names (e.g., allow "HTTP" as
well as "http").
scheme = alpha *( alpha | digit | " " | "-" | "." )
Relative URI references are distinguished from absolute URI in that
they do not begin with a scheme name. Instead, the scheme is
inherited from the base URI, as described in Section 5.2.
3.2. Authority Component
Many URI schemes include a top hierarchical element for a naming
authority, such that the namespace defined by the remainder of the
URI is governed by that authority. This authority component is
typically defined by an Internet-based server or a scheme-specific
registry of naming authorities.
authority = server | reg_name
The authority component is preceded by a double slash "//" and is
terminated by the next slash "/", question-mark "?", or by the end of
the URI. Within the authority component, the characters ";", ":",
"@", "?", and "/" are reserved.
An authority component is not required for a URI scheme to make use
of relative references. A base URI without an authority component
implies that any relative reference will also be without an authority
component.
3.2.1. Registry-based Naming Authority
The structure of a registry-based naming authority is specific to the
URI scheme, but constrained to the allowed characters for an
authority component.
reg_name = 1*( unreserved | escaped | "$" | "," |
";" | ":" | "@" | "&" | "=" | " " )
3.2.2. Server-based Naming Authority
URL schemes that involve the direct use of an IP-based protocol to a
specified server on the Internet use a common syntax for the server
component of the URI's scheme-specific data:
@:
where may consist of a user name and, optionally, scheme-
specific information about how to gain authorization to access the
server. The parts "@" and ":" may be omitted.
server = [ [ userinfo "@" ] hostport ]
The user information, if present, is followed by a commercial at-sign
"@".
userinfo = *( unreserved | escaped |
";" | ":" | "&" | "=" | " " | "$" | "," )
Some URL schemes use the format "user:password" in the userinfo
field. This practice is NOT RECOMMENDED, because the passing of
authentication information in clear text (such as URI) has proven to
be a security risk in almost every case where it has been used.
The host is a domain name of a network host, or its IPv4 address as a
set of four decimal digit groups separated by ".". Literal IPv6
addresses are not supported.
hostport = host [ ":" port ]
host = hostname | IPv4address
hostname = *( domainlabel "." ) toplabel [ "." ]
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit
port = *digit
Hostnames take the form described in Section 3 of [RFC1034] and
Section 2.1 of [RFC1123]: a sequence of domain labels separated by
".", each domain label starting and ending with an alphanumeric
character and possibly also containing "-" characters. The rightmost
domain label of a fully qualified domain name will never start with a
digit, thus syntactically distinguishing domain names from IPv4
addresses, and may be followed by a single "." if it is necessary to
distinguish between the complete domain name and any local domain.
To actually be "Uniform" as a resource locator, a URL hostname should
be a fully qualified domain name. In practice, however, the host
component may be a local domain literal.
Note: A suitable representation for including a literal IPv6
address as the host part of a URL is desired, but has not yet been
determined or implemented in practice.
The port is the network port number for the server. Most schemes
designate protocols that have a default port number. Another port
number may optionally be supplied, in decimal, separated from the
host by a colon. If the port is omitted, the default port number is
assumed.
3.3. Path Component
The path component contains data, specific to the authority (or the
scheme if there is no authority component), identifying the resource
within the scope of that scheme and authority.
path = [ abs_path | opaque_part ]
path_segments = segment *( "/" segment )
segment = *pchar *( ";" param )
param = *pchar
pchar = unreserved | escaped |
":" | "@" | "&" | "=" | " " | "$" | ","
The path may consist of a sequence of path segments separated by a
single slash "/" character. Within a path segment, the characters
"/", ";", "=", and "?" are reserved. Each path segment may include a
sequence of parameters, indicated by the semicolon ";" character.
The parameters are not significant to the parsing of relative
references.
3.4. Query Component
The query component is a string of information to be interpreted by
the resource.
query = *uric
Within a query component, the characters ";", "/", "?", ":", "@",
"&", "=", " ", ",", and "$" are reserved.
4. URI References
The term "URI-reference" is used here to denote the common usage of a
resource identifier. A URI reference may be absolute or relative,
and may have additional information attached in the form of a
fragment identifier. However, "the URI" that results from such a
reference includes only the absolute URI after the fragment
identifier (if any) is removed and after any relative URI is resolved
to its absolute form. Although it is possible to limit the
discussion of URI syntax and semantics to that of the absolute
result, most usage of URI is within general URI references, and it is
impossible to obtain the URI from such a reference without also
parsing the fragment and resolving the relative form.
URI-reference = [ absoluteURI | relativeURI ] [ "." fragment ]
The syntax for relative URI is a shortened form of that for absolute
URI, where some prefix of the URI is missing and certain path
components ("." and "..") have a special meaning when, and only when,
interpreting a relative path. The relative URI syntax is defined in
Section 5.
4.1. Fragment Identifier
When a URI reference is used to perform a retrieval action on the
identified resource, the optional fragment identifier, separated from
the URI by a crosshatch (".") character, consists of additional
reference information to be interpreted by the user agent after the
retrieval action has been successfully completed. As such, it is not
part of a URI, but is often used in conjunction with a URI.
fragment = *uric
The semantics of a fragment identifier is a property of the data
resulting from a retrieval action, regardless of the type of URI used
in the reference. Therefore, the format and interpretation of
fragment identifiers is dependent on the media type [RFC2046] of the
retrieval result. The character restrictions described in Section 2
for URI also apply to the fragment in a URI-reference. Individual
media types may define additional restrictions or structure within
the fragment for specifying different types of "partial views" that
can be identified within that media type.
A fragment identifier is only meaningful when a URI reference is
intended for retrieval and the result of that retrieval is a document
for which the identified fragment is consistently defined.
4.2. Same-document References
A URI reference that does not contain a URI is a reference to the
current document. In other words, an empty URI reference within a
document is interpreted as a reference to the start of that document,
and a reference containing only a fragment identifier is a reference
to the identified fragment of that document. Traversal of such a
reference should not result in an additional retrieval action.
However, if the URI reference occurs in a context that is always
intended to result in a new request, as in the case of HTML's FORM
element, then an empty URI reference represents the base URI of the
current document and should be replaced by that URI when transformed
into a request.
4.3. Parsing a URI Reference
A URI reference is typically parsed according to the four main
components and fragment identifier in order to determine what
components are present and whether the reference is relative or
absolute. The individual components are then parsed for their
subparts and, if not opaque, to verify their validity.
Although the BNF defines what is allowed in each component, it is
ambiguous in terms of differentiating between an authority component
and a path component that begins with two slash characters. The
greedy algorithm is used for disambiguation: the left-most matching
rule soaks up as much of the URI reference string as it is capable of
matching. In other words, the authority component wins.
Readers familiar with regular expressions should see Appendix B for a
concrete parsing example and test oracle.
5. Relative URI References
It is often the case that a group or "tree" of documents has been
constructed to serve a common purpose; the vast majority of URI in
these documents point to resources within the tree rather than
outside of it. Similarly, documents located at a particular site are
much more likely to refer to other resources at that site than to
resources at remote sites.
Relative addressing of URI allows document trees to be partially
independent of their location and access scheme. For instance, it is
possible for a single set of hypertext documents to be simultaneously
accessible and traversable via each of the "file", "http", and "ftp"
schemes if the documents refer to each other using relative URI.
Furthermore, such document trees can be moved, as a whole, without
changing any of the relative references. Experience within the WWW
has demonstrated that the ability to perform relative referencing is
necessary for the long-term usability of embedded URI.
The syntax for relative URI takes advantage of the syntax
of (Section 3) in order to express a reference that is
relative to the namespace of another hierarchical URI.
relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
A relative reference beginning with two slash characters is termed a
network-path reference, as defined by in Section 3. Such
references are rarely used.
A relative reference beginning with a single slash character is
termed an absolute-path reference, as defined by in
Section 3.
A relative reference that does not begin with a scheme name or a
slash character is termed a relative-path reference.
rel_path = rel_segment [ abs_path ]
rel_segment = 1*( unreserved | escaped |
";" | "@" | "&" | "=" | " " | "$" | "," )
Within a relative-path reference, the complete path segments "." and
".." have special meanings: "the current hierarchy level" and "the
level above this hierarchy level", respectively. Although this is
very similar to their use within Unix-based filesystems to indicate
directory levels, these path components are only considered special
when resolving a relative-path reference to its absolute form
(Section 5.2).
Authors should be aware that a path segment which contains a colon
character cannot be used as the first segment of a relative URI path
(e.g., "this:that"), because it would be mistaken for a scheme name.
It is therefore necessary to precede such segments with other
segments (e.g., "./this:that") in order for them to be referenced as
a relative path.
It is not necessary for all URI within a given scheme to be
restricted to the syntax, since the hierarchical
properties of that syntax are only necessary when relative URI are
used within a particular document. Documents can only make use of
relative URI when their base URI fits within the syntax.
It is assumed that any document which contains a relative reference
will also have a base URI that obeys the syntax. In other words,
relative URI cannot be used within a document that has an unsuitable
base URI.
Some URI schemes do not allow a hierarchical syntax matching the
syntax, and thus cannot use relative references.
5.1. Establishing a Base URI
The term "relative URI" implies that there exists some absolute "base
URI" against which the relative reference is applied. Indeed, the
base URI is necessary to define the semantics of any relative URI
reference; without it, a relative reference is meaningless. In order
for relative URI to be usable within a document, the base URI of that
document must be known to the parser.
The base URI of a document can be established in one of four ways,
listed below in order of precedence. The order of precedence can be
thought of in terms of layers, where the innermost defined base URI
has the highest precedence. This can be visualized graphically as:
.----------------------------------------------------------.
| .----------------------------------------------------. |
| | .----------------------------------------------. | |
| | | .----------------------------------------. | | |
| | | | .----------------------------------. | | | |
| | | | | | | | | |
| | | | `----------------------------------' | | | |
| | | | (5.1.1) Base URI embedded in the | | | |
| | | | document's content | | | |
| | | `----------------------------------------' | | |
| | | (5.1.2) Base URI of the encapsulating entity | | |
| | | (message, document, or none). | | |
| | `----------------------------------------------' | |
| | (5.1.3) URI used to retrieve the entity | |
| `----------------------------------------------------' |
| (5.1.4) Default Base URI is application-dependent |
`----------------------------------------------------------'
5.1.1. Base URI within Document Content
Within certain document media types, the base URI of the document can
be embedded within the content itself such that it can be readily
obtained by a parser. This can be useful for descriptive documents,
such as tables of content, which may be transmitted to others through
protocols other than their usual retrieval context (e.g., E-Mail or
USENET news).
It is beyond the scope of this document to specify how, for each
media type, the base URI can be embedded. It is assumed that user
agents manipulating such media types will be able to obtain the
appropriate syntax from that media type's specification. An example
of how the base URI can be embedded in the Hypertext Markup Language
(HTML) [RFC1866] is provided in Appendix D.
A mechanism for embedding the base URI within MIME container types
(e.g., the message and multipart types) is defined by MHTML
[RFC2110]. Protocols that do not use the MIME message header syntax,
but which do allow some form of tagged metainformation to be included
within messages, may define their own syntax for defining the base
URI as part of a message.
5.1.2. Base URI from the Encapsulating Entity
If no base URI is embedded, the base URI of a document is defined by
the document's retrieval context. For a document that is enclosed
within another entity (such as a message or another document), the
retrieval context is that entity; thus, the default base URI of the
document is the base URI of the entity in which the document is
encapsulated.
5.1.3. Base URI from the Retrieval URI
If no base URI is embedded and the document is not encapsulated
within some other entity (e.g., the top level of a composite entity),
then, if a URI was used to retrieve the base document, that URI shall
be considered the base URI. Note that if the retrieval was the
result of a redirected request, the last URI used (i.e., that which
resulted in the actual retrieval of the document) is the base URI.
5.1.4. Default Base URI
If none of the conditions described in Sections 5.1.1--5.1.3 apply,
then the base URI is defined by the context of the application.
Since this definition is necessarily application-dependent, failing
to define the base URI using one of the other methods may result in
the same content being interpreted differently by different types of
application.
It is the responsibility of the distributor(s) of a document
containing relative URI to ensure that the base URI for that document
can be established. It must be emphasized that relative URI cannot
be used reliably in situations where the document's base URI is not
well-defined.
5.2. Resolving Relative References to Absolute Form
This section describes an example algorithm for resolving URI
references that might be relative to a given base URI.
The base URI is established according to the rules of Section 5.1 and
parsed into the four main components as described in Section 3. Note
that only the scheme component is required to be present in the base
URI; the other components may be empty or undefined. A component is
undefined if its preceding separator does not appear in the URI
reference; the path component is never undefined, though it may be
empty. The base URI's query component is not used by the resolution
algorithm and may be discarded.
For each URI reference, the following steps are performed in order:
1) The URI reference is parsed into the potential four components and
fragment identifier, as described in Section 4.3.
2) If the path component is empty and the scheme, authority, and
query components are undefined, then it is a reference to the
current document and we are done. Otherwise, the reference URI's
query and fragment components are defined as found (or not found)
within the URI reference and not inherited from the base URI.
3) If the scheme component is defined, indicating that the reference
starts with a scheme name, then the reference is interpreted as an
absolute URI and we are done. Otherwise, the reference URI's
scheme is inherited from the base URI's scheme component.
Due to a loophole in prior specifications [RFC1630], some parsers
allow the scheme name to be present in a relative URI if it is the
same as the base URI scheme. Unfortunately, this can conflict
with the correct parsing of non-hierarchical URI. For backwards
compatibility, an implementation may work around such references
by removing the scheme if it matches that of the base URI and the
scheme is known to always use the syntax. The parser
can then continue with the steps below for the remainder of the
reference components. Validating parsers should mark such a
misformed relative reference as an error.
4) If the authority component is defined, then the reference is a
network-path and we skip to step 7. Otherwise, the reference
URI's authority is inherited from the base URI's authority
component, which will also be undefined if the URI scheme does not
use an authority component.
5) If the path component begins with a slash character ("/"), then
the reference is an absolute-path and we skip to step 7.
6) If this step is reached, then we are resolving a relative-path
reference. The relative path needs to be merged with the base
URI's path. Although there are many ways to do this, we will
describe a simple method using a separate string buffer.
a) All but the last segment of the base URI's path component is
copied to the buffer. In other words, any characters after the
last (right-most) slash character, if any, are excluded.
b) The reference's path component is appended to the buffer
string.
c) All occurrences of "./", where "." is a complete path segment,
are removed from the buffer string.
d) If the buffer string ends with "." as a complete path segment,
that "." is removed.
e) All occurrences of "/../", where is a
complete path segment not equal to "..", are removed from the
buffer string. Removal of these path segments is performed
iteratively, removing the leftmost matching pattern on each
iteration, until no matching pattern remains.
f) If the buffer string ends with "/..", where
is a complete path segment not equal to "..", that
"/.." is removed.
g) If the resulting buffer string still begins with one or more
complete path segments of "..", then the reference is
considered to be in error. Implementations may handle this
error by retaining these components in the resolved path (i.e.,
treating them as part of the final URI), by removing them from
the resolved path (i.e., discarding relative levels above the
root), or by avoiding traversal of the reference
h) The remaining buffer string is the reference URI's new path
component.
7) The resulting URI components, including any inherited from the
base URI, are recombined to give the absolute form of the URI
reference. Using pseudocode, this would be
result = ""
if scheme is defined then
append scheme to result
append ":" to result
if authority is defined then
append "//" to result
append authority to result
append path to result
if query is defined then
append "?" to result
append query to result
if fragment is defined then
append "." to result
append fragment to result
return result
Note that we must be careful to preserve the distinction between a
component that is undefined, meaning that its separator was not
present in the reference, and a component that is empty, meaning
that the separator was present and was immediately followed by the
next component separator or the end of the reference.
The above algorithm is intended to provide an example by which the
output of implementations can be tested -- implementation of the
algorithm itself is not required. For example, some systems may find
it more efficient to implement step 6 as a pair of segment stacks
being merged, rather than as a series of string pattern replacements.
Note: Some WWW client applications will fail to separate the
reference's query component from its path component before merging
the base and reference paths in step 6 above. This may result in
a loss of information if the query component contains the strings
"/../" or "/./".
Resolution examples are provided in Appendix C.
6. URI Normalization and Equivalence
In many cases, different URI strings may actually identify the
identical resource. For example, the host names used in URL are
actually case insensitive, and the URL is
equivalent to . In general, the rules for
equivalence and definition of a normal form, if any, are scheme
dependent. When a scheme uses elements of the common syntax, it will
also use the common syntax equivalence rules, namely that the scheme
and hostname are case insensitive and a URL with an explicit ":port",
where the port is the default for the scheme, is equivalent to one
where the port is elided.
7. Security Considerations
A URI does not in itself pose a security threat. Users should beware
that there is no general guarantee that a URL, which at one time
located a given resource, will continue to do so. Nor is there any
guarantee that a URL will not locate a different resource at some
later point in time, due to the lack of any constraint on how a given
authority apportions its namespace. Such a guarantee can only be
obtained from the person(s) controlling that namespace and the
resource in question. A specific URI scheme may include additional
semantics, such as name persistence, if those semantics are required
of all naming authorities for that scheme.
It is sometimes possible to construct a URL such that an attempt to
perform a seemingly harmless, idempotent operation, such as the
retrieval of an entity associated with the resource, will in fact
cause a possibly damaging remote operation to occur. The unsafe URL
is typically constructed by specifying a port number other than that
reserved for the network protocol in question. The client
unwittingly contacts a site that is in fact running a different
protocol. The content of the URL contains instructions that, when
interpreted according to this other protocol, cause an unexpected
operation. An example has been the use of a gopher URL to cause an
unintended or impersonating message to be sent via a SMTP server.
Caution should be used when using any URL that specifies a port
number other than the default for the protocol, especially when it is
a number within the reserved space.
Care should be taken when a URL contains escaped delimiters for a
given protocol (for example, CR and LF characters for telnet
protocols) that these are not unescaped before transmission. This
might violate the protocol, but avoids the potential for such
characters to be used to simulate an extra operation or parameter in
that protocol, which might lead to an unexpected and possibly harmful
remote operation to be performed.
It is clearly unwise to use a URL that contains a password which is
intended to be secret. In particular, the use of a password within
the 'userinfo' component of a URL is strongly disrecommended except
in those rare cases where the 'password' parameter is intended to be
public.
8. Acknowledgements
This document was derived from RFC 1738 [RFC1738] and RFC 1808
[RFC1808]; the acknowledgements in those specifications still apply.
In addition, contributions by Gisle Aas, Martin Beet, Martin Duerst,
Jim Gettys, Martijn Koster, Dave Kristol, Daniel LaLiberte, Foteos
Macrides, James Marshall, Ryan Moats, Keith Moore, and Lauren Wood
are gratefully acknowledged.
9. References
[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and
Languages", BCP 18, RFC 2277, January 1998.
[RFC1630] Berners-Lee, T., "Universal Resource Identifiers in WWW: A
Unifying Syntax for the Expression of Names and Addresses
of Objects on the Network as used in the World-Wide Web",
RFC 1630, June 1994.
[RFC1738] Berners-Lee, T., Masinter, L., and M. McCahill, Editors,
"Uniform Resource Locators (URL)", RFC 1738, December 1994.
[RFC1866] Berners-Lee T., and D. Connolly, "HyperText Markup Language
Specification -- 2.0", RFC 1866, November 1995.
[RFC1123] Braden, R., Editor, "Requirements for Internet Hosts --
Application and Support", STD 3, RFC 1123, October 1989.
[RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text
Messages", STD 11, RFC 822, August 1982.
[RFC1808] Fielding, R., "Relative Uniform Resource Locators", RFC
1808, June 1995.
[RFC2046] Freed, N., and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", RFC 2046,
November 1996.
[RFC1736] Kunze, J., "Functional Recommendations for Internet
Resource Locators", RFC 1736, February 1995.
[RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997.
[RFC1034] Mockapetris, P., "Domain Names - Concepts and Facilities",
STD 13, RFC 1034, November 1987.
[RFC2110] Palme, J., and A. Hopmann, "MIME E-mail Encapsulation of
Aggregate Documents, such as HTML (MHTML)", RFC 2110, March
1997.
[RFC1737] Sollins, K., and L. Masinter, "Functional Requirements for
Uniform Resource Names", RFC 1737, December 1994.
[ASCII] US-ASCII. "Coded Character Set -- 7-bit American Standard
Code for Information Interchange", ANSI X3.4-1986.
[UTF-8] Yergeau, F., "UTF-8, a transformation format of ISO 10646",
RFC 2279, January 1998.
10. Authors' Addresses
Tim Berners-Lee
World Wide Web Consortium
MIT Laboratory for Computer Science, NE43-356
545 Technology Square
Cambridge, MA 02139
Fax: 1(617)258-8682
EMail: timbl@w3.org
Roy T. Fielding
Department of Information and Computer Science
University of California, Irvine
Irvine, CA 92697-3425
Fax: 1(949)824-1715
EMail: fielding@ics.uci.edu
Larry Masinter
Xerox PARC
3333 Coyote Hill Road
Palo Alto, CA 94034
Fax: 1(415)812-4333
EMail: masinter@parc.xerox.com
A. Collected BNF for URI
URI-reference = [ absoluteURI | relativeURI ] [ "." fragment ]
absoluteURI = scheme ":" ( hier_part | opaque_part )
relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
hier_part = ( net_path | abs_path ) [ "?" query ]
opaque_part = uric_no_slash *uric
uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |
"&" | "=" | " " | "$" | ","
net_path = "//" authority [ abs_path ]
abs_path = "/" path_segments
rel_path = rel_segment [ abs_path ]
rel_segment = 1*( unreserved | escaped |
";" | "@" | "&" | "=" | " " | "$" | "," )
scheme = alpha *( alpha | digit | " " | "-" | "." )
authority = server | reg_name
reg_name = 1*( unreserved | escaped | "$" | "," |
";" | ":" | "@" | "&" | "=" | " " )
server = [ [ userinfo "@" ] hostport ]
userinfo = *( unreserved | escaped |
";" | ":" | "&" | "=" | " " | "$" | "," )
hostport = host [ ":" port ]
host = hostname | IPv4address
hostname = *( domainlabel "." ) toplabel [ "." ]
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit
port = *digit
path = [ abs_path | opaque_part ]
path_segments = segment *( "/" segment )
segment = *pchar *( ";" param )
param = *pchar
pchar = unreserved | escaped |
":" | "@" | "&" | "=" | " " | "$" | ","
query = *uric
fragment = *uric
uric = reserved | unreserved | escaped
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | " " |
"$" | ","
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" |
"(" | ")"
escaped = " hex hex
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" |
"a" | "b" | "c" | "d" | "e" | "f"
alphanum = alpha | digit
alpha = lowalpha | upalpha
lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
"j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
"s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
"J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
"S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
"8" | "9"
Powered by  | | English | | Albanian | | Arabic | | Bulgarian | | Catalan | | Chinese | | Croatian | | Czech | | Danish | | Dutch | | Estonian | | Filipino | | Finnish | | French | | Galician | | German | | Greek | | Hebrew | | Hindi | | Hungarian | | Indonesian | | Italian | | Japanese | | Korean | | Latvian | | Lithuanian | | Maltese | | Norwegian | | Polish | | Portuguese | | Romanian | | Russian | | Serbian | | Slovak | | Slovenian | | Spanish | | Swedish | | Thai | | Turkish | | Ukrainian | | Vietnamese |
|
|
|
|