The Common Gateway Interface

The Common Gateway Interface (CGI) is a standard for interfacing external (gateway) applications with information servers (primarily HTTP servers). The CGI interface has been in use with the World Wide Web since 1993, and the current version is CGI/1.1. Whereas many of the requests sent to a web server simply retrieve the contents of a file stored on the server, those directed to a gateway program (a CGI script) will cause the program to be executed. The resulting output will vary, depending on the parameters passed to the server-side executable, but will normally trigger a dynamically generated response to be sent to the client application that initiated the request. CGI allows an HTTP server and a CGI script to share responsibility for responding to client requests, and defines a standard way of handling such transactions, including how information is passed to the script, and how the output is used by the server to generate a response.

CGI programs can be written in a variety of different scripting or programming languages, but are most often created as executable scripts using a language such as Perl, and stored in a specific directory on the server named "cgi-bin". The script itself usually consists of a set of program statements, stored in an ASCII text file, that are interpreted at run-time. The client request consists of a Uniform Resource Identifier (URI), a request method, a set of headers that convey information about the client request, and an optional message-body that contains user data.

The client request is received by the HTTP server application, which carries out any necessary decoding and invokes the CGI script identified by the request's URI. The server software converts the client request into a CGI request before passing it to the script via the standard input file handle (stdin). Relevant information about both the request and the HTTP server are passed to the script as a set of named parameters known as meta-variables (these are usually, though not always, operating system environment variables), together with the contents of the message-body. Once the CGI script has executed, the response it generates is forwarded to the client after any necessary encoding has been applied. The server application is responsible for any client authentication required, and for implementing security.

Request methods

The request method is supplied to the script using the REQUEST_METHOD meta-variable, and identifies the processing method to be employed by the script when creating a response. The methods commonly supported include:

The script may also support protocol-specific methods, such as PUT and DELETE (HTTP/1.1). Some systems support a method for supplying an array of strings to the CGI script as arguments. This is only used in the case of an ISINDEX HTTP query, which is identified by a GET or HEAD request accompanied by a URI query string that does not contain any unencoded "=" characters. The query string will be parsed into words, which are then URL-decoded, optionally encoded in a system-defined manner, and added to the command line argument list.

The meta-variables that may be included in a CGI request are described in the table below.


CGI Request Meta-variables
Meta-variableDescription
AUTH_TYPEIdentifies a mechanism used by the server to authenticate the user.
CONTENT_LENGTHContains the size in bytes of the message-body attached to the
request, if it exists.
CONTENT_TYPESpecifies the Internet Media Type of the message-body, if it exists.
GATEWAY_INTERFACEIdentifies the version of CGI implemented by the server (e.g. 1.1).
PATH_INFOSpecifies a path to be interpreted by the CGI script. It identifies the
resource or sub-resource to be returned by the CGI script, and is
derived from the portion of the URI path hierarchy following the part
that identifies the script itself.
PATH_TRANSLATEDDerived by taking the PATH_INFO value, parsing it as a local URI in
its own right, and performing any virtual-to-physical translation
appropriate to map it onto the server's directory structure.
QUERY_STRINGA URL-encoded search or parameter string that provides information
to the CGI script about the client request.
REMOTE_ADDRThe IP address of the client sending the request to the server.
REMOTE_HOSTThe fully qualified domain name of the client sending the request to
the server, if available.
REMOTE_IDENTMay be used to provide identity information reported about the
connection by an RFC 1413 request to the remote agent.
REMOTE_USERA user identification string supplied by client as part of user
authentication.
REQUEST_METHODThe method that should be used by the script to process the request
(e.g. GET, POST, HEAD etc).
SCRIPT_NAMEA URI path that could identify the CGI script.
SERVER_NAMEThe name of the server to which the client request is directed (may
be either a hostname or IP address).
SERVER_PORTThe port number to which the request was sent.
SERVER_PROTOCOLThe name and version of the application protocol used for this CGI
request (e.g. HTTP/1.1).
SERVER_SOFTWAREThe name and version of the information server software making the
CGI request.
HTTP_ACCEPTThe Internet Media Types that the client will accept.
HTTP_USER_AGENTThe browser the client is using to send the request.


N.B - meta-variables with names beginning with "HTTP_" contain values read from the client request header fields.

The CGI Response

The CGI response is passed to the server via the standard output file handle (stdout), and will consist of a message-header and a message-body, separated by a blank line. The message-header contains one or more header fields. The body may be empty. The script will return a document response, a local redirect response, or a client redirect (with optional document) response. The response types are described below.

The CGI response message-body follows the CGI response headers, and is a document to be returned to the client by the server. The server read all of the data provided by the script until it encounters the end of the message-body (indicated by an end-of-file condition). The message-body should be sent to the client without modification apart from any necessary encoding (unless the request used the HEAD method, in which case it will not be sent).

CGI response header fields

The response header fields are either CGI or extension header fields (which will be interpreted by the server), or protocol-specific header fields (which will be included in the response returned to the client). At least one CGI field must be supplied. The response header fields are described below.

The script may also return other header fields relating to the response message that are specific to the server protocol (i.e. HTTP/1.0 or HTTP/1.1).

Processing HTML forms

Forms in web pages allow users to enter data. Once all of the required information has been entered into the form, the user can submit the contents of the form to the server by clicking on the form’s Submit button. The method used to send the data to the gateway program (usually a CGI script) by the server depends on the HTTP method specified for the form, which will be either GET or POST. If the form has METHOD=GET in its FORM tag, the form input is passed to the CGI script in an environment variable called QUERY_STRING. If METHOD=POST is used, form input is passed to the CGI script via stdin (the standard input). The environment variable CONTENT_LENGTH is used to inform the script how much data to read from stdin.

When a form is created in a HTML document, each input box in the form is given a unique name using the NAME attribute (e.g. NAME="lastname"). The information typed into the input box by the user becomes the value associated with that input box. When the user clicks on the Submit button, the form data is sent to the server as a URL-encoded string consisting of name=value pairs separated by the ampersand character (&). The URL-encoding replaces any reserved characters that form part of the user data (e.g. "@", "&", "$") with an escape sequence consisting of the percent character ("%") followed by two hexadecimal digits. The escape sequence represents the character to which it corresponds in the ASCII character set.