Update: Hello new viewers, this post has recently started to receive 100's of page views a day, shooting it up my post rankings... And I don't know why, after all these years, why is it suddenly so popular... Let me know in the comments below! (And don't forget the tip jar is just on the top right there! hehe)
So, today I'm going to go through a really long winded explanation of how to set up C++ based CGI under Apache server on both Debian based Linux (Ubuntu) and Windows.
Before I go too far though let us just sort something out, CGI stands for "Common Gateway Interface" it is simply a set of definitions for data parsing and redirecting the standard input and standard output streams to your application. Your CGI application can be written in ANY language you want. It can be PHP, Perl, C, C++, Visual Basic, C#, Python... Literally any programming language which is able to create a console application can be used to make a CGI application. Many sources on the Internet (like this one http://www.javascriptkit.com/howto/phpcgi.shtml) simply muddle up what CGI is, with it being a specific language. CGI is not a language it is actually an Application Programming Interface (API); it might even be thought of as a protocol rather than an API; to define how we use our cin and cout streams to render dynamic web page content.
So, moving on, we need a web server to host our applications, what we will do is code our applications in one location, build them and then copy them to the CGI folder of our web server. Then use a browser pointing to our server to make it run our application for us.
If you didn't spot it, I just mentioned the largest draw-back of CGI. Every time someone visits your CGI application page a new copy of the application is loaded from disk into the server and executed. If your CGI application is slow, or relies on a shared file, or some other item delays it, then you will noticeably slow your handling of web pages. There are ways around this, such as FastCGI, or with languages like Perl and PHP the scripts are not applications, instead they are interpreted (making them slower) but once they have been compiled (or precompiled once) those scripts can be kept in the server memory negating the need to load and reload each time someone accesses them. However, having lots of such scripts in server ram increases the server resource foot print (it needs more server memory to run effectively) quite dramatically.
Our target platform will be Apache2, on a very low power machine, single CPU, single core, 512mb of ram. Our operating systems will be Ubuntu Server, Kubuntu Desktop and Windows Vista 32bit Home Premium. (Any flavour of windows will work the exact same way - we're going to configure Apache, not Windows), and our goal is to run on as small a memory foot print as possible, with as few interpreters and sub-libraries working on our behalf.
Set up HTTP Apache Server under Windows
So, lets get on with setting up Apache... First lets get hold of the Apache HTTP server from http://httpd.apache.org/. We're going to concentrate on Apache 2.2.19, which is the current (at time of writing) stable release.
You can set the server up however you wish, however, for this example I selected to install my server as "localhost:8080" only so that I had to manually run the server and could browse to "http://localhost:8080" to see the "It Works!" Screen. So go a head and install, start the server
If you want to throw your own HTML pages onto your server you now simply browse to the folder apache installed to (by default this is under C:\Program Files\Apache Software Foundation\Apache2) and you can drop HTML or HTM files into the "htdocs" subfolder.
Note: You can copy files into the Windows Apache Subfolder and have them be served up without needing to restart the server.
Setup HTTP Apache Server under Linux
You should have your Linux machine installed with Ubuntu, server or desktop, or Kubuntu it makes no difference, you simply need to be able to open a command prompt (terminal window) and use the apt-get command.
So, first things first, lets install apache. In your prompt type:
sudo apt-get install apache2
Allow this comment to complete by providing your root password, and you should then be able to browse to your server IP, or the local host IP, and see the "It Works!" Page, just as we did under windows.
Again you can copy, or create, files in the folder /var/www to have them hosted by the server, so you could create /var/www/hello.html with a big "HELLO" text in the center of the page, however, just creating the file under Linux is not enough, you must perform a server restart each time you add a file (or after you have added a bunch of files).
To do this type:
sudo /etc/init.d/apache2 restart
Provide your root password and the server will restart and now you'll be able to browse to "http://127.0.0.1/hello.html" and see your big centered HELLO (swap 127.0.0.1 with your server/linux box IP if you're viewing from a different machine).
Adding CGI
By default both platforms have configuration to point to a cgi-bin directory, it is from this directory we will be running our example code (you can change the folder later), under windows you'll find the cgi-bin in the location of the apache executable you installed, so by default this is in "C:\program files\apache software foundation\apache2\cgi-bin". You can copy your CGI programs into there anytime you wish, once you have copied the file across, make sure you rename the extension "cgi", so you might build "cgidemo.exe", when you copy it rename it to "cgidemo.cgi".
On Linux this default location is /usr/lib/cgi-bin. So copy files into there you will need to use the root user comment to copy, for example, to copy a file called "cgidemo" from your home directory (where your username is ralph) to the cgi-bin you would have to use the command line:
sudo cp /home/ralph/cgidemo /usr/lib/cgi-bin/cgidemo.cgi
This copies the file, but also renames the file just as we would under windows.
Script Alias and Handlers
So, why do we rename our exe or file to .cgi? Well, this is to facilitate our adding a handler type to the Apache configuration. A handler basically tells the Apache software to associate a given file extension with an internal module. So we need to associate the .cgi extension with the cgi-script type.
Under windows you do this by using the start menu, seeking out where Apache installed in your programs list and using "Configure Apache" and the "Edit the Apache httpd.conf Configuration File" option. This will open notepad editing the sever configuration.
You need to look for the "ScriptAlias" for the /cgi-bin/ so use find to get to the line which starts "ScriptAlias /cgi-bin/" it will include another folder which is the actual folder on your hard drive where the cgi files will be stored. You can change that path later, but for now simply remove the # from the front, this will enable the Script Alias against that directory.
So mine reads:
ScriptAlias /cgi-bin/ "C:/Program Files/Apache Software Foundation/Apache2.2/cgi-bin/"
The next thing we need do is to find the actual configuration for this directory, the directory configuration tells the server what can and cannot be accessed or executed from this folder, so now find the Directory section for the alias given:
<Directory "C:/Program Files/Apache Software Foundation/ Apache2.2/cgi-bin/">
And you need to set some options and add a handler for cgi-scripts to the .cgi extension, this looks just like this:
<Directory "C:/Program Files/Apache Software Foundation/ Apache2.2/cgi-bin/">
AllowOverride None
Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
Order allow,deny
Allow from all
AddHandler cgi-script cgi
</Directory>
So, I have enabled cgi to execute from this directory, and told the handler to use any file ending "cgi" as a cgi-script.
Under Linux this is a lot simpler, you simply need to use the command line to edit the same file, which is in a slightly different location. So from the command prompt you will need to run the command:
sudo nano /etc/apache2/sites-available/default
This is the same configuration file as under Windows, it looks different, but it takes the same input, if you look now for the same <Directory specified as under windows, you should find the /usr/lib/cgi-bin folder thus:
<Directory "/usr/lib/cgi-bin">
Now set the directory to be the same:
<Directory "/usr/lib/cgi-bin">
AllowOverride None
Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
Order allow,deny
Allow from all
AddHandler cgi-script cgi
</Directory>
However, unlike windows you need to add the script alias yourself, so I have added mine just like this:
ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
<Directory "/usr/lib/cgi-bin">
AllowOverride None
Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
Order allow,deny
Allow from all
AddHandler cgi-script cgi
</Directory>
If you save this file and then restart the apache server any cgi files you have in cgi-bin can now be accessed.
Alias Explained
So, I hope its obvious now that the alias you set is used to map a local folder on your hard drive to a URL on your server, so if you have an alias of /cgi-bin/ pointing to a folder you access if from a browser as "http://YOUR IP/cgi-bin/"
If you added an alias of "/nothing/" then you could get to contents in that folder as "Http://YOURIP/nothing" in your browser.
Now to Code
So, our next task is to create some code which we can actually run as our CGI. So, in Visual Studio I create a new C++ application which is an empty Console Application. In Code Blocks on Linux I create an empty project.
To both I add a "main.cpp" file and I put the exact same code in both:
#include <iostream>
#include <string>
using namespace std;
const string c_ContentHeader = "Content-type: text/html\n\n";
int main (int argc, char* argv[])
{
cout << c_ContentHeader;
cout << "Hello From CGI" << endl;
return 0;
}
Pretty simple application. Both windows and linux (gcc) will build this without even blinking, so go a head and build it up, and copy the output executable to your cgi folder.... so my output is called "hellocgi.exe" on windows and just "hellocgi" on Linux.
I copy and rename it to our cgi folders "C:\apachepath\cgi-bin\hellocgi.cgi" on windows and "sudo cp /home/ralph/cgidemo/bin/Debug/hellocgi /usr/lib/cgi-bin/hellocgi.cgi"
On Linux I restart apache "sudo /etc/init.d/apache2 restart".
And now I browse to the server as "http://127.0.0.1:8080/cgi-bin/hellocgi.cgi" on windows, and I browse to "http://127.0.0.1/cgi-bin/hellocgi.cgi" on Linux. And voila I see my "Hello From CGI".
Now, lets take this code apart, if you miss out sending the "Content-type" string first you will get a server side error in your browser, this error looks like this:
To then spot that we missed the content header, then we need to view the error.log within our apache logs folder and we can see the line:
[Mon Jul 04 11:32:16 2011] [error] [client 127.0.0.1] malformed header from script. Bad header=Hello From CGI: CGI.cgi
Oh, "bad header" and yes.. I had the code looking like this:
#include <iostream>
#include <string>
using namespace std;
const string c_ContentHeader = "Content-type: text/html\n\n";
int main (int argc, char* argv[])
{
//cout << c_ContentHeader; -- Commented out
cout << "Hello From CGI" << endl;
return 0;
}
So, what have we learned from all this?
- To point an Apache alias at a directory.
- To configure this directory to allow CGI execution.
- To add a handler for the cgi-script type to a defined file extension.
- To copy our C++ compiled/built output to the cgi-bin folder we've aliased.
- To restart the server.
- To run the executable.
Rendering HTML
From our C++ application we can now render any HTML we want, we could load this from templates on disk, or simply stream it out from the code
cout << c_ContentHeader;
cout << "<html>";
cout << "<body>";
cout << "<lcenter>Hello World from the CGI app, but with HTML</center>";
cout << "</body>";
cout << "</html>";
This is using the C++ IOStream library, however we can use the old C style output of "printf".
printf ("%s", c_ContentHeader.c_str());
printf ("<html>");
printf ("<body>");
printf ("<center>Hello World from the CGI app, but with HTML</center>");
printf ("</body>");
printf ("</html>");
You could therefore load a template from disk, for example a file containing:
<html>
<body>
<center><%=String%></center>
</body>
</html>
And in our program we can replace &qupt;<%=String%>" with some other value... This kind of script usage might ring bells with ASP developers, as it is exactly the kind of syntax used to output a value from an ASP application... Maybe you could write your own dynamic content server...
We could simplify processing of templates loaded with the use of the C style calls, for example, this template:
<html>
<body>
<center>%s</center>
</body>
</html>
Loaded as a single string this could then be rendered with a value in place of %s with a single call to printf:
printf (template.c_str(), "HELLO WORLD");
Will result in the Hello world string being presented center of the page from the template.
You can get as complex as you wish here.
Receiving Input
So, if we now are happy with outputting HTML from our CGI application we will need to work out how to receive input. For example, if we had our browser come to our CGI page from a form post, then that page might look like this:
<html>
<body>
<form name="TestForm" action="http://127.0.0.1:8080/cgi-bin/contentLength.cgi" method="POST">
<input type=text name="TextField1" value="Hello World">
<input type=submit name="submit" value="Click To Submit">
</form>
</body>
</html>
So, we see the form has an input field and a submit field, and it points to our CGI page... this is a new CGI page, the code for which looks like this:
#include <iostream>
#include <string>
using namespace std;
const string c_ContentHeader = "Content-type: text/html\n\n";
// ---- CONTENT LENGTH ----
///<Summary>
/// The content length environment variable
/// name
///</Summary>
const string c_ContentLengthVariableName = "CONTENT_LENGTH";
///<Summary>
/// Function to return the current requests
/// content length
///</Summary>
const int GetContentLength()
{
int l_result = 0;
char* l_ContentLengthVariable = getenv(c_ContentLengthVariableName.c_str());
if ( l_ContentLengthVariable != NULL )
{
l_result = atoi(l_ContentLengthVariable);
}
return l_result;
}
// ---- END CONTENT LENGTH ----
int main (int argc, char* argv[])
{
cout <<c_ContentHeader;
cout <<"The content Length is: " << GetContentLength() << endl;
return 0;
}
This code shows you a new function to get the content length, this used the "getenv" call, get env returns a set of environment variables, for a CGI program like ours the environment variables are all items set up by the Apache server before it started our application.
So, if you build this code and copy it into place as "contentLength.cgi" save the HTML shown to your /htdocs or /var/www folder and restart your server. Then you can browse your server to the HTML page, click submit and send that data to the CGI.
The "<input" items in the form are then sent to the CGI script, becoming its input.
You should see by default the number of bytes sent to your CGI script. If you go back and change the text in the input field, you will see a different content length being shown by your CGI program.
So where does this input come from? Well, it comes from the standard input stream, is you perform a cin, or read console, or Console.ReadLine or whatever you want, then you will receive your input. We already know how many bytes to read, so lets see how to do this...
// ---- GET CONTENT ----
///<Summary>
/// Function to return the content
///</Summary>
const list<string> GetContent()
{
list<string> l_result;
// Now seek the content
int l_ContentLength = GetContentLength();
if ( l_ContentLength > 0 )
{
try
{
// Allocate a buffer for the information
auto_ptr<char> l_Buffer (new char[l_ContentLength]);
// Read the content sent into the buffer
int l_bytesRead = fread (l_Buffer.get(), sizeof(char), l_ContentLength, stdin);
// Check the data length
if ( l_bytesRead == l_ContentLength )
{
// Convert the buffer to a string
stringstream l_stream (l_Buffer.get());
// Push the content as a string into the buffer
while ( !l_stream.eof() )
{
string l_item;
l_stream >>: l_item;
l_result.push_back(l_item);
}
}
}
catch (bad_alloc l_badAllocationException)
{
// TODO handle bad alloc
}
}
return l_result;
}
// ---- END GET CONTENT ----
So, what is this doing? Well, its allocating a buffer of the content length we know we have, its then reading this buffer from the stdin stream. I'm then choosing to squirt it into a string stream and pushing each line into the result list.
Putting it all together my whole code base now looks like this:
#include <iostream>
#include <string>
#include <list>
#include <sstream>
#include <memory>
using namespace std;
const string c_ContentHeader = "Content-type: text/html\n\n";
// ---- CONTENT LENGTH ----
///<Summary>
/// The content length environment variable
/// name
///</Summary>
const string c_ContentLengthVariableName = "CONTENT_LENGTH";
///<Summary>
/// Function to return the current requests
/// content length
///</Summary>
const int GetContentLength()
{
int l_result = 0;
char* l_ContentLengthVariable = getenv(c_ContentLengthVariableName.c_str());
if ( l_ContentLengthVariable != NULL )
{
l_result = atoi(l_ContentLengthVariable);
}
return l_result;
}
// ---- END CONTENT LENGTH ----
// ---- GET CONTENT ----
///<Summary>
/// Function to return the content
///</Summary>
const list<string> GetContent()
{
list<string> l_result;
// Now seek the content
int l_ContentLength = GetContentLength();
if ( l_ContentLength > 0 )
{
try
{
// Allocate a buffer for the information
auto_ptr<char> l_Buffer (new char[l_ContentLength]);
// Read the content sent into the buffer
int l_bytesRead = fread (l_Buffer.get(), sizeof(char), l_ContentLength, stdin);
// Check the data length
if ( l_bytesRead == l_ContentLength )
{
// Convert the buffer to a string
stringstream l_stream (l_Buffer.get());
// Push the content as a string into the buffer
while ( !l_stream.eof() )
{
string l_item;
l_stream >> l_item;
l_result.push_back(l_item);
}
}
}
catch (bad_alloc l_badAllocationException)
{
// TODO handle bad alloc
}
}
return l_result;
}
// ---- END GET CONTENT ----
int main (int argc, char* argv[])
{
cout << c_ContentHeader;
cout << "<html><body>" << endl;
cout << "The content Length is: " << GetContentLength() << "<br>" << endl;
cout << "The Content is: <br><pre>" << endl;
list<string> theContent = GetContent();
for (list<string>::const_iterator itr = theContent.begin();
itr != theContent.end();
itr++)
{
cout << (*itr) << endl;
}
cout << "</pre><hr></body></html>" << endl;
return 0;
}
And I've built this and names it "contentExample.cgi" in my cgi-bin.
I've also edited the HTML form to submit to this new CGI application:
<html>
<body>
<form name="TestForm" action="http://127.0.0.1:8080/cgi-bin/contentExample.cgi" method="POST">
<input type=text name="TextField1" value="Hello World">
<input type=submit name="submit" value="Click To Submit">
</form>
</body>
</html>
Now when I perform the post we get an output from our CGI for the form being submitted. And viola, there is our input.
Conclusion
I'm not going to go into processing the input, nor am I going to point you in the direction of libraries to perform these item yourself. It is a much better learning experience to process all the commands yourself, possibly write your own library, so you understand the underlying CGI protocol environment variables and pitfalls.
This becomes invaluable in understanding the API.
All the code given is completely interchangeable between windows and linux.
If you have any problems with this (very basic) introduction, drop me a line.