Monday, 4 July 2011

C++ based CGI with Apache on Win32 & Linux

Update: Hello new viewers, this post has recently started to receive 100's of page views a day, shooting it up my post rankings... And I don't know why, after all these years, why is it suddenly so popular... Let me know in the comments below! (And don't forget the tip jar is just on the top right there! hehe)

So, today I'm going to go through a really long winded explanation of how to set up C++ based CGI under Apache server on both Debian based Linux (Ubuntu) and Windows.

Before I go too far though let us just sort something out, CGI stands for "Common Gateway Interface" it is simply a set of definitions for data parsing and redirecting the standard input and standard output streams to your application. Your CGI application can be written in ANY language you want.  It can be PHP, Perl, C, C++, Visual Basic, C#, Python... Literally any programming language which is able to create a console application can be used to make a CGI application. Many sources on the Internet (like this one http://www.javascriptkit.com/howto/phpcgi.shtml) simply muddle up what CGI is, with it being a specific language. CGI is not a language it is actually an Application Programming Interface (API); it might even be thought of as a protocol rather than an API; to define how we use our cin and cout streams to render dynamic web page content.

So, moving on, we need a web server to host our applications, what we will do is code our applications in one location, build them and then copy them to the CGI folder of our web server. Then use a browser pointing to our server to make it run our application for us.

If you didn't spot it, I just mentioned the largest draw-back of CGI. Every time someone visits your CGI application page a new copy of the application is loaded from disk into the server and executed. If your CGI application is slow, or relies on a shared file, or some other item delays it, then you will noticeably slow your handling of web pages. There are ways around this, such as FastCGI, or with languages like Perl and PHP the scripts are not applications, instead they are interpreted (making them slower) but once they have been compiled (or precompiled once) those scripts can be kept in the server memory negating the need to load and reload each time someone accesses them. However, having lots of such scripts in server ram increases the server resource foot print (it needs more server memory to run effectively) quite dramatically.

Our target platform will be Apache2, on a very low power machine, single CPU, single core, 512mb of ram. Our operating systems will be Ubuntu Server, Kubuntu Desktop and Windows Vista 32bit Home Premium. (Any flavour of windows will work the exact same way - we're going to configure Apache, not Windows), and our goal is to run on as small a memory foot print as possible, with as few interpreters and sub-libraries working on our behalf.

Set up HTTP Apache Server under Windows

So, lets get on with setting up Apache... First lets get hold of the Apache HTTP server from http://httpd.apache.org/. We're going to concentrate on Apache 2.2.19, which is the current (at time of writing) stable release.

Under Windows you'll need to download httpd-2.2.19-win32-x86-no_ssl.msi from http://httpd.apache.org/download.cgi#apache22, then install the server by double clicking the msi where ever you have saved it to.

You can set the server up however you wish, however, for this example I selected to install my server as "localhost:8080" only so that I had to manually run the server and could browse to "http://localhost:8080" to see the "It Works!" Screen. So go a head and install, start the server



If you want to throw your own HTML pages onto your server you now simply browse to the folder apache installed to (by default this is under C:\Program Files\Apache Software Foundation\Apache2) and you can drop HTML or HTM files into the "htdocs" subfolder.

Note: You can copy files into the Windows Apache Subfolder and have them be served up without needing to restart the server.

Setup HTTP Apache Server under Linux

You should have your Linux machine installed with Ubuntu, server or desktop, or Kubuntu it makes no difference, you simply need to be able to open a command prompt (terminal window) and use the apt-get command.

So, first things first, lets install apache. In your prompt type:

sudo apt-get install apache2

Allow this comment to complete by providing your root password, and you should then be able to browse to your server IP, or the local host IP, and see the "It Works!" Page, just as we did under windows.

Again you can copy, or create, files in the folder /var/www to have them hosted by the server, so you could create /var/www/hello.html with a big "HELLO" text in the center of the page, however, just creating the file under Linux is not enough, you must perform a server restart each time you add a file (or after you have added a bunch of files).

To do this type:

sudo /etc/init.d/apache2 restart

Provide your root password and the server will restart and now you'll be able to browse to "http://127.0.0.1/hello.html" and see your big centered HELLO (swap 127.0.0.1 with your server/linux box IP if you're viewing from a different machine).

Adding CGI

By default both platforms have configuration to point to a cgi-bin directory, it is from this directory we will be running our example code (you can change the folder later), under windows you'll find the cgi-bin in the location of the apache executable you installed, so by default this is in "C:\program files\apache software foundation\apache2\cgi-bin". You can copy your CGI programs into there anytime you wish, once you have copied the file across, make sure you rename the extension "cgi", so you might build "cgidemo.exe", when you copy it rename it to "cgidemo.cgi".

On Linux this default location is /usr/lib/cgi-bin. So copy files into there you will need to use the root user comment to copy, for example, to copy a file called "cgidemo" from your home directory (where your username is ralph) to the cgi-bin you would have to use the command line:

sudo cp /home/ralph/cgidemo /usr/lib/cgi-bin/cgidemo.cgi

This copies the file, but also renames the file just as we would under windows.

Script Alias and Handlers

So, why do we rename our exe or file to .cgi? Well, this is to facilitate our adding a handler type to the Apache configuration. A handler basically tells the Apache software to associate a given file extension with an internal module. So we need to associate the .cgi extension with the cgi-script type.

Under windows you do this by using the start menu, seeking out where Apache installed in your programs list and using "Configure Apache" and the "Edit the Apache httpd.conf Configuration File" option. This will open notepad editing the sever configuration.

You need to look for the "ScriptAlias" for the /cgi-bin/ so use find to get to the line which starts "ScriptAlias /cgi-bin/" it will include another folder which is the actual folder on your hard drive where the cgi files will be stored. You can change that path later, but for now simply remove the # from the front, this will enable the Script Alias against that directory.

So mine reads:

ScriptAlias /cgi-bin/ "C:/Program Files/Apache Software Foundation/Apache2.2/cgi-bin/"

The next thing we need do is to find the actual configuration for this directory, the directory configuration tells the server what can and cannot be accessed or executed from this folder, so now find the Directory section for the alias given:

<Directory "C:/Program Files/Apache Software Foundation/ Apache2.2/cgi-bin/">

And you need to set some options and add a handler for cgi-scripts to the .cgi extension, this looks just like this:

<Directory "C:/Program Files/Apache Software Foundation/ Apache2.2/cgi-bin/">
    AllowOverride None
    Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
    Order allow,deny
    Allow from all
    AddHandler cgi-script cgi
</Directory>

So, I have enabled cgi to execute from this directory, and told the handler to use any file ending "cgi" as a cgi-script.

Under Linux this is a lot simpler, you simply need to use the command line to edit the same file, which is in a slightly different location. So from the command prompt you will need to run the command:

sudo nano /etc/apache2/sites-available/default

This is the same configuration file as under Windows, it looks different, but it takes the same input, if you look now for the same <Directory specified as under windows, you should find the /usr/lib/cgi-bin folder thus:

<Directory "/usr/lib/cgi-bin">

Now set the directory to be the same:

<Directory "/usr/lib/cgi-bin">
    AllowOverride None
    Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
    Order allow,deny
    Allow from all
    AddHandler cgi-script cgi
</Directory>

However, unlike windows you need to add the script alias yourself, so I have added mine just like this:

ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
<Directory "/usr/lib/cgi-bin">
    AllowOverride None
    Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
    Order allow,deny
    Allow from all
    AddHandler cgi-script cgi
</Directory>

If you save this file and then restart the apache server any cgi files you have in cgi-bin can now be accessed.
Alias Explained
So, I hope its obvious now that the alias you set is used to map a local folder on your hard drive to a URL on your server, so if you have an alias of /cgi-bin/ pointing to a folder you access if from a browser as "http://YOUR IP/cgi-bin/"
If you added an alias of "/nothing/" then you could get to contents in that folder as "Http://YOURIP/nothing" in your browser.
Now to Code
So, our next task is to create some code which we can actually run as our CGI. So, in Visual Studio I create a new C++ application which is an empty Console Application. In Code Blocks on Linux I create an empty project.
To both I add a "main.cpp" file and I put the exact same code in both:
#include <iostream>
#include <string>

using namespace std;

const string c_ContentHeader = "Content-type: text/html\n\n";

int main (int argc, char* argv[])
{
 cout << c_ContentHeader;
 cout << "Hello From CGI" << endl;

 return 0;
}

Pretty simple application. Both windows and linux (gcc) will build this without even blinking, so go a head and build it up, and copy the output executable to your cgi folder.... so my output is called "hellocgi.exe" on windows and just "hellocgi" on Linux.
I copy and rename it to our cgi folders "C:\apachepath\cgi-bin\hellocgi.cgi" on windows and "sudo cp /home/ralph/cgidemo/bin/Debug/hellocgi /usr/lib/cgi-bin/hellocgi.cgi"
On Linux I restart apache "sudo /etc/init.d/apache2 restart".
And now I browse to the server as "http://127.0.0.1:8080/cgi-bin/hellocgi.cgi" on windows, and I browse to "http://127.0.0.1/cgi-bin/hellocgi.cgi" on Linux. And voila I see my "Hello From CGI".
Now, lets take this code apart, if you miss out sending the "Content-type" string first you will get a server side error in your browser, this error looks like this:
To then spot that we missed the content header, then we need to view the error.log within our apache logs folder and we can see the line:
[Mon Jul 04 11:32:16 2011] [error] [client 127.0.0.1] malformed header from script. Bad header=Hello From CGI: CGI.cgi

Oh, "bad header" and yes.. I had the code looking like this:

#include <iostream>
#include <string>

using namespace std;

const string c_ContentHeader = "Content-type: text/html\n\n";

int main (int argc, char* argv[])
{
 //cout << c_ContentHeader;  -- Commented out
 cout << "Hello From CGI" << endl;

 return 0;
}

So, what have we learned from all this?
  • To point an Apache alias at a directory.
  • To configure this directory to allow CGI execution.
  • To add a handler for the cgi-script type to a defined file extension.
  • To copy our C++ compiled/built output to the cgi-bin folder we've aliased.
  • To restart the server.
  • To run the executable.
Rendering HTML
From our C++ application we can now render any HTML we want, we could load this from templates on disk, or simply stream it out from the code
 cout << c_ContentHeader;
 cout << "<html>";
 cout << "<body>";
 cout << "<lcenter>Hello World from the CGI app, but with HTML</center>";
 cout << "</body>";
 cout << "</html>";

This is using the C++ IOStream library, however we can use the old C style output of "printf".

printf ("%s", c_ContentHeader.c_str());
 printf ("<html>");
 printf ("<body>");
 printf ("<center>Hello World from the CGI app, but with HTML</center>");
 printf ("</body>");
 printf ("</html>");

You could therefore load a template from disk, for example a file containing:
<html>
<body>
<center><%=String%></center>
</body>
</html>

And in our program we can replace &qupt;<%=String%>" with some other value... This kind of script usage might ring bells with ASP developers, as it is exactly the kind of syntax used to output a value from an ASP application... Maybe you could write your own dynamic content server...
We could simplify processing of templates loaded with the use of the C style calls, for example, this template:
<html>
<body>
<center>%s</center>
</body>
</html>

Loaded as a single string this could then be rendered with a value in place of %s with a single call to printf:
printf (template.c_str(), "HELLO WORLD");
Will result in the Hello world string being presented center of the page from the template.
You can get as complex as you wish here.
Receiving Input
So, if we now are happy with outputting HTML from our CGI application we will need to work out how to receive input. For example, if we had our browser come to our CGI page from a form post, then that page might look like this:
<html>
<body>

<form name="TestForm" action="http://127.0.0.1:8080/cgi-bin/contentLength.cgi" method="POST">

 <input type=text name="TextField1" value="Hello World">

 <input type=submit name="submit" value="Click To Submit">

</form>

</body>
</html>

So, we see the form has an input field and a submit field, and it points to our CGI page... this is a new CGI page, the code for which looks like this:
#include <iostream>
#include <string>

using namespace std;


const string c_ContentHeader = "Content-type: text/html\n\n";


    // ---- CONTENT LENGTH ----

    ///<Summary>
    /// The content length environment variable
    /// name
    ///</Summary>
    const string c_ContentLengthVariableName = "CONTENT_LENGTH";

    ///<Summary>
    /// Function to return the current requests
    /// content length
    ///</Summary>
    const int GetContentLength()
    {
        int l_result = 0;
        char* l_ContentLengthVariable = getenv(c_ContentLengthVariableName.c_str());
        if ( l_ContentLengthVariable != NULL )
        {
            l_result = atoi(l_ContentLengthVariable);
        }
        return l_result;

    }

    // ---- END CONTENT LENGTH ----


int main (int argc, char* argv[])
{
 cout <<c_ContentHeader;
 
 cout <<"The content Length is: " << GetContentLength() << endl;


 return 0;
}

This code shows you a new function to get the content length, this used the "getenv" call, get env returns a set of environment variables, for a CGI program like ours the environment variables are all items set up by the Apache server before it started our application.
So, if you build this code and copy it into place as "contentLength.cgi" save the HTML shown to your /htdocs or /var/www folder and restart your server. Then you can browse your server to the HTML page, click submit and send that data to the CGI.
The "<input" items in the form are then sent to the CGI script, becoming its input.
You should see by default the number of bytes sent to your CGI script. If you go back and change the text in the input field, you will see a different content length being shown by your CGI program.
So where does this input come from? Well, it comes from the standard input stream, is you perform a cin, or read console, or Console.ReadLine or whatever you want, then you will receive your input. We already know how many bytes to read, so lets see how to do this...

    // ---- GET CONTENT ----

    ///<Summary>
    /// Function to return the content
    ///</Summary>
    const list<string> GetContent()
    {
        list<string> l_result;

        // Now seek the content
        int l_ContentLength = GetContentLength();
        if ( l_ContentLength > 0 )
        {
            try
            {
                // Allocate a buffer for the information
                auto_ptr<char> l_Buffer (new char[l_ContentLength]);

                // Read the content sent into the buffer
                int l_bytesRead = fread (l_Buffer.get(), sizeof(char), l_ContentLength, stdin);

                // Check the data length
                if ( l_bytesRead == l_ContentLength )
                {                                        
                    // Convert the buffer to a string
                    stringstream l_stream (l_Buffer.get());

                    // Push the content as a string into the buffer
                    while ( !l_stream.eof() )
                    {
                        string l_item;
                        l_stream >>: l_item;

                        l_result.push_back(l_item);
                    }
                }
            }
            catch (bad_alloc l_badAllocationException)
            {
                // TODO handle bad alloc
            }
        }

        return l_result;
    }

    // ---- END GET CONTENT ----
 
So, what is this doing? Well, its allocating a buffer of the content length we know we have, its then reading this buffer from the stdin stream. I'm then choosing to squirt it into a string stream and pushing each line into the result list.
Putting it all together my whole code base now looks like this:
#include <iostream>
#include <string>
#include <list>
#include <sstream>
#include <memory>

using namespace std;


const string c_ContentHeader = "Content-type: text/html\n\n";

    // ---- CONTENT LENGTH ----

    ///<Summary>
    /// The content length environment variable
    /// name
    ///</Summary>
    const string c_ContentLengthVariableName = "CONTENT_LENGTH";

    ///<Summary>
    /// Function to return the current requests
    /// content length
    ///</Summary>
    const int GetContentLength()
    {
        int l_result = 0;
        char* l_ContentLengthVariable = getenv(c_ContentLengthVariableName.c_str());
        if ( l_ContentLengthVariable != NULL )
        {
            l_result = atoi(l_ContentLengthVariable);
        }
        return l_result;

    }

    // ---- END CONTENT LENGTH ----


    // ---- GET CONTENT ----

    ///<Summary>
    /// Function to return the content
    ///</Summary>
    const list<string> GetContent()
    {
        list<string> l_result;

        // Now seek the content
        int l_ContentLength = GetContentLength();
        if ( l_ContentLength > 0 )
        {
            try
            {
                // Allocate a buffer for the information
                auto_ptr<char> l_Buffer (new char[l_ContentLength]);

                // Read the content sent into the buffer
                int l_bytesRead = fread (l_Buffer.get(), sizeof(char), l_ContentLength, stdin);

                // Check the data length
                if ( l_bytesRead == l_ContentLength )
                {                                        
                    // Convert the buffer to a string
                    stringstream l_stream (l_Buffer.get());

                    // Push the content as a string into the buffer
                    while ( !l_stream.eof() )
                    {
                        string l_item;
                        l_stream >> l_item;

                        l_result.push_back(l_item);
                    }
                }
            }
            catch (bad_alloc l_badAllocationException)
            {
                // TODO handle bad alloc
            }
        }

        return l_result;
    }

    // ---- END GET CONTENT ----


int main (int argc, char* argv[])
{
 cout << c_ContentHeader;
 cout << "<html><body>" << endl;
 cout << "The content Length is: " << GetContentLength() << "<br>" << endl;

 cout << "The Content is: <br><pre>" << endl;

 list<string> theContent = GetContent();
 for (list<string>::const_iterator itr = theContent.begin();
  itr != theContent.end();
  itr++)
 {
  cout << (*itr) << endl;
 }

 cout << "</pre><hr></body></html>" << endl;

 return 0;
}

And I've built this and names it "contentExample.cgi" in my cgi-bin.
I've also edited the HTML form to submit to this new CGI application:
<html>
<body>

<form name="TestForm" action="http://127.0.0.1:8080/cgi-bin/contentExample.cgi" method="POST">

 <input type=text name="TextField1" value="Hello World">

 <input type=submit name="submit" value="Click To Submit">

</form>

</body>
</html>

Now when I perform the post we get an output from our CGI for the form being submitted. And viola, there is our input.
Conclusion
I'm not going to go into processing the input, nor am I going to point you in the direction of libraries to perform these item yourself. It is a much better learning experience to process all the commands yourself, possibly write your own library, so you understand the underlying CGI protocol environment variables and pitfalls.
This becomes invaluable in understanding the API.
Here you will find a list of all the CGI environment variables you can ask for via "getenv": http://www.perlfect.com/articles/cgi_env.shtml
All the code given is completely interchangeable between windows and linux.
If you have any problems with this (very basic) introduction, drop me a line.

9 comments:

  1. Addendum, the use of getenv is deprecated you should use _dupenv_s. So here's my function to help you get an environment variable as an srd::string

    ///
    /// Get the environment variable
    ///
    string Helpers::GetEnvironmentVariable (const string p_VariableName)
    {
    // The result
    string l_result = c_UnknownVariable;

    // Get the value
    char* l_buffer;
    size_t l_length;
    errno_t l_error = _dupenv_s(&l_buffer, &l_length, p_VariableName.c_str());

    // If we had no error & there is data
    if ( !l_error && l_buffer != NULL )
    {
    // Set the result
    l_result = string(l_buffer);
    }

    // Free the original buffer
    // Note: Its fine to call free with NULL
    free (l_buffer);

    // Return the result
    return l_result;
    }

    ReplyDelete
  2. How do we handle Unicode input as postdata? With my tests after URL-Decode I am seeing the value is coming as UTF-16. Is this always true or dependent on some setting?

    The ReadConsoleW gives error "Invalid handle". Cann't I use this function with CGI?

    please let me know your thoughts.

    Thanks.

    ReplyDelete
    Replies
    1. My programs are compiled in "multi-byte" mode, meaning when I call "ReadConsole" it actually calls down to "ReadConsoleA", and I can not therefore call into "ReadConsoleW".

      There are however a whole plantheon of other sources on using WCHAR and the standard template library wstring classes to pass Unicode in and out of places.

      Just remember to change your build target to be Unicode, not multi-byte, and check your calls to *W functions are correct.

      Quite why you're getting an error of invalid handle I can't comment, I don't know anything about your system, nor software, nor code, nor compiler. But hey ho, hope this helps.

      P.S. Sorry for the late reply, didn't spot I had a comment until just now.

      Delete
    2. Just a heads up to dipti - look up std::wstring to handle your unicode needs - you can see I've updated my C++ knowledge a little with the latest book from Bjarne, I highly recommend it.

      Delete
  3. This is the message from my error log, please help me!!

    [Sat Nov 01 22:07:19 2014] [error] [client 127.0.0.1] (OS 216)This version of %1 is not compatible with the version of Windows you're running. Check your computer's system information and then contact the software publisher. : couldn't spawn child process: C:/Apache Software Foundation/Apache2.2/cgi-bin/HelloWorld.cgi

    ReplyDelete
    Replies
    1. %1 is obviously some application name... And it could not start your application, you may need to add a Mime type for "*.cgi" as application/octet-stream... But other than that you simply could not run your CGI... Re-read the post, and check against your code/settings, because my info here works I used it again today.

      Delete
  4. How to view the same page from a remote system within the same network?

    ReplyDelete
    Replies
    1. I'm not sure I follow, but you need to bind the apache to listen on an interface or IP which you have allowed through your security measures, and then visit that new URL....

      How you do that totally depends on your security measures and set up, and will be pretty unique.

      I for example at home, would make the server host on say 192.168.0.100, and make apache "listen" on 80 rather than 8080.

      I then visit my LinkSys router, go into security, into the firewall and add a pass through from the external internet to the internal IP 192.168.0.100 on port 80.

      All external requests coming into my cable connection from the outside to my linksys router, will now direct inside my house to the server on that 192.168.0.100 address.

      At work, this would be far more complex, I would need to assign the Mac address to the DHCP to assign the IP address automatically, reserving it, then tunnel the connection through both the external router, to the internal firewall, to the managed switch, to the right internal IP...

      So, in short, over to you...

      Delete
  5. Where are all the new viewers to this old post coming from? Has this page been linked somewhere else?

    ReplyDelete