SharpTools: HTTP, GET, POST, uploading files and cookie/session authentication in C#
SharpTools is a general tool library for C#. This release simplifies using HTTP GET, POST, uploading files and cookie persistence when writing .NET applications in C#.
Downloads for the software described here are available on the downloads page.
IMPORTANT REMINDER: You must provide a credit to myself when re-using this code in your own projects or applications, or when posting it on other sites. I have seen a number of modified clones of the source code for this project – in particular MetaWrap, FishEye @ MahApps / MahTweets and most inappropriately S. Ali Tokmen @ Google Code who took the entire codebase and simply re-marked it as his own work. You are welcome to re-use the code but you must provide a credit in the source code and a link back to this site on your web page (if re-publishing the code). Thank you.
Visual Basic Version
Espend.de have re-written SharpTools in Visual Basic, thanks for your efforts! You can find the VB version here:
Introduction
Blimey. You’d think in the 21st century retrieving web pages and POSTing forms and file uploads in the .NET framework would be a quick and painless exercise. But apparently, this isn’t the case, so here’s a quick and dirty class library to do it for you.
(This code is part of a general tool library I developed called SharpTools, with lots of other features, but those haven’t been released yet)
My basic problem was:
- Login to a web site by POSTing a username and password to a form
- Upload some files via POST to a second form after logging in
First I’ll take you through how Microsoft wants you to do it, then I’ll explain how my class library works.
How HTTP requests work in .NET: A Crash Course
Include the namespace System.Net
in your code to access all the classes below.
First you create an HttpWebRequest
object with the URL you want like this:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(myUrl);
Then set the method, which for our purposes will either be GET or POST:
request.Method = "GET";
Submitting forms usually calls for using POST, fetching pages (with or without query parameters) usually calls for using GET.
If you’re POSTing a form (without uploading files) you need to set the Content-Type
HTTP header to let the web server know you’re sending POST data in the request body, like this:
request.ContentType = "application/x-www-form-urlencoded";
For submitting forms via POST, throw the request body into a string, which takes the same format as a query string in a GET request, then encode it into a byte array:
string args = @"param1=value1¶m2=value2¶m3=value3"; byte[] dataToSend = Encoding.ASCII.GetBytes(args);
Sending the request is done automatically when you ask your request object to return a response. If you’re sending a GET request, there is no request body so you can skip the following step. If you’re sending a POST request, you need to transmit the request body as follows:
request.ContentLength = dataToSend.Length; Stream st = request.GetRequestStream(); st.Write(dataToSend, 0, dataToSend.Length); st.Close();
Note this will throw a WebException
exception if there’s a problem.
Now you can retrieve the response body as follows:
HttpWebResponse response = (HttpWebResponse)request.GetResponse(); StreamReader sr = new StreamReader(response.GetResponseStream()); string responseBodyText = sr.ReadToEnd(); sr.Close();
At this point, if all has gone well, responseBodyText
will now contain the web page or response body of the request. Again, you’ll get a WebException
if there’s a problem.
Getting the response code
Response codes like 200 (successful request or “OK”), 302 (redirection to another page or “Found”), 404 (page not found or “Not Found”) indicate what happened when you made the request. You can access the response code by querying the property response.StatusCode
.
Getting response headers
If you’re looking for a particular header, for example the Location header of a 302 redirection, you can do so like this:
string locationHeader = response.GetResponseHeader("Location");
and so on for any other headers you’d like to access.
Preventing auto-redirection
By default, .NET will follow 302 redirections automatically. Sometimes you might not want this, for example if you POST to a login form whereby you’ll be redirected to different pages depending on whether or not your login was successful. You might want to find out which page you’re being sent to first, in which case you can disable auto-redirection like this:
request.AllowAutoRedirect = false;
Adding custom headers
Sometimes you’ll want to add your own headers, for example to prevent caching. Here’s how you can do that:
request.Headers.Add("Pragma", "no-cache");
where the first argument is the header name and the second argument is the header value.
HTTP 417 Expectation failed
Whoever wrote the HTTP library code at Microsoft had the genius (ahem) idea of not POSTing form data directly on the first request, but instead to tell the web server that a request is coming, get a 100 Continue
header back, then send the actual data in a second request body. This breaks Apache, which is used by the majority of web sites. To fix this, you need to add this code before you make the request:
request.ServicePoint.Expect100Continue = false;
This property was added in .NET 2.0.
How to handle cookie/session-based authentication
In general, when you login to a web site via a login form, upon successful login the browser will be sent a cookie containing a unique login ID, which is then sent back to the site on all subsequent page requests to identify yourself as the logged in user. If you don’t send the cookie back, you’ll be treated as an anonymous (non-logged in) guest.
After a successful login therefore, you need to store the cookies you receive back from the site. This can be done with the following code:
CookieCollection cookies = response.Cookies;
Then you need to supply these cookies back to the web server in all subsequent requests like this:
request.CookieContainer = new CookieContainer(); foreach (Cookie c in cookies) request.CookieContainer.Add(c);
How do I upload files?
This is where it gets rather tricky. First of all, POSTed data which includes files has to be sent with a Content-Type of multipart/form-data
, together with a so-called boundary parameter which separates each argument and file supplied in the request body into separate parts. After each part, the boundary parameter is given to signal the end of the part. The boundary parameter is also used at the start and (slightly modified) end of the entire request body.
Unlike with a regular POST request without files, each and every non-file argument must be supplied in its own part, not as a GET-style encoded string.
An example will illustrate the principle. Let’s suppose our boundary parameter is “MyBoundary” (a very bad choice; the boundary should be something that won’t appear in any of the arguments or file content). Let’s also suppose we are sending a description of the file in a separate argument called desc
and a title for it in title
. The file argument itself will be called file
.
First you’ll set the Content-Type as follows:
request.ContentType = "multipart/form-data; boundary=MyBoundary";
Now you will need to manually create the request body, which will look exactly like this:
--MyBoundary Content-Disposition: form-data; name="title" Katy in Oslo --MyBoundary Content-Disposition: form-data; name="desc" This is a picture of me in Oslo last summer. --MyBoundary Content-Disposition: form-data; name="file"; filename="C:\Documents and Settings\Katy\Desktop\SomeFile.jpg" Content-Type: application/octet-stream --MyBoundary--
The request body format is very strict and will fail unless you format it exactly as above. Specifically:
- Each boundary must start on a newline and be preceeded by two hyphens (
--
). - There must be a Content-Disposition header for each argument.
- There must be a blank line between the headers for each part and the body.
- File arguments must have a Content-Type header, and the Content-Disposition header must include a
filename
parameter otherwise the web server will consider the file argument to have been left blank. - At the end of the request body, you must append two hyphens to the end of the last boundary.
Now you just encode the request body string and send it in the same manner as above for POSTing without files.
Summary
What a hassle eh? In summary:
- You have to create a new request object and start from scratch for every request.
- You have to encode the arguments or request body for every request.
- You have to remember to store and resend cookies before and after every request if you need to stay logged in.
- You have to remember to set the
Expect100Continue
property for every request. - There is no standardised way of creating all three kinds of request, you have to manually produce the request body yourself depending on the type of request you want to make.
- You can’t just send the request as text or get the response as text, you have to get the request and response streams and write/read from them, then close them afterwards, as appropriate.
- There’s alot of error checking for exceptions you have to do which I haven’t included above at all.
Pretty weak. Let’s see if we can make it any easier.
My HTTP Library
I’ve created two classes, HTTPWorker
to make requests simpler, and MIMEPayload
to automatically create the request bodies when POSTing with files. You don’t need to use this in your code though, HTTPWorker will create one automatically if you want to send files.
I won’t document all the properties and methods here, because the source code includes XML documentation which you can compile if you like. There are also examples of logging in and POSTing files included, but I will illustrate how it works here.
Things to note about my class:
- You can re-use the object for many requests. In each case, you must set the
Url
property first (this sets up theExpect100Continue
andAllowAutoRedirect
properties of the HttpWebRequest object used internally), then theType
property, which determines how the request body will be created. - Cookies will be auto-persisted unless you turn it off by setting
PersistCookies
to false, so logging in is now a fire-and-forget experience. - POST and POST-with-files requests are now handled the same. You can add arguments with
AddValue
and files withAddFile
. The request will be created according to theType
property. - You can fetch the request and response objects at any time with the
RequestObject
andResponseObject
properties if you need to query or set additional headers, change the auto-redirect flag etc. - You don’t need to mess with streams. You can get the response text from the
ResponseText
property. The first time you use this after a request, it will open the response stream, read the response and cache it. On subsequent uses, it will just receive the response text from the cache.
Setting up
Do this once before any series of requests in your application. The object can persist for the lifetime of the application if you so wish.
HTTPWorker http = new HTTPWorker(); HttpWebResponse rsp = null;
GET a web page
http.Url = "http://yoursite.com/pageToFetch.html"; http.Type = HTTPRequestType.Get; http.RequestObject.AllowAutoRedirect = false; // if required try { rsp = http.SendRequest(); } catch (WebException ex) { Console.WriteLine(ex.Message); return; } string webPage = rsp.ResponseText;
POSTing a form
http.Url = "http://yoursite.com/login.php"; http.Type = HTTPRequestType.Post; http.AddValue("username", username); http.AddValue("password", password); try { rsp = http.SendRequest(); } catch (WebException ex) { Console.WriteLine(ex.Message); return; } // You can now check for the response code with rsp.StatusCode to see what happened
If you posted a login form and it was successful, the cookie is now stored and will be used automatically on your next request.
POSTing a form with file uploads
This uses the example I gave above and creates the exact request body shown above, just with a different boundary (which is set automatically).
http.Url = "http://yoursite.com/uploadpage.php"; http.Type = HTTPRequestType.MultipartPost; http.RequestObject.KeepAlive = true; http.RequestObject.Headers.Add("Pragma", "no-cache"); http.AddValue("title", "Katy in Oslo"); http.AddValue("desc", "This is a picture of me in Oslo last summer."); http.AddFile("file", @"C:\Documents and Settings\Katy\Desktop\SomeFile.jpg"); try { rsp = http.SendRequest(); } catch (WebException ex) { Console.WriteLine(ex.Message); return; }
Checking if the request was successful
In most cases you’ll probably need to parse the page to check that what you intended to do was actually successful. You can do this pretty easily with regular expressions in the System.Text.RegularExpressions
namespace, or with basic status code and string functions like this:
if (rsp == null) { // The web server didn't return anything } if (rsp.StatusCode != HttpStatusCode.OK) { // There was an error } if (http.ResponseText.Contains("Login failed!")) { // The text 'Login failed!' was found on the form errorText = Regex.Replace(http.ResponseText, "^.*<h2>Login failed!</h2>.*<p>(?<error>.*)</p>.*$", "${error}", RegexOptions.Singleline); }
The principle of this is fairly simple. The regex parses the entire web page as a single line (meaning that .*
can span across newlines), looking for the error message, and captures it into a group called error
. The entire web page content is then replaced with just the error message, which is stored in errorText
. To make sure the entire web page is replaced, we must parse it all, which is done by ensuring ^.*
appears at the beginning and .*$
appears at the end. These symbols match the start of the page plus any number of characters, and any number of characters plus the end of the page respectively.
Final words
I’ve now used this library in a couple of projects and it’s made life alot simpler. I hope you find it useful too!
Thank you for your code. I tried it with a page, but it does not work until I set AllowAutoRedirect to true. Unfortunately requesting another page via get-request fails anyhow.
Do you have an idea why a) I need to set AllowAutoRedirect and b) why the get-request fails?
It might depend on the web server you are contacting. If a page request redirects (with a 302 status code), you will just get a header back with the target URL, so it also depends how you define ‘but it does not work’ 🙂
If you post a source code example we might be able to help you better 🙂
Katy.