Converting HTML Emails to PDF in C# .NET

Converting HTML Emails to PDF in C# .NET

We had a requirement from a customer to turn emails that they received into a pdf so that they could be stored in the system alongside the work item that was created based on the email. I did some research and found some components to help me do that. The final solution uses a mixture of HtmlAgilityPack, HtmlRenderer, and PDFSharp, which are all free open source libraries. HtmlRenderer was the key to the solution as it takes html and converts it to pdf, including all the styles. HtmlAgilityPack is used to manipulate and traverse html, while PDFSharp is used by HtmlRenderer to create pdf documents.

We’ll start by loading up an email. For this example I’m just going to populate a System.Net.Mail.MailMessage object with some values, but in actual solution you would probably read from something like MS Exchange or some email provider service. The actual email looks something like this:

Here’s the code to load it:

public static System.Net.Mail.MailMessage LoadEmail(string file)
var html = File.ReadAllText(“exchange_email.html”);
var msg = new MailMessage(“joe.customer@xyzcorp.xom”, “”)
Body = html,
IsBodyHtml = true,
Subject = “New Order”
return msg;

Once we retrieve the email, in addition to storing the body of the email, we’ll also want the subject, from, to and other relevant information for the email.  We’ll create an html document with HtmlAgilityPack to hold all this info.

​We’ll start with a skeleton html document structure loaded from an html string as this is far easier than creating each node of the html document by hand.

Then once that is created we’ll grab some references to the head and body nodes so we can add our content.  I use the AppendField helper method which just creates a little html structure with a label and it’s text like so:

Then we’ll load up the body html (or plain text possibly) into its own document, to prepare for copying it into the document that we’re creating.  We’ll add a Body label to put the contents of the email body into the document.

To get the document to look as much like the original as possible we copy the styles from the head tag of the email body to the head tag in our document.

Once we’ve done that we create a div tag to hold the html body of the email and copy it in.

Once we have the email loaded and an html document generated from the email, then we save HtmlDocument to a string to prepare for converting to pdf.  We pass the html string to HtmlGenerator and it gets saved to a pdf.

The resulting pdf file looks like this: