Need to load an HTML string or webpage into Excel for processing, visualization, or storage? This guide explains how to convert HTML content directly to an Excel workbook using Aspose.Cells for .NET.

Introduction

Converting HTML data to Excel can be a challenging task due to the complexity of HTML structures and the need for precise formatting in Excel. However, with Aspose.Cells for .NET, this process becomes straightforward and efficient.

This article will walk you through the steps required to convert an HTML string or file into an Excel workbook using C#. We’ll cover everything from installing the necessary package to saving the final output as a .xlsx file.

Use Cases for HTML to Excel Conversion

  • Convert email or CMS data into Excel
  • Process HTML reports or exports from third-party platforms
  • Import web tables into structured spreadsheets

Step-by-Step Guide

Step 1: Install Aspose.Cells for .NET

$ dotnet add package Aspose.Cells

Step 2: Prepare HTML as a String

Prepare your HTML content either from an external source or directly within the code.

string htmlString = "<html><body><table><tr><td>Item</td><td>Price</td></tr><tr><td>Book</td><td>20</td></tr></table></body></html>";

Step 3: Convert String to Stream

Convert the HTML string into a MemoryStream object, which is required for loading HTML content.

using (MemoryStream htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(htmlString)))
{
    // Proceed with loading the stream into an Excel workbook
}

Step 4: Load HTML Stream with HtmlLoadOptions

Use HtmlLoadOptions to load the HTML content from the MemoryStream and create a new Workbook object.

using (MemoryStream htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(htmlString)))
{
    Workbook workbook = new Workbook(htmlStream, new HtmlLoadOptions());
}

Step 5: Work With the Workbook (Optional)

Once the HTML content is loaded into a Workbook, you can manipulate it as needed. For example, you might want to add formulas or styles.

using (MemoryStream htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(htmlString)))
{
    Workbook workbook = new Workbook(htmlStream, new HtmlLoadOptions());
    Worksheet sheet = workbook.Worksheets[0];
    // Add formulas, styles, or modify data
}

Step 6: Save the Resulting Excel File

Finally, save the modified Workbook to a file on disk.

using (MemoryStream htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(htmlString)))
{
    Workbook workbook = new Workbook(htmlStream, new HtmlLoadOptions());
    // Optional: Modify the data or format
    Worksheet sheet = workbook.Worksheets[0];
    sheet.AutoFitColumns();
    workbook.Save("converted.xlsx", SaveFormat.Xlsx);
}

Complete Code Example

Here is a complete example that demonstrates how to convert an HTML string into an Excel file using Aspose.Cells for .NET.

using System;
using System.IO;
using System.Text;
using Aspose.Cells;

public class HtmlToExcelConverter
{
    public static void Main()
    {
        string html = "<html><body><table><tr><td>Name</td><td>Score</td></tr><tr><td>Alice</td><td>92</td></tr></table></body></html>";

        using (MemoryStream htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(html)))
        {
            Workbook workbook = new Workbook(htmlStream, new HtmlLoadOptions());
            Worksheet sheet = workbook.Worksheets[0];
            sheet.AutoFitColumns();
            workbook.Save("html_to_excel.xlsx", SaveFormat.Xlsx);
        }

        Console.WriteLine("HTML converted to Excel.");
    }
}

Best Practices

PracticeBenefit
Use streams for web integrationEasier to handle in APIs
Use HtmlLoadOptionsCustomize parsing or handle advanced HTML content
Auto-fit columnsImprove readability of output

Subsection: Handling Complex HTML Structures

When dealing with complex HTML structures, it’s important to use the HtmlLoadOptions class to customize how Aspose.Cells parses the HTML. This can include setting options such as ignoring certain elements or handling specific tags differently.

using System;
using System.IO;
using System.Text;
using Aspose.Cells;

public class ComplexHtmlToExcelConverter
{
    public static void Main()
    {
        string complexHtml = @"
            <html>
                <body>
                    <table border='1'>
                        <tr>
                            <th>Product</th>
                            <th>Price</th>
                            <th>Quantity</th>
                        </tr>
                        <tr>
                            <td>Laptop</td>
                            <td>$999.99</td>
                            <td>5</td>
                        </tr>
                        <tr>
                            <td>Smartphone</td>
                            <td>$699.99</td>
                            <td>10</td>
                        </tr>
                    </table>
                </body>
            </html>";

        // Create a MemoryStream from the HTML string
        using (MemoryStream htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(complexHtml)))
        {
            // Use HtmlLoadOptions to customize parsing if needed
            HtmlLoadOptions loadOptions = new HtmlLoadOptions();
            loadOptions.AutoFilterColumns = true;  // Enable auto-filter for columns

            // Load the HTML content into a Workbook
            Workbook workbook = new Workbook(htmlStream, loadOptions);

            // Access the first worksheet
            Worksheet sheet = workbook.Worksheets[0];

            // Auto-fit all columns to improve readability
            sheet.AutoFitColumns();

            // Save the resulting Excel file
            workbook.Save("complex_html_to_excel.xlsx", SaveFormat.Xlsx);
        }

        Console.WriteLine("Complex HTML converted to Excel.");
    }
}

Conclusion

Converting HTML content to Excel using Aspose.Cells for .NET is a powerful way to integrate web data into your Excel workflows. By following this guide, you’ll be able to handle both simple and complex HTML structures with ease.

More in this category