Converting HTML content into a structured JSON format is essential for integrating web data with backend services or applications. Aspose.Cells for .NET offers an efficient and straightforward way to achieve this conversion, making it ideal for developers looking to automate the process of extracting tabular data from websites.

Introduction

Converting HTML content into a structured JSON format is essential for integrating web data with backend services or applications. Aspose.Cells for .NET offers an efficient and straightforward way to achieve this conversion, making it ideal for developers looking to automate the process of extracting tabular data from websites.

Why Convert HTML to JSON?

  1. Data Portability: Transfer tabular HTML data into backend services or APIs as JSON.
  2. Web-to-App Integration: Extract table or structured web content for further processing in apps.
  3. Automation Ready: Ideal for automating web scraping or content extraction processes.

Step-by-Step Guide to Convert HTML to JSON

Step 1: Install Aspose.Cells via NuGet

Install Aspose.Cells for .NET:

Install-Package Aspose.Cells

Step 2: Set Up License

Enable full functionality:

Metered matered = new Metered();
matered.SetMeteredKey("PublicKey", "PrivateKey");

Step 3: Load HTML File

Create a new workbook by loading the HTML input:

Workbook workbook = new Workbook("Sample.html");

Step 4: Access the Last Cell

Identify the last cell in the worksheet to define export boundaries:

Cell lastCell = workbook.Worksheets[0].Cells.LastCell;

Step 5: Define Range for Export

Create a range that spans the worksheet data:

Range range = workbook.Worksheets[0].Cells.CreateRange(0, 0, lastCell.Row + 1, lastCell.Column + 1);

Step 6: Configure JsonSaveOptions

Set any export options:

JsonSaveOptions options = new JsonSaveOptions();

Step 7: Export to JSON

Serialize the defined range to JSON:

string jsonData = Aspose.Cells.Utility.JsonUtility.ExportRangeToJson(range, options);

Step 8: Save JSON to File

Write the output to disk:

System.IO.File.WriteAllText("htmltojson.json", jsonData);

Common Issues and Fixes

1. Empty Output

  • Solution: Ensure the HTML file contains table-based structured content for valid data recognition.

2. Incorrect Range

  • Solution: Double-check that the range includes all relevant cells from the worksheet.

3. Export Formatting

  • Solution: Use JsonSaveOptions to control sheet indexing, skip empty rows, or customize hyperlinks.

More in this category