Extract Images from Word in C#

Introduction

Word documents often contain embedded images that need to be extracted for reuse, processing, or storage. This guide explains how to extract images from DOCX and DOC files programmatically in C# using Aspose.Words for .NET.

Why Extract Images from Word?

  • Automate bulk image extraction from multiple Word documents.
  • Retrieve high-resolution images without quality loss.
  • Save extracted images in a specific format (JPEG, PNG, BMP, etc.).

Table of Contents

  1. Setting Up Word Image Extraction in C#
  2. Step-by-Step Guide to Extracting Images
  3. Saving Extracted Images in Different Formats
  4. Batch Extract Images from Multiple Word Documents
  5. Getting a Free API License
  6. Conclusion and Additional Resources

1. Setting Up Word Image Extraction in C#

To extract images from Word documents, we use Aspose.Words for .NET. This library provides:

  • Automated image extraction from DOCX, DOC, and other formats.
  • Support for multiple image formats (PNG, JPG, BMP, etc.).
  • Efficient processing of large Word documents.

Installation

Install via NuGet:

PM> Install-Package Aspose.Words

Alternatively, download the DLL from the Aspose Downloads Page.


2. Step-by-Step Guide to Extracting Images

Follow these steps to extract images from a Word document programmatically:

  1. Load the Word file using the Document class.
  2. Retrieve all shapes containing images.
  3. Extract and save each image to a specified location.

Code Example

// Load the document
Document doc = new Document("input.docx");

// Get all shapes that contain images
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);

// Extract and save each image
int imageIndex = 1;
foreach (Shape shape in shapes)
{
    if (shape.HasImage)
    {
        string imagePath = $"Image_{imageIndex}.png";
        shape.ImageData.Save(imagePath);
        imageIndex++;
    }
}

This method automates image extraction from Word documents.


3. Saving Extracted Images in Different Formats

Aspose.Words allows saving extracted images in various formats:

FormatBenefit
JPEGCompressed format for web use.
PNGLossless format for high-quality images.
BMPUncompressed format for maximum detail.

To save images in a specific format, modify the file extension in the saving method.


4. Batch Extract Images from Multiple Word Documents

To extract images from multiple Word files, loop through a folder:

string[] files = Directory.GetFiles("input_docs", "*.docx");
foreach (string file in files)
{
    Document doc = new Document(file);
    NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);

    int index = 1;
    foreach (Shape shape in shapes)
    {
        if (shape.HasImage)
        {
            string imagePath = $"Extracted_{Path.GetFileNameWithoutExtension(file)}_{index}.jpg";
            shape.ImageData.Save(imagePath);
            index++;
        }
    }
}

This method automates bulk Word image extraction.


5. Getting a Free API License

To unlock full Aspose.Words features, request a free temporary license.

For documentation, visit the Aspose.Words Guide or ask queries on the Aspose forum.


6. Conclusion and Additional Resources

Summary

This guide covered:

How to extract images from Word documents in C#
Saving images in different formats (JPEG, PNG, BMP)
Batch processing multiple Word files


With Aspose.Words for .NET, you can efficiently extract and manage images in Word documents. Start automating Word image extraction today!