Unlock the value of unstructured content
With an estimated 80 percent of content living outside of a structured data environment, information-driven software solutions need a way to unlock unstructured files like Word, PowerPoint and PDF, pull out the hidden and visible content that’s inside, and incorporate the data into their systems in a variety of different ways and formats.
Perceptive Document Filters uniquely fills that void, offering a single product you can embed into your products and solutions that makes it possible to perform a deep inspection of unstructured documents, filter and extract all content, transform it into usable data, and output the data in whichever format required.
With 25 years of experience, Document Filters is the engine that powers Perceptive Search and some of the leading Big Data, eDiscovery, DLP, email archival, content management, business intelligence and intelligent capture products on the market — offering the most advanced and proven alternative to other OEM and open-source solutions.
- Identify, extract and transform every document, email, legacy, archive and container format you need — Word, Excel, PowerPoint, PDF, AutoCAD, ZIPs, MSGs, Visio and hundreds more
- Analyze all text and metadata in a file with deep-inspection capability that even uncovers previously hidden information, such as tracked changes, comments, notes, annotations and embedded web links
- Determine the true nature of content, ensuring that source information is accurately identified for filtering without relying on file-name extensions
- Seamlessly render, manipulate and view contentin high definition (HD) without the need for additional components like ActiveX
- Easily export content for further usage elsewhere by converting files into text, HTML, structured XML, paginated HTML, multipage TIFFs, images (JPG, BMP, PNG), searchable PDFs, and custom formats
- Replicate original files through a Layout Engine that maps out exact, pixel-by-pixel coordinates of text, images and objects (instead of relying on simple character positioning)
- Eliminate the need for a third-party image manipulation package, applying precise redaction marks, annotations, Bates stamps and watermarks to content during output
- Render files at the page level and control the size of output and other variables, making it easy to create thumbnails or convert files with or without headers and footers
- Deploy across 20 platforms including Windows, Mac OSX, Linux, Solaris, FreeBSD, HP-UX and AIX — plus full support of character sets and encodings, such as Unicode
- Benefit from industry-leading extraction and throughput speeds, processing content faster with greater stability
- Embed Document Filters quickly and cost-effectively into your product with our flexible APIs for C, C++, COM, .NET and Java