top of page
Jigsaw Development Team

Jigsaw Analytic Platform - Document Engine


Jigsaw Security new document engine can extract and process millions of documents per day. This capability is being used to perform forensics in the investigative space, make cloud computing platforms more flexible by being able to process large amounts of documents for investigative journalist and for finding correlation between documents regardless of format.

Document Formats

This new capability can process over 190 different types of documents and make sense of data stored in Google Drive, Microsoft Azure or AWS document stores. In addition the software works natively on the HDFS file system (Hadoop), Windows and Linux workstations and servers as well as cloud storage such as Dropbox or Google Drive.

Regardless of file formats, Jigsaw Document Engine can parse the content and make it available to Hadoop or Elasticsearch. Word Documents, PDF's, Video, Images, Wav Files and more, we process files to allow you to find what your looking for.

Use in Security Applications

This new document engine can read millions of documents per day allowing the operator of the system to be able to find links in data easily. The heart of the system will use OCR (optical character recognition) to create reference tables that are then searched for items of interest. Alerting on keywords allows users of the system to quickly key into areas of interest. The security uses of this technology allow homeland security or private security operators to track information on adversaries by looking for references to items being tracked in an automated manner. When a topic of interest is seen after processing, an email, SMS or in application alert will tell the operator that new information is available. This makes finding information on terrorism quick and easy.

A Sample Connection

To test this new capability, Jigsaw Security ingested all the documents present on data.gov. Our processing engine looks for information of interest using keywords or pattern recognition. As new documents are posted to data.gov, our dashboard finds those items of interest to our customers.

Locating Classified Data Leaks

As part of our testing Jigsaw Security started by looking at public documents for classification markings. By crawling news websites, we can alert our Government partners when information is leaked that matches patterns added by the operator.

Tracking Individuals and News Sentiment

A list of individual names was added to our system. When documents contain those terms (in this case names), operators will see the names on a time-line allowing the investigator to immediately see things such as reputations and sentiment of articles, blog post, news articles and more. We can see value in how this could be used during an election or during the passing of legislation to see how the media is covering stories of interest and to allow correction to inaccurate information posted by the media.

As you can see in the demo, you can track documents with certain keywords and terms that may be of interest.

When ingesting millions of documents per day, this allow you to quickly find items of interest to your case or research.

A look at some keywords in documents

Instead of having to read all documents being ingested, new documents matching search terms are automatically presented on the time-line and documents can be read or the original documents can be downloaded directly from the web interface.

To get a demo of this new capability feel free to reach out to your sales representative.

15 views0 comments
bottom of page