Docs Detective

How does it work

Docs Detective uses sophisticated algorithms to process and compare documents, and provides the results in a way that makes comparison easy and quick.  This document gives a high level overview of the plagiarism detection process, however, it’s simplified, and we have excluded much of the secret sauce.

The plagiarism detection process

Search

Portions of the document are automatically sent to the Google Search API, which results a list of documents that have matching text.  The results of multiple searches are combined into one list of documents.

Document Retrieval and Text Extraction

The documents are then downloaded to the App Engine server, where the text is extracted.

Comparison

Using algorithms and data structures optimized for comparing text, a document can compared to several hundred web documents in just a few seconds.

Finalizing the Results

To make comparing simple and quick, web documents that match a section of plagiarized text are grouped together.  Statistics about each section of text are calculated.