public class LucenePDFDocumentFactory
extends java.lang.Object
This class enables easy Lucene indexing of PDF text and metadata via integration with PDFxStream. A supported lucene-core library jar must be on any application's classpath that uses this class.
Typical usage would be to create a new LucenePDFConfiguration
object, configure it as desired, and pass that object into
buildPDFDocument(com.snowtide.pdf.Document, LucenePDFConfiguration)
along with an open Document
. A Lucene Document
will be
returned containing Field
s corresponding to the source PDF document's text and
metadata, as dictated by the provided configuration object.
buildPDFDocument(com.snowtide.pdf.Document)
is also provided; this does not require a configuration
object, but results in Lucene Document
s that contain a direct dump of the PDF's
text content and metadata attributes according to a default configuration
.
This makes little sense in most environments, where the default names of PDF
metadata attributes are unlikely to match the names of the corresponding Lucene Fields for those
metadata attributes. See LucenePDFConfiguration
for details of
the default configuration of instances of that class.
Constructor and Description |
---|
LucenePDFDocumentFactory() |
Modifier and Type | Method and Description |
---|---|
static org.apache.lucene.document.Document |
buildPDFDocument(com.snowtide.pdf.Document pdf)
Creates a new Lucene Document instance using the PDF text and metadata provided by the PDFxStream
Document using a default
LucenePDFConfiguration.LucenePDFConfiguration() to control Lucene field names,
etc. |
static org.apache.lucene.document.Document |
buildPDFDocument(com.snowtide.pdf.Document pdf,
LucenePDFConfiguration config)
Creates a new Lucene Document instance using the PDF text and metadata provided by the PDFxStream
Document using the provided
LucenePDFConfiguration to control Lucene field
names, etc. |
public static org.apache.lucene.document.Document buildPDFDocument(com.snowtide.pdf.Document pdf) throws java.io.IOException
LucenePDFConfiguration.LucenePDFConfiguration()
to control Lucene field names,
etc.java.io.IOException
public static org.apache.lucene.document.Document buildPDFDocument(com.snowtide.pdf.Document pdf, LucenePDFConfiguration config) throws java.io.IOException
LucenePDFConfiguration
to control Lucene field
names, etc.java.io.IOException