public class LucenePDFConfiguration
extends java.lang.Object
LucenePDFDocumentFactory
class.Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DEFAULT_MAIN_TEXT_FIELD_NAME
The default name assigned to the Lucene Field containing the main body of text extracted from a PDF file:
"text" . |
Constructor and Description |
---|
LucenePDFConfiguration()
Creates a new config object.
|
LucenePDFConfiguration(java.lang.String mainTextFieldName)
Creates a new config object.
|
Modifier and Type | Method and Description |
---|---|
boolean |
copyAllPDFMetadata()
Returns true if any PDF metadata attributes not explicitly
mapped will
be added to generated Lucene Documents using their names as specified in the source PDFs. |
java.lang.String |
getBodyTextFieldName()
Returns the name that will be assigned to Lucene Fields containing PDF body text content.
|
java.util.Map<java.lang.String,java.lang.String> |
getMetadataFieldMapping()
Returns a copy of the mapping between PDF metadata attributes and the names given to Lucene fields created for
them.
|
java.lang.String |
getMetadataFieldMapping(java.lang.String pdfMetadataAttr)
Returns the name that should be given to Lucene Fields created from the value of the named PDF metadata
attribute.
|
boolean |
indexBodyText()
Returns true if the main body text of PDFs added to Lucene Documents created through
LucenePDFDocumentFactory using this config object will be indexed. |
boolean |
indexMetadata()
Returns true if the PDF metadata attributes added Lucene Documents created through
LucenePDFDocumentFactory using this config object will be indexed. |
void |
setBodyTextFieldName(java.lang.String bodyTextFieldName)
Sets the name that will be assigned to Lucene Fields containing PDF body text content.
|
void |
setBodyTextSettings(boolean store,
boolean index,
boolean token)
Sets Field attributes that will be used when creating the Field object for the main text content of
a PDF document.
|
void |
setCopyAllPDFMetadata(boolean b) |
void |
setMetadataFieldMapping(java.lang.String pdfMetadataAttr,
java.lang.String fieldName)
Sets the name that will be assigned to Lucene Fields corresponding to the provided PDF metadata attribute
name (e.g.
|
void |
setMetadataSettings(boolean store,
boolean index,
boolean token)
Sets Field attributes that will be used when creating Field objects for the document attributes found in
a PDF document.
|
boolean |
storeBodyText()
Returns true if the main body text of PDFs added to Lucene Documents created through
LucenePDFDocumentFactory using this config object will be stored. |
boolean |
storeMetadata()
Returns true if the PDF metadata attributes added Lucene Documents created through
LucenePDFDocumentFactory using this config object will be stored. |
boolean |
tokenizeBodyText()
Returns true if the main body text of PDFs added to Lucene Documents created through
LucenePDFDocumentFactory using this config object will be tokenized. |
boolean |
tokenizeMetadata()
Returns true if the PDF metadata attributes added Lucene Documents created through
LucenePDFDocumentFactory using this config object will be tokenized. |
public static final java.lang.String DEFAULT_MAIN_TEXT_FIELD_NAME
"text"
.public LucenePDFConfiguration(java.lang.String mainTextFieldName)
mainTextFieldName
- - the name that should be assigned to Fields containing
the main PDF text content.public LucenePDFConfiguration()
PDF documents
converted into
Lucene Documents will be assigned a default name
. Other configuration defaults are as follows:
public void setBodyTextFieldName(java.lang.String bodyTextFieldName)
public java.lang.String getBodyTextFieldName()
public java.util.Map<java.lang.String,java.lang.String> getMetadataFieldMapping()
public java.lang.String getMetadataFieldMapping(java.lang.String pdfMetadataAttr)
public void setMetadataFieldMapping(java.lang.String pdfMetadataAttr, java.lang.String fieldName)
Document.ATTR_AUTHOR
, etc).public boolean copyAllPDFMetadata()
mapped
will
be added to generated Lucene Documents using their names as specified in the source PDFs.public void setCopyAllPDFMetadata(boolean b)
copyAllPDFMetadata()
public void setBodyTextSettings(boolean store, boolean index, boolean token)
store
,
index
, and token
parameters of the Field
constructor before Lucene v4.x and the same-named attributes of FieldType
afterwards.public void setMetadataSettings(boolean store, boolean index, boolean token)
store
,
index
, and token
parameters of the Field
constructor before Lucene v4.x and the same-named attributes of FieldType
afterwards.public boolean indexBodyText()
LucenePDFDocumentFactory
using this config object will be indexed.public boolean storeBodyText()
LucenePDFDocumentFactory
using this config object will be stored.public boolean tokenizeBodyText()
LucenePDFDocumentFactory
using this config object will be tokenized.public boolean indexMetadata()
LucenePDFDocumentFactory
using this config object will be indexed.public boolean storeMetadata()
LucenePDFDocumentFactory
using this config object will be stored.public boolean tokenizeMetadata()
LucenePDFDocumentFactory
using this config object will be tokenized.