- Indexes >
- Indexing Tutorials >
- Text Search Tutorials >
- Specify a Language for Text Index
Specify a Language for Text Index¶
On this page
This tutorial describes how to specify the default language associated with the text index and also how to create text indexes for collections that contain documents in different languages.
Specify the Default Language for a text Index¶
The default language associated with the indexed data determines the
list of stop words and the rules for the stemmer and tokenizer. The
default language for the indexed data is english.
To specify a different language, use the default_language option
when creating the text index. See Text Search Languages for
the languages available for default_language.
The following example creates a text index on the
content field and sets the default_language to
spanish:
Create a text Index for a Collection in Multiple Languages¶
Specify the Index Language within the Document¶
If a collection contains documents that are in different languages, include a field in the documents that contain the language to use:
- If you include a field named
languagein the document, by default, theensureIndex()method will use the value of this field to override the default language. - To use a field with a name other than
language, you must specify the name of this field to theensureIndex()method with thelanguage_overrideoption.
See Text Search Languages for a list of supported languages.
Include the language Field¶
Include a field language that specifies the language to use for the
individual documents.
For example, the documents of a multi-language collection quotes
contain the field language:
Create a text index on the field quote:
- For the documents that contain the
languagefield, thetextindex uses that language to determine the stop words and the rules for the stemmer and the tokenizer. - For documents that do not contain the
languagefield, the index uses the default language, which is English, to determine the stop words and rules for the stemmer and the tokenizer.
For example, the Spanish word que is a stop word. So the
following text command would not match any document:
Use any Field to Specify the Language for a Document¶
Include a field that specifies the language to use for the individual
documents. To use a field with a name other than language, include
the language_override option when creating the index.
For example, the documents of a multi-language collection quotes
contain the field idioma:
Create a text index on the field quote with the
language_override option:
- For the documents that contain the
idiomafield, thetextindex uses that language to determine the stop words and the rules for the stemmer and the tokenizer. - For documents that do not contain the
idiomafield, the index uses the default language, which is English, to determine the stop words and rules for the stemmer and the tokenizer.
For example, the Spanish word que is a stop word. So the
following text command would not match any document: