Syntax:
CREATE FREETEXT INDEX index_name (column_definition [, ...]);
column_definition syntax :
column_name type_name [COMMENT {'constant' | ('constant')}] [STORE|UNSTORE] [UNINDEX] [KEYWORD]
type_name syntax :
TINYINT | SMALLINT | INT | INTEGER | BIGINT | DOUBLE | FLOAT | DATETIME | CHAR | VARCHAR | TEXT
Creates a new Lucene full-text index. index_name specify the name of the index in database. To specify the database use ‘USE db’ command before CREATE statement or set DB name in index name: database.index_name.
Column Definition
Each column must have at least a name and a type listed above.
STORE
If specified and a column is not TEXT type, the value of the column will be indexed and stored in full-text index. We not recommend to store all columns in index. Usually, to store unique fields, used uniquely identify table’s row, for example primary key field(s).
UNSTORE
If specified, index the column values and allow search, sort. However, the full-text search results contain NULL values for the field, since they was not stored.
UNINDEX
The field will be stored in full-text index, but not indexed, therefore not searchable. Usually, we store unique table’s fields, for instance primary key fields. The fields have no interest in full-text search query, but they are needed to join the free text results with the linked DB table. Read more in <a>Querying full-text</a> section.
KEYWORD
When present, the value will not be tokenized / split. All not character fields is auto set to use KEYWORD. The string ‘http://www.server.com’ will only be searchable by entire string, www, http result in nothing.
Sorting and range queries can only be applied on fields that has been specified with KEYWORD.
Type conversion
All data in Lucene internally stored as text. Therefore, the data types: TINYINT | SMALLINT | INT | INTEGER | BIGINT | DOUBLE | FLOAT | DATETIME will be converted into a text representation that can be ordered alphabetically. Numbers will be padded with the leading zeros, so that the numbers has the same width and sort order is the same as the converted number’s order. DATETIME are converted using: YYYYMMDD format.
Setting up “stop words”
Stop words allows skipping from indexing special words that is meaningless for the search output. Such words are: “and, or, is, the,” or just a single character. By default, full-text index uses English stop words. Developers can customize stop words. Custom stop words have to be enlisted in stopwords.txt file, where each word must be separated by newline. Copy the file to full-text index folder. The folder location can be obtained from “system.systables” table’s column “path”.
You must to re-index Lucene index after changing stop words. Although, it is not necessary to do so, if full-text index is empty. How to re-index the index, please read in <a>Rebuilding Index</a> section.
An example:
use debate create freetext index debate_search ( postid int unindex, threadid int unindex unstore, dateline varchar unstore, title varchar unstore, pagetext text unstore )