Where does my data reside?
ScimoreDB splits data by rows to distribute amonst instances. There is no data redundancy. The PARTITION column determines on which instance a particular row resides. The partition column is used to derive a number, i, which is used to determine which instance a row lies upon by i%n, where n is the number of database instances. How i is derived is detailed in the following table:
|
Partition Column Type |
i |
|
TINYINT SMALLINT INT BIGINT UNIQUEIDENTIFIER |
The value of the column is used |
|
DATETIME FLOAT DOUBLE |
The field data is interpretted as an unsigned integer of appropriate size. The values are generally unpredictable. |
|
VARCHAR GUID |
A hash of the data is used |
|
TEXT BLOB |
Not allowed as a partition column |
The generalisation of this is that if the column is guaranteed less than or equal to 8 bytes, it will be interpretted as an unsigned integer value for i, if it is greater than 8 bytes, a hash of the data is used for i.
Example of Partitioning
Imagine we have the following simple table structure and data:
If we then run ALTER CLUSTER to use the three instances, using c1 as the PARTITION column, the data would be redistributed like this.
If we had used c2 as the PARTITION column, it would have been spread like this:
So, the partition column can be used to precisely control which rows are stored on which server instances. This is useful in your database design for better spreading the processing and bandwidth loads over all instances in the cluster.
Related