Search scimore.com

ScimoreDB Distributed Scale Out

To illustrate the benefits of ScimoreDB Distributed we provide the three following examples revealing how distribution can make a difference for a regular database task.

Example 1: Constant query time with growing data set

This example illustrates how you can achieve a linear growth of your database cluster power. Let's take a symbolic figure that it takes 1 second for a regular database engine to scan 1.000.000 rows, and it takes approximately 3 seconds to scan 3.000.000 rows. Using ScimoreDB Distributed, you can set up 3 database servers acting as single server, where each single database will operate of separate 1.000.000 rows. Then all 3 databases together will contain the 3 million initial rows. In this case scanning of 3.000.000 rows will be done in 1 second, hence achieving a constant response time, while database size grows.

Example 2: Solve disk bottleneck, and achieve constant update times

Usually the update query speed is limited not by CPU, but by the computer’s hard disk access speed. All transaction information must be logged to disk before replying back to the client commit status – to ensure no data is lost. A disk can do from 150 to 300 disk write operations per second, and so the speed of any database engine will not be able to exceed data update rate faster than the disk is able to take records of it. (Note that the option of disabling transaction logging to disk is available on ScimoreDB. In that case instead of being recorded with every transaction, it will be written per schedule bases, or, if database is not pressed – immediately.) We can overcome this limit by adding more servers and scaling linear in disk usage wise.

Example 3: OLAP scale-out

When dealing with OLAP queries – sorting & grouping of large datasets is essential to achieve fast performance. Usually, OLAP query performs grouping of the FACT table (a control table) by dimension tables (time, product ID, etc...). Using Distributed Servers, you can distribute the FACT table over many machines and replicate dimension tables on all DB servers. In such way, the grouping/aggregation operation will be done locally and in parallel. I.e. all servers will work at the same time and by completion merge the result to final result set. Merging the results is also done in parallel, by machines in the cluster. Therefore, OLAP queries scale-out linear with the amount of the servers in distributed environment.