We ran a set of YCSB performance tests on Oracle NoSQL Database using SSD cards and Intel Xeon
E5-2690 CPUs with the goal of achieving 1M mixed ops/sec on a 95% read / 5% update workload. We used the standard YCSB parameters: 13 byte keys and 1KB data size (1,102 bytes after serialization). The maximum database size was 2 billion records, or approximately 2 TB of data. We sized the shards to ensure that this was not an "in-memory" test (i.e. the data portion of the B-Trees did not fit into memory). All updates were durable and used the "simple majority" replica ack policy, effectively 'committing to the network'. All read operations used the Consistency.NONE_REQUIRED parameter allowing reads to be performed on any replica.
In the past we have achieved 100K ops/sec using SSD cards on a single shard cluster (replication factor 3) so for this test we used 10 shards on 15
Storage Nodes with each SN carrying 2 Rep Nodes and each RN assigned to its own SSD card.
After correcting a scaling problem in YCSB, we blew past the 1M ops/sec mark with 8 shards and proceeded to hit 1.2M ops/sec with 10 shards.
Hardware Configuration
We used 15 servers, each configured with two 335 GB SSD cards. We did not have homogeneous CPUs across all 15 servers available to us so 12 of the 15 were Xeon E5-2690, 2.9 GHz, 2 sockets, 32 threads, 193 GB RAM, and the other 3 were Xeon E5-2680, 2.7 GHz, 2 sockets, 32 threads, 193 GB RAM. There might have been some upside in having all 15 machines configured with the faster CPU, but since CPU was not the limiting factor we don't believe the improvement would be significant.
The client machines were Xeon X5670, 2.93 GHz, 2 sockets, 24 threads, 96 GB RAM. Although the clients had 96 GB of RAM, neither the NoSQL Database or YCSB clients require anywhere near that amount of memory and the test could have just easily been run with much less.
Networking was all 10GigE.
YCSB Scaling Problem
We made three modifications to the YCSB benchmark. The first was to allow the test to accommodate more than 2 billion records (effectively int's vs long's). To keep the key size constant, we changed the code to use base 32 for the user ids.
The second change involved to the way we run the YCSB client in
order to make the test itself horizontally scalable.The basic problem has to do with the way the YCSB test creates its
Zipfian distribution of keys which is intended to model "real" loads by
generating clusters of key collisions. Unfortunately, the percentage of
collisions on the most contentious keys remains the same even as the number
of keys in the database increases. As we scale up
the load, the number of collisions on those keys increases as well,
eventually exceeding the capacity of the single server used for a given
key.This is not a workload that is realistic or amenable to horizontal scaling. YCSB does provide alternate key distribution algorithms so this is not a shortcoming of YCSB in general.
We decided that a better model would be for the
key collisions to be limited to a given YCSB client process. That way,
as additional YCSB client processes (i.e. additional load) are added, they each maintain the same number
of collisions they encounter themselves, but do not increase the number
of collisions on a single key in the entire store. We added client processes proportionally to the number of records in the database (and therefore the number of shards).
This change to the use of YCSB better models a use case where new groups
of users are likely to access either just their own entries, or entries
within their own subgroups, rather than all users showing the same
interest in a single global collection of keys. If an application finds
every user having the same likelihood of wanting to modify a single
global key, that application has no real hope of getting horizontal
scaling.
Finally, we used read/modify/write (also known as "Compare And Set") style updates during the mixed phase. This uses
versioned operations to make sure that no updates are lost. This mode
of operation provides better application behavior than the way we
have typically run YCSB in the past, and is only practical at scale because we eliminated
the shared key collision hotspots.It is also a more realistic testing scenario. To reiterate, all updates used a simple majority replica ack policy making them durable.
Scalability Results
In the table below, the "KVS Size" column is the number of records with the number of shards and the replication factor. Hence, the first row indicates 400m total records in the NoSQL Database (KV Store), 2 shards, and a replication factor of 3. The "Clients" column indicates the number of YCSB client processes. "Threads" is the number of threads per process with the total number of threads. Hence, 90 threads per YCSB process for a total of 360 threads. The client processes were distributed across 10 client machines.
Shards
KVS Size
Clients
Mixed
(records)
Threads
OverallThroughput(ops/sec)
Read Latencyav/95%/99%(ms)
Write Latencyav/95%/99%(ms)
2
400m(2x3)
4
90(360)
302,152
0.76/1/3
3.08/8/35
4
800m(4x3)
8
90(720)
558,569
0.79/1/4
3.82/16/45
8
1600m(8x3)
16
90(1440)
1,028,868
0.85/2/5
4.29/21/51
10
2000m(10x3)
20
90(1800)
1,244,550
0.88/2/6
4.47/23/53