VK Reimagines its Storage Architecture

Learn how this social networking service company engineered an ingenious data storage solution and optimized the total cost of ownership (TCO) with a little help from Intel.

VK’s Modernization Approach

  • VK, the largest social network in Russia and the Commonwealth of Independent States (CIS), is home to 97 million monthly active users.

  • Modernized its tiered storage systems by utilizing Intel® Optane® SSDs, Intel® SSDs for NVMe, Intel® Optane™ persistent memory and Intel® FPGA Programmable Acceleration Cards (Intel® FPGA PAC).

BUILT IN - ARTICLE INTRO SECOND COMPONENT

Social networking is immensely data-intensive. As Russia’s largest social network, it’s no surprise that data storage consumes a sizeable part of VK’s operating budget. To reduce cost and improve performance, VK transformed its tiered storage using Intel® Optane™ persistent memory, Intel® Optane™ SSDs, Intel® SSDs with non-volatile memory express (NVMe), and Intel® FPGA Programmable Acceleration Cards (PACs). 

Key Challenges to Overcome
 

  • How to reduce the cost of data storage, growing at a rate of hundreds of petabytes per year.1
  • Eliminate the need to store multiple formats of the same image to serve different user devices.

Innovative Solutions Powered by Intel
 

  • VK reworked its storage architecture to meet more intensive performance requirements and also to lower storage cost.
  • Migrated data away from more expensive DRAMs by introducing Intel Optane persistent memory for the rating counter servers that support the newsfeed.
  • Upgraded storage for frequently accessed data to Intel® SSDs with 3D NAND technology, and migrated most frequently used data to Intel® Optane™ SSDs within its content delivery network (CDN).
  • To reduce the need to store multiple image sizes and formats, VK used Intel® FPGAs to convert images on-the-fly from a single high-resolution master copy to the resolution needed for each user.

Real-World Results According to VK
 

  • Reduced cost significantly by diverting data from dynamic random-access memory (DRAM) to SSDs and Intel Optane persistent memory running in memory mode.1
  • Upgrading the processor from the Intel® Xeon® Gold 6230 processor to the Intel® Xeon® Gold 6238R processor decreased compute cost by 40 percent and improved performance per watt by 72 percent.1
  • Using the improved storage solution, VK reported it was able to consolidate servers at a ratio of 2:1; support continued data growth with storage of up to 0.408PB in 1U; and lower power and cooling costs.1

Reducing Storage Costs for Social Networking
Social networking has radically transformed how we connect with loved ones and colleagues. In Russia and the CIS, VK is the largest social network—and it’s growing fast. On a typical day, 10 billion messages are exchanged through the platform, where friends and families use it to keep in touch and share their stories.

Given all that data flowing through the network, it’s little wonder that VK is spending a significant amount on data storage infrastructure, which contributes to a huge part of VK’s annual budget. To optimize TCO for storage, the company needs to find an optimal balance between cost and performance.

Fast storage media are expensive, but deliver a smoother user experience for the most frequently accessed data. In all, 1.1 exabytes of data are distributed across the storage estate, and data is stored close to where it is uploaded. “Russia is a big country with large distances between cities. We need to have a good CDN cache infrastructure to store data close to users, so that they have a good experience using our social network,” says Roman Podpriatov, Deputy COO, VK. The company’s IT infrastructure is based on 19,000 servers spread across three main data centers, supported by 30 CDN facilities to improve and accelerate access to the hottest data.


Looking Back: VK’s Original Storage Architecture
On its CDN servers, VK uses three tiers for caching data as the data drops down the tiers as it cools. Hot data, for example, can be a holiday photo that’s recently uploaded to the network and hence it is still frequently accessed. Warm data is accessed less often, typically images up to a month old, while cold data is rarely accessed. In VK’s context, warm data was stored on SATA SSDs, while cold data was stored on hard drives.

Technical Components of Intel’s Digital Transformation Solution

Intel® SSD D5-P4320. Providing affordable performance for warm data, these SSDs play a critical role in VK’s data hierarchy which migrates data from fast to slower storage when it is less frequently used. As data becomes less sought after, VK migrates it from Intel Optane SSDs to the more cost-effective Intel SSD D5-P4320, which includes Intel® QLC 3D NAND Technology to help lower costs and boost performance.

Intel® Optane™ SSD DC P4800X. This SSD enables breakthrough application performance with significantly high throughput, fast service and ultra-low latency. VK migrated data from DRAM to Intel Optane SSDs on selected CDN servers to reduce the amount of DRAM required.1 The P4800X SSDs were deployed to deliver frequently accessed data more rapidly to users, and hence provide a smoother user experience.

Intel® Optane™ persistent memory. For storage, VK utilizes persistent memory at a lower cost per bit than DRAM.1 Ratings counters, for instance, are used for various real-time processes which are central to the functioning of the social network, which includes optimizing the newsfeed.

Intel® Ethernet Adapter XXV710-DA2. Capable of speeds up to 25 Gb/s, these networking cards provide the strong interconnect bandwidth critically needed for efficient data transfer.

Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA). To meet users’ demands on the fly, the CTAccel Image Processor is used to convert images from a high-resolution master, thus avoiding the need for VK to store multiple versions of the same image at different resolutions. In a nutshell, these PACs reduce storage requirements and provide faster image conversion.

Figure 1. VK’s new architecture uses a combination of storage media to meet the performance demands of a modern social network, while lowering the cost of storing infrequently accessed data. This diagram shows the architecture today. Nodes are still being upgraded to the latest processors and networking is being upgraded to 25 Gb/s.

Besides storage functions, rating counter servers are also used to analyze the data such as rating media, quantifying the number of likes, and managing content demand. VK runs a customized version of Memcached on these servers. The workloads do not require any persistent storage but were DRAM intensive. To cater for a wide range of user devices, VK was also storing multiple copies of each image and was keen to explore a more efficient approach.

“Our aim was to reduce the number of servers we were using,” Podpriatov adds. “If we can reduce the number of rigs we need for our server infrastructure, we can save on our other infrastructure costs too. DRAM was particularly expensive, so we were keen to explore more cost-effective storage options.”

Adopting a New Approach 
VK was on a mission to modernize its storage architecture. Figure 1 shows the storage architecture today, which continues to evolve as storage, processors and network adapters are upgraded throughout the network.

On CDN cache servers, warm data was moved from SATA SSDs to the Intel® SSD D5-P4320 NVMe drives, and hot data was moved from expensive DRAMs to Intel® Optane™ SSD DC P4800X SSDs (see Figure 2).

As depicted in Figure 1, the CDN cache servers are configured with both Intel SSD D5-P4320 drives and Intel Optane SSDs. There are containers, at the software level, running separate NGINX instances for each drive type. NGINX is an open-source web server, delivering dynamic web content across a network and VK has optimized its code for better performance for different storage configurations. Based on Intel Optane SSD DC P4800X SSDs, the “hot data” container is used for caching frequently accessed data, such as new music, videos or other popular content. The P4800X SSDs help eliminate data center storage bottlenecks and allow bigger, more affordable data sets. The “warm data” container is used for less frequently accessed data, such as older video streaming content and images which are approximately 30 days old. CDN servers, however, do not have access to cold data on hard drives in the data center, and is accessed instead through the front nodes.

Figure 2. VK’s new storage solution for CDN servers adds more performant SSDs for warm data, and lower cost fast storage for hot data.

“Now we can store both hot and warm data on SSDs, and reduce the amount of DRAM we use,” Podpriatov explains. “Previously, our SSDs weren’t fast enough to offer a good user experience for hot data, so we had to keep some data in DRAM. Now, we can put it all on SSDs, which are much cheaper than DRAM.”

In the data center, the front nodes use Intel SSD D5-P4320 drives for media content, and Intel Optane SSD DC P4800X SSDs for small files, such as avatar images. VK is upgrading the processors to the Intel Xeon Gold 6238R in these nodes. To differentiate P4800X and P4320 based front servers, the software stack is not containerized and runs on physically different systems.

Due to the high write load on these servers, SSD endurance is critical. As such, Intel Optane SSD DC P4800X SSDs are also utilized by video transcoding servers. Intel Optane SSDs can provide 60 drive writes per day (DWPD), and that’s equivalent to overwriting the entire drive 60 times per day.

Counting “likes” and ranking newsfeed items are performed by the rating servers. These have been upgraded to Intel Optane persistent memory because it offers the performance real-time processes require, besides reducing the cost of high-performance storage as compared to DRAM. The upgrading process involves testing persistent memory to see how it performs under VK’s workloads and server configurations, before deploying it in production.

Figure 3. VK’s projected savings using the Intel® Xeon® Gold 6238R processor, compared to the Intel® Xeon® Gold 6230 processor, based on queries per second (qps) per dollar and per Watt, as reported by VK based on VK’s internal performance testing.1 Forecast based on initial engineering analysis and testing, going into production in 2020.

The storage nodes and fast processors require strong interconnect bandwidth to enable more data to be sent or received than before. To achieve this, VK uses two 25 Gb/s Intel® Ethernet Adapter XXV710-DA2 networking cards for each server of disaggregated storage.

Originally based on the Intel® Xeon® Gold 6230 processor, VK has since upgraded the new servers to the Intel Xeon Gold 6238R processor, which helped VK enhance the performance of storage and compute, optimize TCO and obtain more performance per watt from the compute capacity.1 Based on VK’s 2020 forecast, upgrading the processor improved performance per watt by 72 percent (see Figure 3) and reduced compute cost by 40 percent. 

“We saw a significant performance boost when we upgraded,” Podpriatov says triumphantly. VK is currently upgrading older processors within its storage architecture to the Intel Xeon Gold 6238R processor—prioritizing old CPUs with a high clock speed and core count. Moreover, upgrading from the two 10 Gb/s Ethernet cards has enabled a 2.5x extension of data throughput1 for each new compute node.

Besides storage optimization, the need to process image transcoding algorithms efficiently is a challenge for VK. To optimize its storage and power efficiency, VK is deploying the Intel® Programmable Acceleration Card (Intel® PAC) with Intel® Arria® 10 GX FPGA (Intel® Arria® 10 GX FPGA) (see Figure 4) and running the CTAccel Image Processor workload. The low-power and single slot PCIe Intel PAC makes it easy to deploy FPGAs in various VK servers. FPGAs can speed up application functions much faster than software running on a general-purpose processor. In VK’s case, FPGA is used to convert high-resolution images to the desired size and format. This high-bandwidth and low-latency solution helps to reduce storage requirements and improves power efficiency, because only high resolution images need to be stored, not multiple copies of the image at various resolutions.

Figure 4. VK dataflow solution with and without using Intel® Programmable Acceleration Card (Intel® PAC). The top image illustrates the need for multiple servers to perform image processing algorithms, and the need for storage post-processing. The bottom image illustrates increased efficiency with workload functions offloaded to the field programmable gate array (FPGA), providing the ability to generate images on-the-fly, reducing storage requirements.

Positive Business Results Reported by VK
With the new tiered storage solution, VK estimates it would achieve significant savings in terms of space, power and cooling costs because VK needs fewer racks to store the same volume of data, with storage of up to 0.4PB in 1U.1 “We can replace two of our old servers with one of our new servers, while improving our performance,” comments Podpriatov.

Diverting data from DRAM to SSDs and to Intel Optane persistent memory, and resizing images quickly using Intel FPGAs, have enabled VK to reduce the cost of its hot tier storage, while enabling Intel to deliver the performance users need. “We have more performance now at a lower cost than our previous storage solution,” he adds.

Working Together to Provide Innovative Solutions
VK and Intel have been working closely together for five years. “During this time, we worked on a lot of projects and fixed a lot of issues together,” Podpriatov exclaims. “We have a good relationship between our companies, and, at VK, we know we can call on Intel if we experience any difficulties during our testing or implementation processes.”

Intel helped with some of the validation processes, while VK executes the implementation. “We can only test storage solutions like this in production,” he continues. “It could take two months to populate an SSD with real data, and to check how data moves from hot to cold storage. It’s impossible to test this in lab conditions.”
 

“The Intel team helps us all the way from the beginning of a new product to implementation and production,” Podpriatov goes on. “Intel shares their roadmap and new technologies with us, and we get an opportunity to implement new technologies in our production environment. It gives us a chance to understand whether they’re right for us, and what savings we might achieve as a result of implementing them.”

Social Network Transactions on VK
In 2019, VK had 97 million monthly active users.2 Users view 9 billion posts and 650 million videos, and exchange 10 billion messages daily.1 They tap the like button a billion times a day.2 Over the course of a year, users upload hundreds of petabytes of new data, including photos and videos.1

Explore Related Products and Solutions