Google Borg Infrastructure

Arun Battepati
3 min readMay 4, 2023

--

Borg is a highly scalable, reliable cluster management system at Google that admits, schedules, starts, restarts, and monitors the full range of applications that Google runs. The users of Borg are Google developers and system administrators that run Google’s applications and services, Google Borg is used to run and manage their massive data centers.

Many of Google’s services are running on top of the Borg infrastructure, including:

Google Search — Google’s flagship search engine is powered by Borg clusters, which index and serve billions of web pages to users around the world.

Google Maps — The backend infrastructure that powers Google Maps, including data storage, indexing, and processing, is built on top of Borg clusters.

Gmail — Google’s email service relies on Borg to manage the large-scale storage and processing of email messages, attachments, and user data.

Google Drive — Borg is used to store and manage the massive amounts of data stored in Google Drive, including files, documents, and media.

YouTube — Google’s video-sharing platform uses Borg clusters to manage the massive amounts of video data, including encoding, storage, and serving to users.

Google Cloud — Many of Google’s cloud computing services, including Google Compute Engine and Google Kubernetes Engine, run on top of Borg infrastructure.

Google Ads — The backend infrastructure that powers Google’s advertising platform, including serving ads and processing payments, is built on top of Borg clusters.

Google Photos — Google’s photo storage and sharing service relies on Borg to manage the large-scale storage and processing of photos and videos.

Borg enables Google to efficiently manage and scale its services to meet the demands of billions of users around the world.

Cell — a collection of machines treated as a unit. Cells usually contain 10K servers, but can be larger if needed, and are heterogeneous in terms of CPU, memory, disk capacity, etc.

Cluster — generally contains one large cell and sometimes a few small special purpose cells, some of them being used for testing. A cluster always is limited to a data center building, and all machines in a cluster are connected by high-performance networking. A site can have multiple buildings and clusters.

Job — activity that is executed within the boundaries of a cell. They can have requirements attached — CPU, OS, public IP, etc. Jobs can communicate with each other, or a user or a monitoring job can send commands to a job via RPC.

Tasks — a job consists of one or multiple tasks that are run from the same executable. These tasks usually run directly on hardware not in a virtualized environment to avoid virtualization costs. Tasks come as programs statically linked to avoid dynamic linking at runtime.

Alloc — a set of machine resources reserved for one or more tasks. Allocs can be moved to a different machine along with the tasks it runs on them. An alloc set represents the resource reserved for a job and is distributed across multiple machines.

Borglet — an agent running on each machine.

Borgmaster — a controller process running at cell level and holding state data for all borglets. The Borgmaster adds jobs to a queue to be executed. The Borgmaster and its data is replicated five times, the data being persisted in a Paxos store. One of the borgmasters is leader.

Scheduler — this monitors the queue and schedules jobs considering the resources available on individual machines.

Priority: Priority, a small positive integer. Borg defines non-overlapping priority bands, in decreasing order: monitoring, production, batch, and best effort (also known as testing or free). Prod jobs are the ones in the monitoring and production bands.

Quota is used to decide which jobs to admit for scheduling. Quota is expressed as a vector of resource quantities (CPU, RAM, disk, etc.) at a given priority, for a period of time (typically months).

•Quota-checking is part of admission control, not scheduling.

•Even though they encourage users to purchase no more quota than they need, many users overbuy because it insulates them against future shortages when their application’s user base grows. We respond to this by over-selling quota at lower-priority levels: every user has infinite quota at priority zero, although this is frequently hard to exercise because resources are oversubscribed.

•The use of quota reduces the need for policies like Dominant Resource Fairness (DRF).

•Borg has a capability system that gives special privileges to some users.

--

--