Prudence Pitch

13:33 27/11/2019 |

Total post : 1,458

Facebook for the first time peels back the curtains on Explore’s inner workings

(Tech) Its three-part ranking funnel, which the company says was architected with a custom query language and modeling techniques, extracts 65 billion features and makes 90 million model predictions every second. And that’s just the tip of the iceberg.



Before the team behind Explore embarked on building a content recommendation system, they developed tools to conduct large-scale experiments and obtain strong signals on the breadth of users’ interests. The first of these was IGQL, a meta language that provided the level of abstraction needed to assemble candidate algorithms in one place.

IGQL is optimized in C++, which helps minimize latency and compute resources without sacrificing extensibility. It’s both statistically validated and high-level, enabling engineers to write recommendation algorithms in a “Python-like” fashion. And it complements an account embeddings component that helps identify topically similar profiles as part of a retrieval pipeline that focuses on account-level information.

A framework - Ig2vec - treats Instagram accounts a user interacts with as word sequences in a sentence, which informs the predictions of a model with respect to which accounts the user might interact with. Concurrently, Facebook’s AI Similarity Search nearest neighbor retrieval library (FAISS) queries millions of accounts based on a metric used in embedding training.

A classifier system is trained to predict a topic for a set of accounts based solely on the embedding, which when compared with human-labeled topics makes evident how well the embeddings capture topical similarity. It’s an important step, because retrieving accounts similar to those a user has expressed interest in helps narrow down a per-profile ranking inventory.

During the candidate generation stage, Explore taps accounts that users have interacted with previously to identify seed accounts of interest. They’re only a fraction of the accounts about the same interest, but they help identify topically similar accounts when combined with the above-mentioned embeddings.

Knowing the accounts that might appeal to a user is the first step toward sussing out which content might float their boat. IGQL allows different candidate sources to be represented as distinct subqueries, and this enables Explore to find tens of thousands of eligible candidates for the average person across many types of sources.

To ensure the recommended content remains safe and appropriate for users of all ages, signals are used to filter out anything that might not be eligible. Algorithms detect and filter spam and other content, typically before an inventory is built for each user.

For every Explore ranking request, 500 candidates are selected from the thousands sampled and are passed along to the ranking stage. It’s there that they encounter a three-part infrastructure intended to balance relevance with computation efficiency.

In the first pass of the ranking stage, a distillation model mimics the combination of the other stages with a minimal number of features. It picks the 150 highest-quality and most relevant candidates out of the 500, after which a model with a full dense set of features selects the top 50 candidates. Lastly, another model with a full set of features chooses the best 25 candidates, which populate the Explore grid.


Post new