Dataset at a glance
Recipes
Cuisines
Unique Ingredients
Flavor Edges
Data sources:
Recipe data from Kaggle "What's Cooking" (Yummly).
Flavor compound backbone from Ahn et al. (2011)
Flavor network and the principles of food pairing.
Ingredient categories from FlavorDB.
Recipes per cuisine
Italian and Mexican appear most often in the dataset (limitation: sample size and dataset raw counts)
Average ingredients per recipe
Mean number of distinct ingredients per recipe
Signature Ingredients Per Cuisine
The ingredients that best define each cuisine.
Ingredients ranked left to right by TF-IDF score.
Cuisine ingredient similarity heatmap
Jaccard similarity — gray cells fall below threshold selected by user.
Key insight:
Cuisines on the same branch share the most ingredients but geographic neighbors aren't always culinary neighbors.
Method:
Each cuisine is a binary vector (ingredient present/absent). Jaccard distance = 1 - |A (intersection) B| / |A (union) B|. UPGMA clustering for tree.
Surprise score
= culinary_similarity x (geo_distance / max_geo_distance).
Culinary family tree
UPGMA dendrogram: branching by ingredient overlap, not geography
Culinary similarity vs geographic distance
Green = surprisingly similar (>5,000 km apart, >35% Jaccard)
Most surprising culinary cousins
Ranked by surprise score: click a row, then the button below to explore
Main cuisine:
This selection controls both the profile and comparison views
How to read this page:
The top section profiles one cuisine in detail. The bottom section compares that same main cuisine with another cuisine.
TF-IDF:
TF = count in this cuisine / total mentions in this cuisine. IDF = log(20 / # cuisines using it). High values mean an ingredient is both frequent here and rare elsewhere.
Cuisine profile
Use the main cuisine dropdown to see what ingredients and ingredient categories define one cuisine.
Flavor profile radar
Ingredient category shape — this cuisine (colored) vs global average (gray)
Ingredient exclusivity
Donut — how many ingredients are exclusive to this cuisine vs widely shared
Compare cuisines
Compare the main cuisine against another cuisine using signature and shared ingredients.
Main cuisine
Comparison cuisine
Ingredient overlap
Shared ingredients (excl. universals)
How to use:
Each dot is an ingredient. Lines show ingredients that often share flavor compounds.
Use the controls to show more/fewer ingredient connections or focus only on more common pairings.
Network:
Two ingredients are connected when they share flavor compounds.
Bigger dots are ingredients with more connections.
Line strength is based on shared compounds, and the pairing filter removes rare ingredient pairs.
Number of Ingredient Connections
Ingredient Pairing Frequency
Flavor compound network
Node size = degree. Color = ingredient category. Hover for details.
Search or select an ingredient
Filter to cuisine (optional)
Quick picks:
garlic
ginger
butter
soy sauce
cumin
olive oil
coconut milk
fish sauce
Percentage of Recipes Featuring Ingredient
% of that cuisine's recipes using this ingredient
Top co-occurring ingredients
Most frequent recipe partners
Surprise metric:
surprise = Jaccard_similarity × (haversine_km / max_haversine_km). Upweights pairs that are culinarily close despite geographic distance.
Culinary similarity world map
Click a country to anchor it — all others shade by Jaccard similarity
Ranked by Jaccard similarity