{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Load the dataset\n", "\n", "The dataset used in this example is [fine-food reviews](https://www.kaggle.com/snap/amazon-fine-food-reviews) from Amazon. The dataset contains a total of 568,454 food reviews Amazon users left up to October 2012. We will use a subset of this dataset, consisting of 1,000 most recent reviews for illustration purposes. The reviews are in English and tend to be positive or negative. Each review has a ProductId, UserId, Score, review title (Summary) and review body (Text).\n", "\n", "We will combine the review summary and review text into a single combined text. The model will encode this combined text and it will output a single vector embedding." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Time | \n", "ProductId | \n", "UserId | \n", "Score | \n", "Summary | \n", "Text | \n", "combined | \n", "
---|---|---|---|---|---|---|---|
Id | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
1 | \n", "1303862400 | \n", "B001E4KFG0 | \n", "A3SGXH7AUHU8GW | \n", "5 | \n", "Good Quality Dog Food | \n", "I have bought several of the Vitality canned d... | \n", "Title: Good Quality Dog Food; Content: I have ... | \n", "
2 | \n", "1346976000 | \n", "B00813GRG4 | \n", "A1D87F6ZCVE5NK | \n", "1 | \n", "Not as Advertised | \n", "Product arrived labeled as Jumbo Salted Peanut... | \n", "Title: Not as Advertised; Content: Product arr... | \n", "