trivago RecSys Challenge 2019 Dataset

Problem-definition

The data provided for this challenge consists of a training and test set, and metadata for accommodations (items). The training set contains user actions up to a specified time (split date). It can be used to build models of user interactions and specifies the type of action that has been performed (filter usage, search refinements, item interactions, item searches, item click-outs) as well as information about impressed items and prices at the time of a click-out.

The recommendations should be provided for a test set that contains information about sessions after the split date but is missing the information about the accommodations that have been clicked in the last part of the sessions. The required output is a list of maximum 25 items for each click-out ordered by preferences for the specific user. The higher the actually clicked item appears on the list the higher the score.

The following schematic illustrates the problem setting and the separation of the data into training and test sets.

problem_setting schematic

Evaluation

We use the mean reciprocal rank as the metric to evaluate the submissions. We will provide a leaderboard that is displaying the metric calculated for a subset of users (validation group). The final scores will be calculated on a different set of users (confirmation group) at the end of the challenge.

Example:
    query 1:
  • impressions = [100, 101, 102, 103, 104, 105]
  • clicked_item_id = 102
  • submission = [101, 103, 104, 102, 105, 100]
  • reciprocal rank = 0.25

    query 2:
  • impression = [101, 103, 104, 100, 105]
  • clicked_item_id = 105
  • submission = [103, 105, 101, 100, 104]
  • reciprocal rank = 0.5
  • mrr = (0.25 + 0.5) / 2 = 0.375

Data

UPDATE

The current version of the data is version2. It has been updated on March 26th 2019 UTC with some minor adjustments that affect the order of impressions and prices and the frequency of context descriptions. All descriptions below remain valid.

Please login/register to access the dataset and download it here.

Session actions (train.csv and test.csv)

Item metadata (item_metadata.csv)

Submission

The submission should consist of a list of recommended hotels for each click-out that is missing in the test set. The format of the submission should allow to unambiguously identify the click-out in question.

Therefore the submission file should be structured in the following way:

We provide a sample submission with the correct format inside the dataset download file above.

Explanation of user actions in a sample session

session actions

In this session, a user from the US platform has used trivago on a desktop device. The actions in this session are the following:

  1. (action type: search for destination, reference: Barcelona, Spain): User searches for Barcelona, Spain.
  2. (action type: filter selection, reference: Focus on Distance): The ‘focus on distance’ filter is activated. At this point the current_filters column indicates that this is the only filter that is active.
  3. (action type: search for poi, reference: Port de Barcelona): User searches for a point-of-interest (POI), the Port de Barcelona.
  4. (action type: interaction item deals, reference: 40255): User viewed at the ‘More Deals’ button on item 40255. The ‘focus on distance’ filter is no longer activated.
  5. (action type: clickout item, reference: 40225): The user clicks out on item 40225. The full list of displayed items and their associated prices can be seen in the ‘impressions’ and ‘price’ columns.
  6. (action type: search for item, reference: 81770): User searches for item 81770.
  7. (action type: interaction item info, reference: 81770): User interacts with the item information of item 81770.
  8. (action type: clickout item, reference: 81770): User clicks out on item 81770. The full list of items and their associated prices can be seen in the ‘impressions’ and ‘price’ columns.

Getting started

To get started we have provided additional resources in this github repository that allows to run a simple baseline algorithm and verify the submission format. The solution can be submitted to the submission system and should result in a score on the leaderboard.

Join the trivago RecSys Challenge 2019

Register now!

trivago techblog

With employees from all over the world, trivago is an international IT company operating on a very large scale. Our goal is to provide the best hotel search possible. This blog is the ideal place to talk about our ideas, our prototypes and our tech stack which make our vision become reality. We love to collaborate and communicate so please be invited to contact us and comment on our posts.

Read more →

We are hiring!

Tackling hard problems is like going on an adventure. Solving a technical challenge feels like finding a hidden treasure. Want to go treasure hunting with us?

View all current job openings! →

Need help?

Check out the RecSys Challenge Google Forum or get in touch by mail via recsyschallenge2019@trivago.com.