Welcome to The ChessAiThon project

Chess data: formats structures and dataset files, control version

Table of Contents

Introduction
Chess Representation by humans with computers
Part 1
Part 2
Part 3
Quiz
Chess Datasets
Part 1
Part 2
Part 3
Part 4
Quiz
Conversion between formats
Part 1
Part 2
Part 3
Quiz
Chess board in Parquet for AI training
Part 1
Part 2
Part 3
Quiz
Git and control versions
Part 1
Part 2
Quiz
Share datasets and use it in a Notebook
Part 1
Part 2
Quiz
Teaching Tips
Explore datasets
Read the Alphazero paper
Use Chessboard2 or python Chess to represent boards
Use Notebooks
LLMs and Chess representation
Build a simple chess webpage
Required Readings

LLMs and Chess representation

Introduce students to the fascinating challenge of how Large Language Models (LLMs) interact with chess data. Explain that LLMs are designed to process human language (tokens/words), not inherently structured game data.

The Translation Layer

The key teaching point is that before an LLM can analyze a game or suggest a move, the structured chess format (like FEN or our custom 77x8x8 array) must be converted back into a sequential, text-based format that the LLM understands. This is often achieved by translating the board state into a string of tokens.

Data Diversity in AI Architectures

For example, a FEN string is a perfect, concise text input, but even our numerical 77x8x8 representation can be linearized and fed to a model. Students learn that their initial complex data engineering work—converting moves to 0-4096 indices and positions to a 77-layer tensor—is crucial for the AI training model, but the separate LLM might require a simpler text prompt to function.

This illustrates the diverse data needs of different AI architectures and shows how:

The output of one AI component (the trained chess model) might become the input for another (the LLM used for analysis or chat).