the full recipe for fine-tuning FLUX.1-dev on anky. written from memory after run 001.
Go to /training and swipe through generated Ankys. Keep the ones that feel true to the character — correct proportions, good lighting, recognizable. Reject blurry, distorted, or off-model images.
Aim for at least 80–150 images. More isn't always better — quality matters more than quantity. Each approved image gets copied to data/training-images/ with a caption .txt file alongside it.
The caption files are the image prompts that were used to generate each image. They teach the model the connection between words and visual features.
Go to runpod.io and rent an A100 80GB SXM pod. Use the RunPod PyTorch 2.4.0 template (not the base image — you need CUDA pre-installed).
Set the volume disk to at least 50GB (the FLUX model alone is ~23GB). Container disk can be 20GB.
HF_HOME=/workspace/hf_cache before downloading models — the root disk fills up fast and kills the run.Once the pod is running, open a terminal and run:
curl -s https://anky.app/static/train_anky_setup.sh | bash
This installs ai-toolkit, all dependencies (including torchaudio — don't skip it), clones the repo, and sets up the environment. Takes about 5–10 minutes.
huggingface-cli login.Copy the training images from anky.app to the RunPod pod. The images are at data/training-images/ on the server. You can tar them and scp, or use rsync. On the pod, put them at /workspace/dataset/.
Each image needs a matching .txt caption file in the same folder. The caption should describe the image and include the trigger word anky.
Always use tmux so training survives if your SSH connection drops:
tmux new -s train source /workspace/venv/bin/activate export HF_HOME=/workspace/hf_cache export HF_HUB_DISABLE_XET=1 cd /workspace/ai-toolkit python -u run.py /workspace/train_anky.yaml
Detach with Ctrl+B, D. Reattach later with tmux attach -t train.
Training 3000 steps on an A100 80GB takes roughly 1–2 hours. Sample images are generated every 500 steps at /workspace/output/anky_flux_lora/samples/.
Once training finishes, upload the weights and everything else using these one-liners from the pod terminal:
Upload the LoRA weights:
python3 -c "from huggingface_hub import HfApi; api=HfApi(); api.upload_file(path_or_fileobj='/workspace/output/anky_flux_lora/anky_flux_lora.safetensors', path_in_repo='anky_flux_lora.safetensors', repo_id='jpfraneto/anky-flux-lora-v1', repo_type='model'); print('done')"
Upload the training config:
python3 -c "from huggingface_hub import HfApi; api=HfApi(); api.upload_file(path_or_fileobj='/workspace/train_anky.yaml', path_in_repo='train_anky.yaml', repo_id='jpfraneto/anky-flux-lora-v1', repo_type='model'); print('done')"
Upload the README (managed on anky.app):
curl -s https://anky.app/static/hf/anky-flux-lora-v1-readme.md -o /tmp/readme.md && python3 -c "from huggingface_hub import HfApi; api=HfApi(); api.upload_file(path_or_fileobj='/tmp/readme.md', path_in_repo='README.md', repo_id='jpfraneto/anky-flux-lora-v1', repo_type='model'); print('done')"
Upload checkpoints + samples:
curl -s https://anky.app/static/hf/upload-checkpoints.py | python3
Once everything is on HuggingFace, delete the RunPod instance. The weights are safe. The training config is safe. The dataset lives on anky.app.
Cost for a full run 001: ~$3–5 for 2 hours on an A100 80GB.
ModuleNotFoundError: torchaudio
ai-toolkit imports torchaudio at startup even if you don't use audio. The setup script installs it — don't skip that step.
No space left on device
FLUX.1-dev is 23GB. If it downloads to /root/.cache (the small container disk), you run out of space. Always set HF_HOME=/workspace/hf_cache before starting.
xet download error
HuggingFace's xet protocol is broken on some pods. Set HF_HUB_DISABLE_XET=1 to fall back to normal HTTPS downloads.
GatedRepoError: 401
You need to accept the FLUX.1-dev license on the HuggingFace website and re-login with huggingface-cli login.
torch install corrupted after Ctrl+C
If you interrupt a pip install mid-way, torch can be in a broken state. Fix with a forced reinstall as a single line (no backslash continuations):
pip install torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124 --force-reinstall --no-deps
Python output not appearing in terminal
Python buffers output when piped. Use python -u (unbuffered) to see logs in real time.