Models are downloaded at runtime instead of build.
Raw bindings to llama.cpp with cuda support.
See llama-cpp-2 for a safe API.