Visual options : -- OD: - object detction /w fine tuning: Yolo V5: https://learnopencv.com/custom-object-detection-training-using-yolov5/ -- V-aware - visual LLM: LLAVA : https://llava.hliu.cc/ -- BOTH detection and comprehention: -Phi https://huggingface.co/microsoft/Phi-3-vision-128k-instruct https://github.com/microsoft/Phi-3CookBook - Lavva chat https://github.com/LLaVA-VL/LLaVA-Interactive-Demo?tab=readme-ov-file git clone https://github.com/LLaVA-VL/LLaVA-Interactive-Demo.git conda create -n llava_int -c conda-forge -c pytorch python=3.10.8 pytorch=2.0.1 -y conda activate llava_int cd LLaVA-Interactive-Demo pip install -r requirements.txt source setup.sh - decision making based on ENV, RL: https://github.com/OpenGenerativeAI/llm-colosseum