

Run the following command in your conda environment: python server.py -model llama-13b-hf -load-in-8bit. Copy the entire model folder, for example llama-13b-hf, into text-generation-webui\models. Download the desired Hugging Face converted model for LLaMA here. I was able to find this link to try to avoid the error: ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds, but it still won't work for me.


I used run_glue.py to check performance of my model on GLUE benchmark. Currently, I'm building a new transformer-based model with huggingface-transformers, where attention layer is different from the original one. Currently -load_best_model_at_end silently turns off -save_steps settings when -do_eval is off (or -evaluation_strategy is …Save only best weights with huggingface transformers. Note When set to True, the …Splitting off from #12477 (comment). load_best_model_at_end (bool, optional, defaults to False) – Whether or not to load the best model found during training at the end of training.
