Argilla open-source tool

Argilla meets AutoTrain

πŸ” Argilla meets AutoTrain

March 6, 2023

●

David Berenstein, Daniel Vila Suero

We're excited to announce a new cool integration for Argilla! Starting today, you can use Argilla Datasets and Hugging Face AutoTrain with just a few clicks, empowering you to train NLP models easily without writing a single line of code.

AutoTrain makes it possible to train custom NLP models with minimal configuration, allowing users to focus on their data and business problems instead of the technical details of model training.

Now, with the integration of Argilla and AutoTrain, data labelling, and NLP model training become seamlessly connected, making it easier than ever to build and deploy NLP solutions. Whether you're working on sentiment analysis, named entity recognition, or Text Summarization, Argilla + Hugging Face AutoTrain can help you get to production faster, with less code and less hassle.

We're proud to offer this direction to our users and can't wait to see the amazing NLP applications they'll build with it.

It only takes a few minutes to train a high-quality model, let’s see how!

πŸš€ Deploy Argilla

You can self-host Argilla using one of the many deployment options, sign-up for our upcoming Argilla Cloud version, or launch an Argilla Space on the Hugging Face with this one-click deployment button:

🏷️ Label Data

Argilla supports Text Classification, Token Classification, and Text Generation. There are many of ways to accelerate your labeling process with Argilla. You will find numerous tutorials on the documentation, including a recent one that combines SetFit zero-shot learning, few-shot learning, and vector search.

Once you have labelled some examples, you are only a few clicks away from your first transformers model without writing any code!

πŸš™ πŸš‹ Launch AutoTrain

  1. Go to the Argilla Streamlit Customs Space and select auto-train from the left sidebar.
  2. Add your Argilla API URL, API key, and Hugging Face Token.
  3. Select a dataset from the dropdown of available datasets on your Argilla instance.
  4. Click schedule AutoTrain and follow the steps.

Watch the video to get an idea of the steps with a NER model:

πŸ“¦ Get a model!

After scheduling the AutoTrain job, you can launch it by clicking on the job link. That's it! For small datasets it will take just a few minutes to have a good quality model.

You now have a trained model that you can download or serve via Hugging Face Inference Endpoints, just like that!

Watch the video to get an idea of the remaining steps:

For the curious minds out there, here are the resulting model and training dataset.

If you want to integrate the AutoTrain process within your own workflows without using the Streamlit app, you can find the code in this repository.

Next steps

What if we could use Webhooks to make this a continuous retraining process?

What if we close the data loop and log production data into Argilla from your Inference Endpoints to be used for active learning and continuous evaluation?

Well, stay tuned! and let us know if you want to contribute to making this happen faster.

Code

If you are interested in the code used to launch AutoTrain jobs from the Streamlit application, check this repo