Some Practical Advice
Learning Objectives
This section will help you understand:
- Programming for implementing AI
- Managing experimentation
- Ways to share your work
Overview
As AI has progressed, there are an increasing number of AI tools that are available to researchers. General tools like ChatGPT are generally available, while specific research tools for tasks like segmenting images & searching scientific papers are being integrated into research workflows. Many researchers are moving beyond using pre-packaged tools and experimenting with applying AI to their own research field.
Computer Programming
If you choose to apply AI in your own research, you are going to need to learn some programming. Even if you don't intend to train your own models, but use models that have already been built by other researchers, you may find that programming is helpful to speed up your work.
Most AI and ML work is done in Python. In some fields, R is used as an alternative. Python has great support for machine learning. There are many open-source software libraries available to use, such as scikit-learn, Pytorch and HuggingFace’s transformers. Software libraries are pre-written code that are available for you to use in your own work.
Many researchers begin exploring ML code using Notebooks such as Jupyter or Google Colab. These can be a great way to get started exploring, but you might find that once you've made some good progress your code becomes unwieldy and you look for other ways to structure your work.
Managing the code for your project is best done using a version control tool like GitHub. A tool like this will allow you to track the changes you make to your code, go back and forth between versions, and easily collaborate with colleagues.
Remember that software comes with licences, and you must check the licence of any software you use to be sure you have the correct permissions to use it. Familiarise yourself with different software licences and their implications for using, modifying & distributing your own software.
Open Source
There is a large open source AI community, with a practice of sharing models, data and code. These communities not only share resources, but also have community forums for support. It is highly recommended to explore the open source landscape before writing your own code and building your own models. It can save a lot of time and effort to build on existing work than to start from scratch. However, there is a wide range of quality in open source software. Projects that are actively maintained are generally more reliable.
HuggingFace are an organisation that host a large repository of datasets and models that have been open sourced. It's a great place to start looking for data and models that are relevant to your task.
Good Experimental Practice
Once you've got started with programming, the next challenge is usually managing experiments and tracking work.
AI and Machine Learning projects are iterative, and typically you end up trying a lot of different models, hyperparameters, data sets and pre-processing pipelines. Modern AI libraries (like pytorch and scikit-learn) have easy-to-use implementations of many techniques, meaning that it’s easy to try multiple ideas. However, this means it can be easy to rapidly create lots of files and find your experiments hard to navigate.
Experiment tracking tools like Weights and Biases, or MLflow, can provide ways to simplify the experiment structure.
For recording your work, Model Cards and Data Sheets provide a structured way to log what you've built so that others can understand and build on your work.
Sharing your work
At the end of your project, sharing your model, data and code with the open source community is a possibility. Openness is key to scientific rigour, and published software is a recognised research output which can help with your career progression.
Accelerate Science have a course on publishing and packaging software, which you can sign up to, or follow the materials online at your own pace.
Theory in-depth
If you decide to go further and want to understand the theory of AI in-depth, you’ll want to explore mathematical topics such as linear algebra, optimisation, probability and calculus. With some understanding of these topics, you’ll be able to dive into the theory of AI and ML, and you’ll have a foundation for designing your own models and algorithms.
Contact
If you can't find what you need