Meet your new personal assistant, Artificial Intelligence¶
"Have you tried ChatGPT?" has quickly become the most frequently mentioned topic of conversation at faculty meetings and informal gatherings during the early months of 2023. ChatGPT, now just five months old has already amassed over 100 million users within two months of its public release, setting an unprecedented record for the fastest user base growth in internet history.
Figure: the Google Trends for the search term "ChatGPT".
Artificial Intelligence (AI) driven by Large Language Models (LLMs) (), such as ChatGPT, is causing disruptions in classrooms (), acing all major standardized tests (OpenAI ), and casting a looming presence over the workplace (Eloundou et al. ).
Despite the incredible achievements of ChatGPT and its relatives, they have faced skepticism due to their factual inaccuracies () and criticism for their potential to premote misogyny and racism when misused (). Regardless, these technologies will impact our lives for better and for worse, in the months and years ahead.
It is worth noting that we are likely in the early stages of a Gartner Hype Cycle (). However, current indications suggest that a new era is on the horizon, in which AI becomes an integral part of our classrooms, workplaces, and homes, whether we embrace it or not.
Should you adapt or resist AI in your life?¶
Despite recent calls from prominent tech entrepreneurs and scholars asking for a six month 'pause' on the development of large AI models (), it seems the genie is already out of the bottle. The allure of profits and benefits of network effects () that come from being the first to market with the most advanced AI is fueling fierce competition amongst tech giants, resulting in hundreds of billions of dollars invested in the coming year alone.
Popular culture is rife with science-fiction tales of AI spiraling out of control, enslaving humanity, or bringing about global catastrophe. Ironically, these narratives often serve as allegories or social commentary on humanity's very real history of colonialism, genocide, environmental destruction & unchecked capitalism that gave rise to western civilization, rather than purely speculative fictions. The question then arises: will AI become 'evil' independently or will it be people who create and train it to kill? The latter seems more likely.
The Ethics of Artificial Intelligence
Developing machine learning techniques can indeed have unintended consequences, even when well intentioned. Recent examples in otherwise benevolent research include: ML-powered drug discovery being reverse-engineered to create novel deadly toxins and advanced AI technology being repurposed for warfare or terrorism . Yet imposing absolute prohibitions on the development of AI could lead to worse outcomes for humanity, considering the potential benefits and solutions AI can offer to various social and environmental challenges.
Choosing to delay learning about, or engaging with this technology may also have negative consequences for you, your students, and your research.
What kind of AI do you need to know about?¶
Machine Learning models have been in use for many years, but recently, a specific family of AI known as Large Language Models (LLMs) has gained significant attention online and in the news.
LLMs utilize pre-trained neural networks built from massive datasets (e.g., the internet, Wikipedia, digital libraries, scientific publications, and print news) encompassing billions of pages of text and hundreds of terabytes of data. LLMs can also be trained on images, videos, or audio using self-supervised learning to generate artificial imagery, videos, music, and voices.
OpenAI released its first LLM in 2018, which now powers ChatGPT, DALL·E, and Microsoft's improved Bing search engine. Google's LLM, LaMDA drives BARD, its own AI chat system. Meta's LLaMa is publicly available on GitHub and is utilized in numerous open source chat projects. Meta also released a practical computer vision platform called Segment Anything Model, capable of isolating objects in any image.
Access to libraries of pre-trained models and training data for further model 'tuning' is crucial for ongoing AI development. Platforms like GitHub play a vital role in the AI ecosystem by providing version-controlled code and a space for idea sharing.
Another essential component of this AI revolution is the American start-up HuggingFace. AI developers use HuggingFace to publish their Apps and to share their pre-trained models. As of April 2023, HuggingFace hosts over 173,000 free AI models, the most of any platform.
Table: Dominant LLM models currently in public use
|facebookresearch/llama, (Touvron et al. )
|facebookresearch/segment-anything, (Kirillov et al. )
|(Thoppilan et al. )
|Computer Vision, Chat
|openai/DALL-E, (Ramesh et al.)
|NVIDIA/Megatron-LM, (Shoeybi et al. )
BARD - Google's general purpose LLM
Bi-directional Encoder Representations from Transformers (BERT) - is a family of masked-language models introduced in 2018 by researchers at Google , (Devlin et al. )
ChatGPT - OpenAI's general purpose LLM
CoPilot - GitHub (Microsoft/OpenAI) AI co-programmer, natively integrated as an extension in VS Code or GitHub CodeSpaces
Generative Pretrained Transformer (GPT) - are a family of large language models, which was introduced in 2018 by the American artificial intelligence organization OpenAI . (Radford et al. )
GitHub - the most widely used Version Control infrastructure, owned by Microsoft and natively integrated with OpenAI
DALL·E - OpenAI stable diffusion image generation model
HuggingFace - library for open source AI models and apps
Large Language Models (LLMs) - is a language model consisting of a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabelled text using self-supervised learning ()
Language Models for Dialog Applications (LaMDA) - Google's general purpose LLM
Latent Diffusion Model (LDM) () - machine learning models designed to learn the underlying structure of a dataset by mapping it to a lower-dimensional latent space.
Large Language Model Meta AI (LLAMA) - Meta's general purpose LLM
MidJourney - popular image generation platform (proprietary), which is accessed via Discord
Neural networks - () () - are similar to their biological counter parts, in the sense they have nodes which are interconnected. Rather than string-like neurons and synapses in biology, artificial networks are made of nodes connected by networks of 'weights' which can have positive or negative values.
OpenAI - private company responsible for the first LLMs and ChatGPT
Parameter - () is a value that the model can independently modify as it is trained. Parameters are derived from the training data upon which the model is trained. The number of parameters in the newest LLMs are typically counted in the billions to the trillions.
Segment-Anything (Meta) - is a recently released image and video segmentation technology that allows you to 'clip' a feature from an image with a single click.
Stable Diffusion - computer vision models for creating images from text
Tuning - the process of refining models to become more accurate
Weights - are the value by which a model multiplies another value. Weights are typically determined by the proportional value of the importance of the parameters. Weights signify the value of a specific set of parameters after self-training.
How will AI be used in the workplace?¶
The impact of ChatGPT and other LLMs on the workforce is significant, with early findings suggesting that they will affect more than 80% of workers. A notable 20% of the workforce will experience over 50% penetration of AI technology in their daily tasks (Eloundou et al. ).
LLMs and GPTs can be incorporated into productivity tasks such as writing, editing, and outlining, potentially saving advanced users over 50% of their effort (Noy and Zhang ).
In the next 5 years, AI won't steal people's job. People using AI will.— Bruno Sánchez-Andrade Nuño (@brunosan) April 3, 2023
As AI technologies continue to advance, they will become increasingly integrated into various industries, transforming the way work is conducted. Companies will need to adapt their workflows, train their employees to harness AI's potential, and develop new strategies to remain competitive in an AI-driven world.
Should you be worried an AI is going to steal your job or make that diploma worthless?
Humanity is still decades away from an Artificial General Intelligence or Artificial Super Intelligence which can learn as humans and animals do.
Your future career will most likely leverage AI as a digital assistant. What will be critical is staying grounded by bedrock foundations around the ethical applications of AI.
Enhancing Productivity with AI
ChatGPT and its counterparts are integrated into popular productivity software. Microsoft announced integration of OpenAI and CoPilot into :simple-microsoftoffice: Microsoft Office 365, along with the new Bing.
Similarly, Google announced the integration of LaMDA into GMail, Drive Docs and Sheets.
How can you use AI in the classroom?¶
GPT-4 models can compose essays and pass advanced knowledge assessments (OpenAI ). Online education, a recent and lucrative innovation in academia, now faces challenges regarding effective remote student assessment (Susnjak ).
Attempting to modify coursework to avoid assessment techniques where ChatGPT excels or using bots to detect ChatGPT generated content may prove to be futile. Instead of engaging in a cheating arms race, why not embrace ChatGPT and other AI frameworks?
Proponents of integrating ChatGPT into educational curricula () argue that by adapting and integrating ChatGPT into the curriculum, we can develop a modern workforce empowered by AI assistants. I find myself aligned with this perspective (as does my AI text editor, ChatGPT-4).
Teaching with ChatGPT
Guiding Graduate Students and Postdoctoral Researchers in AI Usage¶
Training the next generation of researchers to use AI effectively and ethically is a crucial aspect of graduate mentorship. As an advisor, it is important to ensure that students have appropriate access to these platforms and a comprehensive understanding of the ethical implications for their education, research, and software engineering.
Platforms like ChatGPT could potentially become the primary mentor for graduate students and postdoctoral researchers. Unlike human advisors, these AI systems are available 24/7 to address virtually any question or problem. However, it is essential to strike a balance between AI assistance and independent learning.
To achieve this balance, advisors should:
Encourage AI literacy: Provide students with resources and opportunities to learn about AI technologies, their applications, and limitations.
Teach responsible AI usage: Emphasize the importance of using AI as a tool to support research, not replace critical thinking and problem-solving skills.
Discuss ethical considerations: Foster open discussions about the ethical implications of AI in research, including issues of bias, fairness, transparency, and accountability.
Promote collaboration: Encourage students to collaborate with AI, leveraging its strengths to overcome their weaknesses and vice versa.
Stay updated: As AI technologies continue to evolve, ensure that both advisors and students stay informed about the latest developments, best practices, and potential pitfalls.
By incorporating AI into graduate and postdoctoral training while maintaining a focus on ethics and responsibility, the next generation of researchers can harness the power of AI to advance their fields while upholding the highest standards of academic integrity.
I will no longer approve graduate student dissertation proposals or dissertations unless they used ChatGPT or a similar AI to help them write part it! (With appropriate acknowledgement). Yes I am serious!— Seth (@DrSethMurray) April 4, 2023
We're training PhDs to think, not to be robots.
Integrating LLMs into Research and Education¶
I strongly encourage faculty and research teams to explore how they can incorporate LLMs like GPT-4 into their daily work in the context of developing their own Research Objects.
ChatGPT Awesome Lists
There is an ever changing meta-list of Awesome lists curated around ChatGPT plugins and extensions.
Check out the lists around:
- ChatGPT Prompts
- API plugins, extensions, & applications
Learn to code with ChatGPT
Using progressively refined prompts or providing similar code examples can help ChatGPT better understand the coding task. This approach enables the AI to explain what the code does, why it may be dysfunctional, and how to correct it. ChatGPT can even act as a Linux Terminal.
Form an AI-powered paired-programming team
Leverage version control systems like
git (GitHub, GitLab) to track changes in your code. With GitHub's free Education accounts for students and researchers, you get access to OpenAI-powered CoPilot which integrates seamlessly with GitHub's virtual machine CodeSpaces environments. CoPilot can assist in developing code in various programming languages.
Literature review and meta-analyses
ChatGPT, despite its potential for generating inaccurate information, is just one among various AI tools available for research purposes. Other tools, like Paper-QA provide a more reliable approach, relying solely on inputted textual information (PDFs) to generate contextual answers with citations. Researchers can use platforms like Paper-QA to perform meta-analyses of numerous papers in just a few seconds. These tools allow users to quickly verify the results by directly navigating to the pages where the context was extracted from, ensuring a higher degree of confidence in the generated information. By harnessing the capabilities of such tools, researchers can streamline their literature review processes and gain valuable insights more efficiently.
Image Generation & Segmentation Models
Stable Diffusion models are available via HuggingFace
Diffusion models have two modes, forward and reverse. Forward diffusion adds random noise until the image is lost. Reverse diffusion uses Markov Chains to recover data from a Gaussian distribution, thereby gradually removing noise.
Stable Dffusion relies upon Latent Diffusion Model (LDM) ()
Example Image Generation Models
DALL·E uses GPT to create imagery from natural language descriptions
MidJourney uses a proprietary machine learning technology, believed to be stable diffusion, along with natural langauge descriptions in the same way as DALL·E and Stable Diffusion models. MidJourney is only available via Discord, and requires a subscription for premier access after a 30-day free trial.
Example Video Generation and Segmentation Models
Meta's Segment Anything can instantly identify objects in complex images and videos. Built on the SA-1B dataset, one of the largest image segmentation datasets ever publicly released, it saves technicians time and helps generate new training datasets for more refined computer vision model development.
By incorporating LLMs and AI tools into research and education, faculty and students can enhance their work, improve efficiency, and foster a deeper understanding of AI's potential in various fields.