Solving NLP Tasks using LLMs

Data Umbrella Webinar, 2023-05-17

Pablo Duboue, PhD

Presenter Notes

Land Acknowledgement

This talk is being held on the unceded traditional territories of the xʷməθkʷəy̓əm (Musqueam), Sḵwx̱wú7mesh (Squamish), and səlilwətaɬ (Tsleil-Waututh) Nations.

Presenter Notes

This talk

NLP, now and then
NLP Tasks
Prompting 101
NLP with OpenAI API
Open Source LLMs

Available at https://textualization.com/talks/20230517dataumbrella.html

Presenter Notes

NLP, now and then

Rule-based
Statistical
Deep learning
LLMs

Presenter Notes

Rule-based

IF-THEN-ELSE
Regular Expressions
Wouldn't be considered NLP these days

Very much alive, full end-to-end problems solved this way
We just don't talk about those things in this house, we have LLMs!

Presenter Notes

Statistical

Let's assume meaning of words are independent of their order and see how far can we take that
- It turns out, very far
- Gets you 60% of the way for most difficult problems
With massive human effort, it can take you all the way
- That human effort is feature engineering one of my favourite topics

Watson Jeopardy! system was its highest point

Presenter Notes

Deep Learning

Language is clearly non-linear
Use large amounts of data to infer intermediate representations
- one or two orders of magnitude than statistical
Can get you 80% of the way
- With massive datasets and computing power might get you all the way

Presenter Notes

What is a Language Model

A language model tells you the probability of new words given already seen words.
- The most famous example is the autocorrect functionality of your cellphone
  - "How does ChatGPT knows XYZ?"... the same way your phone knows that after your first name most probably your last name follows.
- With better models and much more training data, they work much better
Something happens when the language model gets really big

Presenter Notes

LLMs (aka "ChatGPT")

More Is Different, by P.W. Anderson, Science (1972)
A large language model all of a sudden is more than just a language model

Emergent behaviour
Reasoning capabilities

Are they a mirage?
- Are Emergent Abilities of Large Language Models a Mirage? by Schaeffer, Miranda, Koyejo at arXiv (2023)

Presenter Notes

More than just language

Besides naturally occurring text, LLMs use two other sources:

Instructions
- Meta-text, text about tasks that the model needs to do
- "Summarize this text", "Make it sound less obnoxious", etc
Human feedback (RLHF, reinforcement learning from human feedback)
- Humans are asked whether the behaviour of the system (the text produced) is good or bad

Presenter Notes

About the speaker

Been around doing research in NLP/ML for the last 25 years

I've seen things you people wouldn't believe... Attack ships on fire off the shoulder of Orion...
Corporate research scientist for 6 years
- Helped build the IBM Jeopardy! Watson system
About 50 peer-reviewed papers and patents
Have a one person company in Vancouver, Canada
- Consulting mostly for startups
I wrote a book on Feature Engineering published in 2019 by Cambdrige University press

Presenter Notes

About this talk

Different problems need different tools
- Yes, we can use LLMs for everything but shall we?
A course on hybrid (rules/statistical) IE taught in 2016:
- https://github.com/IE4OpenData/ECI2016T2
- IE for Open Data initiative: https://ie4opendata.org/
McGill talk NLP survival tips for non-NLP Graduate Students
In 2016, on a team with Reshama Shaikh and others we worked in Twitter named entity recognition

Presenter Notes

Section 1: NLP tasks

Presenter Notes

Part-of-Speech (POS) tagging

The most basic task in traditional NLP
- Tell apart nouns, from verbs, from functional words such as articles and preopositions.
While it sounds like a very simple task, many words are ambiguous and their POS depends on their role in the sentence.
- At such, it is a preparsing task.
Extracting lists of nouns or verbs can be useful on its own.
- Finding the most common verb between two nouns can give an idea of the type of link between the nouns.

Presenter Notes

POS tagging example

The Brown University Standard Corpus of Present-Day American English is annotated with POS:

 1<p><s n="1"><w type="PPS">It</w> <w type="BEZ">is</w> 
 2  <w type="NEG">not</w> <w type="NN">news</w> <w type="CS">that</w> 
 3  <w type="NP">Nathan</w> <w type="NP">Milstein</w> 
 4  <w type="BEZ">is</w> <w type="AT">a</w> <w type="NN">wizard</w> 
 5  <w type="IN">of</w> <w type="AT">the</w> <w type="NN">violin</w> 
 6  <c type="pct">.</c> </s>
 7<s n="2"><w type="RB">Certainly</w> <w type="NEG">not</w> 
 8  <w type="IN">in</w> <w type="NN" subtype="TL">Orchestra</w> 
 9  <w type="NN" subtype="TL">Hall</w> <w type="WRB">where</w> 
10  <w type="PPS">he</w> <w type="HVZ">has</w> <w type="VBN">played</w> 
11  <w type="JJ">countless</w> <w type="NNS">recitals</w> 
12  <c type="pct">,</c> <w type="CC">and</w> <w type="WRB">where</w> 
13  <w type="NR">Thursday</w> <w type="NN">night</w> 
14  <w type="PPS">he</w> <w type="VBD">celebrated</w> 
15  <w type="PPg">his</w> <w type="OD">20th</w> <w type="NN">season</w> 
16  <w type="IN">with</w> <w type="AT">the</w> 
17  <w type="NP" subtype="TL">Chicago</w> 
18  <w type="NN" subtype="TL">Symphony</w> 
19  <w type="NN" subtype="TL">Orchestra</w> <c type="pct">,</c>

Presenter Notes

Named Entity Recognition (NER)

In NLP, a named entity is an object, person or location in the real world that can be denoted by a proper name
- Usually find in long lists of names, called gazzetteers
- Example types: Person, Organization, Location, Vehicle, Geo-Political Entity
The NER task involves identifying such NERs in running text.

If the NER is fully determined, it is as simple as a dictionary lookup.
- It gets tricky when there are confounding entities ("does Obama refer to Barack or Michelle?")
- Or when the type of the entity is ambiguous ("does IBM refer to the company or a particular location?")

Presenter Notes

NER example

[Fred Flintstone]^person was named [CTO]^position of [Time Bank Inc.]^organization in [2031]^date . The [next year] [he] got married and became [CEO]^position of [Dinosaur Savings & Loan]^organization .

Presenter Notes

Information Extraction (IE)

A series of techniques to extract structured data from text
- Think of it as "word files into excel files"
Expand on NER by incorporating relations between entities.

Ultimate task is slot filling
- Finding all the available data for a particular event on a text description

Presenter Notes

IE example

"Practical extraction of disaster-relevant information from social media.", Imran et al., WWW 2013
Casualties and damage: Infrastructure
- RT @TIME: NYC building had numerous construction complaints before crane collapse #Sandy
Casualties and damage: Injured or dead
- At least 39 dead millions without power in Sandy’s after- math. http://t.co/Wdvz8KK8

Presenter Notes

Sentiment Analysis

Determine whether a piece of text voices a positive or negative opinion
- Or no opinion at all
At its core, it is a text classification problem

It gets difficult when we want to know the opinion with respect to particular entity
- People thanking the Red Cross on Twitter while talking about the devastation of a hurricane

Presenter Notes

Sentiment Analysis example

Positive (Large Movie Review Dataset v1.0)

i was greatly moved when i watched the movie.how jonny could keep such hope and faith was amazing. so many people only care about what they want , and fuss about all the things they don't have . ...
Negative

My roommate got the No, No, Nanette soundtrack as a dub on a tape and she proceeded to listen to it non-stop. After it finally totally brainwashed me into submission, I found the songs to be irresistible, especially the famous, I want to be happy, but I can't be happy... But of coarse from the soundtrack I had no idea what the film was about. So the other day I saw a copy of it at the video store and I rented what was supposed to be a long lost version of the film. I was thinking that it was going to be amazing, because the soundtrack is so cute. Unfortunately most of the songs that I loved were nowhere to be found in the video I saw. Now I've never seen the 1930 version of the musical but this version was sadly disappointing because there was very little singing and practically no dancing and beside that the sound was really bad through out and you couldn't really understand what people were saying a lot of the time

Presenter Notes

Section 2: Prompting 101

Presenter Notes

OpenAI API

Won't be talking about internals of LLMs
- That content is all over the Web
- If you want to see my take on it, I have lectures on Neural Network Architectures on YouTube.
At https://platform.openai.com/signup get an API key
- Will need to provide a phone
- Most probably a credit card number, too
- Might get some free credits for 30 days

Presenter Notes

OpenAI API CLI

With the API key, the completions can be engaged command line using curl:

1curl https://api.openai.com/v1/chat/completions \
2  -H "Content-Type: application/json" \
3  -H "Authorization: Bearer $OPENAI_API_KEY" \
4  -d '{
5    "model": "gpt-3.5-turbo",
6    "messages": [{"role": "user", "content": "Say this is a test!"}],
7    "temperature": 0.7
8  }'

Presenter Notes

OpenAI API from Python

With the API key, the completions can be engaged in Python as follows:

1import openai
2openai.api_key = "YOUR KEY"
3completion = openai.ChatCompletion.create(
4  model="gpt-3.5-turbo",
5  messages=[{"role": "user", "content": "Hello!"}])
6print(completion.choices[0].message)

Presenter Notes

Zero shot

First task is to write down what we want for the system to do (the "prompt") and see how well it does it.
- "Zero shot" refers to the fact we are not giving the LLM anything else
- No examples
- No fine-tuning of the weights

Example:

Given the following tweet, identify any infrastructure damage and 
place it in a line that starts with "Infrastructure damage:". 
If not damage is present, say "Infrastructure damage: -".

Tweet: RT @TIME: NYC building had numerous construction 
complaints before crane collapse  #Sandy

Presenter Notes

Temperature

The most likely next word might not be the best word
- Other words might result in a more likely text
To help in this search process, the LLMs have a temperature
- A temperature of zero always produce the same result
- A higher temperature produces more variations

Presenter Notes

Output

Input:

Tweet: RT @TIME: NYC building had numerous construction 
complaints before crane collapse  #Sandy

Output 1 (T=0.7):

Infrastructure damage: Crane collapse on NYC building.

Output 2 (T=0.7):

Infrastructure damage: Crane collapse at NYC building 
due to construction complaints.

The complaints were the cause of the collapse???
That is not said in the Tweet. These are called hallucinations

Presenter Notes

Few shot

If the task is unusual or we want more control, we can incorporate training data into the prompt.
This very small amount of training data ("few shot") is usually called "exemplars"
- Picking quality exemplars is key

Presenter Notes

Few shot exemplars

Example:

Annotate nouns and verbs in the sentences below using the 
examples as reference:

Example A. The General Assembly/n , which adjourns/v today/n , 
has/v performed/v in an atmosphere/n of crisis/n and struggle/n 
from the day/n its convened/v .

Example B. It was/v faced/v immediately with a showdown/n on the 
schools/n , an issue/n which was/v met/v squarely in conjunction/n 
with the governor/n with a decision/n not to risk/v abandoning/v 
public/n education/n .

Annotate:

1. There followed the historic appropriations and budget fight , 
in which the General Assembly decided to tackle executive powers .
2. The final decision went to the executive but a way has been 
opened for strengthening budgeting procedures and to provide 
legislators information they need .

Presenter Notes

Output

1. historic appropriations/n, budget fight/n, General Assembly/n, 
decided/v, tackle/v, executive powers/n

2. final decision/n, executive/n, way/n, opened/v, strengthening/v, 
budgeting procedures/n, provide/v, legislators/n, information/n

Better prompt will get the format right?
Let's discuss that in the next section.

Presenter Notes

Output, GPT-4

1. There/n followed/v the historic/n appropriations/n and budget/n 
fight/n , in which the General/n Assembly/n decided/v to tackle/v 
executive/n powers/n .

2. The final/n decision/n went/v to the executive/n but a way/n 
has/v been/v opened/v for strengthening/v budgeting/n procedures/n 
and to provide/v legislators/n information/n they/n need/v .

Presenter Notes

Handling priors in exemplars

Quality exemplars should have class counts representative of the task at hand.
- If 20% of the reviews are positive and 80% are negative:
  - Provide 10 exemplars
  - 2 positive
  - 8 negative

Presenter Notes

Chain-of-Thought (CoT)

A great technique to improve accuracy is to induce the model to work piecewise
- Finding subproblems and solving them on its own
- Using the text completion to guide the process
- This is called "chain-of-thought" or CoT
This behaviour is easily activated by asking the system to "think step-by-step"
- Add to the prompt the expression "Let's think step-by-step."
- It is also possible to give CoT exemplars

Presenter Notes

LLM role

As part of the prompt, we can specify the type of attitude the LLM can take to solve the problem.
We can ask the LLM to be a
- "helpful customer service representative" or say
- "you are a lawyer for an employee about to sign a permanent employment contract"

In the NLP tasks case, we can ask the LLM to be a copyeditor or a linguist analysing text

Presenter Notes

Recursing

Take a different prompt and send the output of the original prompt through the API again
- By changing the type of roles and tasks we are asking the LLM to perform, it is possible to greatly improve the output
  1. Dev role
  2. QA role
  3. Security engineer role

There is support for doing this programmatically:
- LangChain

Presenter Notes

Learning More

Learn Prompting: https://learnprompting.org/
Full course: https://fullstackdeeplearning.com/llm-bootcamp/spring-2023/
Learning AI Together discord: https://ws.towardsai.net/discord

My lectures on Neural Network Architectures.

Presenter Notes

Section 3: Solving NLP tasks with OpenAI API

Presenter Notes

POS tagging

Revisiting POS tagging with the help of Google and Stack Overflow

How is ChatGPT able to perform part-of-speech tagging?

Prompt:

POS-tag: <sentence>

Example:

POS-tag: There followed the historic appropriations and budget 
fight , in which the General Assembly decided to tackle executive 
powers .

Output (T=0.7):

There/EX followed/VBD the/DT historic/JJ appropriations/NNS and/CC 
budget/NN fight/NN ,/, in/IN which/WDT the/DT General/NNP Assembly/NNP 
decided/VBD to/TO tackle/VB executive/JJ powers/NNS ./.

Presenter Notes

Output is unstable

Same prompt, new output (T=0.7):

 Verb - past tense (followed)
 Determiner (the)
 Adjective (historic)
 Noun (appropriations)
 Conjunction (and)
 Noun (budget)
 Noun (fight)
 Preposition (in)
 Pronoun (which)
 Determiner (the)
 Noun (General Assembly)
 Verb (decided)
 Infinitive marker (to)
 Verb (tackle)
 Noun (executive powers)

Presenter Notes

Better prompt

Prompt:

 POS-tag Brown Corpus style: There followed the historic appropriations 
 and budget fight , in which the General Assembly decided to tackle 
 executive powers .

Output (T=0.7):

 There/EX followed/VBD the/DT historic/JJ appropriations/NNS and/CC 
 budget/NN fight/NN ,/, in/IN which/WDT the/DT General/NNP Assembly/NNP 
 decided/VBD to/TO tackle/VB executive/JJ powers/NNS ./.

Three repetitions produced the same output.

Note: those are not Brown Corpus tags but Penn Tree Bank tags!

Presenter Notes

Annotation Manual

It turns out annotating POS reliably is hard.

The Brown Corpus has a 20-page annotation manual (published 1964, revised 1979).

Here is an example:

5. Words occurring as constituents of titles, e.g. of books, plays, 
corporations, government agencies, etc., are given their normal tag 
with the addition of the hyphenated tag -TL. In most cases these words 
are capitalized, except for function-words such as prepositions, 
conjunctions, and sometimes pronouns. Some examples:

      the United    States  of  America
          VBN-TL    NNS-TL  IN-TL   NP-TL

I suspect quality tagging for a new tagset will need at the very least a 20-page prompt.

Presenter Notes

NER prompt

See if an extension of the POS one would work:

 NER-tag using MUC-6 tagset:  China is the largest Bitcoin market 
 in the world, and prices fell as low as 2,551 yuan (£257) for one 
 Bitcoin following the news. It had been trading as high as 4,136 
 yuan (£416) just hours before the announcement.

 The ban was issued by the central bank, the People’s Bank of China, 
 after it had moved to crack down on use of the currency this month.

MUC-6 only has named entities, temporal expressions, and number expressions.

Presenter Notes

NER Output is unstable

Output (T=0.7):

 ORG: People's Bank of China
 LOC: China
 MONEY: 2,551 yuan, £257, 4,136 yuan, £416
 MISC: Bitcoin

Output (T=0.7):

 ORG: People’s Bank of China
 MISC: Bitcoin, yuan
 LOC: China
 O: central bank

Output (T=0.7):

 B-LOC China O Bitcoin O \n
 O market O 
 O in O 
 O the O 
 O world O , O 
 O and O 
 O prices O 
 O fell O 
 O as O 
 O low O 
 O as O 
 B-MON 2,551 O 
 O yuan O

Presenter Notes

NER MUC-6 Locations

Using the MUC-6 guidelines for locations from 1995, we end up with a 51-line, 544-word prompt:

 You are a Linguistic Data Consortium annotator and need to annotate 
 text (marked as "TEXT TO ANNOTATE:") following the MUC-6 locations 
 guidelines defined as follows:

 2.1 Markup Description
 The output of the systems to be evaluated will be in the form of SGML 
 text markup. The only insertions allowed during tagging are tags 
 enclosed in angled brackets. No extra whitespace or carriage returns 
 are to be inserted; otherwise, the offset count would change, which 
 would adversely affect scoring.

 The markup will have the following form:

 <ELEMENT-NAME ATTR-NAME="ATTR-VALUE" ...>text-string</ELEMENT-NAME>

 Example:

 <ENAMEX TYPE="ORGANIZATION">Taga Co.</ENAMEX>

  The markup is defined in SGML Document Type Descriptions (DTDs), 
  written for MUC-6 use by personnel at MITRE and maintained by 
  personnel at NRaD. The DTDs enable...

Presenter Notes

ChatGPT Output

Output (T=0.7):

1 <LOCATION>China</LOCATION> is the largest Bitcoin market in the world, 
2 and prices fell as low as 2,551 yuan (£257) for one Bitcoin following 
3 the news. It had been trading as high as 4,136 yuan (£416) just hours 
4 before the announcement.
5
6 The ban was issued by the <ORGANIZATION>central bank</ORGANIZATION>, 
7 the <ORGANIZATION>People’s Bank of China</ORGANIZATION>, after it had 
8 moved to crack down on use of the currency this month.

Output (T=0.7):

 1 <LOCATION>China</LOCATION> is the largest 
 2 <ENAMEX TYPE="ORGANIZATION">Bitcoin</ENAMEX> market in the world, and 
 3 prices fell as low as 2,551 yuan (£257) for one 
 4 <ENAMEX TYPE="ORGANIZATION">Bitcoin</ENAMEX> following the news. 
 5 It had been trading as high as 4,136 yuan (£416) just hours before the 
 6 announcement.
 7
 8 The ban was issued by the 
 9 <ORGANIZATION><ENAMEX TYPE="ORGANIZATION">People’s Bank of 
10 China</ENAMEX></ORGANIZATION>, after it had moved to crack down 
11 on use of the currency this month.

Presenter Notes

GPT4 Output

Output (T=0.7):

1 <ENAMEX TYPE="LOCATION">China</ENAMEX> is the largest Bitcoin market 
2 in the world, and prices fell as low as 2,551 yuan (£257) for one 
3 Bitcoin following the news. It had been trading as high as 4,136 yuan 
4 (£416) just hours before the announcement.
5
6 The ban was issued by the central bank, the 
7 <ENAMEX TYPE="ORGANIZATION">People’s Bank of China</ENAMEX>, after it 
8 had moved to crack down on use of the currency this month.

Same output over three different executions

Presenter Notes

IE prompt

Following CoNLL'04 we will look at relation extraction for relations such as

located in, work for, organization based in, live in

 You are a business analyst that need to fill a table from news 
 sentences. The table contains people and their employers.

 Given the sentences:

 * W. Dale Nelson covers the White House for The Associated Press .
 * James G. Blight and David A. Welch of Harvard University 's John F. Kennedy School of Government say that ` ` if this order had held , war between the superpowers would probably have commenced at sea , shortly after 10 o 'clock on Wednesday morning , Oct. 24 , 1962 , several hundred miles off the coast of Cuba. ' '

 The table looks like

 | Person | Employer |
 |--------|----------|
 |W. Dale Nelson|The Associated Press|
 |James G. Blight|Harvard University|
 |David A. Welch|Harvard University|

Presenter Notes

IE Prompt (cont.)

 Now complete the table given the following sentences:

 * ` ` If it does not snow , and a lot , within this month we 
 will have no water to submerge 150 , 000 hectares ( 370 , 500 acres ) 
 of rice , ' ' said Bruno Pusterla , a top official of the Italian 
 Agricultural Confederation . 
 * Treiki disclosed the planned missile tests during Security Council 
 debate Tuesday night after the U.S. 6th Fleet informed Libyan air 
 traffic controllers so they could divert civilian air traffic .
 * Boston University 's Michael D. Papagiannis said he believes the 
 crater was created 100 million years ago when a 50-mile-wide meteorite 
 slammed into the Earth .

Presenter Notes

IE Output

  | Person | Employer |
  |--------|----------|
  |W. Dale Nelson|The Associated Press|
  |James G. Blight|Harvard University|
  |David A. Welch|Harvard University|
  |Bruno Pusterla|Italian Agricultural Confederation|
  |Treiki|Unknown|
  |Michael D. Papagiannis|Boston University|

Stable over three runs.

Presenter Notes

Sentiment Analysis prompt

  Given this review, say whether the person liked or hated the movie.

  Bromwell High is a cartoon comedy. It ran at the same time as some 
  other programs about school life, such as "Teachers". My 35 years in 
  the teaching profession lead me to believe that Bromwell High's satire 
  is much closer to reality than is "Teachers". The scramble to survive 
  financially, the insightful students who can see right through their 
  ...

Tried one positive, one negative, was correct and stayed stable for three prompts over ChatGPT

Presenter Notes

Section 4: Using open source LLMs

Presenter Notes

Why Open Source LLMs

Less cost
More control
- Fine-tuning weights
- Fine-tuning prompt
- Training a classifier on top of results
Privacy
The right thing to do

Presenter Notes

Issues with Open Souce models

Windows size
Dealing with numbers
Poor programming capabilities

But they are getting so much better Google fears them.

Presenter Notes

Some models

EuletherAI GPT NeoX (48G GPU)
OpenChatKit (instructional training)
- OIG by LATION of Stable Diffusion fame
EuletherAI Pythia models
BLOOM models
- Not open source (use restrictions): https://huggingface.co/spaces/bigscience/license
Dolly models, by databricks

Presenter Notes

Conclusions

This is a great time to enter NLP
- LLMs are offering a much leveled field
OpenAI API now, Open Source next week
- LLMs running locally in your cellphone

Not all NLP tasks need to be solved by LLMs

Presenter Notes

Connecting:

Twitter: @pabloduboue
GitHub: DrDub
Projects: http://wiki.duboue.net/A_Dollar_Worth_of_Ideas

https://tellandshow.org/: community-owned machine learning
https://textualization.com/gptwhitepaper/
https://artoffeatureengineering.com/

Table of Contents	t
Exposé	ESC
Presenter View	p
Source Files	s
Slide Numbers	n
Toggle screen blanking	b
Show/hide next slide	c
Notes	2
Help	h

Data Umbrella Webinar, 2023-05-17

Pablo Duboue, PhD

Table of Contents

Help