Solving NLP Tasks using LLMs

Data Umbrella Webinar, 2023-05-17

Pablo Duboue, PhD

Presenter Notes

Land Acknowledgement

This talk is being held on the unceded traditional territories of the xʷməθkʷəy̓əm (Musqueam), Sḵwx̱wú7mesh (Squamish), and səlilwətaɬ (Tsleil-Waututh) Nations.

Presenter Notes

This talk

  • NLP, now and then
  • NLP Tasks
  • Prompting 101
  • NLP with OpenAI API
  • Open Source LLMs

Available at https://textualization.com/talks/20230517dataumbrella.html

Presenter Notes

NLP, now and then

  • Rule-based
  • Statistical
  • Deep learning
  • LLMs

Presenter Notes

Rule-based

  • IF-THEN-ELSE
  • Regular Expressions
  • Wouldn't be considered NLP these days

  • Very much alive, full end-to-end problems solved this way
  • We just don't talk about those things in this house, we have LLMs!

Presenter Notes

Statistical

  • Let's assume meaning of words are independent of their order and see how far can we take that
    • It turns out, very far
    • Gets you 60% of the way for most difficult problems
  • With massive human effort, it can take you all the way
    • That human effort is feature engineering one of my favourite topics

  • Watson Jeopardy! system was its highest point

Presenter Notes

Deep Learning

  • Language is clearly non-linear
  • Use large amounts of data to infer intermediate representations
    • one or two orders of magnitude than statistical
  • Can get you 80% of the way
    • With massive datasets and computing power might get you all the way

Presenter Notes

What is a Language Model

  • A language model tells you the probability of new words given already seen words.
    • The most famous example is the autocorrect functionality of your cellphone
      • "How does ChatGPT knows XYZ?"... the same way your phone knows that after your first name most probably your last name follows.
    • With better models and much more training data, they work much better
  • Something happens when the language model gets really big

Presenter Notes

LLMs (aka "ChatGPT")

  • More Is Different, by P.W. Anderson, Science (1972)
  • A large language model all of a sudden is more than just a language model

  • Emergent behaviour
  • Reasoning capabilities

  • Are they a mirage?
    • Are Emergent Abilities of Large Language Models a Mirage? by Schaeffer, Miranda, Koyejo at arXiv (2023)

Presenter Notes

More than just language

  • Besides naturally occurring text, LLMs use two other sources:
  1. Instructions
    • Meta-text, text about tasks that the model needs to do
    • "Summarize this text", "Make it sound less obnoxious", etc
  2. Human feedback (RLHF, reinforcement learning from human feedback)
    • Humans are asked whether the behaviour of the system (the text produced) is good or bad

Presenter Notes

About the speaker

  • Been around doing research in NLP/ML for the last 25 years

    I've seen things you people wouldn't believe... Attack ships on fire off the shoulder of Orion...

  • Corporate research scientist for 6 years

    • Helped build the IBM Jeopardy! Watson system
  • About 50 peer-reviewed papers and patents
  • Have a one person company in Vancouver, Canada
    • Consulting mostly for startups
  • I wrote a book on Feature Engineering published in 2019 by Cambdrige University press

Presenter Notes

About this talk

Presenter Notes

Section 1: NLP tasks

Presenter Notes

Part-of-Speech (POS) tagging

  • The most basic task in traditional NLP
    • Tell apart nouns, from verbs, from functional words such as articles and preopositions.
  • While it sounds like a very simple task, many words are ambiguous and their POS depends on their role in the sentence.
    • At such, it is a preparsing task.
  • Extracting lists of nouns or verbs can be useful on its own.
    • Finding the most common verb between two nouns can give an idea of the type of link between the nouns.

Presenter Notes

POS tagging example

 1<p><s n="1"><w type="PPS">It</w> <w type="BEZ">is</w> 
 2  <w type="NEG">not</w> <w type="NN">news</w> <w type="CS">that</w> 
 3  <w type="NP">Nathan</w> <w type="NP">Milstein</w> 
 4  <w type="BEZ">is</w> <w type="AT">a</w> <w type="NN">wizard</w> 
 5  <w type="IN">of</w> <w type="AT">the</w> <w type="NN">violin</w> 
 6  <c type="pct">.</c> </s>
 7<s n="2"><w type="RB">Certainly</w> <w type="NEG">not</w> 
 8  <w type="IN">in</w> <w type="NN" subtype="TL">Orchestra</w> 
 9  <w type="NN" subtype="TL">Hall</w> <w type="WRB">where</w> 
10  <w type="PPS">he</w> <w type="HVZ">has</w> <w type="VBN">played</w> 
11  <w type="JJ">countless</w> <w type="NNS">recitals</w> 
12  <c type="pct">,</c> <w type="CC">and</w> <w type="WRB">where</w> 
13  <w type="NR">Thursday</w> <w type="NN">night</w> 
14  <w type="PPS">he</w> <w type="VBD">celebrated</w> 
15  <w type="PPg">his</w> <w type="OD">20th</w> <w type="NN">season</w> 
16  <w type="IN">with</w> <w type="AT">the</w> 
17  <w type="NP" subtype="TL">Chicago</w> 
18  <w type="NN" subtype="TL">Symphony</w> 
19  <w type="NN" subtype="TL">Orchestra</w> <c type="pct">,</c>

Presenter Notes

Named Entity Recognition (NER)

  • In NLP, a named entity is an object, person or location in the real world that can be denoted by a proper name
    • Usually find in long lists of names, called gazzetteers
    • Example types: Person, Organization, Location, Vehicle, Geo-Political Entity
  • The NER task involves identifying such NERs in running text.

  • If the NER is fully determined, it is as simple as a dictionary lookup.
    • It gets tricky when there are confounding entities ("does Obama refer to Barack or Michelle?")
    • Or when the type of the entity is ambiguous ("does IBM refer to the company or a particular location?")

Presenter Notes

NER example

[Fred Flintstone]person was named [CTO]position of [Time Bank Inc.]organization in [2031]date . The [next year] [he] got married and became [CEO]position of [Dinosaur Savings & Loan]organization .

Presenter Notes

Information Extraction (IE)

  • A series of techniques to extract structured data from text
    • Think of it as "word files into excel files"
  • Expand on NER by incorporating relations between entities.

  • Ultimate task is slot filling
    • Finding all the available data for a particular event on a text description

Presenter Notes

IE example

Presenter Notes

Sentiment Analysis

  • Determine whether a piece of text voices a positive or negative opinion
    • Or no opinion at all
  • At its core, it is a text classification problem

  • It gets difficult when we want to know the opinion with respect to particular entity
    • People thanking the Red Cross on Twitter while talking about the devastation of a hurricane

Presenter Notes

Sentiment Analysis example

  • Positive (Large Movie Review Dataset v1.0)

    i was greatly moved when i watched the movie.how jonny could keep such hope and faith was amazing. so many people only care about what they want , and fuss about all the things they don't have . ...

  • Negative

    My roommate got the No, No, Nanette soundtrack as a dub on a tape and she proceeded to listen to it non-stop. After it finally totally brainwashed me into submission, I found the songs to be irresistible, especially the famous, I want to be happy, but I can't be happy... But of coarse from the soundtrack I had no idea what the film was about. So the other day I saw a copy of it at the video store and I rented what was supposed to be a long lost version of the film. I was thinking that it was going to be amazing, because the soundtrack is so cute. Unfortunately most of the songs that I loved were nowhere to be found in the video I saw. Now I've never seen the 1930 version of the musical but this version was sadly disappointing because there was very little singing and practically no dancing and beside that the sound was really bad through out and you couldn't really understand what people were saying a lot of the time

Presenter Notes

Section 2: Prompting 101

Presenter Notes

OpenAI API

  • Won't be talking about internals of LLMs
  • At https://platform.openai.com/signup get an API key
    • Will need to provide a phone
    • Most probably a credit card number, too
    • Might get some free credits for 30 days

Presenter Notes

OpenAI API CLI

  • With the API key, the completions can be engaged command line using curl:
1curl https://api.openai.com/v1/chat/completions \
2  -H "Content-Type: application/json" \
3  -H "Authorization: Bearer $OPENAI_API_KEY" \
4  -d '{
5    "model": "gpt-3.5-turbo",
6    "messages": [{"role": "user", "content": "Say this is a test!"}],
7    "temperature": 0.7
8  }'

Presenter Notes

OpenAI API from Python

  • With the API key, the completions can be engaged in Python as follows:
1import openai
2openai.api_key = "YOUR KEY"
3completion = openai.ChatCompletion.create(
4  model="gpt-3.5-turbo",
5  messages=[{"role": "user", "content": "Hello!"}])
6print(completion.choices[0].message)

Presenter Notes

Zero shot

  • First task is to write down what we want for the system to do (the "prompt") and see how well it does it.
    • "Zero shot" refers to the fact we are not giving the LLM anything else
    • No examples
    • No fine-tuning of the weights

Example:

Given the following tweet, identify any infrastructure damage and 
place it in a line that starts with "Infrastructure damage:". 
If not damage is present, say "Infrastructure damage: -".

Tweet: RT @TIME: NYC building had numerous construction 
complaints before crane collapse  #Sandy

Presenter Notes

Temperature

  • The most likely next word might not be the best word
    • Other words might result in a more likely text
  • To help in this search process, the LLMs have a temperature
    • A temperature of zero always produce the same result
    • A higher temperature produces more variations

Presenter Notes

Output

Input:

Tweet: RT @TIME: NYC building had numerous construction 
complaints before crane collapse  #Sandy

Output 1 (T=0.7):

Infrastructure damage: Crane collapse on NYC building.

Output 2 (T=0.7):

Infrastructure damage: Crane collapse at NYC building 
due to construction complaints.

The complaints were the cause of the collapse???
That is not said in the Tweet. These are called
hallucinations

Presenter Notes

Few shot

  • If the task is unusual or we want more control, we can incorporate training data into the prompt.
  • This very small amount of training data ("few shot") is usually called "exemplars"
    • Picking quality exemplars is key

Presenter Notes

Few shot exemplars

Example:

Annotate nouns and verbs in the sentences below using the 
examples as reference:

Example A. The General Assembly/n , which adjourns/v today/n , 
has/v performed/v in an atmosphere/n of crisis/n and struggle/n 
from the day/n its convened/v .

Example B. It was/v faced/v immediately with a showdown/n on the 
schools/n , an issue/n which was/v met/v squarely in conjunction/n 
with the governor/n with a decision/n not to risk/v abandoning/v 
public/n education/n .

Annotate:

1. There followed the historic appropriations and budget fight , 
in which the General Assembly decided to tackle executive powers .
2. The final decision went to the executive but a way has been 
opened for strengthening budgeting procedures and to provide 
legislators information they need .

Presenter Notes

Output

1. historic appropriations/n, budget fight/n, General Assembly/n, 
decided/v, tackle/v, executive powers/n

2. final decision/n, executive/n, way/n, opened/v, strengthening/v, 
budgeting procedures/n, provide/v, legislators/n, information/n

Better prompt will get the format right?
Let's discuss that in the next section.

Presenter Notes

Output, GPT-4

1. There/n followed/v the historic/n appropriations/n and budget/n 
fight/n , in which the General/n Assembly/n decided/v to tackle/v 
executive/n powers/n .

2. The final/n decision/n went/v to the executive/n but a way/n 
has/v been/v opened/v for strengthening/v budgeting/n procedures/n 
and to provide/v legislators/n information/n they/n need/v .

Presenter Notes

Handling priors in exemplars

  • Quality exemplars should have class counts representative of the task at hand.
    • If 20% of the reviews are positive and 80% are negative:
      • Provide 10 exemplars
      • 2 positive
      • 8 negative

Presenter Notes

Chain-of-Thought (CoT)

  • A great technique to improve accuracy is to induce the model to work piecewise
    • Finding subproblems and solving them on its own
    • Using the text completion to guide the process
    • This is called "chain-of-thought" or CoT
  • This behaviour is easily activated by asking the system to "think step-by-step"
    • Add to the prompt the expression "Let's think step-by-step."
    • It is also possible to give CoT exemplars

Presenter Notes

LLM role

  • As part of the prompt, we can specify the type of attitude the LLM can take to solve the problem.

  • We can ask the LLM to be a

    • "helpful customer service representative" or say
    • "you are a lawyer for an employee about to sign a permanent employment contract"
  • In the NLP tasks case, we can ask the LLM to be a copyeditor or a linguist analysing text

Presenter Notes

Recursing

  • Take a different prompt and send the output of the original prompt through the API again
    • By changing the type of roles and tasks we are asking the LLM to perform, it is possible to greatly improve the output
      1. Dev role
      2. QA role
      3. Security engineer role
  • There is support for doing this programmatically:

Presenter Notes

Presenter Notes

Section 3: Solving NLP tasks with OpenAI API

Presenter Notes

POS tagging

  • Revisiting POS tagging with the help of Google and Stack Overflow

How is ChatGPT able to perform part-of-speech tagging?

Prompt:

POS-tag: <sentence>

Example:

POS-tag: There followed the historic appropriations and budget 
fight , in which the General Assembly decided to tackle executive 
powers .

Output (T=0.7):

There/EX followed/VBD the/DT historic/JJ appropriations/NNS and/CC 
budget/NN fight/NN ,/, in/IN which/WDT the/DT General/NNP Assembly/NNP 
decided/VBD to/TO tackle/VB executive/JJ powers/NNS ./.

Presenter Notes

Output is unstable

Same prompt, new output (T=0.7):

 Verb - past tense (followed)
 Determiner (the)
 Adjective (historic)
 Noun (appropriations)
 Conjunction (and)
 Noun (budget)
 Noun (fight)
 Preposition (in)
 Pronoun (which)
 Determiner (the)
 Noun (General Assembly)
 Verb (decided)
 Infinitive marker (to)
 Verb (tackle)
 Noun (executive powers)

Presenter Notes

Better prompt

Prompt:

 POS-tag Brown Corpus style: There followed the historic appropriations 
 and budget fight , in which the General Assembly decided to tackle 
 executive powers .

Output (T=0.7):

 There/EX followed/VBD the/DT historic/JJ appropriations/NNS and/CC 
 budget/NN fight/NN ,/, in/IN which/WDT the/DT General/NNP Assembly/NNP 
 decided/VBD to/TO tackle/VB executive/JJ powers/NNS ./.

Three repetitions produced the same output.

Note: those are not Brown Corpus tags but Penn Tree Bank tags!

Presenter Notes

Annotation Manual

It turns out annotating POS reliably is hard.

The Brown Corpus has a 20-page annotation manual (published 1964, revised 1979).

Here is an example:

5. Words occurring as constituents of titles, e.g. of books, plays, 
corporations, government agencies, etc., are given their normal tag 
with the addition of the hyphenated tag -TL. In most cases these words 
are capitalized, except for function-words such as prepositions, 
conjunctions, and sometimes pronouns. Some examples:

      the United    States  of  America
          VBN-TL    NNS-TL  IN-TL   NP-TL

I suspect quality tagging for a new tagset will need at the very least a 20-page prompt.

Presenter Notes

NER prompt

See if an extension of the POS one would work:

 NER-tag using MUC-6 tagset:  China is the largest Bitcoin market 
 in the world, and prices fell as low as 2,551 yuan (£257) for one 
 Bitcoin following the news. It had been trading as high as 4,136 
 yuan (£416) just hours before the announcement.

 The ban was issued by the central bank, the People’s Bank of China, 
 after it had moved to crack down on use of the currency this month.

MUC-6 only has named entities, temporal expressions, and number expressions.

Presenter Notes

NER Output is unstable

Output (T=0.7):

 ORG: People's Bank of China
 LOC: China
 MONEY: 2,551 yuan, £257, 4,136 yuan, £416
 MISC: Bitcoin

Output (T=0.7):

 ORG: People’s Bank of China
 MISC: Bitcoin, yuan
 LOC: China
 O: central bank

Output (T=0.7):

 B-LOC China O Bitcoin O \n
 O market O 
 O in O 
 O the O 
 O world O , O 
 O and O 
 O prices O 
 O fell O 
 O as O 
 O low O 
 O as O 
 B-MON 2,551 O 
 O yuan O

Presenter Notes

NER MUC-6 Locations

Using the MUC-6 guidelines for locations from 1995, we end up with a 51-line, 544-word prompt:

 You are a Linguistic Data Consortium annotator and need to annotate 
 text (marked as "TEXT TO ANNOTATE:") following the MUC-6 locations 
 guidelines defined as follows:

 2.1 Markup Description
 The output of the systems to be evaluated will be in the form of SGML 
 text markup. The only insertions allowed during tagging are tags 
 enclosed in angled brackets. No extra whitespace or carriage returns 
 are to be inserted; otherwise, the offset count would change, which 
 would adversely affect scoring.

 The markup will have the following form:

 <ELEMENT-NAME ATTR-NAME="ATTR-VALUE" ...>text-string</ELEMENT-NAME>

 Example:

 <ENAMEX TYPE="ORGANIZATION">Taga Co.</ENAMEX>

  The markup is defined in SGML Document Type Descriptions (DTDs), 
  written for MUC-6 use by personnel at MITRE and maintained by 
  personnel at NRaD. The DTDs enable...

Presenter Notes

ChatGPT Output

Output (T=0.7):

1 <LOCATION>China</LOCATION> is the largest Bitcoin market in the world, 
2 and prices fell as low as 2,551 yuan (£257) for one Bitcoin following 
3 the news. It had been trading as high as 4,136 yuan (£416) just hours 
4 before the announcement.
5
6 The ban was issued by the <ORGANIZATION>central bank</ORGANIZATION>, 
7 the <ORGANIZATION>People’s Bank of China</ORGANIZATION>, after it had 
8 moved to crack down on use of the currency this month.

Output (T=0.7):

 1 <LOCATION>China</LOCATION> is the largest 
 2 <ENAMEX TYPE="ORGANIZATION">Bitcoin</ENAMEX> market in the world, and 
 3 prices fell as low as 2,551 yuan (£257) for one 
 4 <ENAMEX TYPE="ORGANIZATION">Bitcoin</ENAMEX> following the news. 
 5 It had been trading as high as 4,136 yuan (£416) just hours before the 
 6 announcement.
 7
 8 The ban was issued by the 
 9 <ORGANIZATION><ENAMEX TYPE="ORGANIZATION">People’s Bank of 
10 China</ENAMEX></ORGANIZATION>, after it had moved to crack down 
11 on use of the currency this month.

Presenter Notes

GPT4 Output

Output (T=0.7):

1 <ENAMEX TYPE="LOCATION">China</ENAMEX> is the largest Bitcoin market 
2 in the world, and prices fell as low as 2,551 yuan (£257) for one 
3 Bitcoin following the news. It had been trading as high as 4,136 yuan 
4 (£416) just hours before the announcement.
5
6 The ban was issued by the central bank, the 
7 <ENAMEX TYPE="ORGANIZATION">People’s Bank of China</ENAMEX>, after it 
8 had moved to crack down on use of the currency this month.

Same output over three different executions

Presenter Notes

IE prompt

  • Following CoNLL'04 we will look at relation extraction for relations such as
  • located in, work for, organization based in, live in
 You are a business analyst that need to fill a table from news 
 sentences. The table contains people and their employers.

 Given the sentences:

 * W. Dale Nelson covers the White House for The Associated Press .
 * James G. Blight and David A. Welch of Harvard University 's John F. Kennedy School of Government say that ` ` if this order had held , war between the superpowers would probably have commenced at sea , shortly after 10 o 'clock on Wednesday morning , Oct. 24 , 1962 , several hundred miles off the coast of Cuba. ' '

 The table looks like

 | Person | Employer |
 |--------|----------|
 |W. Dale Nelson|The Associated Press|
 |James G. Blight|Harvard University|
 |David A. Welch|Harvard University|

Presenter Notes

IE Prompt (cont.)

 Now complete the table given the following sentences:

 * ` ` If it does not snow , and a lot , within this month we 
 will have no water to submerge 150 , 000 hectares ( 370 , 500 acres ) 
 of rice , ' ' said Bruno Pusterla , a top official of the Italian 
 Agricultural Confederation . 
 * Treiki disclosed the planned missile tests during Security Council 
 debate Tuesday night after the U.S. 6th Fleet informed Libyan air 
 traffic controllers so they could divert civilian air traffic .
 * Boston University 's Michael D. Papagiannis said he believes the 
 crater was created 100 million years ago when a 50-mile-wide meteorite 
 slammed into the Earth .

Presenter Notes

IE Output

  | Person | Employer |
  |--------|----------|
  |W. Dale Nelson|The Associated Press|
  |James G. Blight|Harvard University|
  |David A. Welch|Harvard University|
  |Bruno Pusterla|Italian Agricultural Confederation|
  |Treiki|Unknown|
  |Michael D. Papagiannis|Boston University|

Stable over three runs.

Presenter Notes

Sentiment Analysis prompt

  Given this review, say whether the person liked or hated the movie.

  Bromwell High is a cartoon comedy. It ran at the same time as some 
  other programs about school life, such as "Teachers". My 35 years in 
  the teaching profession lead me to believe that Bromwell High's satire 
  is much closer to reality than is "Teachers". The scramble to survive 
  financially, the insightful students who can see right through their 
  ...

Tried one positive, one negative, was correct and stayed stable for three prompts over ChatGPT

Presenter Notes

Section 4: Using open source LLMs

Presenter Notes

Why Open Source LLMs

  • Less cost
  • More control
    • Fine-tuning weights
    • Fine-tuning prompt
    • Training a classifier on top of results
  • Privacy
  • The right thing to do

Presenter Notes

Issues with Open Souce models

  • Windows size
  • Dealing with numbers
  • Poor programming capabilities

  • But they are getting so much better Google fears them.

Presenter Notes

Some models

  • EuletherAI GPT NeoX (48G GPU)
  • OpenChatKit (instructional training)
    • OIG by LATION of Stable Diffusion fame
  • EuletherAI Pythia models
  • BLOOM models
    • Not open source (use restrictions): https://huggingface.co/spaces/bigscience/license
  • Dolly models, by databricks

Presenter Notes

Conclusions

  • This is a great time to enter NLP
    • LLMs are offering a much leveled field
  • OpenAI API now, Open Source next week
    • LLMs running locally in your cellphone
  • Not all NLP tasks need to be solved by LLMs

Presenter Notes