大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks_第1页
大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks_第2页
大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks_第3页
大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks_第4页
大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks_第5页
已阅读5页,还剩247页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1

OntheUseofLargeLanguageModels

forTableTasks

-Introduction

HaochenZhangChuanXiaoYuyangDong

Tabulardataiseverywhere

(a)Relationaldatabases

(b)Richdocuments,PDF

(c)webpages(d)spreadsheet2

Growingresearchfocus

•GrowquicklyinDB,AIandNLPcommunities

resultfrom[4]

•Recenttutorials

•[1]Webtableextraction,retrievalandaugmentation,SIGIR19

•[2]FromTablestoKnowledge:RecentAdvancesinTableUnderstanding,KDD21

•[3]TransformersforTabularDataRepresentation:AtutorialonModelsandApplicationsVLDB22,SIGMOD23

•[4]LargeLanguageModelsforTabularData:ProgressesandFutureDirections,SIGIR24

•A-Paper-List-of-Awesome-Tabular-LLMs,

/SpursGoZmy/Awesome-Tabular-LLMs

Tabletasks&benchmarks

•TableInterpretation

•EntityLinking

•ColumnTypeAnnotation

•RelationExtraction

•Tabledetection

“Preparetables”

Tablereprocessing

•Tablematching

•Entitymatching

•Schemamatching

•Tablecleaning

•Errordetection

•Dataimputation

•Tableaugmentation

•Rowpopulation

•Schemaaugmentation

•Tablesearch

•Tabletransformation

Tableunderstanding

“Understandtables”“Getanswerfromtables”

Tableanalysis

•TableQA

•Tablefactverification

•Table-to-text

•Text-to-SQL

4

5

Table

preprocessing

“Matchingtworows”

Schemamatching

“Matchingtwocolumns”

Tablematching

matching

Entity

AAA

BBB

CCC

EEE

AAA’

id

name

rev

1

IBMCorp

$57B

2

AppleInc

$366B

3

GE

$74B

DDD

id

name

loc

#of

employee

1

Apple

CA

154,000

2

IBM

NY

282,000

•Datasetandbenchmark

1.CanFoundationModelsWrangleYourData?[VLDB23].

/abs/2205.09911

2.Jellyfish:Instruction-TuningLocalLargeLanguageModelsforDataPreprocessing[EMNLP24]

/abs/2312.01678

ID

Name

DateofBirth

Prefecture

PostalCode

Height

1

Yuka

2003/02/26

Hokkaido

540-8570

165

2

Nana

Aichi

464-0804

157

3

Miho

2001/06/25

Kangawa

2208799

1.60

Table

preprocessing

“Detecttheerrorcells”

Errordetection

inconsistency

missing

typoformatoutlier

“Imputevaluesintocells”

Dataimputation

ID

Name

DateofBirth

Prefecture

PostalCode

Height

1

Yuka

2003/02/26

Osaka

540-8570

165

2

Nana

2003/03/30

Aichi

464-0804

157

3

Miho

2001/06/25

Kanagawa

220-8799

160

Tablecleaning

•Datasetandbenchmark

1.CanFoundationModelsWrangleYourData?[VLDB23].

/abs/2205.09911

6

2.Jellyfish:Instruction-TuningLocalLargeLanguageModelsforDataPreprocessing[EMNLP24]

/abs/2312.01678

7

Table

preprocessing

Tableaugmentation

“Addcolumns/rowstotable”

Column

population

id

name

loc

#of

employee

1

Apple

CA

154,000

2

IBM

NY

282,000

Rowpopulation

•Datasetandbenchmark

1.TURL:TableUnderstandingthroughRepresentationLearning.[VLDB20]

/abs/2006.14806

2.TableLlama:TowardsOpenLargeGeneralistModelsforTables.[NAACL24].

https://osu-nlp-group.github.io/TableLlama/

Tablesearch

Table

preprocessing

“RetrievetableswithanNLquery”

“Tablescontains

informationofAppleInc.”

Search

Tables,

Datalakes,Documents

Tablecollection

id

name

loc

#of

employee

1

Apple

CA

154,000

2

IBM

NY

282,000

•Datasetandbenchmark

1.Open-DomainTableRetrievalforNaturalQuestions.

/zorazrw/nqt-retrieval

2.Open-WikiTable:DatasetforOpenDomainQuestionAnsweringwithComplexReasoningoverTable[EMNLP23]

/sean0042/Open_WikiTable

8

9

Table

preprocessing

Tabletransformation

“Manipulatetableintowantedstyles”

•Datasetandbenchmark

1.SpreadsheetBench:TowardsChallengingRealWorldSpreadsheetManipulation[NeurIPS24]

https://spreadsheetbench.github.io/

Table

understanding

TableInterpretation“classifycolumnsinto

Companydefinedtypes”

1

2

loc

#of

employee

CA

154,000

NY

282,000

id

/wiki/Apple_Inc

Entitylinking

“Extractandpredictrelation

betweentwocolumns”

RelationExtraction

“Matchentitytoknowledgebase”

!Columntypeannotation

name

AppleIBM

anization.headquarters-location

•Datasetandbenchmark

1.TURL:TableUnderstandingthroughRepresentationLearning.[VLDB20]

/abs/2006.14806

2.TableLlama:TowardsOpenLargeGeneralistModelsforTables.[NAACL24].

https://osu-nlp-group.github.io/TableLlama/

3.ColumnTypeAnnotationusingChatGPT[arixv24]

/abs/2306.0071405

Tabledetection

“detecttableregion,structureandcontent”

•Datasetandbenchmark

1.PubTables-1M:Towardscomprehensivetableextractionfromunstructured.[CVPR22]

/microsoft/table-transformer

2.TableFormer:TableStructureUnderstandingwithTransformers.[CVPR22].

/IBM/TabFormer

3.DoclingTechnicalReport.[arxiv24]

/DS4SD/docling-ibm-models

Table

understanding

11

TableQA

Tableanalysis

“Question-answeringontabularcontents”

id

name

loc

#of

employee

1

Apple

CA

154,000

2

IBM

NY

282,000

Question1:WhereisthelocationofIBM?

Answer1:NewYork

Question2:WhatisthesumofemployeeinAppleandIBM?

Answer2:436,000

•Datasetandbenchmark

1.TableLlama:TowardsOpenLargeGeneralistModelsforTables.[NAACL24].

https://osu-nlp-

group.github.io/TableLlama/

12

2.https://huggingface.co/datasets/SpursgoZmy/IFT-Data-For-Tabular-Tasks

Tableanalysis

“Verifyagivensentencesaccordingtotable”

Tablefactverification

id

name

loc

#of

employee

1

Apple

CA

154,000

2

IBM

NY

282,000

Entailed

“IBMandAppleareU.S.companies.”

Refuted

“ApplehasmoreemployeethanIBM.”

•Datasetandbenchmark

1.TabFact:ALarge-scaleDatasetforTable-basedFactVerification.[ICLR20]

https://tabfact.github.io/

13

2.FEVEROUS:FactExtractionandVERificationOverUnstructuredandStructuredinformation.[ACL21workshop].

https://fever.ai/dataset/feverous.html

14

Table-to-text

Tableanalysis

“Generatedescriptionoftable”

“Thetablepresentsinformationabouttwomajor

id

name

loc

#of

employee

1

Apple

CA

154,000

2

IBM

NY

282,000

companies,AppleandIBM,alongwiththeirlocationsandemployeecounts.Apple,headquarteredinCalifornia(CA),employs154,000people.Ontheotherhand,IBM,based

inNewYork(NY),hasasignificantlylargerworkforce,with282,000employees.”

•Datasetandbenchmark

1.NeuralTextGenerationfromStructuredDatawithApplicationtotheBiographyDomain[EMNLP16]

/DavidGrangier/wikipedia-biography-dataset

2.ToTTo:AControlledTable-To-TextGenerationDataset[EMNLP20]

https://huggingface.co/datasets/google-research-datasets/totto

3.Table-to-text:Describingtableregionwithnaturallanguage.[AAAI18]

/msra-

nlc/Table2Text

15

Text-to-SQL

Tableanalysis

Text:HowmanyemployeesinApple?

SELECT`employee_num`FROMtable_name

WHEREname='Apple';

“ConvertnaturallanguagetoSQLquery”

id

name

loc

employee_num

1

Apple

CA

154,000

2

IBM

NY

282,000

•Datasetandbenchmark

1.Seq2SQL:GeneratingStructuredQueriesfromNaturalLanguageusingReinforcementLearning.[ICLR18]

/salesforce/WikiSQL

2.Spider[EMNLP18]

https://yale-lily.github.io/spider

MethodsbeforeLLM

•Rule,ML,NN–based->skip

•Transformer-based(2018-)

•Encoder

•Encoder-Decoder

“GPT”

decoder

“BERT”

encoder

“T5”

16

Transformer(encoder-decoder)

17

MotivationofEncoderfortables

•Pretrain-and-finetune(“BERT-way”)

•Learninggoodtablerepresentation(embedding)withtablepretrainingtasks

•Finetuneondownstreamtasks

Largeunlabelleddata

Pre-training(Encoder)

Smalllabelled

downstreamtaskdata

task1

task2

Fine-tuning

(Additional

layer)

Fine-tuning

(Additional

layer)

Table-onlyPretraining

•Pretrainwithtablecontents

TURL

•TURL[VLDB20]

•MaskedLanguageModel(MLM)

•MaskedEntityRecovery

•TABBIE[NAACL21]

•Detectcorruptedcells

•TUTA[KDD21]

•MLM

•Cellfilling

TABBIE

•Contextselection

TURL:

/pdf/2006.14806

TABBIE:

/abs/2105.02584

TUTA:

/pdf/2010.12537

Table-and-queryPretraining

•Pretrainwithtablecontents&query

•TAPAS[ACL20]

•Queryandwholetable

•Aggregationprediction

•Cellselectiontask

TAPAS

•TaBERT[NAACL21]

•Queryandrelatedrows

•MaskedLanguageModel

TAPAS:

/abs/2004.02349

TaBERT:

/abs/2005.08314

19

TaBERT

20

MotivationofEncoder-decoderfortables

•Flexibleinputandoutput

•Tabletotext

•Texttosql

•Tablesummarize

•Tabletomarkdown,html

•Mulitmodalability

•Imageencoder->textdecoder:tableOCRtask,tableVQA

•Generalizedandgoodgenerationability

Image

Text

HTML

Markdown

Encoder

Decoder

ImageTextHTML

Markdown

Text-to-textEncoderDecoder

•GeneralizedandGoodgenerativeabilitybyfine-tuningonpretrainedencoder

decodermodel

•UnifiedSKG[EMNLP22]

•FinetuneT5

UnifiedSKG

•TaPEx[ICLR22]

•FinetuneBART

UnifiedSKG:

/abs/2201.05966

TaPEx:

/abs/2107.07653

TaPEx

21

VisionEncoderDecoder

•Multimodality

•Tableformer[CVPR22]:Tabledetection&OCR

Tableformer

•Boundingboxdetection,structuregeneration

•TATR[CVPR22]

•Tabledetection

•Basedonobjectdetectiontransformer(DETR)

Tableformer:

/abs/2203.01017

TATR:

/microsoft/table-transformer

TATR(DETR)

22

LLM(decodersfortable)

•AutoregressiveGeneration

•Goodforgenerativetask

•Easyforself-supervisedtraining

•Simplearchitecture

•Efficientontraininglarge-scaledata

•Easytoscalethemodel

“BERT”

encoder

“GPT”

“T5”

23

-

Transformer(encoderdecoder)

decoder

MotivationofLLMfortabletask

•Goodpoints

•Largeknowledge

•Incontextlearning/zeroshotability

•Generationtask,interactive

•Limitation

•Slow

•Cost

•Notgoodperformancefor

•Mathematicaltask

•Largetable

25

Prompt

Fine-tuning

VLM

UseofLLMintabletasks

RAG

LLMagents

26

OntheUseofLargeLanguageModels

forTableTasks

-Prompting

PromptEngineering

BadPromptGoodPrompt

I'mhungry.

I'manadultman

weighing200poundswithadailycaloric

requirementof2500.Ineedtohitmyproteingoalforthedaywhichis120grams.I'm

hungryforsomethinghotandalittlesalty.

Zero-shot

•InstructtheLLMtofollowprompts.

Few-shot

•Demonstratewithoneormoreexamples.

Scrambleandcook4eggs.

Chain-of-thought

•Elicitcomplexreasoningthrougha

seriesofintermediatereasoningsteps.

•Variants:tree-of-thought,graph-of-thought,etc.

Certainly,here'sawholebunchof

uselesstextbecauseI

don'tknowwhatyouwantbecauseIcan'treadyourmind

becauseallIdois

generatetextfrom

prompts.Ifyouare

unsatisfiedwithmy

responseplease

provideamoreusefulprompt.

27

source:

/the-augmented-engineer/new-to-ai-prompt

-writing-learn-how-to-get-summaries-that-cover-all-the-key-points-you-care-about-46615fab135b

Zero-ShotPrompting

•Instructthemodeltofollowspecificprompts.

Iwenttothemarketandbought10apples.Igavetwotomy

neighborandtwototherepairman.ThenIbought5applesandateone.Howmanyapplesareleft?

Let'sthinkaboutthisstepbystep.

First,Ibought10apples.

1.Igavetwoapplestomyneighbor;10-2=8applesleft.

2.Igavetwoapplestotherepairman;8-2=6applesleft.

3.ThenIbought5apples.6+5=11applesleft.

4.Oneappleeaten;11-1=10applesleft.Asaresult,thereare10applesleft.

Kojimaetal.Largelanguagemodelsarezero-shotreasoners.NeurIPS2022.

28

source:

/Jazuma/items/9de7c56fd9fa8b9c1648

Few-ShotPrompting

•Demonstratetask-solvingwithexamplesprovidedin

Fantastic!//PositiveTerrible!//Negative

Themoviewasamazing!//PositiveWhatahorribleshow!//

prompts.

Negative

source:

/Jazuma/items/9de7c56fd9fa8b9c1648

29

Brownetal.Languagemodelsarefew-shotlearners.NeurIPS2020.

Chain-of-Thought

•Elicitcomplexreasoningbyprovidinginferenceprocesses.

Theoddnumbersinthisgroupadduptoanevennumber.:4,8,

9,15,12,2,1.

A:Addingalloddnumbersgives9+15+1=25.Theansweris

False.

Theoddnumbersinthisgroupadduptoanevennumber.:15,32,5,13,82,7,1.

A:

Addingalltheoddnumbersgives15+5+13+7+1=41.TheanswerisFalse.

source:

/Jazuma/items/9de7c56fd9fa8b9c1648

Weietal.Chainofthoughtpromptingelicitsreasoninginlargelanguagemodels.NeurIPS2022.Wangetal.Self-consistencyimproveschainofthoughtreasoninginlanguagemodels.ICLR2023.

30

31

IssuesofPromptingforTableTasks

Task

Variety

(TV)

............

Data

Format

(DF)

............

PDF

............

Data

Volume

(DV)

Task

Complexity

(TC)

32

PromptingTechniquesforTableTasks

☞TaskComplexity

☞DataFormat

☞DataVolume

☞TaskVariety

Task

decomposition

Instanceprompting

Tableencoding

Zero-shotprompting

Table

decomposition

Table

reconstruction

Batch

prompting

Few-shotprompting

Prefixcaching

Chain-of-thought

33

☞TaskVariety

Invariant

Zero-shot

TablePreprocessing

TableUnderstanding

Few-shot

TableAnalysis

Variant

Chain-of-thought(and

variants)

☞TaskVariety:Zero-ShotPrompting

name

city

addr

phone

Type

Langer's

?

704S.Alvarado

St.

213-483-

8050

delis

TaskType

DataImputation

TaskDescription

Youareadatabaseengineer.

Youarerequestedtoinferthevalueofthe"city"attributebasedonthevaluesofotherattributes.

pt

DataInstance

[name:"langer's",addr:"704s.alvaradost.",phone:"213-483-8050",type:

"delis"]

prom

Answer

Thecityis"LosAngeles".

Zhangetal.Largelanguagemodelsasdatapreprocessors.TaDA2024.

Table

Preprocessing

34

☞TaskVariety:Few-ShotPrompting

TaskType

DataImputation

TaskDescription

Youareadatabaseengineer.

Youarerequestedtoinferthevalueofthe"city"attributebasedonthevaluesofotherattributes.

pt

DataInstance

[name:"langer's",addr:"704s.alvaradost.",phone:"213-483-8050",type:"delis"]

Few-shotExamples

Someexamplesaregivenbelow.

```

User:

Question1:Recordis[name:"carey'scorner",addr:"1215powersferryrd.",phone:"770-933-0909",type:"hamburgers"].Whatisthecity?

Assistant:

Answer1:Marietta

…prom

```

Answer

LosAngeles

Zhangetal.Largelanguagemodelsasdatapreprocessors.TaDA2024.

Table

Preprocessing

35

TaskTypeDataImputation

TheaddressandphonenumberpointtoalocationinLosAngeles,CA,

knownforitsdis

tinctareacode(213)and

localbusinesses.

Answer

LosAngeles

☞TaskVariety:Chain-of-Thought

TaskDescription

MUSTanswereachquestionintwolines.Inthefirstline,yougivethe

reasonfortheinference.Inthesecondline,youONLYgivethevalueofthe"city"attribute.

pt

DataInstance

[name:"langer's",addr:"704s.alvaradost.",phone:"213-483-8050",type:"delis"]

Few-shotExamples

Answer1:Thephoneareacode770correspondstotheAtlanta

metropolitanareainGeorgia,and"1215PowersFerryRd."isanaddresslocatedinMarietta,Georgia;therefore,thecityisMarietta.

Mariettaprom

Someexamplesaregivenbelow.

```

User:

Question1:Recordis[name:"carey'scorner",addr:"1215powersferryrd.",phone:"770-933-0909",type:"hamburgers"].Whatisthecity?

Assistant:

36

Zhangetal.Largelanguagemodelsasdatapreprocessors.TaDA2024.

Table

Preprocessing

☞TaskVariety:

Chain-of-Thought(+Self-ConsistencyDecoding)

TaskTypeDataImputation

Answer

Thenamereferstoa

restaurantinthecityofSantaMonica,CA.The

phonenumberalsocorrespondsto

California.

SantaMonica

Answer

Thephonenumber

correspondtoalocationinLosAngeles,

California,recognizedforitsunique213areacode.

LosAngeles

Weightedsum/majority

voting

•Generatemultiplereasoninganswersandaggregatethem.

TaskDescription

pt

DataInstance

[name:"langer's",addr:"704s.alvaradost.",phone:"213-483-8050",type:"delis"]

Few-shotExamples

…prom

Answer

Theaddressandphone

numberpointtoa

locationinLosAngeles,CA,knownforitsdistinctareacode(213)andlocalbusinesses.

LosAngeles

Answer

LosAngeles

Chen.Largelanguagemodelsarefew(1)-shottablereasoners.EACL2023.

TATable

Chenetal.Programofthoughtsprompting:Disentanglingcomputationfrom

QA

37

reasoningfornumericalreasoningtasks.TMLR2023.

38

•自Task

•Table

☞TaskComplexity

•Taskdecomposition

•Tabledecomposition

Strategies

Sourcesofcomplexity

☞TaskComplexity:TaskDecomposition

Beforedecomposition

?

?

?

?

FriendsPizza

2525

CashVisaMasterCard

7:30AM

TaskType

ColumnTypeAnnotation

TaskDescription

Classifythecolumnsofagiventablewithonlyoneofthefollowingclassesthatareseparatedwithcomma:

descriptionofevent,descriptionofrestaurant,postalcode,regionofaddress…

DataInstance

Column1||Column2||Column3||Column4¥n

FriendsPizza||2525||CashVisaMasterCard||7:30AM¥n

Answer

name,number,payment,time

KoriniandBizer.ColumntypeannotationusingChatGPT.TaDA2023.

TUTableInterpretation

Zhaoetal.Largelanguagemodelsarecomplextableparsers.EMNLP2023.

TATableQA

TPTableTransformation

TATableQA

39

Wangetal.Chain-of-table:Evolvingtablesinthereasoningchainfortableunderstanding.ICLR2024.

TUTableInterpretation

Dongetal.OpenTE:Open-structuretableextractionfromtext.ICCASP2024.

Sub-Task1

Sub-Task2

Yourtaskistoclassifyifatable

describesRestaurants,Events,MusicRecordings,orHotels.

Sub-TaskType

TableClassification

Task

Description

Data

Instance

Column1||Column2||Column3||Column4¥n

FriendsPizza||2525||CashVisaMasterCard||7:30AM¥n

Answer

Restaurant

KoriniandBizer.ColumntypeannotationusingChatGPT.TaDA2023.

Zhaoetal.Largelanguagemodelsarecomplextableparsers.EMNLP2023.

Wangetal.Chain-of-table:Evolvingtablesinthereasoningchainfortableunderstanding.ICLR2024.

Dongetal.OpenTE:Open-structuretableextractionfromtext.ICCASP2024.

TATableQA

40

☞TaskComplexity:TaskDecomposition

Afterdecomposition

Sub-TaskType

ColumnClassification

Task

Description

Yourtaskistoclassifythecolumnsofagiventablewithonlyoneofthe

followingclassesthatareseparatedwithcomma:nameofrestaurant,

descriptionofrestaurant…

Data

Instance

Column1||Column2||Column3||Column4¥n

FriendsPizza||2525||CashVisa

Answer

MasterCard||7:30AM¥

nameofrestaurant,postalcode,

paymentaccepted,time

TUTableInterpretation

TATableQA

TPTableTransformation

TUTableInterpretation

☞TaskComplexity:TableDecomposition

TabIe:FigureskatingattheAsianwinterGames

Rank

Nation

Gold

silver

Bronze

Total

1

China

13

9

13

35

2

Japan

7

10

7

24

3

uzbekistan

1

2

3

6

4

kazakhstan

2

2

0

4

5

Northkorea

1

0

1

2

6

southkorea

0

0

2

2

TotaI

24

23

26

73

Q:whoreceivedmorebronzemedaIs:japanorsouthkorea?

A:Japan

TabIetitIe:FigureskatingattheAsianwinterGames

CoIumns:['rank','nation','goId','siIver','bronze','totaI']

Q:whoreceivedmorebronzemedaIs:japanorsouthkorea?

sub-tabIe

Nation

Bronze

Japan

7

southkorea

2

sub-tabIeseIection

(LLM)

Execute

sQL:seIectnation,bronzefromT

wherenation='japan'ornation='southkorea'

(1)subtabIeseIection

sub-tabIe

TabIetitIe:FigureskatingattheAsianwinterGames

Nation

Bronze

Japan

7

southkorea

2

seIectnation,bronzefromT

wherenation='japan'ornation='southkorea'

Q:whoreceivedmorebronzemedaIs:japanorsouthkorea?

AnswerGeneration

(LLM)

Response:BasedonthetabIe,Japanreceived7bronzemedaIsandsouthkoreareceived2bronzemedaIs.Therefore,Japanreceived

morebronzemedaIsthansouthkorea.

Answer:Japan

(2)ReasoningandAnswerGeneration

TATableQA

TAText-to-SQL

NahidandRafiei.TabSQLify:EnhancingreasoningcapabilitiesofLLMsthroughtabledecomposition.NAACL2024.

TATableQA

Patnaiketal.Cabinet:Contentrelevancebasednoisereductionfortablequestion

answering.ICLR2024.

TATableQA

TAText-to-SQL

Jiangetal.StructGPT:Ageneralframeworkforlargelanguagemodeltoreasonoverstructureddata.EMNLP2023.

☞TaskComplexity:

Text

transformation

Question

ReportthenumberofwinsinGrandSlam

tournaments.

tournament

attn

career_w/l

AustralianOpen

18

RolandGarros

14

Wimbledon

18

USOpen

13

IndianWells

15

tournament

attn

AustralianOpen

18

RolandGarros

14

Wimbledon

18

USOpen

13

IndianWells

15

career_w/l

22-18

11-14

13-18

16-13

20-15

n_win

22

11

13

16

…………

…………

…………

13-18

11-14

22-18

16-13

20-15

tournament

attn

career_w/l

AustralianOpen

18

22-18

RolandGarros

14

11-14

Wimbledon

18

13-18

USOpen

13

16-13

IndianWells

15

20-15

TableDecomposition(ProgressivePrompting)

Focusoncolumnselection.

Incorporatebothcolumnandrowselection.

Applyadditionaloperations(e.g.,aggregationfunctionsandtext

operations).

Setthegroundworkfor

understandinghowtofetch

specificdatafromadatabase.

Extractparticularcolumnsandfilteringrowsbasedonspecifiedcriteria,enhancingprecisionindatagathering.

Aggregationfunctionsempowerdatasummarization.

Textoperationsfacilitatethe

manipulationandtransformationofstringdata.

42

reasoners.ICLR2024.

Kongetal.OpenTab:Advancinglargelanguagemodelsasopen-domaintable

TAText-to-SQL

43

☞DataFormat:TableEncoding

text

(serialized)

spreadsheet

markup

key-value

program

image

embedded

easyhard

☞DataFormat:TableReconstruction

Zhaoetal.Largelanguagemodelsarecomplextableparsers.EMNLP2023.

TATableQA

44

☞DataVolume:InstancePrompting

name

city

addr

phone

Type

Langer's

?

704S.AlvaradoSt.

213-483-8050

delis

Valetino

?

3115PicoBlvd.

310-829-4313

Italian

CafeBizou

?

14016VenturaBlvd.

818/788-3536

French

Youareadatabaseengineer.

Youarerequestedtoinferthevalueofthe"city"

attributebasedonthe

valuesofotherattributes.

[name:"langer's",addr:

"704s.alvaradost.",phone:"213-483-8050",type:

"delis"]

Task

Description

Data

Instance

TaskType

DataImputation

TaskType

DataImputation

TaskType

DataImputation

prompt

prompt

Youareadatabaseengineer.

Youarerequestedtoinferthevalueofthe"city"

attributebasedonthe

valuesofotherattributes.

[name:"cafebizou",addr:"14016venturablvd.",

phone:"818/788-3536",

Task

Description

Data

Instance

type:"french"]prompt

Youareadatabaseengineer.

Youarerequestedtoinferthevalueofthe"city"

attributebasedonthe

valuesofotherattributes.

[name:"valentino",addr:"3115picoblvd.",phone:"310-829-4313",type:

"italian"]

Task

Description

Data

Instance

Zhangetal.Large

languagemodelsasdata

TablePreprocessing

preprocessors.TaDA2024.

standardprompting

#K-shotin-conte×te×empars

Q:{question}

A:{answer}

Q:{question}

A:{answer}

#onesampetoinference

Q:A1ihad$21.Lei1agavehimha1fofher$1.HowmuchdoesA1ihavenow?

#Response

A:Lei1agave1/2=5toA1i.A1inowhas $21+$5=$71.Theansweris71.

Batchprompting

#K-shotin-conte×te×emparsinK/bbatches

Q[1]:{question}Q[2]:{question}A[1]:{answer}

A[2]:{answer}

#bsampesinabatchtoinference

Q[1]:A1ihad$21.Lei1agavehimha1fofher

$1.HowmuchdoesA1ihavenow?

Q[2]:Arobetakes2bo1tsofb1uefiberandha1fthatwhitefiber.Howmanybo1ts?

b(=2)samplesinonebatch

#Responsestoabatch

A[1]:Lei1agave1/2=5toA1i.A1inowhas

$21+$5=$71.Theansweris71.

A[2]:Ittakes2/2=1bo1tofwhitefiber.The tota1amountis2+1=3.Theansweris3.

source:Chengetal.Batchprompting:EfficientinferencewithlargelanguagemodelAPIs.EMNLP2023.

☞DataVolume:BatchPrompting

name

city

addr

phone

Type

Langer's

?

704S.AlvaradoSt.

213-483-8050

delis

Valetino

?

3115PicoBlvd.

310-829-4313

Italian

CafeBizou

?

14016VenturaBlvd.

818/788-3536

French

TaskType

DataImputation

Task

Description

Youareadatabaseengineer.

Youarerequestedtoinferthevalueofthe"city"attributebasedonthevaluesofotherattributes.

Data

Instance

[name:"langer's",addr:"704s.alvaradost.",phone:"213-483-8050",type:"delis"]

[name:"valentino",addr:"3115picoblvd.",phone:"310-829-4313",type:"italian"]

[name:"cafebizou",addr:"14016venturablvd.",phone:"818/788-3536",type:"french"]prompt

Table46

Preprocessing

Zhangetal.Largelanguagemodelsasdatapreprocessors.TaDA2024.

☞DataVolume:

PrefixCaching(+InstancePrompting)

Youarerequestedtoinferthevalueofthe"city"

attributebasedonthe

valuesofotherattributes.

Youarerequestedtoinferthevalueofthe"city"

attributebasedonthe

valuesofotherattributes.

Youareadatabase

engineer.

[name:"cafebizou",addr:

"14016venturablvd.",

phone:"818/788-35o",mpt

"

Youarerequestedtoinferthevalueofthe"city"

attributebasedonthe

valuesofotherattributes.

Youareadatabase

engineer.

[name:"langer's",addr:

"704s.alvaradost.",phone:"213-483-8050",typ:rompt

Task

Description

Data

Instance

"delis"]

Task

Description

Data

Instance

type:french"]

•UseAutomaticPrefixCaching(APC)inthevLLMlibrary.

Task

Description

Data

Instance

Youareadatabase

engineer.

[name:"valentino",addr:

"3115picoblvd.",phone:

"310-829-4313",typ:rompt

"

italian"]

KVcache:

cachekeyvectorstospeedupQKV

attentionin

Transformers

Shareprefix

ShareKV

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论