大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks

上传人：策*** IP属地：山西上传时间：2024-11-02 格式：DOCX 页数：252 大小：8.30MB 积分：19.9 举报 版权申诉

大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks_第2页

大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks_第3页

大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks_第4页

大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks_第5页

已阅读5页，还剩247页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

OntheUseofLargeLanguageModels

forTableTasks

-Introduction

HaochenZhangChuanXiaoYuyangDong

Tabulardataiseverywhere

(a)Relationaldatabases

(b)Richdocuments,PDF

(c)webpages(d)spreadsheet2

Growingresearchfocus

•GrowquicklyinDB,AIandNLPcommunities

resultfrom[4]

•Recenttutorials

•[1]Webtableextraction,retrievalandaugmentation,SIGIR19

•[2]FromTablestoKnowledge:RecentAdvancesinTableUnderstanding,KDD21

•[3]TransformersforTabularDataRepresentation:AtutorialonModelsandApplicationsVLDB22,SIGMOD23

•[4]LargeLanguageModelsforTabularData:ProgressesandFutureDirections,SIGIR24

•A-Paper-List-of-Awesome-Tabular-LLMs，

/SpursGoZmy/Awesome-Tabular-LLMs

Tabletasks&benchmarks

•TableInterpretation

•EntityLinking

•ColumnTypeAnnotation

•RelationExtraction

•Tabledetection

“Preparetables”

Tablereprocessing

•Tablematching

•Entitymatching

•Schemamatching

•Tablecleaning

•Errordetection

•Dataimputation

•Tableaugmentation

•Rowpopulation

•Schemaaugmentation

•Tablesearch

•Tabletransformation

Tableunderstanding

“Understandtables”“Getanswerfromtables”

Tableanalysis

•TableQA

•Tablefactverification

•Table-to-text

•Text-to-SQL

Table

preprocessing

“Matchingtworows”

Schemamatching

“Matchingtwocolumns”

Tablematching

matching

Entity

AAA

BBB

CCC

EEE

AAA’

name

rev

IBMCorp

$57B

AppleInc

$366B

$74B

DDD

name

loc

#of

employee

Apple

154,000

IBM

282,000

•Datasetandbenchmark

1.CanFoundationModelsWrangleYourData?[VLDB23].

/abs/2205.09911

2.Jellyfish:Instruction-TuningLocalLargeLanguageModelsforDataPreprocessing[EMNLP24]

/abs/2312.01678

Name

DateofBirth

Prefecture

PostalCode

Height

Yuka

2003/02/26

Hokkaido

540-8570

165

Nana

Aichi

464-0804

157

Miho

2001/06/25

Kangawa

2208799

1.60

Table

preprocessing

“Detecttheerrorcells”

Errordetection

inconsistency

missing

typoformatoutlier

“Imputevaluesintocells”

Dataimputation

Name

DateofBirth

Prefecture

PostalCode

Height

Yuka

2003/02/26

Osaka

540-8570

165

Nana

2003/03/30

Aichi

464-0804

157

Miho

2001/06/25

Kanagawa

220-8799

160

Tablecleaning

•Datasetandbenchmark

1.CanFoundationModelsWrangleYourData?[VLDB23].

/abs/2205.09911

2.Jellyfish:Instruction-TuningLocalLargeLanguageModelsforDataPreprocessing[EMNLP24]

/abs/2312.01678

Table

preprocessing

Tableaugmentation

“Addcolumns/rowstotable”

Column

population

name

loc

#of

employee

Apple

154,000

IBM

282,000

Rowpopulation

•Datasetandbenchmark

1.TURL:TableUnderstandingthroughRepresentationLearning.[VLDB20]

/abs/2006.14806

2.TableLlama:TowardsOpenLargeGeneralistModelsforTables.[NAACL24].

https://osu-nlp-group.github.io/TableLlama/

Tablesearch

Table

preprocessing

“RetrievetableswithanNLquery”

“Tablescontains

informationofAppleInc.”

Tables,

Datalakes,Documents

Tablecollection

name

loc

#of

employee

Apple

154,000

IBM

282,000

•Datasetandbenchmark

1.Open-DomainTableRetrievalforNaturalQuestions.

/zorazrw/nqt-retrieval

2.Open-WikiTable:DatasetforOpenDomainQuestionAnsweringwithComplexReasoningoverTable[EMNLP23]

/sean0042/Open_WikiTable

Table

preprocessing

Tabletransformation

“Manipulatetableintowantedstyles”

•Datasetandbenchmark

1.SpreadsheetBench:TowardsChallengingRealWorldSpreadsheetManipulation[NeurIPS24]

https://spreadsheetbench.github.io/

Table

understanding

TableInterpretation“classifycolumnsinto

Companydefinedtypes”

loc

#of

employee

154,000

282,000

/wiki/Apple_Inc

Entitylinking

“Extractandpredictrelation

betweentwocolumns”

RelationExtraction

“Matchentitytoknowledgebase”

!Columntypeannotation

name

AppleIBM

anization.headquarters-location

•Datasetandbenchmark

1.TURL:TableUnderstandingthroughRepresentationLearning.[VLDB20]

/abs/2006.14806

2.TableLlama:TowardsOpenLargeGeneralistModelsforTables.[NAACL24].

https://osu-nlp-group.github.io/TableLlama/

3.ColumnTypeAnnotationusingChatGPT[arixv24]

/abs/2306.0071405

Tabledetection

“detecttableregion,structureandcontent”

•Datasetandbenchmark

1.PubTables-1M:Towardscomprehensivetableextractionfromunstructured.[CVPR22]

/microsoft/table-transformer

2.TableFormer:TableStructureUnderstandingwithTransformers.[CVPR22].

/IBM/TabFormer

3.DoclingTechnicalReport.[arxiv24]

/DS4SD/docling-ibm-models

Table

understanding

TableQA

Tableanalysis

“Question-answeringontabularcontents”

name

loc

#of

employee

Apple

154,000

IBM

282,000

Question1:WhereisthelocationofIBM?

Answer1:NewYork

Question2:WhatisthesumofemployeeinAppleandIBM?

Answer2:436,000

•Datasetandbenchmark

1.TableLlama:TowardsOpenLargeGeneralistModelsforTables.[NAACL24].

https://osu-nlp-

group.github.io/TableLlama/

2.https://huggingface.co/datasets/SpursgoZmy/IFT-Data-For-Tabular-Tasks

Tableanalysis

“Verifyagivensentencesaccordingtotable”

Tablefactverification

name

loc

#of

employee

Apple

154,000

IBM

282,000

Entailed

“IBMandAppleareU.S.companies.”

Refuted

“ApplehasmoreemployeethanIBM.”

•Datasetandbenchmark

1.TabFact:ALarge-scaleDatasetforTable-basedFactVerification.[ICLR20]

https://tabfact.github.io/

2.FEVEROUS:FactExtractionandVERificationOverUnstructuredandStructuredinformation.[ACL21workshop].

https://fever.ai/dataset/feverous.html

Table-to-text

Tableanalysis

“Generatedescriptionoftable”

“Thetablepresentsinformationabouttwomajor

name

loc

#of

employee

Apple

154,000

IBM

282,000

companies,AppleandIBM,alongwiththeirlocationsandemployeecounts.Apple,headquarteredinCalifornia(CA),employs154,000people.Ontheotherhand,IBM,based

inNewYork(NY),hasasignificantlylargerworkforce,with282,000employees.”

•Datasetandbenchmark

1.NeuralTextGenerationfromStructuredDatawithApplicationtotheBiographyDomain[EMNLP16]

/DavidGrangier/wikipedia-biography-dataset

2.ToTTo:AControlledTable-To-TextGenerationDataset[EMNLP20]

https://huggingface.co/datasets/google-research-datasets/totto

3.Table-to-text:Describingtableregionwithnaturallanguage.[AAAI18]

/msra-

nlc/Table2Text

Text-to-SQL

Tableanalysis

Text:HowmanyemployeesinApple?

SELECT`employee_num`FROMtable_name

WHEREname='Apple';

“ConvertnaturallanguagetoSQLquery”

name

loc

employee_num

Apple

154,000

IBM

282,000

•Datasetandbenchmark

1.Seq2SQL:GeneratingStructuredQueriesfromNaturalLanguageusingReinforcementLearning.[ICLR18]

/salesforce/WikiSQL

2.Spider[EMNLP18]

https://yale-lily.github.io/spider

MethodsbeforeLLM

•Rule,ML,NN–based->skip

•Transformer-based(2018-)

•Encoder

•Encoder-Decoder

“GPT”

decoder

“BERT”

encoder

“T5”

Transformer(encoder-decoder)

MotivationofEncoderfortables

•Pretrain-and-finetune(“BERT-way”)

•Learninggoodtablerepresentation(embedding)withtablepretrainingtasks

•Finetuneondownstreamtasks

Largeunlabelleddata

Pre-training(Encoder)

Smalllabelled

downstreamtaskdata

task1

task2

Fine-tuning

(Additional

layer)

Fine-tuning

(Additional

layer)

Table-onlyPretraining

•Pretrainwithtablecontents

TURL

•TURL[VLDB20]

•MaskedLanguageModel(MLM)

•MaskedEntityRecovery

•TABBIE[NAACL21]

•Detectcorruptedcells

•TUTA[KDD21]

•MLM

•Cellfilling

TABBIE

•Contextselection

TURL:

/pdf/2006.14806

TABBIE:

/abs/2105.02584

TUTA:

/pdf/2010.12537

Table-and-queryPretraining

•Pretrainwithtablecontents&query

•TAPAS[ACL20]

•Queryandwholetable

•Aggregationprediction

•Cellselectiontask

TAPAS

•TaBERT[NAACL21]

•Queryandrelatedrows

•MaskedLanguageModel

TAPAS:

/abs/2004.02349

TaBERT:

/abs/2005.08314

TaBERT

MotivationofEncoder-decoderfortables

•Flexibleinputandoutput

•Tabletotext

•Texttosql

•Tablesummarize

•Tabletomarkdown,html

•Mulitmodalability

•Imageencoder->textdecoder:tableOCRtask,tableVQA

•Generalizedandgoodgenerationability

Image

Text

HTML

Markdown

Encoder

Decoder

ImageTextHTML

Markdown

Text-to-textEncoderDecoder

•GeneralizedandGoodgenerativeabilitybyfine-tuningonpretrainedencoder

decodermodel

•UnifiedSKG[EMNLP22]

•FinetuneT5

UnifiedSKG

•TaPEx[ICLR22]

•FinetuneBART

UnifiedSKG:

/abs/2201.05966

TaPEx:

/abs/2107.07653

TaPEx

VisionEncoderDecoder

•Multimodality

•Tableformer[CVPR22]:Tabledetection&OCR

Tableformer

•Boundingboxdetection,structuregeneration

•TATR[CVPR22]

•Tabledetection

•Basedonobjectdetectiontransformer(DETR)

Tableformer:

/abs/2203.01017

TATR:

/microsoft/table-transformer

TATR(DETR)

LLM(decodersfortable)

•AutoregressiveGeneration

•Goodforgenerativetask

•Easyforself-supervisedtraining

•Simplearchitecture

•Efficientontraininglarge-scaledata

•Easytoscalethemodel

“BERT”

encoder

“GPT”

“T5”

Transformer(encoderdecoder)

decoder

MotivationofLLMfortabletask

•Goodpoints

•Largeknowledge

•Incontextlearning/zeroshotability

•Generationtask,interactive

•Limitation

•Slow

•Cost

•Notgoodperformancefor

•Mathematicaltask

•Largetable

Prompt

Fine-tuning

VLM

UseofLLMintabletasks

RAG

LLMagents

OntheUseofLargeLanguageModels

forTableTasks

-Prompting

PromptEngineering

BadPromptGoodPrompt

I'mhungry.

I'manadultman

weighing200poundswithadailycaloric

requirementof2500.Ineedtohitmyproteingoalforthedaywhichis120grams.I'm

hungryforsomethinghotandalittlesalty.

Zero-shot

•InstructtheLLMtofollowprompts.

Few-shot

•Demonstratewithoneormoreexamples.

Scrambleandcook4eggs.

Chain-of-thought

•Elicitcomplexreasoningthrougha

seriesofintermediatereasoningsteps.

•Variants:tree-of-thought,graph-of-thought,etc.

Certainly,here'sawholebunchof

uselesstextbecauseI

don'tknowwhatyouwantbecauseIcan'treadyourmind

becauseallIdois

generatetextfrom

prompts.Ifyouare

unsatisfiedwithmy

responseplease

provideamoreusefulprompt.

source:

/the-augmented-engineer/new-to-ai-prompt

-writing-learn-how-to-get-summaries-that-cover-all-the-key-points-you-care-about-46615fab135b

Zero-ShotPrompting

•Instructthemodeltofollowspecificprompts.

Iwenttothemarketandbought10apples.Igavetwotomy

neighborandtwototherepairman.ThenIbought5applesandateone.Howmanyapplesareleft?

Let'sthinkaboutthisstepbystep.

First,Ibought10apples.

1.Igavetwoapplestomyneighbor;10-2=8applesleft.

2.Igavetwoapplestotherepairman;8-2=6applesleft.

3.ThenIbought5apples.6+5=11applesleft.

4.Oneappleeaten;11-1=10applesleft.Asaresult,thereare10applesleft.

Kojimaetal.Largelanguagemodelsarezero-shotreasoners.NeurIPS2022.

source:

/Jazuma/items/9de7c56fd9fa8b9c1648

Few-ShotPrompting

•Demonstratetask-solvingwithexamplesprovidedin

Fantastic!//PositiveTerrible!//Negative

Themoviewasamazing!//PositiveWhatahorribleshow!//

prompts.

Negative

source:

/Jazuma/items/9de7c56fd9fa8b9c1648

Brownetal.Languagemodelsarefew-shotlearners.NeurIPS2020.

Chain-of-Thought

•Elicitcomplexreasoningbyprovidinginferenceprocesses.

Theoddnumbersinthisgroupadduptoanevennumber.:4,8,

9,15,12,2,1.

A:Addingalloddnumbersgives9+15+1=25.Theansweris

False.

Theoddnumbersinthisgroupadduptoanevennumber.:15,32,5,13,82,7,1.

Addingalltheoddnumbersgives15+5+13+7+1=41.TheanswerisFalse.

source:

/Jazuma/items/9de7c56fd9fa8b9c1648

Weietal.Chainofthoughtpromptingelicitsreasoninginlargelanguagemodels.NeurIPS2022.Wangetal.Self-consistencyimproveschainofthoughtreasoninginlanguagemodels.ICLR2023.

IssuesofPromptingforTableTasks

Task

Variety

(TV)

吁

............

Data

Format

(DF)

............

PDF

............

Data

Volume

(DV)

Task

Complexity

(TC)

PromptingTechniquesforTableTasks

☞TaskComplexity

☞DataFormat

☞DataVolume

☞TaskVariety

Task

decomposition

Instanceprompting

Tableencoding

Zero-shotprompting

Table

decomposition

Table

reconstruction

Batch

prompting

Few-shotprompting

Prefixcaching

Chain-of-thought

☞TaskVariety

Invariant

Zero-shot

TablePreprocessing

TableUnderstanding

Few-shot

TableAnalysis

Variant

Chain-of-thought(and

variants)

☞TaskVariety:Zero-ShotPrompting

name

city

addr

phone

Type

Langer's

704S.Alvarado

St.

213-483-

8050

delis

TaskType

DataImputation

TaskDescription

Youareadatabaseengineer.

Youarerequestedtoinferthevalueofthe"city"attributebasedonthevaluesofotherattributes.

DataInstance

[name:"langer's",addr:"704s.alvaradost.",phone:"213-483-8050",type:

"delis"]

prom

Answer

Thecityis"LosAngeles".

Zhangetal.Largelanguagemodelsasdatapreprocessors.TaDA2024.

Table

Preprocessing

☞TaskVariety:Few-ShotPrompting

TaskType

DataImputation

TaskDescription

Youareadatabaseengineer.

Youarerequestedtoinferthevalueofthe"city"attributebasedonthevaluesofotherattributes.

DataInstance

[name:"langer's",addr:"704s.alvaradost.",phone:"213-483-8050",type:"delis"]

Few-shotExamples

Someexamplesaregivenbelow.

```

User:

Question1:Recordis[name:"carey'scorner",addr:"1215powersferryrd.",phone:"770-933-0909",type:"hamburgers"].Whatisthecity?

Assistant:

Answer1:Marietta

…prom

```

Answer

LosAngeles

Zhangetal.Largelanguagemodelsasdatapreprocessors.TaDA2024.

Table

Preprocessing

TaskTypeDataImputation

TheaddressandphonenumberpointtoalocationinLosAngeles,CA,

knownforitsdis

tinctareacode(213)and

localbusinesses.

Answer

…

LosAngeles

☞TaskVariety:Chain-of-Thought

TaskDescription

…

MUSTanswereachquestionintwolines.Inthefirstline,yougivethe

reasonfortheinference.Inthesecondline,youONLYgivethevalueofthe"city"attribute.

DataInstance

[name:"langer's",addr:"704s.alvaradost.",phone:"213-483-8050",type:"delis"]

Few-shotExamples

Answer1:Thephoneareacode770correspondstotheAtlanta

metropolitanareainGeorgia,and"1215PowersFerryRd."isanaddresslocatedinMarietta,Georgia;therefore,thecityisMarietta.

Mariettaprom

Someexamplesaregivenbelow.

```

User:

Question1:Recordis[name:"carey'scorner",addr:"1215powersferryrd.",phone:"770-933-0909",type:"hamburgers"].Whatisthecity?

Assistant:

Zhangetal.Largelanguagemodelsasdatapreprocessors.TaDA2024.

Table

Preprocessing

☞TaskVariety:

Chain-of-Thought(+Self-ConsistencyDecoding)

TaskTypeDataImputation

Answer

Thenamereferstoa

restaurantinthecityofSantaMonica,CA.The

phonenumberalsocorrespondsto

California.

SantaMonica

Answer

Thephonenumber

correspondtoalocationinLosAngeles,

California,recognizedforitsunique213areacode.

LosAngeles

Weightedsum/majority

voting

•Generatemultiplereasoninganswersandaggregatethem.

TaskDescription

…

DataInstance

[name:"langer's",addr:"704s.alvaradost.",phone:"213-483-8050",type:"delis"]

Few-shotExamples

…prom

Answer

Theaddressandphone

numberpointtoa

locationinLosAngeles,CA,knownforitsdistinctareacode(213)andlocalbusinesses.

LosAngeles

Answer

LosAngeles

Chen.Largelanguagemodelsarefew(1)-shottablereasoners.EACL2023.

TATable

Chenetal.Programofthoughtsprompting:Disentanglingcomputationfrom

reasoningfornumericalreasoningtasks.TMLR2023.

•自Task

•Table

☞TaskComplexity

•Taskdecomposition

•Tabledecomposition

Strategies

Sourcesofcomplexity

☞TaskComplexity:TaskDecomposition

Beforedecomposition

FriendsPizza

2525

CashVisaMasterCard

7:30AM

TaskType

ColumnTypeAnnotation

TaskDescription

Classifythecolumnsofagiventablewithonlyoneofthefollowingclassesthatareseparatedwithcomma:

descriptionofevent,descriptionofrestaurant,postalcode,regionofaddress…

DataInstance

Column1||Column2||Column3||Column4¥n

FriendsPizza||2525||CashVisaMasterCard||7:30AM¥n

Answer

name,number,payment,time

KoriniandBizer.ColumntypeannotationusingChatGPT.TaDA2023.

TUTableInterpretation

Zhaoetal.Largelanguagemodelsarecomplextableparsers.EMNLP2023.

TATableQA

TPTableTransformation

TATableQA

Wangetal.Chain-of-table:Evolvingtablesinthereasoningchainfortableunderstanding.ICLR2024.

TUTableInterpretation

Dongetal.OpenTE:Open-structuretableextractionfromtext.ICCASP2024.

Sub-Task1

Sub-Task2

Yourtaskistoclassifyifatable

describesRestaurants,Events,MusicRecordings,orHotels.

Sub-TaskType

TableClassification

Task

Description

Data

Instance

Column1||Column2||Column3||Column4¥n

FriendsPizza||2525||CashVisaMasterCard||7:30AM¥n

Answer

Restaurant

KoriniandBizer.ColumntypeannotationusingChatGPT.TaDA2023.

Zhaoetal.Largelanguagemodelsarecomplextableparsers.EMNLP2023.

Wangetal.Chain-of-table:Evolvingtablesinthereasoningchainfortableunderstanding.ICLR2024.

Dongetal.OpenTE:Open-structuretableextractionfromtext.ICCASP2024.

TATableQA

☞TaskComplexity:TaskDecomposition

Afterdecomposition

Sub-TaskType

ColumnClassification

Task

Description

Yourtaskistoclassifythecolumnsofagiventablewithonlyoneofthe

followingclassesthatareseparatedwithcomma:nameofrestaurant,

descriptionofrestaurant…

Data

Instance

Column1||Column2||Column3||Column4¥n

FriendsPizza||2525||CashVisa

Answer

MasterCard||7:30AM¥

nameofrestaurant,postalcode,

paymentaccepted,time

TUTableInterpretation

TATableQA

TPTableTransformation

TUTableInterpretation

☞TaskComplexity:TableDecomposition

TabIe:FigureskatingattheAsianwinterGames

Rank

Nation

Gold

silver

Bronze

Total

China

Japan

uzbekistan

kazakhstan

Northkorea

southkorea

TotaI

Q:whoreceivedmorebronzemedaIs:japanorsouthkorea?

A:Japan

TabIetitIe:FigureskatingattheAsianwinterGames

CoIumns:['rank','nation','goId','siIver','bronze','totaI']

Q:whoreceivedmorebronzemedaIs:japanorsouthkorea?

sub-tabIe

Nation

Bronze

Japan

southkorea

sub-tabIeseIection

(LLM)

Execute

sQL:seIectnation,bronzefromT

wherenation='japan'ornation='southkorea'

(1)subtabIeseIection

sub-tabIe

TabIetitIe:FigureskatingattheAsianwinterGames

Nation

Bronze

Japan

southkorea

seIectnation,bronzefromT

wherenation='japan'ornation='southkorea'

Q:whoreceivedmorebronzemedaIs:japanorsouthkorea?

AnswerGeneration

(LLM)

Response:BasedonthetabIe,Japanreceived7bronzemedaIsandsouthkoreareceived2bronzemedaIs.Therefore,Japanreceived

morebronzemedaIsthansouthkorea.

Answer:Japan

(2)ReasoningandAnswerGeneration

TATableQA

TAText-to-SQL

NahidandRafiei.TabSQLify:EnhancingreasoningcapabilitiesofLLMsthroughtabledecomposition.NAACL2024.

TATableQA

Patnaiketal.Cabinet:Contentrelevancebasednoisereductionfortablequestion

answering.ICLR2024.

TATableQA

TAText-to-SQL

Jiangetal.StructGPT:Ageneralframeworkforlargelanguagemodeltoreasonoverstructureddata.EMNLP2023.

☞TaskComplexity:

Text

transformation

Question

ReportthenumberofwinsinGrandSlam

tournaments.

tournament

…

attn

career_w/l

AustralianOpen

…

RolandGarros

…

Wimbledon

…

USOpen

…

IndianWells

…

tournament

…

attn

AustralianOpen

…

RolandGarros

…

Wimbledon

…

USOpen

…

IndianWells

…

career_w/l

22-18

11-14

13-18

16-13

20-15

n_win

…………

13-18

11-14

22-18

16-13

20-15

tournament

…

attn

career_w/l

AustralianOpen

…

22-18

RolandGarros

…

11-14

Wimbledon

…

13-18

USOpen

…

16-13

IndianWells

…

20-15

TableDecomposition(ProgressivePrompting)

Focusoncolumnselection.

Incorporatebothcolumnandrowselection.

Applyadditionaloperations(e.g.,aggregationfunctionsandtext

operations).

Setthegroundworkfor

understandinghowtofetch

specificdatafromadatabase.

Extractparticularcolumnsandfilteringrowsbasedonspecifiedcriteria,enhancingprecisionindatagathering.

Aggregationfunctionsempowerdatasummarization.

Textoperationsfacilitatethe

manipulationandtransformationofstringdata.

reasoners.ICLR2024.

Kongetal.OpenTab:Advancinglargelanguagemodelsasopen-domaintable

TAText-to-SQL

☞DataFormat:TableEncoding

text

(serialized)

spreadsheet

markup

key-value

program

image

embedded

easyhard

☞DataFormat:TableReconstruction

Zhaoetal.Largelanguagemodelsarecomplextableparsers.EMNLP2023.

TATableQA

☞DataVolume:InstancePrompting

name

city

addr

phone

Type

Langer's

704S.AlvaradoSt.

213-483-8050

delis

Valetino

3115PicoBlvd.

310-829-4313

Italian

CafeBizou

14016VenturaBlvd.

818/788-3536

French

Youareadatabaseengineer.

Youarerequestedtoinferthevalueofthe"city"

attributebasedonthe

valuesofotherattributes.

[name:"langer's",addr:

"704s.alvaradost.",phone:"213-483-8050",type:

"delis"]

Task

Description

Data

Instance

TaskType

DataImputation

TaskType

DataImputation

TaskType

DataImputation

prompt

Youareadatabaseengineer.

Youarerequestedtoinferthevalueofthe"city"

attributebasedonthe

valuesofotherattributes.

[name:"cafebizou",addr:"14016venturablvd.",

phone:"818/788-3536",

Task

Description

Data

Instance

type:"french"]prompt

Youareadatabaseengineer.

Youarerequestedtoinferthevalueofthe"city"

attributebasedonthe

valuesofotherattributes.

[name:"valentino",addr:"3115picoblvd.",phone:"310-829-4313",type:

"italian"]

Task

Description

Data

Instance

Zhangetal.Large

languagemodelsasdata

TablePreprocessing

preprocessors.TaDA2024.

standardprompting

#K-shotin-conte×te×empars

Q:{question}

A:{answer}

Q:{question}

A:{answer}

#onesampetoinference

Q:A1ihad$21.Lei1agavehimha1fofher$1.HowmuchdoesA1ihavenow?

#Response

A:Lei1agave1/2=5toA1i.A1inowhas $21+$5=$71.Theansweris71.

Batchprompting

#K-shotin-conte×te×emparsinK/bbatches

Q[1]:{question}Q[2]:{question}A[1]:{answer}

A[2]:{answer}

#bsampesinabatchtoinference

Q[1]:A1ihad$21.Lei1agavehimha1fofher

$1.HowmuchdoesA1ihavenow?

Q[2]:Arobetakes2bo1tsofb1uefiberandha1fthatwhitefiber.Howmanybo1ts?

b(=2)samplesinonebatch

#Responsestoabatch

A[1]:Lei1agave1/2=5toA1i.A1inowhas

$21+$5=$71.Theansweris71.

A[2]:Ittakes2/2=1bo1tofwhitefiber.The tota1amountis2+1=3.Theansweris3.

source:Chengetal.Batchprompting:EfficientinferencewithlargelanguagemodelAPIs.EMNLP2023.

☞DataVolume:BatchPrompting

name

city

addr

phone

Type

Langer's

704S.AlvaradoSt.

213-483-8050

delis

Valetino

3115PicoBlvd.

310-829-4313

Italian

CafeBizou

14016VenturaBlvd.

818/788-3536

French

TaskType

DataImputation

Task

Description

Youareadatabaseengineer.

Youarerequestedtoinferthevalueofthe"city"attributebasedonthevaluesofotherattributes.

Data

Instance

[name:"langer's",addr:"704s.alvaradost.",phone:"213-483-8050",type:"delis"]

[name:"valentino",addr:"3115picoblvd.",phone:"310-829-4313",type:"italian"]

[name:"cafebizou",addr:"14016venturablvd.",phone:"818/788-3536",type:"french"]prompt

Table46

Preprocessing

Zhangetal.Largelanguagemodelsasdatapreprocessors.TaDA2024.

☞DataVolume:

PrefixCaching(+InstancePrompting)

Youarerequestedtoinferthevalueofthe"city"

attributebasedonthe

valuesofotherattributes.

Youarerequestedtoinferthevalueofthe"city"

attributebasedonthe

valuesofotherattributes.

Youareadatabase

engineer.

[name:"cafebizou",addr:

"14016venturablvd.",

phone:"818/788-35o",mpt

Youarerequestedtoinferthevalueofthe"city"

attributebasedonthe

valuesofotherattributes.

Youareadatabase

engineer.

[name:"langer's",addr:

"704s.alvaradost.",phone:"213-483-8050",typ:rompt

Task

Description

Data

Instance

"delis"]

Task

Description

Data

Instance

type:french"]

•UseAutomaticPrefixCaching(APC)inthevLLMlibrary.

Task

Description

Data

Instance

Youareadatabase

engineer.

[name:"valentino",addr:

"3115picoblvd.",phone:

"310-829-4313",typ:rompt

italian"]

KVcache:

cachekeyvectorstospeedupQKV

attentionin

Transformers

Shareprefix

ShareKV

人人文库> 全部分类> 应用文书 > 研究报告

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks

文档简介

温馨提示

最新文档

评论

大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks

文档简介

温馨提示

最新文档

评论

相关文档