![大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks_第1页](http://file4.renrendoc.com/view14/M09/28/12/wKhkGWcl6smAEd3VAACPwa48_yc998.jpg)
![大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks_第2页](http://file4.renrendoc.com/view14/M09/28/12/wKhkGWcl6smAEd3VAACPwa48_yc9982.jpg)
![大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks_第3页](http://file4.renrendoc.com/view14/M09/28/12/wKhkGWcl6smAEd3VAACPwa48_yc9983.jpg)
![大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks_第4页](http://file4.renrendoc.com/view14/M09/28/12/wKhkGWcl6smAEd3VAACPwa48_yc9984.jpg)
![大语言模型在表格任务中的应用 On the Use of Large Language Models for Table Tasks_第5页](http://file4.renrendoc.com/view14/M09/28/12/wKhkGWcl6smAEd3VAACPwa48_yc9985.jpg)
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1
OntheUseofLargeLanguageModels
forTableTasks
-Introduction
HaochenZhangChuanXiaoYuyangDong
Tabulardataiseverywhere
(a)Relationaldatabases
(b)Richdocuments,PDF
(c)webpages(d)spreadsheet2
Growingresearchfocus
•GrowquicklyinDB,AIandNLPcommunities
resultfrom[4]
•Recenttutorials
•[1]Webtableextraction,retrievalandaugmentation,SIGIR19
•[2]FromTablestoKnowledge:RecentAdvancesinTableUnderstanding,KDD21
•[3]TransformersforTabularDataRepresentation:AtutorialonModelsandApplicationsVLDB22,SIGMOD23
•[4]LargeLanguageModelsforTabularData:ProgressesandFutureDirections,SIGIR24
•A-Paper-List-of-Awesome-Tabular-LLMs,
/SpursGoZmy/Awesome-Tabular-LLMs
Tabletasks&benchmarks
•TableInterpretation
•EntityLinking
•ColumnTypeAnnotation
•RelationExtraction
•Tabledetection
“Preparetables”
Tablereprocessing
•Tablematching
•Entitymatching
•Schemamatching
•Tablecleaning
•Errordetection
•Dataimputation
•Tableaugmentation
•Rowpopulation
•Schemaaugmentation
•Tablesearch
•Tabletransformation
Tableunderstanding
“Understandtables”“Getanswerfromtables”
Tableanalysis
•TableQA
•Tablefactverification
•Table-to-text
•Text-to-SQL
4
5
Table
preprocessing
“Matchingtworows”
Schemamatching
“Matchingtwocolumns”
Tablematching
matching
Entity
AAA
BBB
CCC
EEE
AAA’
id
name
rev
1
IBMCorp
$57B
2
AppleInc
$366B
3
GE
$74B
DDD
id
name
loc
#of
employee
1
Apple
CA
154,000
2
IBM
NY
282,000
•Datasetandbenchmark
1.CanFoundationModelsWrangleYourData?[VLDB23].
/abs/2205.09911
2.Jellyfish:Instruction-TuningLocalLargeLanguageModelsforDataPreprocessing[EMNLP24]
/abs/2312.01678
ID
Name
DateofBirth
Prefecture
PostalCode
Height
1
Yuka
2003/02/26
Hokkaido
540-8570
165
2
Nana
Aichi
464-0804
157
3
Miho
2001/06/25
Kangawa
2208799
1.60
Table
preprocessing
“Detecttheerrorcells”
Errordetection
inconsistency
missing
typoformatoutlier
“Imputevaluesintocells”
Dataimputation
ID
Name
DateofBirth
Prefecture
PostalCode
Height
1
Yuka
2003/02/26
Osaka
540-8570
165
2
Nana
2003/03/30
Aichi
464-0804
157
3
Miho
2001/06/25
Kanagawa
220-8799
160
Tablecleaning
•Datasetandbenchmark
1.CanFoundationModelsWrangleYourData?[VLDB23].
/abs/2205.09911
6
2.Jellyfish:Instruction-TuningLocalLargeLanguageModelsforDataPreprocessing[EMNLP24]
/abs/2312.01678
7
Table
preprocessing
Tableaugmentation
“Addcolumns/rowstotable”
Column
population
id
name
loc
#of
employee
1
Apple
CA
154,000
2
IBM
NY
282,000
Rowpopulation
•Datasetandbenchmark
1.TURL:TableUnderstandingthroughRepresentationLearning.[VLDB20]
/abs/2006.14806
2.TableLlama:TowardsOpenLargeGeneralistModelsforTables.[NAACL24].
https://osu-nlp-group.github.io/TableLlama/
Tablesearch
Table
preprocessing
“RetrievetableswithanNLquery”
“Tablescontains
informationofAppleInc.”
Search
Tables,
Datalakes,Documents
Tablecollection
id
name
loc
#of
employee
1
Apple
CA
154,000
2
IBM
NY
282,000
•Datasetandbenchmark
1.Open-DomainTableRetrievalforNaturalQuestions.
/zorazrw/nqt-retrieval
2.Open-WikiTable:DatasetforOpenDomainQuestionAnsweringwithComplexReasoningoverTable[EMNLP23]
/sean0042/Open_WikiTable
8
9
Table
preprocessing
Tabletransformation
“Manipulatetableintowantedstyles”
•Datasetandbenchmark
1.SpreadsheetBench:TowardsChallengingRealWorldSpreadsheetManipulation[NeurIPS24]
https://spreadsheetbench.github.io/
Table
understanding
TableInterpretation“classifycolumnsinto
Companydefinedtypes”
1
2
loc
#of
employee
CA
154,000
NY
282,000
id
/wiki/Apple_Inc
Entitylinking
“Extractandpredictrelation
betweentwocolumns”
RelationExtraction
“Matchentitytoknowledgebase”
!Columntypeannotation
name
AppleIBM
anization.headquarters-location
•Datasetandbenchmark
1.TURL:TableUnderstandingthroughRepresentationLearning.[VLDB20]
/abs/2006.14806
2.TableLlama:TowardsOpenLargeGeneralistModelsforTables.[NAACL24].
https://osu-nlp-group.github.io/TableLlama/
3.ColumnTypeAnnotationusingChatGPT[arixv24]
/abs/2306.0071405
Tabledetection
“detecttableregion,structureandcontent”
•Datasetandbenchmark
1.PubTables-1M:Towardscomprehensivetableextractionfromunstructured.[CVPR22]
/microsoft/table-transformer
2.TableFormer:TableStructureUnderstandingwithTransformers.[CVPR22].
/IBM/TabFormer
3.DoclingTechnicalReport.[arxiv24]
/DS4SD/docling-ibm-models
Table
understanding
11
TableQA
Tableanalysis
“Question-answeringontabularcontents”
id
name
loc
#of
employee
1
Apple
CA
154,000
2
IBM
NY
282,000
Question1:WhereisthelocationofIBM?
Answer1:NewYork
Question2:WhatisthesumofemployeeinAppleandIBM?
Answer2:436,000
•Datasetandbenchmark
1.TableLlama:TowardsOpenLargeGeneralistModelsforTables.[NAACL24].
https://osu-nlp-
group.github.io/TableLlama/
12
2.https://huggingface.co/datasets/SpursgoZmy/IFT-Data-For-Tabular-Tasks
Tableanalysis
“Verifyagivensentencesaccordingtotable”
Tablefactverification
id
name
loc
#of
employee
1
Apple
CA
154,000
2
IBM
NY
282,000
Entailed
“IBMandAppleareU.S.companies.”
Refuted
“ApplehasmoreemployeethanIBM.”
•Datasetandbenchmark
1.TabFact:ALarge-scaleDatasetforTable-basedFactVerification.[ICLR20]
https://tabfact.github.io/
13
2.FEVEROUS:FactExtractionandVERificationOverUnstructuredandStructuredinformation.[ACL21workshop].
https://fever.ai/dataset/feverous.html
14
Table-to-text
Tableanalysis
“Generatedescriptionoftable”
“Thetablepresentsinformationabouttwomajor
id
name
loc
#of
employee
1
Apple
CA
154,000
2
IBM
NY
282,000
companies,AppleandIBM,alongwiththeirlocationsandemployeecounts.Apple,headquarteredinCalifornia(CA),employs154,000people.Ontheotherhand,IBM,based
inNewYork(NY),hasasignificantlylargerworkforce,with282,000employees.”
•Datasetandbenchmark
1.NeuralTextGenerationfromStructuredDatawithApplicationtotheBiographyDomain[EMNLP16]
/DavidGrangier/wikipedia-biography-dataset
2.ToTTo:AControlledTable-To-TextGenerationDataset[EMNLP20]
https://huggingface.co/datasets/google-research-datasets/totto
3.Table-to-text:Describingtableregionwithnaturallanguage.[AAAI18]
/msra-
nlc/Table2Text
15
Text-to-SQL
Tableanalysis
Text:HowmanyemployeesinApple?
SELECT`employee_num`FROMtable_name
WHEREname='Apple';
“ConvertnaturallanguagetoSQLquery”
id
name
loc
employee_num
1
Apple
CA
154,000
2
IBM
NY
282,000
•Datasetandbenchmark
1.Seq2SQL:GeneratingStructuredQueriesfromNaturalLanguageusingReinforcementLearning.[ICLR18]
/salesforce/WikiSQL
2.Spider[EMNLP18]
https://yale-lily.github.io/spider
MethodsbeforeLLM
•Rule,ML,NN–based->skip
•Transformer-based(2018-)
•Encoder
•Encoder-Decoder
“GPT”
decoder
“BERT”
encoder
“T5”
16
Transformer(encoder-decoder)
17
MotivationofEncoderfortables
•Pretrain-and-finetune(“BERT-way”)
•Learninggoodtablerepresentation(embedding)withtablepretrainingtasks
•Finetuneondownstreamtasks
Largeunlabelleddata
Pre-training(Encoder)
Smalllabelled
downstreamtaskdata
task1
task2
Fine-tuning
(Additional
layer)
Fine-tuning
(Additional
layer)
Table-onlyPretraining
•Pretrainwithtablecontents
TURL
•TURL[VLDB20]
•MaskedLanguageModel(MLM)
•MaskedEntityRecovery
•TABBIE[NAACL21]
•Detectcorruptedcells
•TUTA[KDD21]
•MLM
•Cellfilling
TABBIE
•Contextselection
TURL:
/pdf/2006.14806
TABBIE:
/abs/2105.02584
TUTA:
/pdf/2010.12537
Table-and-queryPretraining
•Pretrainwithtablecontents&query
•TAPAS[ACL20]
•Queryandwholetable
•Aggregationprediction
•Cellselectiontask
TAPAS
•TaBERT[NAACL21]
•Queryandrelatedrows
•MaskedLanguageModel
TAPAS:
/abs/2004.02349
TaBERT:
/abs/2005.08314
19
TaBERT
20
MotivationofEncoder-decoderfortables
•Flexibleinputandoutput
•Tabletotext
•Texttosql
•Tablesummarize
•Tabletomarkdown,html
•Mulitmodalability
•Imageencoder->textdecoder:tableOCRtask,tableVQA
•Generalizedandgoodgenerationability
Image
Text
HTML
Markdown
Encoder
Decoder
ImageTextHTML
Markdown
Text-to-textEncoderDecoder
•GeneralizedandGoodgenerativeabilitybyfine-tuningonpretrainedencoder
decodermodel
•UnifiedSKG[EMNLP22]
•FinetuneT5
UnifiedSKG
•TaPEx[ICLR22]
•FinetuneBART
UnifiedSKG:
/abs/2201.05966
TaPEx:
/abs/2107.07653
TaPEx
21
VisionEncoderDecoder
•Multimodality
•Tableformer[CVPR22]:Tabledetection&OCR
Tableformer
•Boundingboxdetection,structuregeneration
•TATR[CVPR22]
•Tabledetection
•Basedonobjectdetectiontransformer(DETR)
Tableformer:
/abs/2203.01017
TATR:
/microsoft/table-transformer
TATR(DETR)
22
LLM(decodersfortable)
•AutoregressiveGeneration
•Goodforgenerativetask
•Easyforself-supervisedtraining
•Simplearchitecture
•Efficientontraininglarge-scaledata
•Easytoscalethemodel
“BERT”
encoder
“GPT”
“T5”
23
-
Transformer(encoderdecoder)
decoder
MotivationofLLMfortabletask
•Goodpoints
•Largeknowledge
•Incontextlearning/zeroshotability
•Generationtask,interactive
•Limitation
•Slow
•Cost
•Notgoodperformancefor
•Mathematicaltask
•Largetable
25
Prompt
Fine-tuning
VLM
UseofLLMintabletasks
RAG
LLMagents
26
OntheUseofLargeLanguageModels
forTableTasks
-Prompting
PromptEngineering
BadPromptGoodPrompt
I'mhungry.
I'manadultman
weighing200poundswithadailycaloric
requirementof2500.Ineedtohitmyproteingoalforthedaywhichis120grams.I'm
hungryforsomethinghotandalittlesalty.
Zero-shot
•InstructtheLLMtofollowprompts.
Few-shot
•Demonstratewithoneormoreexamples.
Scrambleandcook4eggs.
Chain-of-thought
•Elicitcomplexreasoningthrougha
seriesofintermediatereasoningsteps.
•Variants:tree-of-thought,graph-of-thought,etc.
Certainly,here'sawholebunchof
uselesstextbecauseI
don'tknowwhatyouwantbecauseIcan'treadyourmind
becauseallIdois
generatetextfrom
prompts.Ifyouare
unsatisfiedwithmy
responseplease
provideamoreusefulprompt.
27
source:
/the-augmented-engineer/new-to-ai-prompt
-writing-learn-how-to-get-summaries-that-cover-all-the-key-points-you-care-about-46615fab135b
Zero-ShotPrompting
•Instructthemodeltofollowspecificprompts.
Iwenttothemarketandbought10apples.Igavetwotomy
neighborandtwototherepairman.ThenIbought5applesandateone.Howmanyapplesareleft?
Let'sthinkaboutthisstepbystep.
First,Ibought10apples.
1.Igavetwoapplestomyneighbor;10-2=8applesleft.
2.Igavetwoapplestotherepairman;8-2=6applesleft.
3.ThenIbought5apples.6+5=11applesleft.
4.Oneappleeaten;11-1=10applesleft.Asaresult,thereare10applesleft.
Kojimaetal.Largelanguagemodelsarezero-shotreasoners.NeurIPS2022.
28
source:
/Jazuma/items/9de7c56fd9fa8b9c1648
Few-ShotPrompting
•Demonstratetask-solvingwithexamplesprovidedin
Fantastic!//PositiveTerrible!//Negative
Themoviewasamazing!//PositiveWhatahorribleshow!//
prompts.
Negative
source:
/Jazuma/items/9de7c56fd9fa8b9c1648
29
Brownetal.Languagemodelsarefew-shotlearners.NeurIPS2020.
Chain-of-Thought
•Elicitcomplexreasoningbyprovidinginferenceprocesses.
Theoddnumbersinthisgroupadduptoanevennumber.:4,8,
9,15,12,2,1.
A:Addingalloddnumbersgives9+15+1=25.Theansweris
False.
Theoddnumbersinthisgroupadduptoanevennumber.:15,32,5,13,82,7,1.
A:
Addingalltheoddnumbersgives15+5+13+7+1=41.TheanswerisFalse.
source:
/Jazuma/items/9de7c56fd9fa8b9c1648
Weietal.Chainofthoughtpromptingelicitsreasoninginlargelanguagemodels.NeurIPS2022.Wangetal.Self-consistencyimproveschainofthoughtreasoninginlanguagemodels.ICLR2023.
30
31
IssuesofPromptingforTableTasks
Task
Variety
(TV)
吁
............
Data
Format
(DF)
............
............
Data
Volume
(DV)
Task
Complexity
(TC)
32
PromptingTechniquesforTableTasks
☞TaskComplexity
☞DataFormat
☞DataVolume
☞TaskVariety
Task
decomposition
Instanceprompting
Tableencoding
Zero-shotprompting
Table
decomposition
Table
reconstruction
Batch
prompting
Few-shotprompting
Prefixcaching
Chain-of-thought
33
☞TaskVariety
Invariant
Zero-shot
TablePreprocessing
TableUnderstanding
Few-shot
TableAnalysis
Variant
Chain-of-thought(and
variants)
☞TaskVariety:Zero-ShotPrompting
name
city
addr
phone
Type
Langer's
?
704S.Alvarado
St.
213-483-
8050
delis
TaskType
DataImputation
TaskDescription
Youareadatabaseengineer.
Youarerequestedtoinferthevalueofthe"city"attributebasedonthevaluesofotherattributes.
pt
DataInstance
[name:"langer's",addr:"704s.alvaradost.",phone:"213-483-8050",type:
"delis"]
prom
Answer
Thecityis"LosAngeles".
Zhangetal.Largelanguagemodelsasdatapreprocessors.TaDA2024.
Table
Preprocessing
34
☞TaskVariety:Few-ShotPrompting
TaskType
DataImputation
TaskDescription
Youareadatabaseengineer.
Youarerequestedtoinferthevalueofthe"city"attributebasedonthevaluesofotherattributes.
pt
DataInstance
[name:"langer's",addr:"704s.alvaradost.",phone:"213-483-8050",type:"delis"]
Few-shotExamples
Someexamplesaregivenbelow.
```
User:
Question1:Recordis[name:"carey'scorner",addr:"1215powersferryrd.",phone:"770-933-0909",type:"hamburgers"].Whatisthecity?
Assistant:
Answer1:Marietta
…prom
```
Answer
LosAngeles
Zhangetal.Largelanguagemodelsasdatapreprocessors.TaDA2024.
Table
Preprocessing
35
TaskTypeDataImputation
TheaddressandphonenumberpointtoalocationinLosAngeles,CA,
knownforitsdis
tinctareacode(213)and
localbusinesses.
Answer
…
LosAngeles
☞TaskVariety:Chain-of-Thought
TaskDescription
…
MUSTanswereachquestionintwolines.Inthefirstline,yougivethe
reasonfortheinference.Inthesecondline,youONLYgivethevalueofthe"city"attribute.
pt
DataInstance
[name:"langer's",addr:"704s.alvaradost.",phone:"213-483-8050",type:"delis"]
Few-shotExamples
Answer1:Thephoneareacode770correspondstotheAtlanta
metropolitanareainGeorgia,and"1215PowersFerryRd."isanaddresslocatedinMarietta,Georgia;therefore,thecityisMarietta.
Mariettaprom
Someexamplesaregivenbelow.
```
User:
Question1:Recordis[name:"carey'scorner",addr:"1215powersferryrd.",phone:"770-933-0909",type:"hamburgers"].Whatisthecity?
Assistant:
36
Zhangetal.Largelanguagemodelsasdatapreprocessors.TaDA2024.
Table
Preprocessing
☞TaskVariety:
Chain-of-Thought(+Self-ConsistencyDecoding)
TaskTypeDataImputation
Answer
Thenamereferstoa
restaurantinthecityofSantaMonica,CA.The
phonenumberalsocorrespondsto
California.
SantaMonica
Answer
Thephonenumber
correspondtoalocationinLosAngeles,
California,recognizedforitsunique213areacode.
LosAngeles
Weightedsum/majority
voting
•Generatemultiplereasoninganswersandaggregatethem.
TaskDescription
…
pt
DataInstance
[name:"langer's",addr:"704s.alvaradost.",phone:"213-483-8050",type:"delis"]
Few-shotExamples
…prom
Answer
Theaddressandphone
numberpointtoa
locationinLosAngeles,CA,knownforitsdistinctareacode(213)andlocalbusinesses.
LosAngeles
Answer
LosAngeles
Chen.Largelanguagemodelsarefew(1)-shottablereasoners.EACL2023.
TATable
Chenetal.Programofthoughtsprompting:Disentanglingcomputationfrom
QA
37
reasoningfornumericalreasoningtasks.TMLR2023.
38
•自Task
•Table
☞TaskComplexity
•Taskdecomposition
•Tabledecomposition
Strategies
Sourcesofcomplexity
☞TaskComplexity:TaskDecomposition
Beforedecomposition
?
?
?
?
FriendsPizza
2525
CashVisaMasterCard
7:30AM
TaskType
ColumnTypeAnnotation
TaskDescription
Classifythecolumnsofagiventablewithonlyoneofthefollowingclassesthatareseparatedwithcomma:
descriptionofevent,descriptionofrestaurant,postalcode,regionofaddress…
DataInstance
Column1||Column2||Column3||Column4¥n
FriendsPizza||2525||CashVisaMasterCard||7:30AM¥n
Answer
name,number,payment,time
KoriniandBizer.ColumntypeannotationusingChatGPT.TaDA2023.
TUTableInterpretation
Zhaoetal.Largelanguagemodelsarecomplextableparsers.EMNLP2023.
TATableQA
TPTableTransformation
TATableQA
39
Wangetal.Chain-of-table:Evolvingtablesinthereasoningchainfortableunderstanding.ICLR2024.
TUTableInterpretation
Dongetal.OpenTE:Open-structuretableextractionfromtext.ICCASP2024.
Sub-Task1
Sub-Task2
Yourtaskistoclassifyifatable
describesRestaurants,Events,MusicRecordings,orHotels.
Sub-TaskType
TableClassification
Task
Description
Data
Instance
Column1||Column2||Column3||Column4¥n
FriendsPizza||2525||CashVisaMasterCard||7:30AM¥n
Answer
Restaurant
KoriniandBizer.ColumntypeannotationusingChatGPT.TaDA2023.
Zhaoetal.Largelanguagemodelsarecomplextableparsers.EMNLP2023.
Wangetal.Chain-of-table:Evolvingtablesinthereasoningchainfortableunderstanding.ICLR2024.
Dongetal.OpenTE:Open-structuretableextractionfromtext.ICCASP2024.
TATableQA
40
☞TaskComplexity:TaskDecomposition
Afterdecomposition
Sub-TaskType
ColumnClassification
Task
Description
Yourtaskistoclassifythecolumnsofagiventablewithonlyoneofthe
followingclassesthatareseparatedwithcomma:nameofrestaurant,
descriptionofrestaurant…
Data
Instance
Column1||Column2||Column3||Column4¥n
FriendsPizza||2525||CashVisa
Answer
MasterCard||7:30AM¥
nameofrestaurant,postalcode,
paymentaccepted,time
TUTableInterpretation
TATableQA
TPTableTransformation
TUTableInterpretation
☞TaskComplexity:TableDecomposition
TabIe:FigureskatingattheAsianwinterGames
Rank
Nation
Gold
silver
Bronze
Total
1
China
13
9
13
35
2
Japan
7
10
7
24
3
uzbekistan
1
2
3
6
4
kazakhstan
2
2
0
4
5
Northkorea
1
0
1
2
6
southkorea
0
0
2
2
TotaI
24
23
26
73
Q:whoreceivedmorebronzemedaIs:japanorsouthkorea?
A:Japan
TabIetitIe:FigureskatingattheAsianwinterGames
CoIumns:['rank','nation','goId','siIver','bronze','totaI']
Q:whoreceivedmorebronzemedaIs:japanorsouthkorea?
sub-tabIe
Nation
Bronze
Japan
7
southkorea
2
sub-tabIeseIection
(LLM)
Execute
sQL:seIectnation,bronzefromT
wherenation='japan'ornation='southkorea'
(1)subtabIeseIection
sub-tabIe
TabIetitIe:FigureskatingattheAsianwinterGames
Nation
Bronze
Japan
7
southkorea
2
seIectnation,bronzefromT
wherenation='japan'ornation='southkorea'
Q:whoreceivedmorebronzemedaIs:japanorsouthkorea?
AnswerGeneration
(LLM)
Response:BasedonthetabIe,Japanreceived7bronzemedaIsandsouthkoreareceived2bronzemedaIs.Therefore,Japanreceived
morebronzemedaIsthansouthkorea.
Answer:Japan
(2)ReasoningandAnswerGeneration
TATableQA
TAText-to-SQL
NahidandRafiei.TabSQLify:EnhancingreasoningcapabilitiesofLLMsthroughtabledecomposition.NAACL2024.
TATableQA
Patnaiketal.Cabinet:Contentrelevancebasednoisereductionfortablequestion
answering.ICLR2024.
TATableQA
TAText-to-SQL
Jiangetal.StructGPT:Ageneralframeworkforlargelanguagemodeltoreasonoverstructureddata.EMNLP2023.
☞TaskComplexity:
Text
transformation
Question
ReportthenumberofwinsinGrandSlam
tournaments.
tournament
…
attn
career_w/l
AustralianOpen
…
18
RolandGarros
…
14
Wimbledon
…
18
USOpen
…
13
IndianWells
…
15
tournament
…
attn
AustralianOpen
…
18
RolandGarros
…
14
Wimbledon
…
18
USOpen
…
13
IndianWells
…
15
career_w/l
22-18
11-14
13-18
16-13
20-15
n_win
22
11
13
16
…………
…………
…………
13-18
11-14
22-18
16-13
20-15
tournament
…
attn
career_w/l
AustralianOpen
…
18
22-18
RolandGarros
…
14
11-14
Wimbledon
…
18
13-18
USOpen
…
13
16-13
IndianWells
…
15
20-15
TableDecomposition(ProgressivePrompting)
Focusoncolumnselection.
Incorporatebothcolumnandrowselection.
Applyadditionaloperations(e.g.,aggregationfunctionsandtext
operations).
Setthegroundworkfor
understandinghowtofetch
specificdatafromadatabase.
Extractparticularcolumnsandfilteringrowsbasedonspecifiedcriteria,enhancingprecisionindatagathering.
Aggregationfunctionsempowerdatasummarization.
Textoperationsfacilitatethe
manipulationandtransformationofstringdata.
42
reasoners.ICLR2024.
Kongetal.OpenTab:Advancinglargelanguagemodelsasopen-domaintable
TAText-to-SQL
43
☞DataFormat:TableEncoding
text
(serialized)
spreadsheet
markup
key-value
program
image
embedded
easyhard
☞DataFormat:TableReconstruction
Zhaoetal.Largelanguagemodelsarecomplextableparsers.EMNLP2023.
TATableQA
44
☞DataVolume:InstancePrompting
name
city
addr
phone
Type
Langer's
?
704S.AlvaradoSt.
213-483-8050
delis
Valetino
?
3115PicoBlvd.
310-829-4313
Italian
CafeBizou
?
14016VenturaBlvd.
818/788-3536
French
Youareadatabaseengineer.
Youarerequestedtoinferthevalueofthe"city"
attributebasedonthe
valuesofotherattributes.
[name:"langer's",addr:
"704s.alvaradost.",phone:"213-483-8050",type:
"delis"]
Task
Description
Data
Instance
TaskType
DataImputation
TaskType
DataImputation
TaskType
DataImputation
prompt
prompt
Youareadatabaseengineer.
Youarerequestedtoinferthevalueofthe"city"
attributebasedonthe
valuesofotherattributes.
[name:"cafebizou",addr:"14016venturablvd.",
phone:"818/788-3536",
Task
Description
Data
Instance
type:"french"]prompt
Youareadatabaseengineer.
Youarerequestedtoinferthevalueofthe"city"
attributebasedonthe
valuesofotherattributes.
[name:"valentino",addr:"3115picoblvd.",phone:"310-829-4313",type:
"italian"]
Task
Description
Data
Instance
Zhangetal.Large
languagemodelsasdata
TablePreprocessing
preprocessors.TaDA2024.
standardprompting
#K-shotin-conte×te×empars
Q:{question}
A:{answer}
Q:{question}
A:{answer}
#onesampetoinference
Q:A1ihad$21.Lei1agavehimha1fofher$1.HowmuchdoesA1ihavenow?
#Response
A:Lei1agave1/2=5toA1i.A1inowhas $21+$5=$71.Theansweris71.
Batchprompting
#K-shotin-conte×te×emparsinK/bbatches
Q[1]:{question}Q[2]:{question}A[1]:{answer}
A[2]:{answer}
#bsampesinabatchtoinference
Q[1]:A1ihad$21.Lei1agavehimha1fofher
$1.HowmuchdoesA1ihavenow?
Q[2]:Arobetakes2bo1tsofb1uefiberandha1fthatwhitefiber.Howmanybo1ts?
b(=2)samplesinonebatch
#Responsestoabatch
A[1]:Lei1agave1/2=5toA1i.A1inowhas
$21+$5=$71.Theansweris71.
A[2]:Ittakes2/2=1bo1tofwhitefiber.The tota1amountis2+1=3.Theansweris3.
source:Chengetal.Batchprompting:EfficientinferencewithlargelanguagemodelAPIs.EMNLP2023.
☞DataVolume:BatchPrompting
name
city
addr
phone
Type
Langer's
?
704S.AlvaradoSt.
213-483-8050
delis
Valetino
?
3115PicoBlvd.
310-829-4313
Italian
CafeBizou
?
14016VenturaBlvd.
818/788-3536
French
TaskType
DataImputation
Task
Description
Youareadatabaseengineer.
Youarerequestedtoinferthevalueofthe"city"attributebasedonthevaluesofotherattributes.
Data
Instance
[name:"langer's",addr:"704s.alvaradost.",phone:"213-483-8050",type:"delis"]
[name:"valentino",addr:"3115picoblvd.",phone:"310-829-4313",type:"italian"]
[name:"cafebizou",addr:"14016venturablvd.",phone:"818/788-3536",type:"french"]prompt
Table46
Preprocessing
Zhangetal.Largelanguagemodelsasdatapreprocessors.TaDA2024.
☞DataVolume:
PrefixCaching(+InstancePrompting)
Youarerequestedtoinferthevalueofthe"city"
attributebasedonthe
valuesofotherattributes.
Youarerequestedtoinferthevalueofthe"city"
attributebasedonthe
valuesofotherattributes.
Youareadatabase
engineer.
[name:"cafebizou",addr:
"14016venturablvd.",
phone:"818/788-35o",mpt
"
Youarerequestedtoinferthevalueofthe"city"
attributebasedonthe
valuesofotherattributes.
Youareadatabase
engineer.
[name:"langer's",addr:
"704s.alvaradost.",phone:"213-483-8050",typ:rompt
Task
Description
Data
Instance
"delis"]
Task
Description
Data
Instance
type:french"]
•UseAutomaticPrefixCaching(APC)inthevLLMlibrary.
Task
Description
Data
Instance
Youareadatabase
engineer.
[name:"valentino",addr:
"3115picoblvd.",phone:
"310-829-4313",typ:rompt
"
italian"]
KVcache:
cachekeyvectorstospeedupQKV
attentionin
Transformers
Shareprefix
ShareKV
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2022火电厂铁路专用线安全管理标准
- 第十六章 区域发展 第2讲 产业转型地区的结构优化高考地理第一轮复习课件
- (高清版)DB11∕T 2385-2024 外保温复合装饰线应用技术规程
- 《信号调制解调》课件
- 2025至2031年中国抽油机专用皮带转离合器行业投资前景及策略咨询研究报告
- 《频度副词讲解》课件
- 2025至2031年中国TPE密封条行业投资前景及策略咨询研究报告
- 《母亲节主题班会》课件
- 医院药学工作转型课件
- 辐射环境监测人员持证上岗考核习题集复习测试有答案
- 走新型城镇化道路-实现湘潭城乡一体化发展
- 2025年春季学期各周国旗下讲话安排表+2024-2025学年度第二学期主题班会安排表
- 2025-2030年中国煤制油行业市场运行状况与前景趋势分析报告新版
- 实验室生物安全培训
- 《幼儿教育政策与法规》教案-单元1 幼儿教育政策与法规
- 【语文】第23课《“蛟龙”探海》课件 2024-2025学年统编版语文七年级下册
- 北邮工程数学试卷
- 2024年贵州云岩区总工会招聘工会社会工作者考试真题
- 2024版冷水机组安装合同
- 药品专业知识培训考试试题5
- GB/T 21369-2024火力发电企业能源计量器具配备和管理要求
评论
0/150
提交评论