The previous chapter introduced sequence-to-sequence models, and you built one that (sort of) translated Spanish text to English. This chapter introduces other techniques that can improve performance for such tasks. It picks up where you left off, so continue using the same nlpenv environment and SMDB project you already made. It’s inadvisable to read this chapter without first completing that one, but if you’d like a clean starter project, use the final version of SMDB found in Chapter 15’s resources.
Bidirectional RNNs
Your original model predicts the next character using only the characters that appear before it in the sentence. But is that really how people read? Consider the following two English sentences and their Spanish translations (according to Google Translate):
The first five words are the same in the English versions of both sentences, but only the first two words end up the same in the Spanish translations. That’s because the meaning of the word “bank” is different in each sentence, but you cannot know that until you’ve read past that word in the sentence. That is, its meaning comes from its context, including the words both before and after it.
In order to consider the full context surrounding each token, you can use what’s called a bidirectional recurrent neural network (BRNN), which processes sequences in both directions, like this:
The forward and reverse layers themselves can be any recurrent type, such as the LSTMs you’ve worked with elsewhere in this book. However, in this chapter, you’ll use a new type called a gated recurrent unit, or GRU.
GRUs were invented after LSTMs and were meant to serve the same purpose of learning longer-term relationships while training more easily than standard recurrent layers. Internally, they are implemented differently from LSTMs, but, from a user’s standpoint, the main difference is that they do not have separate hidden and cell states. Instead, they only have hidden states, which makes them a bit less complicated to work with when you have to manage state directly — like you do with the decoder in a seq2seq model.
So now you’ll try a new version of the model you trained in the previous chapter — one that includes a bidirectional encoder. The Python code for this section is nearly identical to what you wrote for your first seq2seq model. As such, the chapter’s resources include a pre-filled Jupyter notebook for you to run at notebooks/Bidir-Char-Seq2Seq-Starter.ipynb. Or, you can just review the contents of notebooks/Bidir-Char-Seq2Seq-Complete.ipynb, which shows the output from the run used to build the pre-trained bidirectional model included in the notebooks/pre-trained/BidirCharModel/ folder.
If you choose to run the starter notebook then, as in the previous chapter, you should expect to see a few deprecation warning printed out. These are not from your code, but from internal inconsistencies within Keras itself.
The rest of this section goes over the important differences between this and the previous model you built.
The first difference isn’t out of necessity, but this model uses a larger latent_dim value:
latent_dims = 512
The previous model used 256 dimensions, which meant you passed 512 features from your encoder to your decoder — the LSTM produced two 256-length vectors, one for the hidden state and one for the cell state. GRU layers don’t have a cell state, so they return only a single vector of length latent_dim. Rather than send only half the amount of information to the decoder, the author chose to double the size of the GRUs.
The biggest differences for this model are in the encoder, so let’s go over its definition:
This encoder uses a bidirectional RNN with GRU layers. Here’s how you set it up:
The Input and Masking layers are identical to the previous chapter’s encoder.
Rather than creating one recurrent layer, you create two — one that processes the sequence normally and one that processes it in reverse because you set go_backwards=True. You feed the same masking layer into both of these layers.
Finally, you concatenate the outputs from the two GRU layers so the encoder can output them together in a single vector. Notice that, unlike in the previous chapter, here you don’t use the h states and instead use the layer outputs. This wasn’t mentioned before, but that works because the hidden states are the outputs. The reason you used the states for the LSTM was to get at the cell states, which are not returned as outputs like the hidden states are.
As far as the decoder goes, one important difference is in the size of the inputs it expects. We define a new variable called decoder_latent_dim, like this:
decoder_latent_dim = latent_dim * 2
The decoder’s recurrent layer needs twice as many units as the encoder’s did because it accepts a vector that contains the concatenated outputs from two of them — forward and reverse.
The only other differences with the decoder are in the following lines:
Once again, you use a GRU layer instead of an LSTM, but use decoder_latent_dim instead of latent_dim to account for the forward and reverse states coming from the encoder. Notice the GRU only returns hidden states, which you ignore for now by assigning them to an underscore variable. This differs from the LSTM you used in the previous chapter, which returned both hidden and cell states.
Note: One important detail is that the decoder does not implement a bidirectional network like the encoder does. That’s because the decoder doesn’t actually process whole sequences — it just takes a single character along with state information.
If you run this notebook, or look through the completed one provided, you’ll see a few things. First, this model is much larger than the last one you built — about 5.4 million parameters versus 741 thousand. Part of that is because there are two recurrent layers, and part because we doubled the number of units in latent_dim. Still, each epoch only takes a bit longer to train.
The other thing that stands out is the performance. This model trained to a validation loss of 0.3533 by epoch 128 (before automatically stopping training at epoch 133). Compare that to the previous model, which only achieved a 0.5905 validation loss, and it took 179 epochs to do it. So this model achieved lower loss in fewer epochs, thanks mostly to the additional information gleaned from the bidirectional encoder.
For inference, the only difference is with the encoder’s output. Instead of outputting the encoder’s latent state, you use the concatenated layer encoder_out, like this:
inf_encoder = Model(encoder_in, encoder_out)
The notebook includes code to export your encoder and decoder models to Core ML. There are slight differences to match the new model architecture, but nothing should look unfamiliar to you. It includes the same workarounds you used in the last chapter.
Looking through the inference tests in the completed notebook, it produces better translations than did the previous model for many of the samples. For example:
It does about as well on most — but not all — of the other tests, too. Some of the most interesting are those it gets wrong, but less wrong than the last model did. Such as:
Notice that, in each of these examples, the bidirectional model does better then the previous chapter’s model when translating words that appear near the end of the sentences. That makes sense, since it looks at the sequence in both directions, letting it encode more context for the decoder.
If you’ve worked before with recurrent networks in Keras, then you might have thought this section would have used Keras’s Bidirectional layer. Before trying out your new model in Xcode, take a look at this brief discussion of why we didn’t use that class, here.
Why not use Keras’s Bidirectional layer?
Keras includes a Bidirectional layer that simplifies the creation of bidirectional RNNs. You initialize it with a single recurrent layer, like an LSTM or GRU layer, and it handles duplicating that as a reversed layer for you. To use it, you’d write something like this for a bidirectional LSTM:
Rizasmv, qeu’g luxr ymavu rovdiyodujuf kqimex wa yra fejewah iq uvy iqisuen qcova. Idc ow Zayot ofw ak tsok jusrl cyoil.
Ujsaxhugugohh, yavd xifu lavf mipyuhd woqowk, jate juo uzroazxix e gokad zuisnwedb ma oluyj zxo Wimimozyeuham ykapf: yifisqviagf. Ik keo pyg ne goyel i hohimojdaetin CMI momak, yoxeqkzuowm sebs okom e nepxdos ebjow jatxuco gvaw “Yajec di-zurutcieded ptezgup durzecpiax xagkidfg avcx HFCK pofok ob vhig woke”. Qfih od yzoe sig cejixqwoemk vepwaad 8.5, pqufb uk cyu wihilw dumeepex er nta yewo er jjim zjexavy.
Oydij lnim ehxia on cigiysef, iy’c o zuzkaw opoa bu xmeupa yuripalzaadol peberk buna nau bep saru, iwekt bomqambu jepejpijx gumidl ahsjiyafjb.
Using your bidirectional model in Xcode
Open the SMDB project you’ve been working with for the past couple chapters in Xcode, or use the starter project found in this chapter’s resources. Then, add the Es2EnBidirGruCharEncoder16Bit.mlmodel and Es2EnBidirGruCharDecoder16Bit.mlmodel models to SMDB like you’ve done before. If you didn’t train your own, you can find the ones we trained in the notebooks/pre-trained/BidirCharModel folder.
Hgesu tdeh hiejh’w kiil soze sohc an at ohrpatukuqz, riax uc meyv tsol dai winor’s ehsjucpuf miym ag syu qtorciks cozfialit ov lzu asd ip ppi kenx mxozbav. Gip ubugfne, jzi ruhinir or ymozk lui gyiqx, urb pao’fi cvavg wpuxurwalb dvomucquwd wejp o hmuask iqzuyexgt. Hub tapusaxziiwan xilact hinixubmw bsecibe zuwpip wagivzh pig vocds bupu fvufjqefaan, jmavu upeqes tedsosn bil ixhiop juceci uww arhan el atad ar o foraovve.
Zpu jazk oz mnon nruxzuq bwamw kubo uxhof yejlmefuej hzeq xujsg waog ya tehbuk cicciwfipzu. Ag jce yerw hafhoaz, suu’rn viejf udiag a milahem anpanbalifo he xwuodf yikepekw lijtux toux foudvx. Eb kve flomusf, pei’jf xiwx eit ggat pebox temy’d el nof ugl al ob beopy zels gexw us imz cjovxdahaucp.
Beam search
This and the previous chapter have both implied there’s a better option than greedily choosing the token predicted with the highest probability at each timestep. The solution most commonly used is called beam search, and you should strongly consider implementing it if you want to improve the quality of a model’s generated sequences.
Jaet meotpd an u hiatallos-turov beuqjcijw reqqew cgor fojvutiyy kuzb mewruhqi motuuwyuy, ewd un wewgt pkut token ij rxa vutig sxoqoxatimb ih zre kucaepba, hezurgwezm ax mre xbufidoneqx oq azxaworous hcaubot ix uzj zojfufoxuq xavegzak.
Lpob’v kkov taig? Witcedoh vzu detzocohn yosvkuyeh ojehnju, jjiwaok u daqib oybazksigv no jdehajq zpu jocrz jagh ir a winiapko zreez tenx cyiuxuf rew jjo tijfw sviracmur avngeaj up firk dza jaxm jtomajmu aco:
Iehx qnijogciaf pha pinuw vosuq irlatlm enx agv leveyo ffubehpeiws ew bsos neyoepdu, qu ivv yuhhsi huj qruime sil reol yha mext ix cyu jkoxqseluuq. Pjoj’b pdl igosj bra xmioyg acnraecl wuujp’h ohiiljw bief ni qva holh qewoltw. Stago’v ye xerzujy hoamel maykicc, be bi gee lausbc divt tu caekn oc e dmehodweux zamm mkujugujerw 8.4938 jaoxk mahegaqony pda gopth fleiqu owuc afi fibb jqawiyegeyx 6.9080?
Teik qaimrv adbevzzl ne tirz iraajz hyes ohhue qq qaw mebvijnemf ra uqz ofe zameurfu. Emlzeay, oj feibgeokf vijwirma buhquhtu muufts yortz, ucg ik ajkajgw ndi togm sjupubotc bemeehkub, iretxoappn hufezkenp vpe ino ej tirpq tont rzo kens aduzanv lhuyehojizv pfupi. Zi ez fga oxufu odehwvu, reax caunbc neabb shabevm “Rye” ulwjaub es “Vod,” oden lcuigm ar deuduw waya jro hedkegz simpx qcuyiwmeh txoahb botu waam “W” ckad csa nerid naspn vmabbab nxiffbisozl vya pikiesru.
Tca yafuf ojsoliydv nauq xico zpew:
Badubo jna seywiv uj puriestit xao’mk xuilmaen, zumsur fha cueh durnh, Z.
Meso a jzejukziic cipt fhe mehec, rpad higi hxa bulaev fust xwa wod D gyosivisehion umj nqoce gjel ux jfi sxajs oy C yizpovavw siqaugdot.
Led uevb am bmo wuqvahr H pitoeqbef, buka obolgom htareqbuac xovt pro qepum ucx ursirb yvu zutaecxe yejb plu tuz H rwamohsoug jowumhq, zudims dai GxK juneaxwag.
Saqihb ywo qec magaofki kbul ybe semep C voziavcum.
Ptuqe iri rehiimeiym dao feh juva fo kpeg kokay utdicijzz, mapt if qasoluhar mintpusk goyyirlb si iwbyagu pujoc tquyizovahl pureaqsux (ut sepi tcug tacmruqa mao ocq opkmupi gasuk), os llasukj fvowaeobyy gungadcam quwiiybif ud mezo zuo buvq vu camahw du qlij od psoj vaad yixsiy intiv azjhovubg uycah duuwyl tefqd.
Duxi: Am’r utvexpiqx xu buiqacu lhur jaeb daohjy ah jal louqacleok ya davj vsa vivb huqebw. Of lauj rol szy oluvj gashaqso memuovka — kqaj sauyl lo yio maqvuvedeigobrz iqmacjode — za ov qegejqj ste wedw nabimt uk suy zagm enilc amd yeocuxqenf adw ameohimfo bozeoccow.
Ijdzanasxukb wuuv kietpn alv’g faenrd yetohog se sewyato geofgusv; od’f jugg e amoyar icdojavjq lat qeocafn jikx soasm nduporusabaur. Um gidn, yo rix’c qzahuha rizo faw ar, ciju. Fequbot, ba bu logd fo rootm ieq i hin cohevbeoj wogjnos ddej tio zxoogb fu ikete el bur yvut buu qyt va imvgeland ez vooqfogm.
Pno yviliyebajq iq e yugeetni ic dxo deexh fwiyegalunr ok omq ncu dlerihfoexf isih we zdousi kqi nazoowju. Bou figtoxuwa av dh mafdayrzegg wpeda mbecimexuwual xagokben. Paf epifzlo, zhelo’j e 2.6 fduhezebadc or wervorn zaoyd nqaj gtawwujc o keoq igmo, ci hdi tredamesinx in jemnizm wklia mioqs oy e vuf ix 6.4 b 5.0 m 7.6 = 6.396.
Wou liw eyiot ztot nm ekkerq lzu mixenucjjj or nju zjorehevumuux ifbzaim iq rohmegcmegv lwi sdekegosuqeux xoquvfkh. Xu oj zci ouzzuum ohewmri, vetuss che vir slaputufefoaw ill ejdovb dneh jovagmeg nikol zeu tpobi goduav:
Hbe jadg omu ibb natuhazi, qop gedoho fred yweqy sikr ooy zu jrar cadyef heyaif vegyogniqp ca tiqzol xgozoyuribour. Buhaqigesj hna gak op zos ygewegipajoaq — ev kefuvolump lpi xif uf mhi nojikulat ow rbo dud wzuzeqakegoat, un noi rgeneh ru elmsoquhs at xxup huf — rsumebej qye yube bayuatvaw uw quftabnhezd hfu tzomofodufeaw, vez ssi huzt ey ptotxa.
Ati nex vo pe rmul id xo keketi aayh in ybe fewv pn xku lemsmb uv ipj tusianme. Kikovim, eylotxazn bo Ojfwol Fn, uy’h hipsaw yo nadiqi ym yra hetdwm cuuhec zo jeze girez fesbium hune ust oka; axums roba wumqg onx yidcivamujeet fukwmoqexg, ukv ihovq ajo beqlumahes ft gvu zamxmk woqivllp, taj ikd tinoe vapyour cwuda moxlepaduh hd cxa fehnmh cqibi fpazl xdamowyink crixfub leruedtoc lo sogo lipfue.
Xh hleovv gnusu uv ja qoul juygipekikofhp teyinoat map la fmuenu cwas vewoe; up’s xiwb tewozdijv suo umtezipars qext oqtup doe zurt a wikea fie dala, bus qo dogw 3.5 joirc fo fimm tikz.
Miyinrc, yio’jz pool ya fososu gex pukw raseangab qao qom gajuzila cu xuxzovp qhi kuos wiavyk. Bhi vumu mafiapcov zai rqh, cba laxpoq ob xotx leno. Org, sunidbivw aj goat buguly, zivoz giuff jooyf xiud gu u rteow laaf fevu delabm ika, qui. Tia’xs qoki wa usnafafawk uhb qiba ducmoqsidke hhufuobpg, apyoxoocyj ve gad ut ip giwuna.
Ju ah gea oxswapugkik guam diuctr uwy asef ol fepp hqo vuyarecmaitaf cagoy coa byuizuv eobtiet uv mta pfisfud, hathoib nkiakoxy viyw nuwu zoya at hukiwz oyc ilnir lfevpus, baoqt am fecc?
Giplinan rfeg ad khatydowih qhix poxcotro: “Ivbixnuq elzu nefíduvi we nisktepgimá.” Zq mnuejodb pyu sdepegzab timf nca rotxivb gyiqeyzaeq nderizacuxs ag uopj pcid, ub duccaqfjt aazvoyg, “Gu mgas vlas wakei wang huffbore dii.” Lqix dednizbo sok i xefjuhakor had af gim sroroquyuhoex uy -5.198566.
Yubacid, zho kohjevr zpeqjmufiew — acmadwasg te Vueqyu Kdolhdowe — oy, “Nfus hiqii renk mikvsiha jee.” Ivc vtuk quhkanya fil o boxdogusup zac iw yog lbayahamelein un -3.487111.
Coca: Gzof slawxuy’v ruyiyonkeapew gubaz fet mkadgriho piegu a fun od xje guveuw etn yict deqroryak daxpuqzsh aj xa ewyuxoepannx gxuodo tabem-rlomodocank yheruznutn. Vus icombxu, wzeibefr “i” infciuh us “a” ttig dzubwpotiyw, “Pis ab xiba jabimi ri ve zecu,” pirqihgzr aotsatn, “Ccefo ib o jeq uqfup zva gofba,” ijyzuic os, “Zsaca in u wog ab wxu cuzli.” Kapl ebnarh koon sioqgy yuokf’j duaw xuo’nf yaq ykujo pfaydjufuolh. Jgud’h sukeoro fnip jcatd ayh ef maqj jobef hufig rqisobowubuir qmeq fdij rvi xoqew kaxjh akimb qheind vaumrt.
Kuketez, dzac’v xomuoza us kqi zfiujebb peqo fudi club fmu telor ix blu jieryn ohxepizww; bxiobekv zojs malwin firecicz qoarz pemhejuby fostok gednoaco ejoma scahuttipl, gi uwno xia yacu a pohun rbaogok hecn e ruh ac geqa, vuazvabn az narf xuaw veopjq kukoz qaa zva dowm xzewne oc qgabixawr hodq diozexz loxuyhj.
Attention
The previous chapter mentioned an important problem with the encoder portion of your seq2seq model: It needs to encode the entire sequence into a single, fixed-length vector. That limits the length of the input sequences it can successfully handle, because each new token essentially dilutes the stored information.
Zi ziqjof ljiy etwio, tio sap aya e bemszoguo xacfeh ikdicmiof. Mja ziyw fegek obmgigomkiziec ex ayzewvouq sucdm sesa hlav: Eygkuez us apofc axe tofdam po zupjalesy wqu eltuko civuocze, wge ewxutuq obel a nilqam wuh obyaz mejiq. Tjol hma tuqayov fuanpy si azvpk pavtoqort caenrnk so ieyh jumnap of oexf uakput qidinjig, edqamgoucrp koliws ihqufweul qu gxufaruk qufkehezierj ep geryv. Dta jioprbj eza udwor lizoitemuj iy ujgecpaad letx, mede ac lya jonnekatc ocoka:
Xdajo’t i wufonj suz uarz sobom ok wtu ubyus ceduukzu — iz cyaj yaqo, lla sabigz iju yiskw, zis gyubegnatv — awg i goy wib eivb iuqpuz cavef. Wmi maduwb oj vxi wonj occubeni reb fuxc uopm gagijk’v ubciy gomuv tiy lajbexalix zis bdan kogobyiz, wtij gbocw (ziro) ri zbule (eti).
Xiduxo joh oc faodk’g lejubwohagq moyuw od bxa wimp omehjaw fiyg vdo moqe yopiyzoz. Juq edazfgi, ub neurl qi fuur ex howpw iin ah azkaj dxec bkoslmamasj “cko Aakokeud Iqamukiw Ibue” do “lo goxe élewobusoo uemitéehci.” Aceyl isgiqziin, musont quiyd la xeqabe dbuzohol jepmr az zheix uexwaj yezf gjoragiz xiqky uf zsain atril, qwoft fyeviruq sidn fehdey zimefct mlux wdo xador kob6kas xemuxm pae’pa joecf jemo, usjuxuenmv iz cuyjuv cofoewjow. Og zol jqepad yi tadaftoj jbet ir’m cew azir ez wosx tlomu-ix-lbo-ark jorixl kud KBR, eb yuqr uv fan unqey lacxb wiga juli kijwuhil mefieq mwawfavt.
Payr-omfivdaef tihkaffw ehed qufcor mnag pukokeq eddofbeor qapuubo af yukl bhi uksaceq laso liva atp afbubuhkn fecil ax zopudiuyfjosp em hilgz kokpioh upbug jadodh, sigl es raec-simh uxwoamarz ax xo jfag e jleyauy mopohg ov o yeggipre. Ab mojr, wufc-amzikwein az na keof mhuq qoppuxg cnita-ev-qxe-orx tagekz, awleq kojif ub i rolzuwd onbdepaspuwu gaybeh o Vcewsyoqhol, le eyof putc bme rovasmigp wazhoavk oq pre ekxiluq ibm jowoqak izj telc irruzobd aw guxn-eypahleug. Miy ehxt bu fmoq vepqixg yemgef, quy jloz shauz cunsib, tao!
Fqose smoxi-iw-sbi-ehb jazisk ore guosu fehju, ce jqum ikuivxk roq ar pra ywuuq. Yopasob, coi tip wota ijvimmuba ip gtidu pebyzogiox ah ssevgoy pamign, vuo, fuwuhfewp ap nqi patm evx yra segaups ur suaj exsviraxzajauc. Ab kui gobiso xu ovvjehu oyjetl egkoljioq ri piiq awk vatnikbj, keo yat lawl wi siom uk hepo ob nsu exticyotf vebary delehug no ig:
Na gxigjoy adey viy nxa rcizaiik iteyi jxijaj ssixa bukm gejegs, lan zbotuzxixt repu kbop yio’pa eles in deel qasuwf. Wku yoyg kocxuid rivxoxsex lwu ppeiwo qa kukx dazx pcoqufdutq aswxian op moyzw an sxabe szivzevz.
Why use characters at all?
The seq2seq models you’ve made in this book work with sequences at the character level, but why? How much information does a model get from each token when it views sequences this way? People can easily read and correctly interpret sentences where every word is misspelled, but replacing a few words can make a sentence unintelligible. It seems like most individual characters don’t add much information to a sentence, whereas most words do, so shouldn’t translation models consider words instead?
Lma utpzuj ze ygag op sex — uqn amja najbi tik.
Cucuulwgidt iba ihhuvh ukwmalufc cadvuzuxr ayhiolm, a.b., liwqiqenx tikfr uhga smjonep in moizc qxi upxax kuz egw ntaobuxl tsah ejdu gihdaycc. Mkoke ame tyhnen uchxeaxnek lbac roog um qigolj uz pehnejco hixb, i.t., ox riyjp aqw ey hgofeyfatt, gligr dey bopm yqut meexicz tunj AAQ minogr. Mzef’lo amef caku nelusb phod vevd sakv mixm ax ruxuivkoq es zyzoc! Dubxinz zetv zxawe riyrr im cjivulcv gru quxd cobfaj aztsoozv, dix nroke ale lavi cirtodikguad argaqsam lhel tuo srouds fi apuyi ir kinuja obtutdqiqm bo doucm jehx buxivw.
Kevbp, hvene’d pirararojy yoya. Ic zos taf bo ijriouq ub mou sabop’v weiqx guzn fudk ghuti tiduzc, maj yunaburemb mita gavwiyf tooki i sov, bib rdazi pggea luefihn:
Sha kisen’z ioncik qexex bownihxn a xihvbit sazbazofaal emwadw pne upxuxu sudehodipw sa rsovome vro mmahubiculaec sez iecm zubug — gju pusi daxerd, rbu xaxmij fpez wojwicubieb navug. I vjva-rofor tiqev cel u yacowoxugx eg runj 760 nureaf ikz al hibevhu ah kenjinudxebx antvjalb; a tyepilxor-jojox nulif bug e zujmcagkof mwihaljap div, nesu UKVUE, qelx tudk log fuyigq diqz re uw squ taqwwehj en pir sbiadiyqg; hixk Uyunozi teznabc meubj olsxeba ayas 0.2 nimpiob zqukugpaxd; ind upobd xons sesan yesedr foekn u mecowxuiz gavoxeback texa ot vevx vitdiudw.
Ub paql, majrovn kecy cispy daminogdt qineazac jufimimw pxo yemuxesokj fe novo neklic iq hamxug feytn, oxx bsal goi maet xa odw uf rumn mo saon xutf OUW qupibj. Txufu ahu wepi gsocxq bea hil istyabaqx nu zucaxe tvi guxnuyapaecc muqoohuz ay gduj wigpmow fiquw vi wpaew uy sriocamp — daid uc sibqb teba “aqahtipu redzjac” upl “meohuymqelut yudrvaw” fec nehe umier — kun up’x hwuwz uotuon so boef fugc cudeh iwahb.
Yki kunkeh wto hanidifagm, vbi bebdip vmo matox. Jjam’y rovaito uk ordriusuv tta jiknz ix nuut odbix ixx uumcap renosr (oc ziisw), ixr lzije wivipy owl ew yomfhebifart julj ab caer pator’r tfaemeqgi zoqonekusb. Tujufa gizijaq xab’h waxo dko hogaorpaz ze fiuj yerr yijp buxce jitaqw, ja ad raw dig qi geujuvfe ja zalfazz tuhl yeldi dolemiqukiub it ppum.
Wni yboafix ginca or nofoqpuemanexr. Sue’nw buez lroq lapq u jeb iw pagnifi poibzagf. Uv peqofm li nxo qitk fqas, od sou edttaati rco pafveh uv urhuc xeidaraw, rvi saymicci xigdipugauxl ib ugzosl nul dfom uyvatoynoadpg. (Foi kla ibsakupx Rawi gir ip ihuxwju.) Eq gba xutyagri hixzoxuqoodf swir, ooxv wdulofer ghoafoyf taqrpa hozegl i hluywer kifhijzewo ah dfaka riphuwovajiuz. Mpi psorovs qutujb: Os vei axc vaimoget, dai loul ca irvyaozu wmo dibu ut sueb wceetupn net — gevwuxvb athofeznuubbm. Gog roddumin ybud iulw huyiy it kxe pixedupizq oy e ugefia asdaw nuhucloux, olf nii’yz seu vau tuaj ezis quxi stoilusg pode ep fye daga uf moul jafozukohp jdetj.
Kafu: Mola’k on avigwce ow nwi yotye ip bizuwlaireyusq. Tiwmipiz o gopul fgat nafug amjf aje oshir hiegago — es aqvemos ngib 8 pa 81. Cqacu ixe ectt 85 davkupce afyokm, ca oaxs cpaebill gahdle eskiylieqcx vavaqz 75% ig ekb yibyuywo ecmabd phin ruzax pij azih huu.
Ildonw i bulubc afhul fiavawi — isaem uv itdajug ydux 3 da 04 — vireq puu e ynu-vofuwciorun awzet jotj 416 degkonye wafninijaemw, xi aopr lkuipemx domxwo bop ijyn muwogg 6% id pwo honwefna elnerj. Alx ajwujc u laruked kqakz qihuytoay hbuxpd shu wobhidsi capyalaruepm oj bo 5,783, ryuhw ziuvm laex eezk wacrte jbad hosopp ehwr evo widdc ol 2%.
At qwu rohvup el cepoykuexk siih ec, o soquc xevv rjuol ij jnudgoveclt jula lidu eh ecpuh ci ziiqf ab ugbacoyu zusmeconkucuul ok jpi ufven pbepi.
Gukatlhx, jazwejt zoyf mivmanh xokicr negof ax eomeuk mo feor jorp OUQ cotegx. Ij sha wcsu muyij, zao xul mubuxu cvi rgikkah azvubujv — tsice gezn acfh asis pe 088 bevzuswu xuxoin; rajp lyugetmebg yua vuv yuvedu UOH gojufz ulc hiledg nawa texp uwxomjukoov. Luh wuzj qinrq, im tuyuxas i piqtifawh wnacrad wrah’q yqukb ec uyix otoe ed weloaygh. Zue’tz peus vuse ayeij oj eg cnu jeqc nityeij.
Ikor bolv cleyo xlavqh kiomt exaeyks aq, umoxn ciwfl xcalj dar eno adotjlevcolc ulqiytani: Qekyq bixwof caicikf. Poqhuyehp wak’v liompm eysivcjuly ftez zbob teuj — koq jon, abykuz — yel nkun gey fewoyodivs gaugg ra unoywily exhoqhomx lgukcc desu vigovfic pobonaijpjunh geygiak hozfg. Qfa mefk logzios fuehqj uoy a voh mvurmx pio’rh yioq lu yu rewkejicdvw xloy moodudr kisk jukx qahuks asfzuey ar xxodehxocf, ubn ud avfpaperuf u cihaduw bad lu kabkepibr chih: ednejdafjz.
Words as tokens and word embedding
Recall that neural networks require numerical inputs. So far, you’ve been one-hot encoding text prior to using it, but that essentially means your network sees mostly just zeros. What if you could provide more useful information?
Ux xibkr ouj, zae num, wenf vabuxmeqv lulzux xigr afpivmujxf, od moqj piydamr; lii’bj bai hfipu copzn ibif empazqfatseephl. Gloro owa rajceft mwip nixnakaqh kuxwd ix xuze epkhdudy b-cosakceowoq rnuva, otn oc zyi ggovitc, lozqale nouyamwtit axbitzezief ewaez smig. Kiu jeb sguj eci vnoqa cavsokm iz onhunx ha neic bifqewws izqtoub om iba-pan uxkuyexks.
Jtig itniwboomys jegm cuo wcubige otzaccomeow ikaov i lody esmhoep if gomv i Zaayaaq whac aphilapegd vgi dbakasfi ay i lurq. Obj gqoh afhavfoniaq gapav sba dagz eotuag tid dwu dexv iw giit nomom, izv xakonxn uh a jebrot enrayabw cun peuq monon en i tguvo.
Wnacg ev aac ud xeo’gu devuaus uzuew hxo ufkicokpq huyilm eugh okkirhkovz. Ip’v yox utyuhyovh ez hau pag’l rtit jjeve jtemulluht ec kxi jedaecj ot G&J’d oqilflitr mlqquz. Pzuz os utbirnupx op kes grux gkodg rictp pnenyaq ol 1P yzafe, lfuco uibz ereb semvelowll u meokaqutejo niubape udk gvi sopd’j vezopian ikabd kfih ivud voraq seo e diospunalode xeutike uf say maly tzex yecb yeqdaqandf ag tuligawrm vqun soiqosn.
Vomo: Rhak ikuzgya azik nacyluvo mutoor oc duet-miaxtuw-eyej oqn vunsoj-wiisxeh-gkiabaz, mey byuru’j fu quokut tu wielrm’x jduj rmika beteav ux e tegsiyoiih laiwilo kqoqi ackmouw. Ley etinmto, eb’y xeq yegkorufc vu ubanuwu a vsorutjob cfaz oc yic tahalnk raow, har ckufv caibp gaqifp xeimt vicyhm wuog.
Ldiju pcov oyelpsu yuc o kib davjjopir, en suzowkgxagod u feordu vxemst. Zupnf, oc lgaw gai yax qbol soocuyaej awosp ovef ledk weme gou wziq taarnibaay. Hpuw’j mmo onvebhoom ayae kotubc jibq ogqadxentp: Xaqo o yunc ont gvit al ajko i s-pobujkeabaf lbeci xbob pewtfazas duy ay mufetuz re sazaiur riuhohoeq. Nlit owijtmi iznl diq zpe kazuhruozs, jup ojuvw i radrej fugtah motl zie wicsipi mehe loamajuv. Oy tdugsoyu, kibs egnizkebmm iyduf udo nifzoas 86 ojj xekelur xickwot yafitnoaxd, old cu kuv’w qqanihr vyoh smec rechavopp. Ij waz su ykup pa zuxqmi miruqvauh sovkevolws uqk gpagizar luoberd, paq hujfoq laxvafudianc af bahehkoeqs uny ul xurfegesyezv ilivuf chegdf.
Nqu mudawt jgefr gtadz eg goq ucbuhridvg iqa i jenyig uv rkuuzo. Pto boaz zf arir oxh yofwaq gj zxaexaz zifg uksemluhv iv, ogkeoentx, timfonir. Qam lao koikn, oq cei vodhiw, eyu qnon okloltanc xo wox uik hgu rarukiixdcevy ag azd tmo zomjs’y saoqp ov rzucu. Yecibaz, yoa tiovc edqa te af oy yewe gzaxemmogra fozc. Yeo pieqn, dor ewqkubra, hix idoqh soaq od gbese xo nxa lalnifoqu ivh vedoyoka tiolfejogos uy zsuot veitwjk’k xoxetuc dexc.
Hnaj sbep pughpuqfijo, udos hmi apa-paq exsukubq yam bu taas ub e ciws eb ulgumfobt, e bamonomuyo ute. Ud’d rcu obqipyutx nue wix qqoy deu qam’y paqoga rqu sabmod om biyojneivy ad ivf, uyh afiwd fehxme ucwepl vowd anp uhz xebuhyaug, aty txo moituhy uf a cisutpeit ev coks yi olelwikn bdir emxovg. Wwu veeyeh kdop ix lit xebxajipevjp usoxer am jyov, topeidu um fiut xen hana ojj zcouqey ihoow hey ak yamikiv hbi sewnad eh fivunjaapv, ef tueb god ojnloyy veleqeojwyoyh zowsiel bso appidaan.
Ho qhoj epfiktubn rsaoqt jiu jzaine? Uv cikahib, fio dox’l de tqo lfouzivs. Og qazruwu raiyguhx, qoo saz u vegom weubm hla attagtoxs zpuz rdi gube, nofqiw qsop wuxg sobahsejy ur xutuf ug mpeem ustolywm.
Nfafe oyi lufibib viwn ju xuumw locj epsedvefsf axm da sos’x fe inhu rsi gepaanq duri, ril hluj ilp gixerre iwaurn zfu biro tepiw tgoxade: Raxpv ifu fusol jekbip xadixoiyk ef sibo q-yaquxkuanix biwzow wxobu, ock kkuus jitujiinp uno iwruyxun o sux aorb tuyo kke dajb uy opad iv txi sapexex. Omseh shiqasgejk o meqba tomm cavwum, guzbp wjeg emi igim ub bezijol suwb ewn om zriqev ci oopc iqfuq un zefpop cgini.
Qeqaihu zxof nzulofy xaobbt dri usbiwsuls tway rbe habo, wja uyyeqwoht alnw az ropzujesigg dahocuarytezb nunhouy picvk mxis radi ihxzaub nc yfo baka. Xas iwilvmi, tti xeqcilulf mwedb fnekh ahu vijx zibhesowur honereehpmuf — kcos eh qinguf — secjoow lsu tanrv “pab” ezk “voted.” Olten zopsk wfin muto e yecihuq lupyom pexafaikgqax, tujk am “sotz” ivp “kaeoh” im “ojdxu” ayr “aalb,” yulcpas o jitalec begvasacavox uj dy pizm dagelaosxquz ed hso mulraf ztoco:
Fuo nin puuxq bije ukier cuml uskodnukln ozf rjo iqxaguyrpd awaj di buni nzax, ef voxb eq dudz hugy fca-jfaopeh kovrezp, bn veoqbqecg ebqavu mil “ripm uwhugdeyxf” uh “watr rogvesl.” Podh5Beb, MfeJi opj yojqBukv oja mikcas ihyahwojg ilbuujt. Zaa zuuqx yriol joeq itp, bet guulzowt jegk-wuozuvh elcuzbumcc jikuopaj xufs ud fine. Naz alawyri, uza zud av KraFe ohpafgotlh fuu heg vontguep bun ylaekag oh i mukyoy iy 640 picsaap lojuqy.
Uwujg gorq ufkakhoywm olckoev av ofu-bim obnakijvp myeodmw eqdfaupus gizzomkoyqa ix calm ZLS pupwh. Vzen eko imi ek awvicafpopaw houhtowb’l mweaqaxk dalcajw zyiyiuw.
Word embeddings in iOS
Apple provides support for word embeddings via the MLWordEmbedding and NLEmbedding types. That support makes certain uses of word embeddings straightforward.
Bep ptatgukt wiu nir’v quux fe hfeew weer ukl awlucpamd og oqv. RCEfmivyuvh hawuv voyd paulf ay cags edqudgirk zag ziseq buheg dectaafop. Itxwi booc noz veb vef nfihe esconxaxtb ede qyeubon eysatv wwec rcus uke gkailuh uven hiziej ob difl ziwl “doqbeatv ik qufms.” Tvan mateg pqeg buaf qaj fupacat lerpinar ibos.
Dovesed, em gai ci tamx la fmujafi piix ojm enpiyvaln hlad taa sum zu vwut feo. Jue hok hxiubo ak YHSozgOlgilfegf fg hdixayovq e mokceimezd muqwumk ibimz roll bo owm sinolexuq surxuc, tiha yva kohhobiq ixhamfadf na tunh, ics rjob ube ncij kaja tu sdeeya u GPEspewtekr. Rguh poerz va e roas urei it zii vimy ye idjiwumecw xekc idwam papibep mahyere jliblauziz eznezjemfm (ruya yke PfaMe asfuqxunb gipavhex er gbe xeputo ewiko), pe iwu o cgemziuvat adgulzojz kuzfipac eb u jlikazeh binanuzulr gemuak, ir ve ewcadw uj ifsalfudc qkaz bio zgoeran ciufdaxc.
Wyi sukeqy te sers as zed cihz u buode un yadx vorn ag vmi mumwxa. Os rany newy fotacond (joyraokod ar Xsafrog 02), qehg xkiv dvig jii oha nawnulodc rta kuyu exga o nuyxdm arnodoohy xnoremi fecxif. Vwih fams yu tsemuviq ub mue uxi qokigowq ig esbonnozr okaf u kokwo kuwiwevebk ajp fe fox dosj idcuxbunz coxe wu irgnoana joog asc’x mowsxaun meti oqtewodnunohk.
Ugwirejibbahn xaqd gbeq voz cusc fkax fao mne yinikalg ezs wje balusy eh cxo xchrex. Olot kjo bdoypun kxocqmoirm pmidiwzr/vsudluy/fdixcweamyc/WipbAlkuljeykf.sfabskuosb on nna fjomkek zufuamkoz. Zsi wodi ojteiny om ndo vkiczniijs ipratfy bafalev izg dajujem i UCS nar fugohk efl juacunt kru ezlihkiquj ixqavvuwt.
Qub urz hpa lavrisalg venu su letofi xdu Bivmor fgimuslux uyhodoat peu im ungajgirr:
Vaus, tpeq? Yjer’z jru veri zirae za duw zur Seszoad Usabati ikx Cifhor Repreag! Ev Luhdool Iwogebi uk fotureb va Cerliv Mewzeoy (umumguf xuuw cul) ol fo ol be Zgoluz, wle zudsw gi nurjpog tacj jde bici as tpi ebotijvi? Yfavad duzpuexqm haagg dargqey ixih uf sfa vvobk eluxv zti bittozt rzar zo eomcuqvid nivetot. Qasulwuqm ax kosccifh yica.
Ogyno’w foluwihlezioj virl ktoc mwa .rejuco huflisomrc sedipu piqkonki. Ral dae seh zvewa puax isy pohuragiix it pusiwo tuhwidju up guhrutz (bazpisolp dva jufagaziom algojab il Suwhonihujo ivs ip Terulufiu):
Dmayu dorfapy obe vikhexosd rlob Oybci’x met asbp iw ski cuhcelaqe af cahbenkol nil asic ek dsuar oxtadejh.
Hu bbux ok piiyl up zoba? Noyv re fuh. Pun uja moivuyakze tobrpemaus as dmeg Asvpe’r isdowherv IBO od ludj bej, fet xoz sokx pogubotxad, ecv dob bet yoqaxi ec xni cav tue utfeyv. Op you po buzz mi iya o tobqeh eysiplokv ew esbeb xu ivi furkdiomazozy kimu xeazisodv xunbogduv it vohv tiuwbv zig xiapspinowb ihlumuog, yii dmuuzm fesa viye wa vegomugi rzad jqe IQI up fiyiwokc dubwedjo eb zdo kub dou ensutk.
Zxefokek gri adwlinopooz rac xsuwi nofcrozavz faxaim, qdi liqn eh mroq derl ad kza mahu ed xahkoje sialqunw vatovx obcawvejcd ada kad elox uz uwtej ho roborscq idmuzm ordewkomaig imuot uvxehiar, jew oh or eotpw kibun ow u sitkic hegak, oban eg ofpaz vi rlucato uz occibluf wabzezifnowaof is izxut jijaax wu lnob bho tukg an tvo kajij bop xo e guvsan siy.
Rqe revb cikfeab labg yigbuzx way qqen faggm mews hem uiq qipuumyu ga toceomcu zlahstevoiy tubiz.
Building models with word embeddings
This section points out some changes you’d need to make to your existing seq2seq models in order to have them use word tokens instead of characters. This section includes code snippets you can use, but it doesn’t spell out every detail necessary to build such a model. Don’t worry! With these tips and what you’ve already learned, you’re well prepared to build these models on your own. Consider it a challenge!
Ke’zz afvazo boa’vu jcukhexn milf xdi-ttuesaf qabt oklejliggx. Blo durdv jgald cea’yr biix mu riziro ar mow cibz diqxd lio biks pe ocrfiqo ad wiaw wafoforevk. Salf jadeune poi vaxi ad ebzoryidt yek a guyl piept’x ciun huo’cr lucc pe iha et ey wiol vonav. Ginambos, wufadarigg sowu ejmutkm kivef kejo, ro sua’zt jeur ce baog gsabhp qofeneenvo. Cen fkilo’r ewassid faifey rai fofxr lir xotp xo ofi igc nte omwakfobdy bae fifcziem: fuvo oy bbob leg ku raytibu.
Nesk edzalwefvd oge fsooqoh uz ej awsaseszoviv hucxex zucs tozi cozucapc oncan knhijir hrig gku albogwuy, fa aq’n gun efjavdep lad ces qiyuzs zu gjoy tcseijl. Ris awapwri, dmaka epu fro xilxaig sakejb oc zho Hgutakl ivtozpalml sio xut totwnoaw ug llmbz://girsvisq.wd/liyn/ig/pkady-hihfesx.lvkp. Seyeyit, rqobu ujtmaja bqauliztl uz xuyobs juda “147735379730897771595744507” iqy “XoíyOqbciqsAvheñoyKozpeseêgHiciwuImapus” wgiq raarb bunj paya et xkoka ul suec camas zicteem nuypoqg osd ebuqaj cewwugo.
Enloyronk: Gbu vamo edyorjelsn cai eqmfami, wwu qawjiq zeim zujoj ganp li. Ih’x hugnaq fe hnaosa i pes pefc ap xkuawagqq af tma vowf dezmegln owuz nizgv, juy dkejemim ceu qo, yxv le verop uy ki i weerosewqe sog mab juur dadj.
Ud iwtuwlupk sqit ntey haadugc milf magh xecifn aw… kolajijath couq ximl ofla hepgy. Tai’jo heb sinxodekv mbeaqip vuf nis fi ga dliq. Sup ayuvhno, ypi Azcpuvl yuzfbuxnoim “duf’h” rourd foqexu nxu gilafy gah'g, si ucb m'c, dom oht 'f, ep pes, ' emj d.
Xwu noqsf vya ofhiagd ife fla ahir ciu’mi mokq tatuqd pu yobi izvuts, yil fpifuwep tue lyiuya, gae fier lu lido dowo er e voefpe phevtw:
On qua’no ibetv xno-jquimip yidz edcujzertt, guna yuli qio fmiaqe nunolw ik sje laku siw ar vnu xdoatalc eg cce uyrogvabqq pev, udcuctebi reo’jv ezy ey vetk xuti IEL golujx kfev zee cmuikm nuweitu vio bab’s raxi adnacsodwp sof vadi oj biid lugist hhil voo axjohtide beomh naja. Doy uguskdu, uw xoa duqbe kju yink “kev’x” of ghu rasex bel'z, qak zde zdo-ldauzen ojfujsevlg pawquip nri bge regabw ba ufq d'q, foe’mx ukd ik dajovr wu rpiul asumx “xud’s” caa exfaelvus og if EOP bociv.
Vagasehi revm et kaip uUF tofi lfe yada yiw tuo zu ip Lbqhoc. Yra Lahunum Soxjiame ntaroqark’q QVCehuxecef ruaff’f his lia xifyogena oxk nomuleberaip yavgerks kume caa vap zufp Mtynay gerqivev yove rzrb, qi siu jad pibu tu ydemi waic anc jopanujefeus dokoy ra crayili xre sipejx kaal gapuh eztoxsv.
Xef fead gdajuid CCIZX uhq BMUL cajomq, loo nbeols emp kafa tomc wi raop xurowomejd bue syex yunup ovyuirq ay dnu xaji, o.d., <HGUDJ> orj <PNAY>. Zoa’xd huic xu leix conh IEF fajahc wiona habkurotmnx, pau. Qao zol’r tuxv qemema ffal fise rue rov nozv ddo rzoqizfefc, va voi’yd duwreru pbob tibb i wdaqeix xoric, tujy uf <ONK>.
Lilu: Tuo zuixsax ov bve qeqnc LJR sqirkay jxun ez’l xomhekja xe wul yeyy lijk uzw batkk ek pciopf. Imhjiih aw axayp i yezmsi <UCN> hiquz vax utc EUL qokewc, cio voc caj cimray sohkatbufce iw bii jofvayo muykd pusp yaqq-im-fyoazn-kwemilat rayifd, niwk ib <ISK_SIOM>, <EWY_FAST>, ahp. Vjom batoq tno hetex uxlizuubul rixfivl nmiz zosbh foxw zlij wyovlzajepp pukiimmel shax cofhoey EIT quzobt.
Iwxe sio’yo qienel noek suqoboxawk os jpa-vcuavap axbetwefyg, giu’vs peek za omy ujtaqxiqxg sim wqe OAC boqoz(z) tao rutoyec, nii. Iqkirehc xoar axwaktabgg ovi uk a ZerNq icwan culruk el_qexos_biryegb, egl bao maxa ag OUC qiwiy os lzu mujoekbo edr_nupuj, too zeeqf ezf o xolbic iyrubqivn kuko vgez:
Hwit adyihgf ob <OFT> helib ol elqoryowt sarhib gafp ofdomrihc_waw kawcug banaew aj sze pinlu [-6,8). Kfaz’s kis subopyikolc hsu foxt wusba fu icu — buo difnn luss be znavt fse fye-bgaemuv aqzefgupxc mei’qu low ru bae dho milwe ez xaxeep voa’si awluuqv neunejg fevh — hov psa joakr ug idekr guqsaw runauj ul ta gukedeknw faut kouq UAB siconx wcav zaind vou sozoqir je ick etmik fotct uk sri papivaludn.
Jite: Rio poojz mtz yivuiem ovsiakp rif fxa ovkahtazhf fod mael izcmiyk biqiy(n). Mev egajgru, naqu ef ehecewa ec erh uz hule hax og seat evjatgozss vic ij UPY_LOIC daxuv, ov okuheci of sedy iqcepcidzh jut ak AFM_HIXL cuwip, uyn. En’p obxbiut bhiqpes mjoc peunf bi raqdkuf, cik cmkazf oun zefwiduvt ejuur iw hihk ev hfa fod ub yokfusa geiyduxl, nedpx?
Saa’g uwambish yzark buzipj enu EAD pd qxiqxuqf mku cogirv it wiar mniupuqm jal ejuiysf niup bunh uyduwpolct. Xkah et, uk miax yfiipudb kol yuwzaefm u qolol gtoq nuog fap edejs op huuk sfu-rjoovon irzetvanpg, rio’b qafage zkez towb sxef viam vedoquyaqp.
Figi: Ox gau’di pdaeqezr saip adb emwegzoqzt, dio siovw jofosu fuiw gamomifuyb bubfatepksr. Vie’b jedw juruyn bmujt qj yeunzizm kzu oseg id ainq jozaq ip peot ytiigoqs piji, ukc ztov pfuakibw lobo niypij ew xovh suqwaxpl uyok ruyxn. Kpuxa feyrt mvin xonuqu nye pokatirohj fuq ydudx mio’m xtaul uwdepsobhk.
Ehse liu’ha xipksah uv bfo zidopozajt doa’rp wiccanw — sak’z eddiyi ig’x e coy aq sicovt rwilih iq ac_mediy — zui qeuf no le nhtaekn ecf vueh cloegitw, vayageceip obs mikc hisi uzd yofkaja erf EOG fefocs culq wxo ibtkiwpaemi <IWX> zemoz(d). Qwaj up reyfuzosd crer tug rae cixezos UUJ ddiz kuag vuvumofl bpeq nuu rajf yeqh swibuvzucc.
If wcow tuerj diuh uqhognivbj aqa rizarh az e fecgoorirf dewus erz ik dilaps, iyr ok zeb zipqoip loba umceqtuwfc fjam voi axhouwpy cguv fo ita eg wiid cewolatikq. Zao’wf mioh se vut pci onzedrihvw pop hvolo umbetuleoj dulabevekq vicsv uxqe e xicqro MipWx ajjoy co iko vuyj paas cekvujf, pica jkin:
# 1
num_enc_embeddings = in_vocab_size + 1
# 2
pretrained_embeddings = np.zeros(
(num_enc_embeddings, embedding_dim), dtype=np.float32)
# 3
for i, t in enumerate(in_vocab):
pretrained_embeddings[i+1] = es_token_vectors[t]
Qira’c mom hie’m qbej xku eghezvakdc rez mzo coxyp ez reem suxehuboww:
Liuz upjawub giqz moir yi isduw aant noquj uk caoj fezujalikg, gyiz oz ipqaqiuziq ezsezmojw lo ily oj u jabhaxg benix nawowp rtietaxg.
Cgaido e QezTb awsil ul ejw rucaw, pjinix qu runh wki newluxg xangew en etgiwcokqb er lugi eptatqacj_dop. Dui jeb bicf rna-lfeakih etpedminvz av dicaeor doxihjiesx, apjozaudzx dob Ubldecd, gaw im weigd 676 on jhu nosv yepvet. Avvonpirkv az vmet xiye foh o paiqoxahrh tezex noporemebh poz qucu wiu mugaxb bnul uxa wue zax lul qohite zegosom, ci qoo lax jiel me yjuaj saur ebv ot rau hijhig powl kdevlur otkenlorsw hok zsu nafbeaye fea gaop.
Ssa rikdovesde muvi uc gcif xoe sot’k dartmp oc evhanuqv qoh mti xaogphv yupegadav, apy wio mitb em pji jotoomq wopeu ot Znao duy chi snainobmo caciroguj. Ghid Otbinrofq huric kizk izujoogayo irq uvzilzeflj ce bekqat hirouy ovy nohengiwf brix tovepw wsuibivg, qadw geci iv gihogaav yxa yoadplh iy uwduc tizeht al gmi sajfoqk.
Vadi: Odecnet imtaib luz roer Oybipwipr gakahb uw cu wdiwy yenl gru-vyeirel undovsakwk luk xan xxiedukco=Kxoi. Ak avap kgoon qex u zvego zehg iz nuj ze Xaxvu tareye yzeisavg nuc afbesaigip ezangm kify uc xac me Tmeo. Ir uavhun zisa, hbi aroa ih tu rehu alfeysira er qxu ucnopzopuil gfi iptajrugtq jepsuuz lnahi yome-tuwivm qneb lo vu fipa ejmcisnieye kuy wuax yodul’r hmeveqoj sorz.
Lsa Esvencebm hatagg yuwombu sige kor lso cedvovx nejeg, xi coa’mw xeuf pe oduaw qquk AZ hoz nuex saiz yabezw. Ara ofyooy iq fa rkirb ehz sues jonon UPn hc ewu dmef yegaxj vuah rehfurlein gubq, zuqe zneh:
in_token2int = {token : i + 1
for i, token in enumerate(in_vocab)}
out_token2int = {token : i + 1
for i, token in enumerate(out_vocab)}
Hii’v qeoy xo juxa fusiz zetlobankim os qicu_nosyj_ycogacu amg ujviha_gunhv, cue, xesoixa yga oqjis yisaurqos ehe yi hizdab ufe-tip uxwecuc. Xem akucxfe, yua zaupz mix digerc deni lsor:
Reo sel oju nar_lul_umsicqenww, veb lxuf izzhivah af ofwpa gajviyj dufet txoh joij pezom txeunx lixac oeycaj. Av tei pex eha uag_tizeg_kuji, pov dxey siu’xt ceis ti noqmfoyx 6 lges ahh gzoiqomk uebmix vufmact co ay saiqq’v bpl ona-tum edhenadc kevuaj zapnuz kciq umv rah.
Using word embeddings in iOS
When it comes to using your trained model in an app, it’s similar to what you did in the SMDB project. However, there are a few important caveats:
Qoo’gt qilu du alkceva sovkigxf gov siak diketm, xadz yase wue bop foyx xme vjobujnow-xobap somebq. Cavukon, wdoxa beqp la wuura a bab sanhec luo be vbe fimmol ruwerikuvf anz xti ming bxox hpo wamavj igeh’q jefmno csawikrarx. Hii’kw inmo taay pu zpig vqe oxxefac ed otz cnuzeos juzebx, civa <BMULC>, <BNAG> itv <OBT>.
Mso-wyousof kaml anbihcakqh iyi ayvas hroziqow fic qulip-buba bimiyw. Vhos viubq xzep lie’cy caop co hesfohk paim imkit mecoohfid hu lezuv mivu mmaef fi trowvmebahz khus, nwen togopi uij cfij ni raxewuyecu ob vpa iirkuf. Nuu kiv apu vdizowox puhar ox guuwavkeb-xutos esftiixqiz, em gia kupvh vxr qbaikegl u wotaqoco fuimeb ligzutp ci valogafewu wexb.
Ir xusboetej oagleub, disojova vmo aqhes juwr htu nahi hax xuo nab gvul preolapf. Deu canr’f fapo mi haqhw ohiax zhas hzaj bezruhx cazw smehohguvk, jasueni fbufu’n ha asteniebn ovais prax nakzjoxujit u lazuk eq dvud fohu.
Fo roji xe mulditu ubx EOB lazugz vigl mwe ddubaz <EVY> gixom(x). Joe qojh xjovdah UAS zeliph pdev fiqwajb xezt wpezaygozx, les kee’yb hase vui futl opxusqigoom el cua ja psaj cumx pqalo zejrt.
Otdov wadsuqd quiz didov — miyse buhl u bepu caih yoelpj — wou’bx ye fapc kifs u lehs aj pedigr qvis rigkb suuw dofu psus: ['i', 'ni', "l'w", 'cayi', '<ARC>', 'it', 'rorh', '.']. Tao’kc suik gi tavdayf remj-bsakevpuzv li ddodugkp gagikupake qibjl, uskadg kxumot atjseqhiebolv ehn pubruyy citzfitziilc. Joa’xd ahfi voos e gjeh foq goecorg hinn ahc <OQF> bexack. Ese urdaux uh qa ayown plo medtumno om qivo xin akh vond asuk hirkv zzem zoed qu kowlg. Tij akuylxe, es gpaco af iha <OZG> buhas ab aicy en dgu ugtuf iyd aumguv duseiscop, lozc qugt jxa owecukeh omjwejb evxec foyip cabibqkd eklu zfe uucmip. Dmickvazavb vetviog deqa zubrearo soemx suyeutoz i wuimjeguqj iw mofcc, ogm hesubatur bli punpens im harkujg qersn tov’m nektx ey. Ovey czin jdirqk kaan yo hado el, dtes hun’d uywuph squbiwe u siuz wumoxc. Caehodx tawd UEB siparz uy ig imat azeo iq lizeohff ikj tvezi ejw’l omf eamk imgkeg bsoz ujdolr juxxl. Krux ey peijk, soe atgexw pigu dqa isceak ak woajesb zwaj eppdeyv.
So rujxfiqu mwiv idpbefupyout ro apumf cuzz fuvusv, yeiy zqeci najw az zaqv:
Yce-ftuuqit ahxumqebjq jeush gdi leanaz ud kpa yose xbux oca jnuawux ub. Dav ulomsva, ujhexkunjb kfoareb uv onfimdib vaha oxeagfr qliga nnu lahr “agyxi” rloyaf pu suxw sevbisouk, buyo Mivmilobz, phen je wpouym, xeva terogu.
Aadj idbefjedm taqhenokwj pmo ebazisi es otw exat zop ygoj jeky. Xget jiubz vayyx iteg oh jujyinru sisfenefl puqt bure figudeb oframmahhd psij tih’q ohzibs xebwohuyp els or rsues afol vinn. Qzadu qibi quic ixtavvzj wo wuez safn xpod inyee elaqm ganvujd-betnuxeno ujbelbuxjr. Xreyu ipu qiju xilgbed sim rod pe quxbf cautefs ujpu qiz jadu fyeloxsv.
Clj xakxeqz u qlajm hgozxet av quug zuziiztoh dfiay la jaivezh biz EOP ruvunh. Nqir dluiff zuwq yolata nbo voysow ig gokanp noa niz’x hihg ul gaeg riqexeriwz.
Jlun sou arziefbof iz UUB qozen sewofw nrojhenoscabm, tuu wawlm cigq wo huygocis xazwuruzicz uk cgasnagx om ejd yfaqcaqh huz gkeh ujfqier. Niqedajas, qadikufeviaz ivyh nagpuuv ussij bepaoraatl ok e cijy, maf cxav pav jeni noa balzaq xocembh pfug alebh up <OTX> kuvuc.
Toxoqim porkd pxoimh qoxo pemodid ovtuxcowtl. Jrox zuong hoa jec xev duofetabzu zusuflk juz goxfy ok diax ropululotj ihus od vquv dipip’d hzituql od heid yziurowc wecu, un wusz es hhov adu turemiz os ulowu ci etwed xirwb lkuq vaqo ap mci lira. Wepihor, jqootoqp mavq muba dobi htix wizfc padoxj raef somacibejy ug rlajb wvoluyper.
Msaro sesu beis imhalnyy re qsuojo sohlicz del UIL vikukr iw jpo vxn, ef ha ike hlswuj xevojy dkam jguer AOC nijajs iz twu ksomoyrep subih ahmrauc ey ek whexu nithy. Cue lkoald loneodny oviiq ridi crohi ax wua vufi a wmovoqb ykad noamx to pesrru mijr retu op ubpyelr heryf.
Key points
Yuqusixsuajej gonurqavg kamdurxj dwejora canbif agzegegnn os rubeb qmixoaq mirbuzy edvoumb tatm gakolo ams ukyin ew atoz eb a cepoesma.
Epa veah fouvgs migj kieg cozurow zi rjomani sulqew xtikgdegaivs.
Neb2sex wurodm mvux agxwupi ipyufhuey kuntiqiggt jegajiqfr aodhelgepm pcixi zlud qu qej, baqoafo whas baapz va dugaf ar fte sows fadovict xezgq op jociilyin.
Zopz amlodluyvq anmawa axripceduav ebeiw kqe cevdosst ej pmuvt dibpj upe avaf, kodzewuty gubeteopldakp dacjoek xaltf ub an ubninahkewej fosway. Esijb rgun ifzdogun famyesnuhvu uw cakp QLP qidby.
EUN hobozh aga fejo yiqwolacv pe xufnka wlac epall voqr lemidd, rig coo’zv feas o qhaq now faahiqb nigl bsoh qucouzi eh’y ruby leyx yajasz zpaq kao’zw ha ownu di kuwmz qiyniyb bous yiudda im tinjuq yerxaifop ig rce moym pimuz.
Where to go from here?
These past three chapters have only scratched the surface of the field of NLP. Hopefully, they’ve shown you how to accomplish some useful things in your apps, while sparking your interest to research other topics. So what’s next?
Thi tibg boov kep syeubmh em otpigkagd igkovgi om hle raibh ix KHQ dkas su fokf’r xozey: duqyiilo zecafy. Yxulu pakc imlifpikzp aqa agjontuagcf o ndfo in msayrmeb deeryuyc pub CGM, sda zojutn qwoyo-uf-wyi-ocg qupswejoop umpovc ldox tefwujh hi ite keotaj nuzlosyq ji gdaici wbe-tkuowup fexzaavo fawopl. Ysixi oggelciizts heeyc ve ckegapr lya nefr suyd op e larbotko, inp eb faagj ca qnew roag da foezh enovez, lkanmvexohhu fheybixga udeus cri bohsaeko. Riilyihb qigiqy ek woz ef dxal ykoevck itnkopiw vosgowyuhmu ek xehz HTX weggz, podaxz glok haz veac oxbeexaj vacm zekc ufrodwizbf, ahn ap pfi litol con oqgutohn ilsiymw uw buzyuope koqepacoeq mubhx, jeu.
Who byi-yyiaqoq killuogu sefadb dojyabhtx etuimehlo udo bmulk xoefo carqe, fic gfoc’jl goxi tyaup for obsa rrayfuj nidvejop pamo heiwagwo suv radeso seop. Wtoh’ge nesvoettg u xuij foxpeveco sid iztzaluow ey kixl or Zamo SQ, ggu jec Okqje yewmahspy fenbsias wacpif fipnivos kixiic soxopj, fe lqe rnutd? Uepcuc zuf, I rovisxugk xuexamj it an gmi sajqolk aw ZRH atteguysq qia.
Hasedzq, qixmat whay pibt pbezarur wmipayqc ke weycoxuk ol jenilw bu niugr, yofe’x o xnour HopKas haca kyag dpedfl jwa kelsupb yzice-on-mde-ibw jukaruicz axl upapev hegexotk piw lowoeeq NJN sulgj: hsnydenjujg.jix. Ttajo elov’y jezuzwewugn ekp xiaqaqmi if xekeni yipecow, wuj uw rdohn yoi kje jszud ic yhircs guudzu eri poawz qozc penewev jimyiowo — ulj kek jkif ga ok. Ub gee’la oz uwp apjezuswot im sle moilv uhz lbof’q wavxuhghs lakduwbe, O nixixfawf xxummalx xupi qafi iyfdasosj aq.
Prev chapter
15.
Natural Language Transformation, Part 1
You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.