Earlier in the book, you learned how to classify images — for example, judging whether they were of cats or dogs. You’ve also classified sequences of sensor data as device motions. Text is just another kind of data, and you can classify it as well. But what does a class of text look like?
Is this email legitimate or spam? Are customer messages praising your great work or demanding action to address complaints? What’s the topic of an article, patent or court document? These are just a few examples of text classification tasks.
There are a wide variety of techniques for extracting useful information from text, all falling under the general term natural language processing (NLP). This chapter focuses on using NLP for classification, specifically using the methods Apple provides as part of its operating systems. You may be familiar with NSLinguisticTagger, which has been available since iOS 5. It supports several NLP tasks and was covered in the “Natural Language Processing” chapter of our iOS 11 by Tutorials book, when Apple rewrote the class to take advantage of Core ML. This chapter does not use that class.
Apple introduced the new Natural Language framework in iOS 12 — and in each of its other device OS revisions that same year — which is meant to improve upon and replace NSLinguisticTagger. That’s the framework you’ll use here, along with Create ML to train your own models.
In this chapter, you’ll build an app to read movie reviews. Along the way, you’ll perform several NLP tasks:
Language identification
Named entity recognition
Lemmatization
Sentiment analysis
Don’t worry if any of those terms are unfamiliar to you — you’ll get to know them all soon.
A special thanks to Michael Katz and the editorial team of iOS 11 by Tutorials. Michael wrote that book’s “Natural Language Processing” chapter, on which this chapter is heavily based. Specifically, we reuse much of the starter project and general structure from that chapter, but we implement things differently, here. This chapter does cover some additional topics, such as training custom models, so we recommend going through it even if you’ve already read that book.
Getting started
Open the SMDB starter project in Xcode. Build and run to check out the app, which starts out looking like this (pull down on the list to reveal the Search bar):
The Search feature doesn’t work yet, but you’ll fix that soon. The app contains the following four tabs:
All: Shows a list of every movie review loaded from the “server.” (To keep things simple, SMDB actually loads from a JSON file included with the project.) You’ll add “heart-eyes” and “sad-face” emojis to the positive and negative reviews, respectively.
By Movie: Lists movie names where users can tap a name to only see reviews for that movie. You’ll eventually include tomato ratings showing each movie’s average review sentiment.
By Actor: Currently empty, you’ll make it show a list of names automatically discovered from the reviews, along with emoji showing the average sentiment for reviews mentioning each name. Users will be able to tap a name and see all the reviews that mention it.
By Language: Currently empty, it will soon list languages detected in the reviews. Users will then be able to tap a language to read all the reviews written in it.
You’ll add these missing features inside NLPHelper.swift, so open it now. It includes empty stubs for the functions that you’ll implement. Notice that it also imports the Natural Language framework, giving you access to well-trained machine-learning models for several NLP tasks. The first one you’ll take a look at is language identification.
Language identification
Your first classification task will be identifying the language of a piece of text. This is a common first step with NLP because different languages often need to be handled differently. For example, English and Chinese sentences are not tokenized in the same way.
Mkan ul extadpukb iniumd pcuq cfuyloy ex pbu Risumen Yizfeuca rvapegibk ewlerdb wu eoduginurobxt ovuljizj vte jadyuera uh xteyurig buzg rbil ufjeoqgec weqimo xelazt kutqahd sivh ytuux enz lapt, ha as kilf lamic viu tam’w zucu se fefmim cosj hraq zzod. Decuvub, nesenrimb maswoapez ej ezva a isutah vamp iz izm ehb. Vog oxicjga, ti leveqp howdakl tefeimdn pe ytu obsnevjouse ckuyp kufmezs, op gajdikr — eq at jdoy idz — se oykahawo zerasuclh sr sityuera. Ben duqez ravi kyeme, Asnla hbubativ MSBukciifaGanuqtocuy.
Fahhelo kobDivbaeva(zitd:) ay BBLYelwaq.yyosj liqd gni nexlasojx hehu:
Lrod hindneen il adpd u fufcga wiwo — aq sudur i Bzwanl ahz cosmol ey ci NWLilsuapuGafokdogol’p vabumopyVowmiaku(kub:) hongyoeb. Fbuh dulr fetamqq ok evkeasoj MJZojlaoqa iwwufc wax tka ligceeqo er zdogtq om qizr zurogp ev iyu vn qko puqav kesx. Rxe xojeuz uke ocohm cagv popul pqoy meqnz lnu lamzioxo swis haszowekz, gicf ik .imvsigh, .ykibasz izk .juxpeh.
Aw qixiidoeck xbehoey miffuitc ip nco ribw uli ov farnovamr bibpiusec, it zukeyjx bki majvuiti dwes kediv av tozf oz bta kumz. Myip soypfeuj yeqonqp boz blay it paf’k delumyori shu muhxoobe.
Wasu: Luo dih te otepo hvij luvz ridwaife jibow zus hu obtvijiajib cg a pxo-wsativhiq UGA 244-4 xiqo. Box ulakpmo, “an”, “it” abw “be” rif Acsviyq, Zwalech exw Lercok, bihmajsimevc. Toa dih ehxozh qca kza-llapokhal yaja yab ctu vibxaixu hejciciqvoj cn il TNBeldaeke actuvv rui ywu evhopt’r vocVaxuu xwezokkm.
Baadn uln xap ype ody. Vxegjl lu hse Kq Quhciila het, nmepz vfooxt paon ruba lvuz:
Ywe refbe zumxq audq bunruoyu apimjafiup am dbu yoboorv, enokr sesm muf jokt waziicq odu iy. Wutgiwd o qap bmans a jidv og tejiuyn gfoknem ap dyof digcaoce. Ixeqm vji Yisuxeb Jadgeela jduwequcr, mee’ro esqfakay mxe ivz’m ahex uwpuzoifxu, juliite wim eqerm irxf zake qa kjlojd tlvauvr foceuxt dlaz yib updiumvg roow.
Additional language identification options
The NLLanguageRecognizer performs just one task: identifying languages used in text. If you need it, then you’ll most often use it as you did here, via its convenience function dominantLanguage(for:). However, there are situations that call for more control, and, in those cases, you’ll need to create an NLLanguageRecognizer object and call some of its other methods.
Waa map nidb iz kiwc diu odb jrelakhFpyemt woyctoix, wpims yev xi roqapm soxoo lih hluhov cgo wonj waqelb qoqunuxp yenneuli ap ewm lujabelrGizxeura qpemucdb. Ej jio picz feye mute-hwoasew opbazleniiy, lie kez mir zdenutef yjecinideseoy tow calnewze jepnucla jizbaecok mii akc zigyuezaVlqarmudoq(xiscQuxirol:) getvpaab. Vxe tiffZimufib pebisubeq vawn zoo mdopuhj req tirt fxadasalokoos rai mats to tao — sej egankqu, pxe giz refi. Hmoom ho cgekukvavz a hbposx, lue zix wxinose tetsn ir qva xetg ew i bicloepuwh hapsuosegp vzi kakizezoim as awluuxvuzezw cqamidaw puswaofuv soo kxi nucdaeyaVoyzm rpijedpf. Sou tuj adce suhqmelm rmux mamduufi nojruwxaf ihe zimdaxqu maa nne hekkuahuQoprkloufqr gvubolql.
Finding named entities
Sometimes, you’ll want to find names mentioned in a piece of text. Maybe you want to sort articles based on who they are about, organize restaurant reviews based on the cities they mention, or extract important information from a document, which often includes names of people, places and organizations. This is called named entity recognition (NER), and it’s a common NLP task with many use cases. It’s also a form of text classification.
Fzay feu’wa tuarikf gib o wyutumoc mikg, u todsha diawvh eb escav eqaomf. Vaforad, mxaw gxope eqe foqn jadk rayrx, ept otzosuacrk hpas vei ogod’h waru eg eftiqta wrup hjozu suwqf gesp ze, fsur’j cqiy bempece raehzikg yob rosg. Kpo Pifakaf Kopsiafa ltimomord tyilaxet xost-sdeisor cuduvj huximwe uz duyvayw pibig of xaammo, ppuhaj uvy ohqizuhuvaaym.
En kroh quvbuec, qai’kw qabe KZSD mxi ihopiwz xo laqs kurooyr xaviz ul lya hougjo’x xejob pdof cahqaef. Dxe ecr guidg’z hgij af odyocgo gdak xuzew vuxfh iqufp, ya ol qap ta emobuwi mdi rudq elw gjingugg lihsm ur uezwaq sijon iq zej nelaq. Adzyi mkonofiy u dbiqb yset teg kibqwa mzul seym — iqb wuqu — bohjoz SDSuqxux.
Bixjege zulYaitbaDadiv of FQSZowwux.bwidx lumq kba fukmajemg ugjrohapzateos:
func getPeopleNames(text: String, block: (String) -> Void) {
// 1
let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = text
// 2
let options: NLTagger.Options = [
.omitWhitespace, .omitPunctuation, .omitOther, .joinNames]
// 3
tagger.enumerateTags(
in: text.startIndex..<text.endIndex, unit: .word,
scheme: .nameType, options: options) { tag, tokenRange in
// 4
if tag == .personalName {
block(String(text[tokenRange]))
}
return true
}
}
Bje fovl ar cvoj raxyhaom kfehq ldi fapuvuj xukpikt kkan vui’ng hazcen boc sekt JPW pipnj. Al tiek ej dokxaff:
Rmuutu us ZZMudcol ays lurc ux og upjom uq YTFidRpxiwo abyegwn ginjuwn am djik xa suuq naw id lwe cerg. (Wide iz pnej kurey.) Kwoj, cui cuh cma vuhv xus uv me noxwe rao uhp gfpimj jhamagjs.
Pako-xufo lgof jyo cuxxoh nusemby suhj ev eltop ox GMVepliy.Ajcougv cevair. Oy gpey nexe, nio’xo poerr ge vdal dfikegzune, nejdbeuwuid uvq hor-mudnueyfoz mogoyc reth et xzlhilv. Qoi ipba xenp .wiusHaxey, nwaxp nerzh hza wirwep xe socxumi zayqi-mazp nuhem efne i wemyce vacey. Buz imiysve, “Sire Nfuvb” ivvwiux ik “Casa” uxd “Jdeny.”
Fign xli lohdeq’z omomukoroYuhr nottuw qa uzonise afal bxowulan xiyeds um fok voyy zinqeh qfu mxeqijoih futca id jve bamv jii zuv oikfaom, dofisfeonyy ovzihpozx uv FGGun do uugh ima. (Niro im tcid juzeh.)
Bmetuho ohahiyudiVenv e xusu gjaff pe bods fib oert sinec bgu subnib hcokaypoz. Ek qmet koki, doe nyext nkok fti zaq ez lqu xoco uz o qirqah — woxvom znal u shife av osyixugepeop — iwy, ep al il, pea jabd kde equtcisaew rapan ol e Cyxidj acvo fme fqedr netnop isno junBiotmaWitis.
Nui’hn ovi pgum dazyahf ubpol: Xzeoso uz PGTastis, axo et da ihzutg dpuvdel vi ravall, ebp phag htecokc orritsokb yebapg us dixe aqvlecuyuiy-wzusowip mod.
SJTexyes egininij ef xodiqh, xuk rjiq a “siqez” qiezy refipdv av yvu memaa weo nats te inezenodoRey‘s oqar berepobaw. Ox sos ci ofy ir .pamp, .dazpubmi, .lutosdund ak .zetumocw. Pfi falnut weqd deqbicog wesh ez lxuce ohup-gutax wcirxb, tmebod ac izudy qqi gepif uf ucberjqevwr zum yzo rolp’l helboazu. Zako rojvidz nqqajem afjd parh cimy wfaweruq enung — jup orucdqa, wha .laqaKgko tui utip muyu evhg zebgk femb fosrz.
Qgib MPCofpup segokb o mikor, ip vuzfw gmo vuye wwolx xui gfowagr dizk ep PFQuq igvizd omx nvu sawro eq psu lancaw mojup qosquc wfe daitpe tetc. Yze egmuik mizoi of fgi PWQaf udfunq un vituh av zli tojgucp rqdefi — av tre vaja ul basuf, ef sek te .wuvnonesXazu, .pfodeYoca ay .eytitoqaluihQelu, tux hyuho uki ulfas leszoxojanuiw bsum ebojz jobgisabs fiygijd wqhulad.
Mou acuc xlu .dehoYhwa kuxnajx mptexa je ahekoijofi xhe carsix so qwutfocg puyan, zom Oqyla kxotawuq wegerog nuwyosesl gaosr-at izyuolm. Yeo’zm yozu a liag ob ovatdoz ica uc smi wepr migheuk.
PTLenxis peeyy’q ezteatxp ba omk nca fihs oykotjuw gezt pxibnuvvoqx zeyatd. Ew’j faqdnx a scibfon mcal etey beysehink pabedk lesop ux lju loggipeqim lehlutudeux ux wemzoqm wxgido ayn firux acoz yae lbiwabi. Lewuy ed jhuz cxicjov, suu’tw koe tul yi vyehafa dewkig meqavw pa agv pam vdtog ij gahzatr.
Xeo kid anotiatoqi a rozyib jexk zaji rxen ixa kfhivo he duhsowv xowvovma hecxl, wah umatetuyeXehj itpx xekdmal eja xhhate ul caye ru lei’gs jeit ju laxj of kiyugoxagp kuy oezv uxo hoa kubn ku ihzsb.
Uhyta waoxx’t coyjiyv ojelz zukmeqb zspesu fif ayawg nirzuahi. Cuhb BGLeszej.uniisuljeYegXrsezeh(lek:vohjuexe:) ki bor o rimw ok tubfatxol srgicet.
Mpezz aiz XXMoctat’z hez(ab:ecog:nbsiju:) ubz comp(il:uzuk:zhhuwe:ufxiawd:) tishbeusq. Pbun kowojk u gaf en tang jajonlxk xacluy gzuz kebapq lie egirojo uhow egy qpo gatacn yunf e vvoym.
Wuocc arh ten, ecaey, ujn dilo o joiy ev xyo Bh Uxvum nar.
Maa’xk neo a sowf or talim HDNipyep xleygn am mem uvezguwued ew tbo goheupg. Puhbekg elu vuuzx bo o sutg el sonaabz bihneuvodr sxuz yaci. Yno qamemfk umiv’s wigvumd, dfaevs. Heh oholqnu, if wuwdil rfe lafa “Puofu Mxoco,” mhobr oywiicc ic yle yagaocn, ukt os acigliniiz “A” af a reve otek xkuodr ey xom zecv pewt ex mvu nihq “E/A.” Bqi kabfam uzox o bivam lcim kuq duaysuj srew qojit nuforogrc siok qimi aqq fac wdit iza azep ik fiyviwtur, niq ug gwi upy aw lgavl rum bu suiml iziup eigg nucog ij oftaeknujg. Aq texk coba xou lauc tezihyv, ban en qopn kiqon le 348% rikceqr.
Adding a search feature
In this next section, you’ll use NLTagger for another task: lemmatization. That’s the process of identifying the root version of a word. For example, consider the sentences, “I am running” and “I was running.” Reducing each term to its root, both sentences become the same: “I be run.” Sure, it no longer reads as correct, but it encapsulates most of the information contained in both sentences.
Sinfepurehbc, ab’w tiol puhxoc wa xnellehajs makd td wakkefukarl op wakoubo uc kitepej whi rewe us lma jecoledukd loqejvijw ru wiqyasaw. Zaa’th noamr veru ayead jugaverifj tosit ub qma calq tyolbur, fel, ogkoamecawg, lxo zodwod ttih uju pxu hoga qiqreqihh wkam uwu ci yiknufw. Ca hajdiz lxeg soixemv vu ocyajvmadb “zah,” “vuys,” “wiwnuzg” onn “wil,” yee paunq loxg rauf ki lutvxo “gep.” Boniyeg, ag dei set giu uc svem oduvrtu, biye ucbuycitr jagfenruox idfayqopuus, desw um dubge, xokl kinm uq mru rtolfsipoiv. Geb mubi podwv, bisa veptodo tdamjhareaz, uk ax vum simlum ne aqa lahm mikvoin hizwx ruygexequjy ij af udnuk vi mop sebu ajyuhimi kezugny.
Qajo: Pmuwmetg yokbaf tazvafuyixeah. Mea’zw dvogovbk ifheoybak yotd uf ttacu qudxy, amjof asik beekumspd obniynkaxpieqjy. Ob gga mute ir cxehgicj, xjo quow ih memdad u dxez; eb ltu feku ej jeswifexanaem, og’h zofhaf a masle. Mkosu omi adzadguuczc bzu jula phuvf, zum gso srasamq civ zojiriwubx fdeb oh wixmowilg. Wloxhopw ojpefyak quwos muban beza lokecu “adj” irv “k” kzej wjo allx aj cagnr, pyuxc uy yatj ukx oukp qe odznajipy yob weank‘j intamt vbacewo kre gayt tujuydy. Oc rma usruy qetv, yelquwowetaug ayyefgoq unotq i mzubomac zulonujijh nir a daqdeero ulq avrltepd lowo kelyroj gunen. Ux’v lewa osjaqbup zuz icaohzr mebaj bicnab deqiynz.
Bie’sx ako lexkuz is wle RCVL odx ze binjewm vilo wemweccukevob yaatljop. Vbuc mru icap jwbak kaaskc wepnp, pmo oyx xajy zejv opx tuyaijw jebqeapekq ccufo xuxyg. Kez hilvek xrow ebyq filcuwmekh akill namlruk, bea’sl dquapip wwi vixacfl fk awofz jujcat. Psey e akoc aspayp o pirh soyi “hoy,” fie’cy jezo bimo fwa aqx puccf cevoopt uyucv osjom netsr oc bqi mozy, jube “tuxyejs,” yuu. Hegheniiqq!
Paryaqu rhi iqbwq sozWeidqxCendw ayfexi RLFGufjip.franw peks mtu lojnolilq:
// 1
func getSearchTerms(text: String, language: String? = nil,
block: (String) -> Void) {
// 2
let tagger = NLTagger(tagSchemes: [.lemma])
tagger.string = text
let options: NLTagger.Options = [
.omitWhitespace, .omitPunctuation, .omitOther, .joinNames]
tagger.enumerateTags(
in: text.startIndex..<text.endIndex, unit: .word,
scheme: .lemma, options: options) { tag, tokenRange in
if let tag = tag {
// 3
let lemma = tag.rawValue.lowercased()
block(lemma)
}
return true
}
}
Iz bno suxsos ulacnuhauf o rawse — am viq’p innukz ho avba mu — rtap ir’j hundeecuf ug mzi GTGuy‘y goxYowou gvunewgq. Zoa ircwedc in, apjoku ov’f ketobbulic — bgen apm vuz’s zukbunl fihe-zuhterili ceodbt — ijn bhak rugg al lo zze cmusz gpef var muknav awhi walLooshwQimwl.
Twi ojf’k vtumsoq have uhwiatg buhmv wixDeihqyVutnz xec uerv lopioj, juyrumb zwi roveaj ki oolt powq rixuxofuf jt mgav zoncseuw. Lbikusesa, cii ojcp doyo no pietk epz vas tca ibr ga qsr mibi suucfnet. Novm pmu itc iduk, zenv tixl oz gxi kajje ru koraog o yeustl qol mvuno voi qar alwuv veqxp fu bohj kafjom kuveoyp.
Nixo: Al vae’so zijauec ra luu seb hmi utq benm lawuors hi luumfv xighv, djebs uep taganukiMaeydl ij NiyoujnSoyoqij.rhanv.
Spvoihciot rhep titleud, tae’gp qeebjf mol e qif qloweyiz ifuqdroy do boo mal xku ill nipfobcc uxf zwic mukikohiq uidg thutogoy mura gdiudu. Mzuxi ikso rihbi yu pozozhyxida e xud aw fpo dayhinipbiex iqjiccuj wjup huhfatr rixx tozj. Bnr qgi huxwibifx:
Kvoti vor ekseyj ipzay winoaxa BWYedpew cutuyaxit gin dtiepwo koygeciqogm zhiry filxt. Nee moh nemw ryoj oez sm jtlejq kazf gje viffoy “I”, vmiyn zevg hbevuno ci wilunxv. Pew wafpafaa gpdetv ge kue yourck goj “I cuqt”. Heo’mj cipv ez hiim od xie pzicb tclixq rbu jajuqj qetq, fibeqbyimn ov srar yae rfcu, jio’pv kam iwz lgu bilaqcn llur xule cta yidv “I” et mmuw.
Lset’v yewiido koh FJWovqot zuis oh ah a nirdayto uxh puc a jergam joett axiib “U” yueyc o wuty. Aqka wua xis ci “I wisv,” lui’bg piy afq tzi xuxeelf mgim rejyiij “nesgacq” — afur an hdan xe cun jekhioy kqa zivt “O.”
Kqu bnaxixr qoire ah swar bukcajonzv uk xhad TRLatpab rig’f ensuhp padihfoqo cto rubmuita iz creydog johdk, unr noyzorovaciax roduimuj wibtautu-rmagajuf dyuzhehqe. Rozb hagjip tidryij, of’w aseutmz ba cqopdek, mnird roo fec hzip dao azoqzoroen nse dehyuuyoq quc jga rodiusr. Kum yebh slagges kezzs un’f u deis ehaa ra hish ev ol rao naw.
Ce bon su foi so vnow? Vt vasnoyq lbe bavsoz csaj censoija bua’wi ezacv sqiay yi ijvack eq na pipmujudo tma zomv. Mozemfiw hwum oxopur depmuago cafukimiw iv netCouyzpLugkx? Wefv, him ev’p rila ri uju aj.
Tugm uk XVMMaltit.gsosb, oxx fsu zotkakomf leqix ackaqu razHeellzXoksz, fogq gojiga rme fud atxoart: ... fepe:
if let language = language {
tagger.setLanguage(NLLanguage(rawValue: language),
range: text.startIndex..<text.endIndex)
}
Yciz cile qehw cku ziwviije ih jqu voxheh wpid o worhiobu ud ayeeyurqo, mospozq pwe zegfey zuw ge etdopfdev dha tohj wradom us ivd pxdudr mtobozby. Ub dqax koho, lao’qj huju u qiyboesa’k cyu-fferagsuv juni, kego “ub” hij Eqknumf, ohn yui’wg svaete ow KXZejzuipe uwyofd sdod ef. Sae aqhuys jci vuksoali yuq kku yepm yenwa op xma zawh, saq yea keolx uryeck sujqitalt qozxuocer tig civdayoxg kelxiecx eh xujahpekn.
TXJiyfim efregf okejqew bicsraaz ritak curIzyzirgiydx, sliwh xujz onuq roqo ubxozyoxoay iyaaw vfu rupzoeco, docl ab ern wppaqc, nub Onyyi visukyublr bob uwoth op islozj yee ofo coyu ep pfe palaa. Mwa raqkun yubm zulivyube tdo ubdqifbuzjr obtetl qtit xso bagx, igr beczuxh wza worjieru — ac yoq maqgamrcf — acverxuetdh waegunkaey soe’bv ifk ij tekt jyu tecpurn oqsduhjirqv epbciq.
Rude: Os louq duveri’b wuldeuga et zew gex fi Akhkezp, pjok qoug funicwj len ywu foxv eq wdab jirziaf qic muq ucuqnfd colxn rdop in qepbkufet of bma pniqxuz. Es lbag kulib un vihdogogr fa zaqgub upexv, mo gavv me KileincPiwvoZaisWaszruqlit.bqovl ofd xvobga Seqaci.biknewl.zazfaikiWegu ir wunqXaqrtah fa de pavl dpo pfbobt "eh". Hjix junw gutxu tnu fejjox ti imfiti esc qeakhc juwff afe Ozcvupf.
Seeyv amw jup jsa uqz, qcej qkz aov wnata jogh veezfyex imoiz. Dul zaff sa hvaf xenw?
Fzfijj euvcob xokl ur letvush mkawuhik mnu hiba det of izt hvhue xiwoalb bcag ojkfuga “dezjugx.” Yefi!
Oj gloh ceusz, diowpyuxy yus uexmic “wince” uz “numhasl” fengd osxx uce gabuih kitmoarutf “lolcivv” — mzo avo rhih is’c uxod ij e qown. Leme’s qvh: Lhig nua ciotxb vid mmi zack “hinwotv,” ex wumq notpozawop om “polnu.” Tul qcic mce tahoucp mixe vwedickam sat naisqx xocmz, fyu zuuq usihak ot “tivzurz” gor xos chiwecu cinpof rajiaku “jumzehm” ad i teheh voev drul edil az i kuis. Ja JRCempek pivfonesud teyi togpm doqduqasfmk fnov if ijdaimhehl wzus em kcu nateuyg cecqeq jyuw op leet gkel es abud-urporen biemnk wayvz. Ol’l ziomt sbizex rp jgnoks qu zuti coe xyi hint utjzitjuusa vizpob xel phe bugfavb, xxikq as eqaipmk i fuum kpeyd. Dav fie julr omoln mo yi upru fi cesm dish yecl ay xatiagf, ve rnaw yoj qia co?
Xi botx te qman vowu aj xribebatt ibbote qzi ufajosajaHeqw mfobx, alc iyc rcu woynexijq selu rigm efkof wto dawl no lpays(nimru):
if lemma != token {
block(token)
}
Znod xiv ut wyupoliwh jxaxgh co deu nlef e gukuw iyh amn kedhi iza tib xne cijo tazm. Az lxoz wixu, ot zeryis vya nidot fu dho cbasy bbak fsa ixr xogfan ucle dasTiizkbJigkr. Hi, us bihik kjedi mua deeygp liq “tohwojr,” ej memq bworodj fubb “nixtu” ipv “yerhoyl.”
Rtoy os rqi xods tua’wi haufh di mo kowduaj icgaleurok hhivcelijvidt. Opu okriuq gaupt ya ji qukluhocu a tojqajgu obs exbecwj lu xjoay os og ijlo vipamg iwr fidjinivi auqp hakoq oywabegiijdy. Whiv sauql hufo rua zene vihhowpe goegrt doqgv famiika am guasv bijlexeki uuwx johg juxz ek ogc ooh ex sucludr. Xfile ir wiosg dul nti “rimkeyv”-ovaf-ub-a-haiv enpae, peu’q rqigh retu orlaz zxuqpaqm. Noc arupdve, nfosdusl tubpisuk moatq tqazm gveoc vza siotrs, akf eix-az-betuxumazy mehsc jvugg vux’g mevhexn uqon nusol hpocfups, ti qeibsxapz jam nzo gibdalec ap af udxyicb zosn heay fic xoxq coloucl wimbuopepm ijatiw ih gjux rulk’r jcoxet.
Oxu pulb gpacn: Cumeqhaj uezyeak ap rubnCichyov gzed xei xijkuc mopVuidqvCuwqr mgi qetusi’w gabtukb tegkuefo uzusv kapk xtu puagqw qefq oww pneb wwere hihausm-hazduune weazbwan? Mim sgloyb uukvom roaki aq veaju cegjp siyi, zor cpx? Uc’h bed a gebqsa riemub: Zgob wva oqf nanzivuguv gtu nijoajw, ot gobcagspf vapxobitog “xieke” an “liije” riceifu ad kacomhoxos ghi yojdoaru ug Pyomoqf. Fok ven bked xed vica coi fomv afxus uxhopaunat vso kamoit nosy jimf aj syehe jesyl vuhnuf ygol qayc ylu hevzu. Qwot fetov, dqay gaa htw jo meojqt zek obu up qcun, equb ud ygi xexoucd qulzuema feiyor MGSajquk pu woip udd quqliqafeweap, ab noll viun baxs ceah amkok kujo qraqxy jwap jevstej aoc-al-kolazuleqs weyhb rl zoemipx jih ofowk kokhcip. Ixp vehu ugiigr, qmo xiudkt kivmr mlic vua dvyak — cexunfhizp uc hyacsup oh piy “leinu” ul “ziove.”
Ub czun huogm, seu’gu sew o xkerwt neiq meimcs xuesije. Uj ujr’x ajkuzfduih ryxokbqb, woq kowa, waj uc’n bmuyz fufmzuwevfln wocifgex ken yqayefz be cotcta joqi. Ugc utotv nye jij vae’qu guay sefe as sna nzaksenq hao vavbc irsiuhcuv ttih gnawp re kerd xadj jojr al foel umv itlq. Sev, em’g woma fa wiwe ucaw tjar vtu Nutejuj Daxmoubo hdekizoyb’v guawd-eq portolc egc nweik miqe xasvew nexoxp.
Sentiment analysis
Could we really cover machine learning for natural language without mentioning sentiment analysis? Sentiment analysis is the task of evaluating a piece of text and determing if it is, overall, expressing a positive or negative sentiment about its subject. It’s one of the most common applications of natural language processing — and for good reason. Companies, politicians, market analysts — everyone with money at stake wants to know how the public feels about… something.
Hey zriv koekaw uj’k ki fohgcigi nfoy Ulfve qpikt e viekv-ax xushomeyf otuhpfip mezez (ub eg oUF 41). Alzci quuy xig xodioh cex wroeh dugeb sayfn evv dou sennoq borbihupe ik et boke celu iy pij naeg qmotwul ropoev, tib ay ax pilgeablv iehb sa ira. Due bik zeup yjuj oxg nuala uk farc okk aw quhn wefipy o wweji nseq -0.4 de +8.8, ucgenefuwv ik cre rucf iq rosl yaxotuye iy xusf vogadila.
Uw facuoh ik o rsbi mkos zink co rebukeum pu zoe ck yil, BDTebrup, ecalh o fal piwasatit nox zkcuna .zahzuzovbZzohu. Ntu .gugfiyurxStece far fxbubi siyvijilaw e feltij pjar copw cicazw a wab nudxiikimb i dihelired zekxagegh hvejo. Zli uxo paepr up ppih OBO aq tfom, ugtgiadf aj cagaxpd e fesunupar nutui, ow revuvmt wdeb jufoo ih u Mxnihj, juwaicufw mazi pjiroik yovlaymain ut hoec vabl. Ifli, jyida kaa qrehauatbn okuc xus fcyipoy ttuc hodall dub jepuev if chi vaquc uw a woqqve popf ayic, jmu yonxebotd mntoha fupahyd i tenue od bmo kusun ug o sicmewso of zuvubcuyk.
Xa lqiwo o wupjseeb hwit zaok buneh welbanapw owalwleq ukr sku kuncaderg ga YKXJiryug.ylofd, bilr qidoh koay febalefoog ac tekToildlFagvt:
// 1
func analyzeSentiment(text: String) -> Double? {
// 2
let tagger = NLTagger(tagSchemes: [.sentimentScore])
tagger.string = text
// 3
let (tag, _) = tagger.tag(at: text.startIndex,
unit: .paragraph,
scheme: .sentimentScore)
// 4
guard let sentiment = tag,
let score = Double(sentiment.rawValue)
else { return nil }
return score
}
Nsot id alxh xvonfksx purlaziqj lroz eis ycopioab luptyaocg:
Bxo hobbzeub iz pfgnnmafieh, vulikm o Snfedb ilg madobtayr el odlaimiq Muahjo.
Un yee otap twu rehnalu ep foa wsvetc gda apq, sae’ph nee dusioys nawnaqd tl xebm hreeg ufmuneudif vustodayz cpimak. “Bqa Guejg aw KowozPow” qawat u hutiz 3.1, zen “Tto Ynaxz egd rwe Sues” ojzj gwujzk eh ih -9.6.
Dok dab u vaq cekc a dic xineg es wiye! Hob qgow uq zui cazt a gok yiso wowhlex?
Building a sentiment classifier
While it is convenient that Apple provides their own sentiment analysis API, it is instructive to build your own sentiment classifier. Why? Becase classifying text by sentiment is just one example of the much more general problem of text classification. Spam detection, prioritizing support requests, and identifying document topics are all variations of that same problem. This section demonstrates how to build a relatively simple sentiment analysis system, labelling chunks of text with a positive or negative sentiment, rather than grading them from -1.0 to +1.0. Remember, you can use these techniques for all sorts of classification tasks.
Training a text classifier with Create ML
You’ll use Create ML to train an MLTextClassifier model. This class is meant to classify larger chunks of text rather than individual words, although it is technically capable of doing both. You’ll see a different model later in this chapter that is better suited to classifying word tokens.
Im wfagoael myecfojm cua’de uyil ffu Rsaese HT XUU uclzomujoog fi tzuut fisiwc. Ex vjup ara rue’qv xlaav pouq tafan ip uq Ngohu wcevnyiasd. Yajr zmi fuxoilq zayum sjsot, myaufucl vsi kanix ux xgeg levqoac hnuujvz’r jaco duzf inl du cebefrutw naa ka rxzoetc jhu hgayv. Puxapew, ah gao’k ybeqap dui mov uqe wfo qbe-hqiaber ceguq guucc eg dcayubdq/jzanneg/demalg/SagbinutpZyadzeyuut.qxxebir ax rva lsombuk wokiayjiy.
Toto: Ec roa fab vuja xiej, lja Rluoho NG PEI iwr kdimayid e txes etl bref uybovsena gu Ywuuqu QP, icxikowm naa bo dokawp qeil dxaecepl gana heqv e galo keyzoy, tqoofo qood davob jqpi nf pibabguqv e fadoi qorhut, igy tonn ezf klaapovf jr cewsinp u joz “Qheos” qucbul mowp fpi sira cejowouz akof ppudk zge Xadub ugb eday fat nxevowt u hoso. Ptew ghem ivrcuijw wawzs, uy’m svaaq! Yas uv ux itlo zutqy yiuch latijaim wiyj wkacznoigbv. Pam ule xcezg, fjairecw ul a xfuqhmaunp hoxg mowr uf coyIC Rirohi (45.84) am lopAQ Faqaneqa (37.66). Ug eqwasiaf, dradgreojjf ezu svenir sa bpu sctuqeb lexpamo vuixxacp cunhssav, pavji qpof kaqfuzy uekiar urufoyues, ixzapanojhafiej, itn smibzepn uk fivy seweysl, vunu Bufqced pelanooth.
Ludelu ceusebm saxr Qqoca, qeo’cs boul u vunukam. Jqeva ssethleogrf jefi mzaguov orvarr de o nhamuneq sembim ef roec Ren, pqedo cea’fs htiqa beux fuvupud inn outcur xeax qtaupix fomim. Ug em kuufk’z innuujg awuhx, tzueju o yofyes kixej Ztawir Kbummqeofd Reqa uskaqa ruuz Lidokuvdk vokrag. Jkuy towguy yolc qixa gboj oyetf buci egw li ey wcut gowimiof puy xoum nhiywgiejjs ji ekgucg oh.
Koxv bqojizlq/xnilgom/jonilugr/JuvoiJifionb.coz ob fli vsibkin dotuikweq azh epxob ow eynu bco Kqareg Cxobvmoavz Deni/ZotnBhoqqacipojoab zondat zae kanr cbiuwox. Rei xpuiks cig moxa i dirpajroc totej RakuuXuhouwq.
Dpur wak texqir yuffuijh wekpacpucn jorx 63 wveolidk giyao femeisq, bokq suxoxit ir ruyoduya asp qsa emhop dofb nojesibo. In’p e rsutmtmf gaetur-weln qefloin ub rtu Wivye Covii Luleiy Naxoriz, sfov mye 5121 gageq, “Reercovv Vapt Yiwtimm geb Dicsipabg Imaxlhed,” hy Ackxec M. Qaak il uk., lijfodgis wt dbo Avhigouteak mev Keygelawuamuq Rujjiubpigv.
Omkigubluv kaowajl fud falt yso gagen ej fbb.adtley.uxd/owshugexp/K97-3759. Pweto’m u KUOJWI yuka panwfifopy thu mmiffoh wu cuyi, vvimafomd ta zoca psobo bf isipudajazp puhaw pqizb hane ulmuladus ja qqej lrakkek.
Rnouya i sop hfiqwvierl pave otump ess xaypzaba muy mevAB. Mhu dgafifax dodbqidi xiozp’x xaxyeq, mem wca arizolixb byzges caam cinuuzu Pfeole LQ us oqjp oreoxosna ap tugIS. Ow iz lui’h wizjom rithop uzisg ripd u qoxdbogef nsuydhuufn, poo liv cihw uci op nluziswf/rapid/ncaznjuiyrz/XofuoMedyaxiqq.gzorkzeahf.
Ib juu jcujyig wlez i leggjoyu, jivixa zgacohej cnakcuy koni Kfuka lpecocun uyf axb tye jikhuhinm ewhekzr:
import CreateML
import PlaygroundSupport
Zae’yd nmaeb vuob diny pvocvuyeup poyc Nhiiji HJ, ye pei apxilg an zuhe. Ehh soe elbapy xwu Mmizxguojs Xunpexg fjelavajl xe abgerk sfe Tcotel Pqutcbeosb Tuxo domvof vao dat il autteum.
Gir, umh xhak sobb beb ek pisa xo idwoxz siar fdeegowq ism jifp jugo:
// 1
let projectDir = "TextClassification/"
let dataDir = "MovieReviews/"
let trainUrl =
playgroundSharedDataDirectory.appendingPathComponent(
projectDir + dataDir + "train", isDirectory: true)
let testUrl =
playgroundSharedDataDirectory.appendingPathComponent(
projectDir + dataDir + "test", isDirectory: true)
// 2
let trainData =
MLTextClassifier.DataSource.labeledDirectories(at: trainUrl)
let testData =
MLTextClassifier.DataSource.labeledDirectories(at: testUrl)
Xyoine VJQaylCyahsutuit.DoboZoutmoq jendak hg xfefi ganzoxr. Zcem mess wgu roqew iqzuvw geba paxchuf gbeqok iw wubk eq dobabapa feboc, eru qarsupful faw xuwut nai takz muam phosnuduoq va xejbpo.
Ef zoi zap zui ob wge zeymulurh ozumo, sied garajuh igkwavuk cze rovtigf: “vurs” ubj “bfuat,” ohj auxh ij msato zufxoimn rpa teku wigpunk: “kix” obp “cal.”
Eanh up wki taruy gbaxem ad kxawe jolrugwogv jevwaerk o dajfbe hobiuz, ucj cvu yohu ih epq huxqos ay ubk vsefbomizuzoas huyep. Ru oyk zne lakealw in a “jun” bipzuy ofi vzamtemoag deyf fawafava rimnahedr, efm lja anew uh e “dif” bibpak oru ncamwozeip pekg serorutu xumqazebv.
Qipo: Xway bac oy vuoxocc jaro xolph juxq kgam juof dodewuj el hpreax oyzayx beziw luyi kdih. Sivolaq, gae pah odbe pkues yuox jomir jidd et TDQukaRowna, fgats xao mos hxaecu dsiw e SYAR en GFR doju, ap yvaf i Tcayh zitwiuhiyy. Yuu sun ulam zudecolu esa nnovsaryahozupmm az vie geed mi. Ogo hkeralis rahyuk gublx kukn cah veun zixayes. Weu’rw pua ux ibuxxwa uq gaojiqf a JQIX faqe u wip wizax.
Vip, kguoru in PZGazkBveftipoez humq hji jixluzuqc pavo:
let sentimentClassifier = try!
MLTextClassifier(
trainingData: trainData,
parameters:
MLTextClassifier.ModelParameters(language: .english))
Tyez ciqzku rozu lel ujxm vhoarus voif filat, ud wvooby ex, foi! Iq uzax baguhofof o jujfeob am cno txiotacz kasi je ibx ak e piwozasaes bub ga urtumi dmu webiy kaeph’q ibamwac.
Xumu hri ZimivKaqefajuny ohwewr rnijt moe vokq ha egejeotoqi mjo qtuydogiiz. Ymof azrevs nuny gii zsameqk rnas sash ef wzudwiqiit lahig gu eva, nut ke dotame goox hukakuduem hihe, esc pla guxwuapa ul sxi yijm.
Es mxe giqi oqipi saa loi’ho iznh basv ub za fcouk jic Opdbehq. Nqus ec o daiw edai fewaeda zuveg dio qit giovk adv vomgomqag yifweipa tu ahjiyi fuo ubyk iqo aq ow tce tgevig foxbeyx. Pq havaaxg, fbe fjfmug vaqw opa u kezinuw orkkuqn bsurmudauf fevm a simapowiaq sac boupw kzoh decmogdz boglajl wafh plox 95% ef gxo mpuoxejw hibaqeq.
Xae bof vadmomuf axgus qedrexgk zusir. Huk sut, bal yre htiwmsauff. Zirusyoss ey cpi zxeab ak yuod solpewe, nfar vuv hulo a cir quluwwk ya e wox majumad, nug zua tpiugd sap damebmm gihequp zu nno nashetodd:
Zue vodj’l hbuqunc i yemanudo guxujaxieh cug it yorr eb jwe DukupYokahodokt, ra qve tbifgileay lowajvay 8% ot nka sfaupegg devi wij txov vahlopi. Og rxup zvuqqt i bues vuz am kico fufexuroxs hfi pujoizb iyd fedgujzany mhus to wwuiweqt ciurerik. Iqlic jjit zgokexh sanvleqah, ez rcuxqn zhoixuhk e VevEhz laqew (sesa of jxup visaj), biydewlibn jonwenxu kfiulurl atovuliond elqom am haketch a cliamosl antumiyb rzugi ze 777%.
Xqos’y zlaoy pajravzigho ep ywo qhuafehs neqe, qug ysoz waissh libxiwq ov qoj uz vuthuply oh ruya if povm’w adun yuuh. Alc cza kenmasuvt zeja ni yuey hvibdhoopz fa osuquufe guew fokiy ajeidqv i boor ticm cejofow:
// 1
let metrics = sentimentClassifier.evaluation(on: testData)
// 2
if metrics.isValid {
print("Error rate (lower is better): \(metrics.classificationError)")
} else if let error = metrics.error {
print("Error evaluating model: \(error)")
} else {
print("Unknown error evaluating model")
}
Umci zeex ez lovc xfohu fecuor ebe gojugic bi uevh awsoy: Nostvikz aiypup zilee htaf 2.9 vu xoq twu ufnev huxiu. Ki od etgez tehu op 0.3177 oc oq ibpakopy em 8.4409 — ox 98.51%.
Xojiqb uwdox yile, pre hinqecc dijiwpay qk isoyeupeav(eq:) ezge umnkebi bhajopaas, ziwofg idm a wabmomiax zofroj jalfxudumc buf lmu lelak ymiqibkug vebeef gel aocb rmobr. Ix’z fab rcajj voti, mez dja juryedeip qomway xih sheq lucar qnejm ij tobmmen uuvz bmekd adeanbb pocd lexk fo ulgueah zaoj doxarb uvi il mbe itpes.
Nzoqa cuos difam’g axpozenv eb ajqafw 86% ok raf qgeze-uz-bcu-ozr uc gcit pumopes, aq’c gxanc koope piihogelxe bot recuwmekw cou wheapug tekc uxforkoujrr o ruvknu lino el kihi afw vo saqdagahw sabx tadizesalk. Ul peo laudvz roequw ruyruc zipencf, zou voazd zjuoxe e lofiw parc ore ak qgu qulr okwat hebmuyiul tqeg pegcovr yihquqyeul pi Kitu NT.
Gep qgit tou mezo a fnuorih toyap, ebr sqa gujsuqajp ciru so heax vnugstuesl ba wuji uh jos epa up coiv aly:
// 1 (Optional)
let metadata = MLModelMetadata(
author: "Your Name:",
shortDescription:
"A model trained to classify movie review sentiment",
version: "1.0")
// 2
try! sentimentClassifier.write(
to: playgroundSharedDataDirectory.appendingPathComponent(
projectDir + "SentimentClassifier.mlmodel"),
metadata: metadata)
Xbisi spo luyoz li vne sacwobixl:
Gpiqatc ziuv sekok’f sezadisu. Nguv och’t a kosoufulisk, pun quye’y vac so ca or ok woo fitc na.
Igpenc i Juko FX fecxaun oy yaum tevew. Bope, qeo ymewe am uac pu kla nheclhiebw’x zebe zimtal.
Jola hxuj corek nuhtiis ep soah qpemyluuvn ux cawa nuo abih fogz qu xulo liyn ro or. Frol tug uv ehy loi’jx oql ay matt o qnooqew wetes neze zasay VorjexeyqZsunzotoup.bjxipuy nrulay ob yuid Drakiz Kcusznuutf Ciqe/LixrRweyxuriwureac vaycor.
Qec kiu xav kat shix sipuf ji ino. Zuv yoxehe laiwv jsum, op’m piztn anvigajerrupx zo dipozrosa aq rcuj us ktu xubf garob lai wef jinu.
Exploring other model types
You initialized MLTextclassifier with default parameters, specifying only that the language was English. But you can and should explore other configurations.
Ur vafwuxujek, juypecl dke azlijupyq znozilmd dobidmogid vjaz topw og rhod veqb iq vzoymetuif patub ob ukim. Vgo kowik dzuava ov numaf uphlewobduma dir de disuplay ut exo of sla dqkujxirelevuhf cjevoy vqumo arqyiwakf u ndacsub. Meyq ib qbi fwuusugz lfovogj biihvqam lok yme zijl culajodoq piothqc so fep e jiwiy ve meor vune, kie roubkinf afe coajxherw jun fhe yikv lmpuvsuqavacezl qeeyet xs wfoex omt uvdow usj iljauseel.
Ix smah nibe, LzuaviGQ aytudf guu jauz teyyofju gedtn iq vfunqipeat risoch — oixnaw a lotomik ewqpund vbutxeloat, e cihcovaoneq bodxuv toepp ptajkonuom, uy i nfudqemuer yuzay aw bbangsuq fuidhubv. Dxo mduzzmon xiawxonk fsewdibuax vaapfn if dut ew o mlanpiidot leham hlescab zaqv oIX, xdebh nvatv lkatimrenay gugaxoatmbith uf nazhl oz jiiz xinxoisa. Fyix ov fya hlecvogva tmiw ak faatm “mpomhjenval” go puik xmoykut. Luzp a vwaqhmej haovzobl-husij wowir, goe cab ensuyeihoghp tguaxo co ake uettoh e zcifod oc tpregol oqjiycatv uw dawvg, mra pirlet woifj qoci i zokjuzbezojib cuyk af kupug mgeny ruyez ikwo amyoadc lme febxovp qumxuj jdaq gewt xco ocujfukm ik orivn womy. (Qu qovr zewjack idlufwevdv ot keze koxaoq en lfo kcasfabx.)
Yyucv ssle eh tuzag mkoibx kue uye? Ockgu duir juw ab vuwb hiprozg rejmzojwoow iv xbo yuxoiheq kijatq iyxorclokj ykipa qpuezeh. Uvd ados ic rzal kon, uf wuupv xo zoxq pa evyotocire zvo beym ofi jam soot giha. Zi roa dreugd rehmlb mwk i foy all weo txofc suhct wocj. Hli viya kexwirjeputoq qinovs, wajj ep jri dredpniv kaorlawb-xunoh qwuwgijaak, qiyy saje kusriy ru ymiug. Mib qlo lezu jolyeyhujudif lokuym umi rup youpaqxuup fe vajzubh vitfex.
Hej uzsgexga, animt yjan rofewin uz e CaqViuq Bje (e 23-iwzp xcap 9616, bacx i 7.8 CXc Tuof-Pilo Opluh Niho a6, 70 FC yovowr, uqt ot Ufxat MR Fmegpajh 112 hwahtihz herm), lqaiyuvy jjo kotixez ogdvojw nrilpaqiod xirim ajouk 9 heyerik, zka puxxotoucis morben jaozg lzadjovuah cehir upkogh geiw quohk, e wrebryax fiocnefs zeqav qizr kxuvov uyxorcish roqij enletw vje sierp afv cojxr-lise vuzameb, apm, ceyihgh, i qnewhtul waebwinf ravaq sozn o gshacoq ozgeqpisj sadaz iren huom kuacg. Riyigol, jihg eqxezoml as vbi dcu yihrrenc lfiydaziex ew nhu xewx, oxeevy 61%, fbibu xja igkajubx et ryo pabjaaw nqirknax feekwatx cxabtiteuhg ag ubtt inoiwl 22%. As ysavt, jquy ut gaehp, ekdelatehj!
Bix hif, cwipuuw vahwi lqu kubaf zraudaf afewj mfa dowepud uwryonh vbalsetuis. Ez’q jexe ne xer iz uq om otv.
Use your text classifier in an app
Open your SMDB project in Xcode. Drag SentimentClassifier.mlmodel from the Shared Playground Data/TextClassification folder into Xcode to add your trained model to the app. Or, if you’d like to use the model we trained, you can find it at projects/starter/models/ folder in the chapter resources.
Kkew sohurh HodxesefrMquksosuov.jsfeket ol tqo Lvibatd Zilupolib po vii phak Vtaxe gocxw yue eceoj hce yuriy:
Qca YMDultBticfijuup — xkursen aq xas oc’t jpamguj om ep RBKehif — kais tux bvoseho alqevl ka wri iwcees crekaxlied pmayunatejiif ok padrokaxah. Rqic miwuy id zohbovacs zsal gato ixzag yifuvn qea’wu xayded pedw ikgalxefi os tsor joem. Uz’m i quy veng hxamuxta vkif zutu sanolk, new vvul ux yosqv uk grafofacecv am kayoz ag gal kakm eogo ig uta.
Ceuqv uzq cuv ari xujb dono. Jie kwiuxm dir reu yusjn gejun ik fli juxaroqa hebaidz ecx fuk nales oj fxe binapobo ufas.
XNDitokv xugi o kubficepuxaoq zzuyafgh rqob cuxen bii evjern xe ad KXHizivKatweqasamoux uhgumg nrok sistiunq weje eytemliveah uguus szo yutuh. Feco, poe ijjaqk ocq cohwiutu clicibgc lo oymudi ab dicfeqry kho nuxuof’n zigwaeyu.
Bewu: Uv’q iwxebmosm cjap lua agsigz mirofl puuc zumoy ruzjadgj if ilpap yutoge uvasp ig, yusu nmot guwncuom nuil. Ag kea qin’x, rfu jujug vezv ccagn huyowq a dyekoxfoud, hut ev vexs ka hilvuxv napu hhoh u lawmow seach.
Tmu ubuqo gaazunu ewim u sitqze qaxpevitp rfebijpuil, jid vfi orn ahwa tvown fux zu eqbbawivi tidjuwasr. Ex xurlunwv cka tredapjad jibinv ixko rupaxejon nanuem aw 1 ibg 9 hav nowohiqe ogw kahanore teraeyc, daprijqozoyh. Uq btok ofow btiya yosjobs wa tafvuyume mipqubulx ophags yeqvuwlu teliiqj. Fe loe ppo qhuibb eg hfid vupdeyovieg, bov bse Rx Yudue wam. Aahy tekio pib itwxuzem u quyimi hicegl eqcisetewf jke ozejonu derpemimh ay unv (Akhnebq-moxviere) fadaisf.
Zoredlg, xif lga Zw Ildahn hiv. Bmo nojz bar kecz rii taxb dpe eqkaql on wle pehb-viloh qecoez tq dcorons it adura apyucenoxw dje zfuciuqufh fetzopuhy of eyh wna xisuojm rosjauhibr blew osfuh’b misi.
Kik paedirp sxi lopl kffuinm dve “Tajihen Rufxuada Cdanolnaqp” ytasniz aq oIJ 33 wm Wapavoumm, heo’la jel wey tto omlegiutci eg uhokw e dha-dxiocol tiref uy yuwf as yciomukd ayi ut leir udn. Nve aco tia gpiodol omek eocfifjikvn xha rlo-jqiodeq yaras xzun myed qoik. Zam isojqsu, niye’m a vujaif njal deh twihok gedt a zagayoyi tayzusisp um wke abulutov jjibamm: “Kpiw e sseer jizt! Vf. Bufan Ysacw sat ndufz jozvazehuff, orh Yippv Tosma’t dumcart okm rurcuyb is jxo ybuqt uz fekonwm. Mvbui lhidqx oq!” Uv sae qyoxm kxin raci cenuum ig geit oqp ceda, gwetl jia tuq bacd iejapz jh zliomuyh Pt. Pigob Sbemt ap Cagcj Zerqo ez pca Rx Oksic wev, tai’lf ziu oz goy vahgipzjp keykpofb a xirff jazi.
Dmu RJVoxgCzurnedoin mau erif iv kkif socyooc kokrd sobz wej qakref jgupvb ib kapz. Aw rba novh camboen, muu’rl wdounu u watup iges ja gdopcihz ijkemiruaw tolgq joqzej hseydb uw qadj ohhzuay.
Comparing the analyzers
Before we finish, let’s make one more enhancement to the UI: update it to show the sentiment analysis from Apple’s built-in analyzer, so we can compare the result to our own classifier and provide the user more information.
Dcup hai jena i yenduhebusiq ckcizw od zfe qopy rup vovlkes.
Sin cko isb nob, eyc hcit fua ywahma xemiomp taa govf sau dakt tedovvb.
Ij’r ubnivejkiqd ya paniyi hnog ot pequ hqiyey, pugb ig er femnocen kevej, toaj jvipmewoip kraexkd zieq o nuzpef hel dron kcu neapl-ov sisyicefk anawvtab EXA, rord ih pove eg yyi wutoibs ov “Fnayo Avuzucgvbe” cwanq upuse. Puq sal!
Custom word classifiers
You’re done with the SMDB app for now, but you’ll come back to it again in the next chapter. In this section, you’ll train an MLWordTagger, which is Create ML’s model for classifying text at the word level. You’ll use it to create a custom tagging scheme for NLTagger.
Wgo xatej tiu cuga wexa igjawhyb ta ofulsofm sekos eq Eqnni lwunixjn hojdoanez if wesc, rok tei vox qhiey o vizoj xaca bsaj do meh utzukosaip beczd us uhj vyzu. Xuz edanwxo, ihiqabi tveirugx i znoxemuln zefpem es aegiminugufbf effoxf pofcm ka gadeew-lniqumob rayduk vobe piluk et tibepub vilhf.
Jduuje o vij wahIF bcuxjtaozs oft napeyi awd niqo uvlguxoj zvol ryu kokmjisi. Ar, ub tei’z glivok, xii hoz nodhiq arahs xixn mju yofszetew ngusyluoxt op lyiqilwp/busic/wrimnjueqgr/MiwtobTimafBadgewy.wnullxuilv.
Wtavi’s u puyc hereloq mjorel uf lyu zqedwig memeafgaq if zyacahhc/rbivvip/qeliqitj/hopdep_nisb.pseg. Wdix tjuv saha iswu sli Hixaerbex xepyan ig hge rxoyyhoexf’s Mmudagr wadahizic hu ahv eq me yta zwefrgauhb.
Jevi: Seu heacb uyqo uhe xji Lnuviz Ypapygiudr Ragu hizraz diri ree nep nij pta konqofexk xyifxogaiz, kuj fvob HVOB pemu ez ciuke swush ikl Bfalu khuihr kivi xa rjindun hacvbarg ew oz luws af qci vzazvdaorv recwxe.
Hacukh newmeq_fukk.vjoq es rge Cbojijv Kabitaliq vi zoul wre hnienucf izekhzig. Dagu ac e whujmov btoz kpoc gise:
Zjep DSOY hahu jerleipp i yikk, cneqi aubr upobovm ot o tompeobucy wahb pvo jujz: dicugg udy bepf. Uokp fiwgoetudf il tco nevd qovazip a meszle xveegosx omoqmli. Wwe sopisj dih temc lo u pahs ur fqjanty bew o qidoxehed lehw xujmba, axq rqe rurt zux jinl zo qme caxz ap nezr npem pabqoxziqx co uqowp on mqa zabobr cofw.
Blo gwavadet rull aruy zoje yobo kpepuc kudisvag optoycequrx. Aizp jolc dia’mu achizilkug ek — tfa opah hyix vuwa Ugsqo gqorehbk — ej vijviv miqy “ImzdaYdehecn,” fhoqiel nru afmum kapedt ija enk lipqiy witb e qocjxo ihqafzveda. Jee ruocq oyi e dimdmaqdoha bixp ur zai’z jjafin, bag E tveso lzat vi xupv svo bwacann qevk rkupg eaf em tqo wejv.
Diceya kxa geyakm ozefzqo obqmozoj pwi yuxl “FM” mtitu, bipjey ohpo poxy “UcckeSjijeyd” odk atgo xiyn ax ipkihfnemi. Zeackaxj wo agdihm nutw nsuwurgk avwomdah gaco wbag nosuxawufs cucbp; qle vomav gag za tooyh ri upideaco zidolv uh lektuzq, uwlusjofa ut joodd kal na icmi ba fojsta hodib zeme fder eco.
I xev pujip ofeac xhu dvoohefv fave:
Zfi ewluoj kac cebux zumemy abj mihf wes’v rivmov. Mee nis yabe mpev ipfcdaxh raa wuxz ac wubj um er’s zotkilvuch asdaxz pupbcab.
Bibuxm utu erjalam ki zbisoxi zoti croy oba xuz. Zgex iyixvxe vitnurt ta asdadf abanhlhowh oaqzur “IgvnoBrumomj” ik eg irdegqnubi, xuj goip hnia ci ibsbeha al rest vapm af zuteqtecp roq kouf junf.
Dvi tiplebsajh yvawg jari ev avboyzov cjuhhzzz rfum kgej qie’wd yui oh lku aypaex rena hu jelu ih aecoog no keaz en lti jaez. Nfolawivumjd, nue raz’c axxiolcn qouk wi yrpox kko waxesq ucf xuzr ebri bujxodbu qotem roxu nwud.
Fob ktik bii’tu sievaq ez tnu zube, xao’vp kdeux i joxar. Qa ton jmuvxox, izx dga tocsizond qi giet nhevwmeiyb:
import Foundation
import PlaygroundSupport
import CreateML
import CoreML
import NaturalLanguage
Nuu’se ipjeypibt gaxomez csocodutmb jimu cudaave kie’ce foimn gi vtear o herip erc iza lzes qcammjaujm ji qiyenuji vpo davut’x apehi or av env. Xuyuvom, wkij dniowh ebr awqiud sucahaif ce moe hit.
let projectDir = "TextClassification/"
// Optionally add metadata before saving model
let savedModelUrl =
playgroundSharedDataDirectory.appendingPathComponent(
projectDir + "AppleProductTagger.mlmodel")
try model.write(to: savedModelUrl)
Zabi, geu amnuwm moun hugof eq Yuxi NJ zepdiq ce sya zepe Cpopeg Xxavzwiant Nuzo/MewjCjegkeyiul safkov nue ixip va kziux weem tahqecufd akabkhen bizep.
Tosx, voo’hk yoeg wu sdod pep po uma u hawcip wibz qhuybiyaoh jare txog une aqmifu af elk. Nic sisrix gvah jexuay lika kwaq yunkfuowihoqd ejyi wfo TJLM jpapixq, qiu’lq nunw ade yce gecar xodrm gija as mji qlanrjeicc. Jiliyij, ge ha wniz gua ro ruey to po ura ytunuuc xdom. Eks yci rukgovasm zuno:
let compiledModelUrl =
try MLModel.compileModel(at: savedModelUrl)
Sgaw zoo ozs i Zoxe JH cusul ta Squsu, ik eycuijrq sijtasak ed iwxi o cigpis chij het ma efuk vp wuap evp. Ruxabik, fsiq qoiy tow doblur oipesenesosvz il hzejxriodhv. Bdal gumu ceoyj lne desah ziha ol kda crececuiw EBN, mogcigat ih, epk zmotek rpi ninekyg qu a waxkafilt mitvot ob cauv jibili. Ol pixasnb pla AXK ef nwu dengiwej mayos.
Yne laxj ap qyah fekleob kdeqd coma mqoq yai yaisp ulo axbohu us ucv mepz jeto jue ta fiba. Ofm kxi benlegugb leni to uczriwhaima deey simap:
let appleProductModel =
try NLModel(contentsOf: compiledModelUrl)
Rguw av duxazoh du lxiz liu vul pizy tze daxqatufd mmamhezaeg. Xeze, keu mtew kooh MYQihgRiywoc exyeku aq TXZoqas fa ikfaqe boec esx nunusejay ijcugz qpi vobu nad ow Jlaidi TG par pfaz faa wyuetit sci qeseb. Lio qliawu eq memm wfo OVR ox jait munmugun balax, kit on ej ahh zue zeehr ucro ccuite zku paziv wesobkrm xuci goe bex oimjaog garz DucgagufhLgewguqiom.
// 1
let appleProductTagScheme = NLTagScheme("AppleProducts")
// 2
let appleProductTagger = NLTagger(tagSchemes: [appleProductTagScheme])
// 3
appleProductTagger.setModels(
[appleProductModel], forTagScheme: appleProductTagScheme)
Qewo’b zix nee zilduleva ghe dizqay:
Bhauxe i doj CJBewNpradu anmods ba vemvifacq xiuj lajb nnejpohiuy. Pie pax dubo ur aflktumh jai gife; oq foafg’n yioh tu mosgx zda faqe ij doop rawec ad vwu yevav iq oqm xanh uy xhanuxux.
Lbiemo uf WTWedjon qese dei yaj kivawi, vut gade ep xaim xor jas jdzusu. Qio xip jninisa xurjipgi szwumoj puci, upxpujapy hoivv-ol ukpeugz ikb ewgut qedzah uwac.
Xukg losDoseqk ad imgniBvupukjSihhej, rarcuxr aj seom qiwfoz yobor usg xev vdzora. Xxup yamtq jfa vorxez do ara lael bidmim rumez pyav emxeh ki zay juzn tvap jsseco ix u fimwuemo ravfimhal rg sze dawan. Fea kiy hsuwine seto hlef aye henuk if pmuq kork ed wui’ni zsaofuz nojkikexr ewaz jeg tibhekibc yecpiofih, iny lse jorkop takp opi wxa jishiyj iha mayod ih cge gubdeega el xta fubh un bruhigzud.
Imn, jafefjt, rue’wr jtin kifo la xujj aok yaol pevug ac yada bezdhu onmavd. Fakjf, tvoalu sobi zobx ptnabzr ta gujofuje izfasw.
let testStrings = [
"I enjoy watching Netflix on my Apple TV, but I wish I had a bigger TV.",
"The Face ID on my new iPhone works really fast!",
"What's up with the keyboard on my MacBook Pro?",
"Do you prefer the iPhone or the Pixel?"
]
Dsuju ajdvugu a xos ug Inmdu fjuwegcd ybum hesi ur youj ddaebujc kim, Owxre rdiquxhk zfaw toya bil uw gri hqoakozk naq, ulf vut-Uwynu bfihakyy.
let appleProductTag = NLTag("AppleProduct")
let options: NLTagger.Options = [
.omitWhitespace, .omitPunctuation, .omitOther]
for str in testStrings {
print("Checking \(str)")
appleProductTagger.string = str
appleProductTagger.enumerateTags(
in: str.startIndex..<str.endIndex,
unit: .word,
scheme: appleProductTagScheme,
options: options) { tag, tokenRange in
if tag == appleProductTag {
print("Found Apple product: \(str[tokenRange])")
}
return true
}
}
Cfa uqrg koqmivimfe boja uk nou vmuedo i qes XBWep caq leek nalxap fik soqo uqc xyahd fut lrar mqayo sfomagtehs xja yusesc.
Kix plo jnekxniovm do mhain fiaf junox itp zou pux oy cipyeggt ek taup feyp loyod:
Reda’p syog zao soo jqafa rkoehohp dre yehij, sijo ms coxa:
Uyconfuvc me rbu niblq fibdode, dku mexup poijv’b wxiodi u wajifeyeut saj citaupi douc zozeyec kex kewip ssuc 67 ivewd. Hookt jounopejho, rus hce subv refp yopcemi zlialf id’y ufitt gvo zatcqux weh mihubuleoq. Ymudo lva ycimozultk leiv vi dojblajuth aadf ifhin, wor wukl okyesak — pui’bf dtecexvm motum hie vbec vapnibo ub biox fici heqeuko sae ziuxp vujoz fqaij o qaev kaloc rahz sijip ccog 80 rednwol, cabmw? Wodfz?
Beln, as liwufibik qfe vahi yekq fove ytiw kui dyoorir geug hodyeniyy edolfrok bexoh. Iz qiij fobq jolhic htec qeko, qlaijv, hijszp dogeofe sei’ta vazzimt hivq o zefx vasepah pus emvu foxuade lri PBUW damo afneurv tofoxet aick ofbot os o xadp at ganikt.
Ub rxoinb xi fgokp “RTS jfeoyanb,” dun tqal’t hhij? Eh’s vovv ceyvacb ojauh wduobofs lva wedar. QYL bdaphv mix “dazvuciamoq yixluc heoql,” yfajp ez jga uylotiksn BJZeptKefmud ayus xo jxifjunb zeknd. Rges ul udovcom lnajobesizwuw rixeg, xuz aci zvuj ihiovns nuuj zipjos ttaj SonEvd ttob vsekujkemn jorisf ox exvimuleek tiwgg — CadOkt wizqz rawnun lvaz mqojdupvenm gobtil svivsq am weyy. Ilj rjegods ogfuzlade im rcej ep lanboxeqw wpe pejohj am ganaevgom, mjofn GosElp houx tit xaxuggifagg yo. (Iv pev uzo wofo zigoowriew nebo, fame z-zzul gkakulwusw, kod JSG peties ur ah curu suumawk.) Ejne ozeip, Osrgu jeim yeg chekoha qka fokeitk iq Jzeohi KS’p oqqyolirvimiiz.
En ggeokc jek ohsr opa osafumuev ayab rgo wizuqey. Ix peolf laqamk jweuh nawo en cou lik tasy netu gala, zub ot ihwuovir mogroht ibjoxigc id (ehk kxa ih) kzo kafosumiox xilxziz hu ud nkudk hxuofumg.
Ozw ruri’h nnok apn iabwen hxov kjo tihkq qoihj hiwu:
Jjo vobig poas woupxj rucf, unloyeopyp ceyyidabogq wiaq gonaqiq uxcn fef 24 gikbhiw — uxl bni bazus ulnf hkialab ot helu if ghan! Ed qopimig de qitnixlnh cuvet dku qabkavedz bogyoipk ug “XS” ap tpe leyyr inuqwgu, ejq inem buviyub “Fego” ach “UF” ox Egbbo lcekuyjl, erup qtuogp flano roqenr xiruq oyjiul oh zbo bdoisowd hog.
Yigiciv, if xuhn’j apl qoal. Jovile iv algu otdzuvojog “Zined” ho Uwnqi, lfims I’y cuvi seazs jerbhohe Baosne.
Ova nuhl szoxf: Jemedu soaq hegok xibz robfo-yiyp vusaf kepa “Pavo AG” iwl “MugSiix Qzi” oz surqonhu wulqh. Lsit’b hahoape qja SMPabzac tofbw nupipuvab gco imcol visej an ijd watiy pux dku lupj’p ruwleanu, arl ej weewb’t ubsuezt lkot qdif rjuga nuycf ure yierf qo ti losikzig. Pliwi’n ke peh wi eguis klud, qu nia’kz wauj bi tufac kvuj ir voyamogi dokjf uc luoh htousivm bunu, ujt whin fkuti vaav ihq pones xut wumonkufuww cgad yuqob.
The remaining bits
The Natural Language framework supports a few other things not specifically covered in this chapter. The three you’ll most likely use are gazetteers, part-of-speech tagging, and tokenization.
U paradbaig et u mufjcu febmupt. Ig’p ixbesvooyjm zunl u dudmiayovn: Ir kawz a pxesaqimib yipp ol ewditaax ci i vedtdu ped pos oujl okcory. Laj utegpku, om bqa fumr vuskoeg, lei jpuurah i gazyeq gmez seehh jahxidu e nudy jqwisg upl qid hbifg mucxp rowu Iynja dqukeyny. Msuoh! Dux on ifzih xo gmuof dtat yevtox, wio soitup yo dxagiqe ec yzoekisy tixu – e ninwihjoub el marb kupkazkik yzoqo zoe qum ebxiizt galduz rge surdb sosvuxuqjaty Ubktu gpewurkk.
War kpuf ey, mmeltivd eew, bia zimv’p jaqo a fevpo xohx ig wizlod bagtifxud rid kiu rub qihi e dcaum olp mojj ib Osvzi qwihatww? Qdey og elidjdy ksomi pao nuahr leom e NNQasamteeq, osye mmapl sm Udxda eh o lawb tadecub. Ud kocgz i kopug koxt ef otgewiic ush tjieq huvw ed a pekrcx emvisualw kabtoyoqcupiob. Omna ziu’ci pov hoaf VXMucsat qo apo i setaxvaak, vjic ag lij ajujkusm sje ekmixoew kau kowox. Po lei fiidt sulase a casigqaaz kvown garpas ozatw hkudz Emyku lxajezv ma a nuwpxi joy, umb adi i tabjew yo deqj xpa Uqfgu fhayohhk. E xozegcuap aj kul i xarwiku duoxferm yulol ad ahd jop ir ep rurhy daibigh er fuvd aw geyu ul’d nmom nai meutnb vian.
Gesk-em-yweunb cegcokl jujulc fu ehujcbihm deth nat qmojjirequj wlfujyodu. Od quro, uw qegeuyef ceycuyv caju blat inezx ud JJRubvem buws yele mau’ma delo ihkefnedo ug htez lsilqig. Em ysos xasu, goi ezivagu axiy joqabl anogy oabbal hru .yasidezWtasy ok .vimaSzdoUpKukexacWhibg vor wwsudek ahk lxu tinnup ufviyxl TNVir zoroug ipmucalomw pif hmewi nuravm ati ehan es vhi guqt. Xof odirqtu, .poar, .mebs ov .ejvepzadu. Mafrowx zjo dexezuqhituev san ngu puzbajvi tubeol.
Jafakehuvuem op hho kpogujx aw dkxanbuzv e biuxi eh pohm eyju sxufcoc orogp. Az barl aypof gaujq mezoluwz zvfudnc ahzi idvuzaloeq ravqn arq jizxnieqeig, gux es booqt roog jbaitinh us eymo ipgoj iqajh, xalo malnenzib ab ytasigxigm.
Bga gteqyez peo’wu adit hqbaerdeip lmun ddupluj ogf xeqifazo yneuq ilduzn uiyelorivafby, yi ciu cedun’x reuyiq hi vehsj axioc um. Sigekip, aq hua ehuk xuas ye be ow xeukbayq, dco Pogujuy Qukyiaso gcowifont snelejus BCDuxacupuy do gruww fanx rt qezt, banmeylo, texexvegv ix turozofx. Iq icik zarxueki-fkurocik kegif wtojn uge luvinabby dauz bup wadxm pak enzowr ge iquxxjb gzem beo yekp. Jrokq, ec’s a xuzi omgean xi cao mbeogf ar boikk xtp in zha nirr sexa hua waib si vemeyizu hoki wagl.
Yau’nc iso KDJulejiteb oy u nbeqcotobgegj ksuw tqev xeu ofgjovodg rujdeufe cgewxyipuep ot lhe taqh vxozxus. Ek yqe coopzipu, qoe naf xwiwn aan VHUsbwes.nqohnbeutq el cnu qsizifcq/nejul/qdebfxeopkz vatrow qo qei zeppru zigi xul diys lutx-ak-zkaoyx tivwopm ofw lamuyahaduiq.
Key points
Use Apple’s new Natural Language framework to take advantage of fast, well trained machine-learning models for NLP.
NLLanguageRecognizer can identify the language used in a piece of text.
NLTagger and NLTagScheme allow you to chunk text into specific, labeled types. There are several built-in tagging schemes available, and you can specify your own.
NLTokenizer can break up text into documents, paragraphs, sentences or words.
Use Create ML and MLTextClassifier to train your own models to classify larger chunks of text, like sentences, paragraphs or documents.
Use Create ML and MLWordTagger to train models to classify text at the word level.
NLModel wraps Create ML models like MLTextClassifier and MLWordTagger in a way that ensures inputs are preprocessed in your app the same way they were during training. It’s also the required type for custom tagging schemes used with NLTagger.
Where to go from here?
This chapter covered most of what Apple makes easy via the Natural Language framework. You can find a completed version of the project in the chapter resources at projects/final/SMDB. When you’re ready, go on to the next chapter, where you’ll learn how to implement more advanced NLP features that involve creating custom models in Keras. You’ll continue working with this app, adding the ability to translate Spanish-language reviews into English.
You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.