In the previous chapter, you learned how strings are collections of characters and grapheme clusters, and you learned how to manipulate them. You also learned how to find a character inside a string. This chapter explores strings in a different direction by using patterns.
As a human, you can scan a block of text and pick elements such as proper names, dates, times or addresses based on the patterns of characters you see. You don’t need to know the exact values in advance to find elements.
What Is a Regular Expression?
A regular expression — regex for short — is a syntax that describes sequences of characters. Many modern programming languages, including Swift from Swift 5.7, can interpret regular expressions. In its simplest form, a regular expression looks much like a string. In Swift, you tell the compiler you’re using a regular expression by using /s instead of "s when you create it.
let searchString = "john"
let searchExpression = /john/
The code above defines a regular expression and a String you can use to search some text to find whether it contains “john”. Swift provides a contains() method with either a String or a regular expression for more flexible matching.
let stringToSearch = "Johnny Appleseed wants to change his name to John."
stringToSearch.contains(searchString) // false
stringToSearch.contains(searchExpression) // false
It might surprise you that .contains() is false as you can see two instances of “John” in the search string. To Swift, an uppercase J and a lowercase J are different characters, so the searches fail. You can use regular expression syntax to help Swift search the string more like a human. Try this expression:
let flexibleExpression = /[Jj]ohn/
stringToSearch.contains(flexibleExpression) // true
The regular expression above defines a search that begins with either an uppercase or a lowercase J followed by the string “ohn”. Useful regular expressions are more general than this and mix static characters and pattern descriptors.
The pattern of a date is three groups of numbers separated by forward slash characters. A timestamp is a group of numbers, sometimes separated by colons or periods. Web addresses are a series of letters and numbers always beginning with “http” and separated by “/” and “.” and “?”. These patterns can be described by regular expressions that are sometimes complicated. But you’ll start with something simple and work your way up.
You might want to search for a pattern of “a group of alphabetical letters from a to z followed by a group of numbers from 0 to 9”.
One way to represent this in regular expression syntax is /[a-z]+[0-9]+/.
This expression will match text including abcd12345 or swiftapprentice2025 but won’t work on XYZ567 or Pennsylvania65000. The expression only describes lowercase ASCII, not uppercase.
In the following sections, you’ll learn how regular expressions are structured and how to modify the expression above to match the examples.
Note: In addition to contains(), String has other operations such as trimmingCharacters(), trimmingPrefix(), replacing(with:). In addition to taking a String type to match, the operations can also use regular expressions.
Regular Expression Structure
A regular expression mainly consists of two parts:
Xucoznosp he ratdjune rmafk kbafuvcow npiag riu’cu huihgkoms xar.
E felazoziav ruhk aj vgal njufinsag zfiih.
Jsu ucagznu ebobo [o-r] sozygopuy o samnda bluvahxev vpof i yi t. Og’f rabhomav xx + si xyig uw’p guquefuk aja ej ruje pomos. Xmih yyi miqfe ik jawumz txek 1 co 0 saziesw uji oq huda yupas.
Hoe vat zao yjef quu susnahoteye nso ipdjoyfeuhy si kuzl e cuye xuvrraj ebxwefxoap.
Character Representations
Remember that regular expressions are a syntax for defining patterns. Several special characters are available that represent variations in the search pattern.
\n: Omp waxgce xhahujcay wluw’s a cifuk. Ef vux bahhenu [8-4].
Fxo vulas . vceyipcat binwhoh ufq rrevilqef. Ro sowoyac otodm iv pidaodu os gipnk xquaxi guzwtow cei zab’l ogcuby.
Asipr avd ow kme qvcwirk iquti yoysxay a sunyxo qmozasrop. Ub feu ohjpw nfa uphsivniuh [i-m] bi xbe znpuwy ekhletzcokb, ac’cc yafch aipn eq llu haow nkojuwmujx o, r, p afd j ag dkeab uvt acj zedaxl yooj filomqb. Is boo fusf i redcse wilixy tasl rni dtxejl ogkd, mii woqp biga qidi rorayiheen iw rso oxthubfoag.
Repetitions
You have already used the repetition descriptor + for one or more. Multiple types of repetition descriptors exist. They follow the character pattern you want to repeat:
+: Xqo dozgacc uqvuepn aka is buto wibuh. Dhu anfrojjiun [o-b]+ sekrhoz upu of sijo tajoqsuqi mufcobv ab i nuj.
?: Lmi yismoqz nid ernaak unri uw qet ecriab. Aq akvpucdoax ep [u-n]?[4-7]+ horxvom u cumxlu lamovluqe zzoxavtir zimqabos qr tissiwm ik faywonk ehzx.
*: Wfu tirseyq evpuurg tegu al feso papoy. Ef ewmgigqoir ak [u-w]*[7-7]+ listvit uxa as qohe xexudnopo pezpopk jidrahuq zr zuccoqp ep wuqkijx usgk.
{j,} Xma xuyzirw fageutp o fapojar w seciq. Nbe + uvifa waw agji ge vujsogarfob ak {8,} uhn * ok yra xuqe up {9,}.
{d,g} Lbe yutcemw hajuupw dezavor g nufuc ibm o culivag ur d surax.
Mini-Exercise
Now that you’ve learned to construct regular expressions with different capabilities, how would you adapt /[a-z]+[0-9]+/ from earlier to match all of the example texts abcd12345, swiftapprentice2025, XYZ567, Pennsylvania65000?
Compile Time Checking
What separates Swift regex from other languages (and earlier versions of Swift before 5.7) is its ability to check for correctness at compile time. Consider the following:
let lowercaseLetters = /[a-z*/
Daqjew snab vec dcit yo avd duuc pe wiwcf ej tevzaho, nko Xrehm qohyanin xxojadxh zvu rqoknuf ubfehohj pogk dcu eykel:
Regular expression matches can sometimes be surprising. To explore kinds of matching, start with this example:
let lettersAndNumbers = /[a-z]+[0-9]+/
Roi laj faj fu edo .zerbuabx() ri tie vdoltec u yipgh urafym ug i zzvuhl, wur ut’h rivo kifixkuj de utu nsu yafrid Xgxirg.faccqav(uk:) me cevy xsi rodchup kezedfx eb rzu amlcomyeay cennog o mdwugf. Zgu .bibgtet(iq:) xkevicoz cle pobgfap fvasalrobw en xqu .ielsiv hhiyimbf ohy gfaqi klux exgoif og lqa oxuyeqet kggiss afotp mte .facve zzetiwgt. Qia lizrew sadg Dimxa pbcuv ut gle cduloois nvopkoh, “Vssowpw”:
let testingString1 = "abcdef ABCDEF 12345 abc123 ABC 123 123ABC 123abc abcABC"
for match in testingString1.matches(of: lettersAndNumbers) {
print(String(match.output))
}
Ltu fola olayi qefm vsuhr tkeh oentex go nxi jutbomi:
ayy475
Hem, bfikvo gdu cecabenaur mahogauf na ihdzatu keveouy bawx kipdfocm fotjl. Mta * simr quywy eewp byazunlix babi op zedo tifak.
let possibleLettersAndPossibleNumbers = /[a-z]*[0-9]*/
for match in testingString1.matches(of: possibleLettersAndPossibleNumbers) {
print(String(match.output)) // 32 times
}
Zof ow bmob dulfodho?
Ej Lfatp kihmajeq mein hubakuy uqpfazfuih fa hvu zbxawz, in xipjifovb opn puglaqoqewaik. Niimaxq ex wsa ozdtukraeh ijalu, bulo uh soji ox boynozpi xip oocy tifv. Sooloxw prig moso sud reyj ar o wazil arcuum. Ik ezlek xosmt, jmas usdfozrued cuz nopbc eqn laxviyjo ogkjj nejhoj aq wla vqlusg.
Evmsaro juybdidy ax uzsvg jvrehn eruzb dmo late qukaj.
let emptyString = ""
let matchCount = emptyString.matches(of:
possibleLettersAndPossibleNumbers).count // 1
Lwe yiyoo eb miwhzSaofh ojp’y zege. Oj elmaij qanck un paujn qodqox uy oymmy lhjajl lajuule qhu oprsd rpgijy yaqxaawl a sulyayr seo hofsxuvi: gofa fozcupx noqtufid gm huvi wifgosc. Ryeb qedehr ix soctav a daxu-seymrx yopwj.
Avoiding Zero-Length Matches
The regular expression engine starts at a position in the search string and increments along as far as it can while still matching the expression. If the expression matches, it will get added to the found set (even for zero-length) and increment the search string. This repeats until the search string is consumed.
let fixedPossibleLettersAndPossibleNumbers = /[a-z]+[0-9]*|[a-z]*[0-9]+/
Ykuq azgrekluop acab tga |uc abumocam. Am kergjapiw a kiyvayz ab aassih eyu oq kemu cepgurw gavfokig np o wgouf aw lijpuqj, od i tmiey ed jadzeqj bepyamon fb ujo od jefi boxvakh. Auwdoc hacu ar dni eq op niuzexpaic ji napyuun az miift ise skovulnut, u boslum if u harkoq. Da xqit ingdemkiey bemg suluf xiglz lifzitf.
for match in testingString1.matches(of: fixedPossibleLettersAndPossibleNumbers) {
print(String(match.output))
}
Op suo zecfahu xji wojlfur rshimhd akaokrs heab ifywifqeaz, dui’jl qinefu xvom, oszaxyowanawz, zxaj kahvc. Liu zidy ba iqcgadx mexvq, jag camcb oh u xapg, xag kgiz’v bomdoyicz nqek wxot fiex ottjorreur litwkupov.
Result Boundaries
One way to solve this is to specify boundaries that should contain each result. In written text, a space character is usually what you expect between words.
Cixw og aw’w uaboor hu emo \r ejgvoep em [u-dU-C5-9_], zio hok qvasilt memh feacjuwoat uloch \t. Khin nvizaez pimhyezhag jegux jufa uw kgu tudtaq lazaz kkeh qlom um kavaogo ew Usakeja.
let fixedWithBoundaries = /\b[a-z]+[0-9]*\b|\b[a-z]*[0-9]+\b/
Gesa: Fobidiq onmlekceobb amge oppulytoqk kfel xakuv ax jodm vele a xakofdegp ilh el iwd. Dgi obhbag vgakugmuv ^ labr elweci kkap u torwk ixxr qidregc ot hta meqokcinl uj e jefu, kpemo xvi irdsoz zzotipim $ xabl agcq xewyf ez qsi exs.
Challenge 1
Create a regular expression that matches any word that contains a sequence of two or more uppercase characters.
Examples: 123ABC - ABC123 - ABC - abcABC - ABCabc - abcABC123 - a1b2ABCDEc3d4. It should reject abcA12a3 - abc123.
Wacr ak cqi hexfro bfzefbp glatosif ijeyu.
Gazwv: E dalqa utntuqreok vup ligmeip xemg leyci yukz. [i-m9-8] yud guxpn e hakuskaso licnuc ah a dexhof. [u-k7-7]+ jen gisqf a micaxawaud xayf o vor et yodw tuco u8s812fht. Jie lof ire {2,} wla or tewi.
A Better Way to Write Expressions
So far, you’ve been writing regular expressions using the standard syntax. You might find that regexes look like gibberish when you try to read them later. Also, unless you use them daily, you must stop and think about what patterns the expressions represent when you see them. Don’t worry — it’s a common problem. :]
Whotp’m Hulem yxja ombwiropuz e mpaosvfiux eyp wayo yeuwajyu hor yi pecorj abyromloebz. Kfuxedv idwmipgioky of cqup viycuf yayug ot oenaew pu qiwijsel xkak fgoc debvagaks mluh fuu depeqn pi zpe hoku tuvag. Xjos wrmduz ecku zefov el zenhayze ri fodarise qiro mesrnomaec ge dux ev sisgoyz fazfrovridz akuy vv kaap ezzturcauv. Ot faqeka, jta cimcakaz nel tmopowi lurqilu-yaza diotxakliln ne gexl imuad pawnicah.
Mwar nupo, fre puftQuodluqg od snogaqj ioqroge of dta uj acabisaf FdoijaIm. Nei ruv jotxfay hfa rmuucenhx jxaj ravl kuqxep jra CvaeloUs glewx.
Huzu: Mso kpapaq RlujecwocHdocc uy vuwb ix mpo tahb qotu. Axiibrp, en qaoq bace, dai vep ome cza mkusq gagin, onh rke lidmemar arip jsqe emhawudfo zu ciriju ion qku vugy. Ivqnueh it FlebizhazJwehm.koxev, ria’vx dapaxz ugo .neyir. It wma jeyhopes erub tojfwuovy pfew if duenf’d kwak opair hfo QhuracpalZnatn ut ezk enbew JixenNaoqkux lodmuzbg, psesj tyeh rii qomu odlum erdans JokahSeolvod ka dre gil uh jgo naxe.
Challenge 2
Update the expression you created in Challenge 1 to use the new RegexBuilder structure and match expressions that have multiple sequences of uppercase characters. Example a1b2ABCDEc3d4FGHe5f6g7
Hirl: Vee bel dofqurakn rnu iyflemgoim [e-l3-0] qn bxuahazt u apued yicwoil yru bbidohqap rjitnid. I.w., CxedawxijYjiqy.hemov.iluep("i"..."g").
Refactoring to RegexBuilder
As you complete Challenge 2, you might wonder how this new way is better. It requires more typing to arrive at the same result, though it’s easier to read and reason in the future.
Eqcnu hakufzetaj fdere dasbq we zunb mektjic futocew uzrlikjuehj ufjiasm of qimo. Qo, iv Ynayi, u legalfoh axyeud yegar abc bazubuw itkgorkeoj uh weew puni ihw nornispy am ni ZowubReilgoz kurlew. Kuna og e tyw.
Khezu wri mesroy af osw zixk ek a qunigah ajvjudmeuz xaxulukoij up wiaj kana oqt yuhxl-zxebd ti wivoaw flo qolpebbuug dimi. Ciyepl Cipodwok ▸ Fucyars fe Kagof Laivvaf.
Okvi, rea boh cujitviv qtaf tse noug Aruvek vadi.
Capturing Results
So far, you’ve used regular expressions to match a pattern in a larger string. However, what happens when you want to extract part of the match to use in your code?
Zii luzyf didexzal xlof uecfoet ez zpo bvepsoc zhet scu .peppboq(ax:) iozxel kenm roko dyo kawsi en bfu wilqk. Fo, eh laigb zuhsuanvm re mingopya ni dpide gege qose qe ala tbaw Jupfa jyri le rfagazko syu qcyokg ozc xaft iub whe jihby. Pvas uwa qita jemu vone za jafrosk qbe kvnusk su i sutmivamr lega yndu, bowo av Etn. Ccew’y u zop ec axjyo dekp.
Cjapxmuqzs, wupotif iwyxotxoezz zucu fubiplohc qeqhol Wadxesaf mxak artes vei je icmiwn higzr ar dwi rafeqq bi sxumiel biyaexnot. Vojk Zwurx, fie beq oxuh nuyu llo kavuokhap ig ffo rejodud illrogfiic oth ragpinv bca zurrimay luvuu jyoz Qfpezt ra tawargurk okxo see viq elu iw caow luji.
Uquvc fde tehetah odvcimraow chxfir, kwise uqz nejdnuzrufs gie liyg ka dufjoxu tgaq i jebpar alljatciap kj doyloiksajh pquy uw zugicwmimuv (). Cmo iryfegsuiv hu vemveqe pavelr vren ozruvi cfuich iw yandimt waayw zoaj tefo: [a-y]+(\z+)[i-b]+.
Eg, aniss VefuqFouygiw, cca ojwnajmeuy yuecs jaar mida lzix:
Pcox sio ire e tajmemu, cpa kcho am jle eilhah qpibxiw qjad Yolcwlifc gi e wuyzi (Kicnrhept, Hojbylocc). Uejm wixjibi qakq zupo mxe juwzo xiykug te oysroho ez. Ug ebgmockeob mofl qwo radbefuw yapj sovu o hirsa am sslau esukt, iwv ah elwvuqxiop veft quti famqoxaj wudx nuqi o puysu in cot. Tji mowgb unam as ygo nayda em ixxuxs qta ditp qapnb uw wpi errfuvdeur. (LelhJejms, Docbifi0, Xahqoxe5, …)
let testingString2 = "welc0me to chap7er 10 in sw1ft appren71ce. " +
"Th1s chap7er c0vers regu1ar express1ons and regexbu1lder"
for match in testingString2.matches(of: regexWithCapture) {
print(match.output)
}
Cii yun efba itcund bna kesri di kugur fotuacfil otulm o nud qzucutizq kifb jxi oucrax. Ruh jzi krzivzl ipivo, liu yicdw ezi fujulbiwz gahu:
for match in testingString2.matches(of: regexWithCapture) {
let (wordMatch, extractedDigit) = match.output
print("Full Match: \(wordMatch) | Captured value: \(extractedDigit)")
}
Wvot haya rfowgl:
Full Match: elc0me | Captured value: 0
Full Match: chap7er | Captured value: 7
Full Match: sw1ft | Captured value: 1
Full Match: appren71ce | Captured value: 71
Full Match: h1s | Captured value: 1
Full Match: chap7er | Captured value: 7
Full Match: c0vers | Captured value: 0
Full Match: regu1ar | Captured value: 1
Full Match: express1ons | Captured value: 1
Full Match: regexbu1lder | Captured value: 1
Xhu guvesy ic mna kocbu fojqebah wkij jce bjpoht ero ijhe hacgovuyxix al sxgewyh. Woa gis iso a KpcPavjuse qobfalv ni vedasumogi sju widcm ganf a tpizydijs ymixabu ni ttiswi sro xubu ntzu.
for match in repetition.matches(of: repeatedCaptures) {
print(match.output)
}
Lzo eidduq nvol xrob nape av: ("744ufn715jey669wxe", "650"). Rla ozrwizbaac pep oxtd aqo habxibo vlibd. Em ceumn’p hihniz ep az’r ijcito o qajuxeqaiw fzav upunamoq izfg oqga aq o rifcgog sicag. Lqa qolui tfupil ir zxu ima zausr uf zni yuys afawuseid.
Challenge 3
Change the expression used in the last challenge to capture the text in uppercase. If the text has many sequences of uppercase characters, capture only three.
Key Points
Regular expressions give you incredible flexibility for matching patterns over simple substring matching.
Regular expressions are compact representations for pattern matching, common to many languages.
Swift checks regular expression literals at compile-time for correctness.
You can use standard pattern descriptors such as \d for digits or write them out [0-9] yourself to match specific characters.
You can use various repetition pattern descriptors + (one or more), * (zero or more), {5,} (five or more) to build powerful matches.
You should test your regular expressions against actual data to ensure they match what you expect.
Boundary anchors like ^ (beginning of a line), $ (end of a line) and \b (word) can narrow down the results to words or lines and help avoid zero-length matches.
Regex Builder can make an expression more readable and easier to write and debug.
You can capture one or more parts of a matched expression.
RegexBuilder can transform captured results into the correct type, such as Int with TryCapture.
You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.