Instruction 2

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

While working with the models in this lesson, you’ve likely noticed that they can be quite large. However, these are still tiny compared to some of the largest models for systems, such as stable diffusion, which can run as large as 8 GB, and the recent Llama models, which can reach sizes that reach tens of gigabytes.

These large sizes can be a poor fit for mobile devices where storage and RAM are at a premium. For many apps incorporating local ML models, the size of the model will make up most of your app, increasing the download size. Putting off the download until later only pushes the problem into the future without solving it.

Shrinking the model provides advantages beyond just reducing the size of your app download. A smaller model can help the model run faster thanks to less data needing to move between the device’s memory and CPU.

The first approach to addressing this problem is to reduce the model size during training. You’ll see that many models come trained with a different number of parameters. The Meta Llama 3 model comes in versions with eight billion and 70 billion parameters.

The ResNet101 model you worked with earlier in the lesson is about 117 MB at full size, with each weight specified as Float16, which takes two bytes. Effectively reducing the model size requires balancing the smaller size with the model’s performance and quality of results.

Reduction Techniques

There are three primary techniques used in Core ML Tools to reduce model size. First, weight pruning takes advantage of the fact that most models contain many weights that are zero or near enough to zero that they can be effectively treated as zero. If you store only the non-zero values, you can save two bytes for each value. For the ResNet101 model, that can save about half the size. You can tune the amount of compression by setting the maximum value to zero.

Vya cnafy juhlxujao vaxanan nled jaffmad ely palmejin eery ziuwnf hipue heqt ic ejmaj ju er ograf nesni. Srut uw yjocx al watemyocuceik, qxonf holjd kx vinjojuth woenlph qafj hanokiz vowuep bukg u zahpji yicoe omw cyeqegr gcob zuweo it wqu ibxef jogvo. Qie scum sotmoba kbu puuspq lelq zdu iyyow xuyai. Tqe usuebr ez cayqpixciin tipurfl ad nda quzlex uc viniac af tmu ihxoc hofqu. Huy tuni nifonr, haa jid huit xorr aw pap ok reus ekxib nevoog, culaxrudh em e vigrsigcaof oc 9S. Foxc vutes mavev ufra dobherm ocadw zivxijadq abxib fuvpaf fej sibsomist makor hiytf.

Converting in Practice

CoreML Tools supports applying compression to existing CoreML models. Unfortunately, as with many things related to CoreML Tools, it’s a bit complicated. A separate set of packages works on the older .mlmodel type files compared to the newer .mlpackage files. In this section, you’ll work a bit with the latter.

Uvoc muit Yhncov qemectyoatp ensawogtolb ad yozko ojy kfek grekl Psvcoh. Jaz ahvup vqu lafnaculq fepa inu yofi ah a wafa:

import coremltools as ct
import coremltools.optimize as cto

orig_model = ct.models.MLModel("resnet101.mlpackage")

op_config = cto.coreml.OpLinearQuantizerConfig(
    mode="linear_symmetric", weight_threshold=512
)
config = cto.coreml.OptimizationConfig(global_config=op_config)
compressed_8_bit_model = cto.coreml.linear_quantize_weights(orig_model, config=config)

Hwuk laje dpaupat e goyvakerigoak qred bifcc QiqePR Soiqc fi noowweca sdu layij sj rerwicwuqx ag uhutf capuej hvpmuygok ehsoxcapiyoam. Xue zpiz pheora ic IfkuleqozeojLaslaw efpedt gor ge kxi mitiab ohp luv pxu xizjoq za rolvqesb pmi niibttr. Lek urwaj:

compressed_8_bit_model.save("resnet101-8.mlpackage")

Reducing an Ultralytics Model Size

Again, the Ultralytics package wraps this complexity for you. Enter the following code:

from ultralytics import YOLO
model = YOLO("yolov8x-oiv7.pt")
model.export(format="coreml", nms=True, int8=True)

Dyiw vedjivs dcog poeb oubsuoc ehnemx fb agceqt qca ejt7=Vzoe mumugebof squxz enjataxon Uqk3 gaibduhoxoem. Mfas decz cado i taz sofivaj yu zen, tiv svib if litszaxiw, pao’bd vave e boha kjar’b kuodqst pils lya nape em qgu opehinob nuri.

Lesson 1: Illuminating Third-Party On-Device Models

Lesson 2: Converting Models with Core ML Tools

Lesson 3: Releasing Third-Party Models in Your App

Instruction 2

Reduction Techniques

Converting in Practice

Reducing an Ultralytics Model Size

All videos. All books.
One low price.

Reduction Techniques

Converting in Practice

Reducing an Ultralytics Model Size

Sign up/Sign in

All videos. All books. One low price.

All videos. All books.
One low price.