[pdftex] Merging duplicate embedded fonts

Discussion:

Maarten Bezemer

2013-10-07 08:53:35 UTC

Hello,

I have a problem with pdflatex asked about at
http://tex.stackexchange.com/questions/136574/merging-duplicate-embedded-fonts

In the end I got suggested to further ask over here.

I have a LaTeX project that contains 2 PDF images. Both have a text with the
same font that is fully embedded (not subset, to keep things simple).
The PDF that is created from the LaTeX sources does contain the font twice,
once for each included image. I suppose that pdflatex should keep only one
copy, especially since both fonts are fully embedded so it is easy to
determine they are duplicate.

My LaTeX file is as follows:

\documentclass{article}
\usepackage{graphicx}

\begin{document}
\includegraphics{image1}
\includegraphics{image2}
\end{document}

pdffonts shows:
$ pdffonts mydoc.pdf
name type encoding emb
sub uni object ID
------------------------------------ ----------------- ---------------- ---
--- --- ---------
SDXKYB+CMR10 Type 1 Builtin yes
yes no 6 0
DejaVuSans TrueType WinAnsi yes no
yes 11 0
DejaVuSans TrueType WinAnsi yes no
yes 17 0

In my original document I have lots of images containing texts, resulting in
lots of duplicate fonts. Obviously, I normally use subsets reducing the size
of the final document. But I am (also) not able to merge the duplicate
subsets... So I thought to use fully embedded fonts, but those do also not
properly merge...

Am I doing something wrong resulting in the duplicate fonts? Or did I
encounter a bug in pdf(la)tex?

The resulting PDF file is set online [1], as well all the source files [2]

Best regards,
Maarten

[1]: https://dl.dropboxusercontent.com/u/9671810/mydoc.pdf
[2]: https://dl.dropboxusercontent.com/u/9671810/mydoc.zip

Reinhard Kotucha

2013-10-07 23:21:44 UTC

Permalink

Post by Maarten Bezemer
Hello,
I have a problem with pdflatex asked about at
http://tex.stackexchange.com/questions/136574/merging-duplicate-embedded-fonts
In the end I got suggested to further ask over here.
I have a LaTeX project that contains 2 PDF images. Both have a text with the
same font that is fully embedded (not subset, to keep things simple).
The PDF that is created from the LaTeX sources does contain the font twice,
once for each included image. I suppose that pdflatex should keep only one
copy, especially since both fonts are fully embedded so it is easy to
determine they are duplicate.
\documentclass{article}
\usepackage{graphicx}
\begin{document}
\includegraphics{image1}
\includegraphics{image2}
\end{document}
$ pdffonts mydoc.pdf
name type encoding emb
sub uni object ID
------------------------------------ ----------------- ---------------- ---
--- --- ---------
SDXKYB+CMR10 Type 1 Builtin yes
yes no 6 0
DejaVuSans TrueType WinAnsi yes no
yes 11 0
DejaVuSans TrueType WinAnsi yes no
yes 17 0
In my original document I have lots of images containing texts, resulting in
lots of duplicate fonts. Obviously, I normally use subsets reducing the size
of the final document. But I am (also) not able to merge the duplicate
subsets... So I thought to use fully embedded fonts, but those do also not
properly merge...
Am I doing something wrong resulting in the duplicate fonts? Or did I
encounter a bug in pdf(la)tex?
The resulting PDF file is set online [1], as well all the source files [2]
Best regards,
Maarten
[1]: https://dl.dropboxusercontent.com/u/9671810/mydoc.pdf
[2]: https://dl.dropboxusercontent.com/u/9671810/mydoc.zip

What happens if you push your file through Ghostscript?

ps2pdf mydoc.pdf mydoc-gs.pdf

Regards,
Reinhard

--
----------------------------------------------------------------------------
Reinhard Kotucha Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover mailto:reinhard.kotucha at web.de
----------------------------------------------------------------------------
Microsoft isn't the answer. Microsoft is the question, and the answer is NO.
----------------------------------------------------------------------------

Maarten Bezemer

2013-10-08 11:04:46 UTC

Permalink

<snip>

Post by Reinhard Kotucha

<snip>

Post by Reinhard Kotucha
What happens if you push your file through Ghostscript?
ps2pdf mydoc.pdf mydoc-gs.pdf
Regards,
Reinhard

Ghostscript converts the embedded fonts into embedded subset fonts. But it
does not merge the duplicate fonts:

$ pdffonts mydoc-gs.pdf
name type encoding emb
sub uni object ID
------------------------------------ ----------------- ---------------- ---
--- --- ---------
CPKUFF+DejaVuSans TrueType WinAnsi yes
yes yes 8 0
KUMHSN+DejaVuSans TrueType WinAnsi yes
yes yes 10 0
LYKUVW+CMR10 Type 1C WinAnsi yes
yes no 12 0

Regards,
Maarten

Ross Moore

2013-10-08 19:58:47 UTC

Permalink

Hello Maarten,

My understanding is that pdfTeX does not try to parse the internal structure of embedded files, whether they be in PDF or other format. It just assumes that they will work within the context in which you are embedding them, and takes no further responsibility apart from including them straight, within an appropriate PDF XObject wrapper.

How can the software know that the font included within each image is indeed the same?
Even if named similarly, and occupying the same number of bits, this doesn't preclude the internal structure being different. This could well be the case for two different subsets of the same base font, used in images each having just small amounts of text. But pdfTeX doesn't even look for what fonts are in the image, so no comparison will ever be made.

Post by Maarten Bezemer
\documentclass{article}
\usepackage{graphicx}
\begin{document}
\includegraphics{image1}
\includegraphics{image2}
\end{document}
$ pdffonts mydoc.pdf
name type encoding emb
sub uni object ID
------------------------------------ ----------------- ---------------- ---
--- --- ---------
SDXKYB+CMR10 Type 1 Builtin yes
yes no 6 0
DejaVuSans TrueType WinAnsi yes no
yes 11 0
DejaVuSans TrueType WinAnsi yes no
yes 17 0
In my original document I have lots of images containing texts, resulting in
lots of duplicate fonts. Obviously, I normally use subsets reducing the size
of the final document. But I am (also) not able to merge the duplicate
subsets... So I thought to use fully embedded fonts, but those do also not
properly merge...
Am I doing something wrong resulting in the duplicate fonts? Or did I
encounter a bug in pdf(la)tex?

pdfTeX is doing nothing wrong.
If you want to combine the fonts for each image, then those images cannot be considered as independent objects within the PDF. Suppose you want to isolate and extract an image from the final PDF? The reader software will have to be smarter than just extracting a simple XObject. It will need to build a new PDF on the fly, containing all the required fonts, appropriately referenced. Some PDF readers may be able to do this, but others will not.

Presumably you want a size-reduction in your final PDF. This can only come as a compromise in the functionality according to the browser used by your audience, and/or at the expense of extra processing when the full document is created.

Ghostscript has been suggested already.
Or try using Acrobat Pro to save a "reduced size" PDF, as an extra step after pdfTeX.
Whether this latter will work may depend upon the characteristics of the font; in particular whether it is known already to the software installation, and it's licensing conditions.

Post by Maarten Bezemer
The resulting PDF file is set online [1],

I'll give APro a try and get back to you with the results.

Post by Maarten Bezemer
as well all the source files [2]
Best regards,
Maarten
[1]: https://dl.dropboxusercontent.com/u/9671810/mydoc.pdf
[2]: https://dl.dropboxusercontent.com/u/9671810/mydoc.zip

Cheers,

Ross

Ross Moore

2013-10-09 20:26:28 UTC

Permalink

Post by Ross Moore
Presumably you want a size-reduction in your final PDF. This can only come as a compromise in the functionality according to the browser used by your audience, and/or at the expense of extra processing when the full document is created.
Ghostscript has been suggested already.
Or try using Acrobat Pro to save a "reduced size" PDF, as an extra step after pdfTeX.
Whether this latter will work may depend upon the characteristics of the font; in particular whether it is known already to the software installation, and it's licensing conditions.

Post by Maarten Bezemer
The resulting PDF file is set online [1],

I'll give APro a try and get back to you with the results.

When you resave your example file using Acrobat Pro, the two font
instances are indeed combined and also subsetted.
Other Metadata is also added, with the perhaps surprising result
in an overall *gain* in size: 16kb has become 34kb.

Due to the different way compression is handled within the two files,
it is hard to identify just where all the size-difference lies.

The attached image shows a schematic view of the internal structures
(without MetaData) of both the original and APro-produced file.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen shot 2013-10-10 at 7.06.34 AM.png
Type: image/png
Size: 206981 bytes
Desc: not available
URL: <Loading Image...

>
-------------- next part --------------

Now presumably you have in mind embedded more than two images,
but maybe hundreds or thousands?

Using Acrobat pro as a post-processor of the pdfTeX-built
original PDF would most likely then lead to a savings in size.
If you have such a larger example, we could test this hypothesis.

Acrobat Pro gives a lot of control over just what things are discarded
when you do a save as "Optimized PDF".
I specified just removal of one instance of the DejaVu font; but in
another test, the combination and subsetting happened anyway just
accepting all the defaults.

Post by Ross Moore

Post by Maarten Bezemer
as well all the source files [2]
Best regards,
Maarten
[1]: https://dl.dropboxusercontent.com/u/9671810/mydoc.pdf
[2]: https://dl.dropboxusercontent.com/u/9671810/mydoc.zip

Hope this helps,

Ross

------------------------------------------------------------------------
Ross Moore ross.moore at mq.edu.au
Mathematics Department office: E7A-206
Macquarie University tel: +61 (0)2 9850 8955
Sydney, Australia 2109 fax: +61 (0)2 9850 8114
------------------------------------------------------------------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: logo.png
Type: image/png
Size: 5257 bytes
Desc: not available
URL: <Loading Image...

>
-------------- next part --------------

Maarten Bezemer

2013-10-10 08:29:39 UTC

Permalink

Post by Ross Moore

Post by Ross Moore
Presumably you want a size-reduction in your final PDF. This can only come
as a compromise in the functionality according to the browser used by
your audience, and/or at the expense of extra processing when the full
document is created.
Ghostscript has been suggested already.
Or try using Acrobat Pro to save a "reduced size" PDF, as an extra step
after pdfTeX. Whether this latter will work may depend upon the
characteristics of the font; in particular whether it is known already to
the software installation, and it's licensing conditions.>

Post by Maarten Bezemer
The resulting PDF file is set online [1],

I'll give APro a try and get back to you with the results.

When you resave your example file using Acrobat Pro, the two font
instances are indeed combined and also subsetted.
Other Metadata is also added, with the perhaps surprising result
in an overall *gain* in size: 16kb has become 34kb.
Due to the different way compression is handled within the two files,
it is hard to identify just where all the size-difference lies.
The attached image shows a schematic view of the internal structures
(without MetaData) of both the original and APro-produced file.

Thanks for trying in Acrobat Pro.

When I look at your screenshot, I do not see that the font instances are
combined. Both sides show the same amount of fonts. It is nicely shown that
each instance is part of an embedded PDF file (XObject) though.
I assumed the PDF images would have been 'better' merged, I now see why it is
more complex to merge the duplicate fonts, as they are not really duplicate,
but part of the embedded PDF files...

Too bad, I hoped to solve this 'problem', but I see now that this is fairly
complex. I could see whether I can embed the images as LaTeX files, so they are
properly merged, but I am afraid I do not have enough time to do so before my
deadline.

Thanks for your help and provided details!
Regards,
Maarten

Peter Rolf

2013-10-10 12:12:03 UTC

Permalink

Post by Maarten Bezemer

Post by Ross Moore

Post by Ross Moore
Presumably you want a size-reduction in your final PDF. This can only come
as a compromise in the functionality according to the browser used by
your audience, and/or at the expense of extra processing when the full
document is created.
Ghostscript has been suggested already.
Or try using Acrobat Pro to save a "reduced size" PDF, as an extra step
after pdfTeX. Whether this latter will work may depend upon the
characteristics of the font; in particular whether it is known already to
the software installation, and it's licensing conditions.>

Post by Maarten Bezemer
The resulting PDF file is set online [1],

I'll give APro a try and get back to you with the results.

When you resave your example file using Acrobat Pro, the two font
instances are indeed combined and also subsetted.
Other Metadata is also added, with the perhaps surprising result
in an overall *gain* in size: 16kb has become 34kb.
Due to the different way compression is handled within the two files,
it is hard to identify just where all the size-difference lies.
The attached image shows a schematic view of the internal structures
(without MetaData) of both the original and APro-produced file.

Both instances refer to the same object (28 0 R), so the fonts are merged.
The size increase if probably document overhead (XMP,...), inserted by
Acrobat. Nothing to worry about, as the optimization will save a lot
more bytes for bigger documents.

BTW: you get a better overview by using the [Audit space usage...]
button in the 'PDF Optimizer' dialogue of Acrobat (here Acrobat Pro 9;
menu 'Advanced->PDF Optimizer'; button is located at the top,right). It
shows how much bytes of the document are used for the fonts.

Post by Maarten Bezemer
I assumed the PDF images would have been 'better' merged, I now see why it is
more complex to merge the duplicate fonts, as they are not really duplicate,
but part of the embedded PDF files...
Too bad, I hoped to solve this 'problem', but I see now that this is fairly
complex. I could see whether I can embed the images as LaTeX files, so they are
properly merged, but I am afraid I do not have enough time to do so before my
deadline.
Thanks for your help and provided details!
Regards,
Maarten

Hope that helps,

Peter

Thierry Bouche

2013-10-09 16:24:45 UTC

Permalink

M> Am I doing something wrong resulting in the duplicate fonts? Or did I
M> encounter a bug in pdf(la)tex?

In principle, pdflatex merges embedded fonts if and only if they
appear with a full line in the map file (including exact Postscript
name). I don't know exactly if this is done for other font formats
than Type 1, though. (I suspect this aspect of font subsetting was an
add-on by Thanh quite some time ago, and the current team is not so
much familiar with it, but I hope I'll be proven wrong!)

Best wishes,
Thierry

The Thanh Han

2013-10-09 18:17:22 UTC

Permalink

Post by Thierry Bouche
M> Am I doing something wrong resulting in the duplicate fonts? Or did I
M> encounter a bug in pdf(la)tex?
In principle, pdflatex merges embedded fonts if and only if they
appear with a full line in the map file (including exact Postscript
name). I don't know exactly if this is done for other font formats
than Type 1, though. (I suspect this aspect of font subsetting was an
add-on by Thanh quite some time ago, and the current team is not so
much familiar with it, but I hope I'll be proven wrong!)

the fonts are TrueType. pdftex can only "merge" Type1 (and Type1C) fonts.

Regards,
Thanh

Maarten Bezemer

2013-10-09 18:56:13 UTC

Permalink

Post by The Thanh Han

the fonts are TrueType. pdftex can only "merge" Type1 (and Type1C) fonts.

It is not even possible to just replace one font with exactly the same (and
drop the replaced font)?
I have fully embedded fonts, so they should be exactly the same..?

Best regards,
Maarten

Continue reading on narkive:

Search results for '[pdftex] Merging duplicate embedded fonts' (Questions and Answers)

replies

Creating a "Why we should switch to Mac" Speech. Help Please?

started 2007-10-26 15:44:14 UTC

desktops