[pdftex] pdftex core dump when including certain pdf files

Discussion:

Karl Berry

2016-07-11 22:26:41 UTC

https://sourceforge.net/p/pyx/mailman/message/35118842/

Thanks for the report. I (or perhaps someone else, if we're lucky :)
will fix it for the next release. -k

KAKUTO, Akira

2016-07-11 23:33:58 UTC

Permalink

Hi Karl,

> https://sourceforge.net/p/pyx/mailman/message/35118842/

Can you reproduce the error?
I cannot reproduce the crash on windows.
Please see an attached test.tar.gz.

Karl Berry

2016-07-14 21:28:41 UTC

Permalink

Hi Andre - as Akira mentioned (thanks Akira), though maybe you didn't
see his reply, including your PDF with the floating-point StemV values
doesn't crash with the pdftex in TeX Live 2016, which is version
3.14159265-2.6-1.40.17, or any version back through 2010,
3.1415926-1.40.11. I stopped looking there.

This is with the original pdftex distributed with TeX Live, the
i386-linux binary. Therefore I surmise this is probably a bug in some
way that Ubuntu (or more likely Debian) has reconfigured/recompiled the
program. E.g., something in poppler vs TL's "libxpdf".

I suppose it is possible that it is something about i386- vs. x86_64-,
assuming you are on 64-bit, but that seems less likely.

So ... I cc Norbert. Norbert, does the attached tryplot.tex (which
merely includes the attached plot.pdf) crash with Debian pdftex? The
crucial thing is the non-integer value of StemV in plot.pdf. (The
original report leading to the thread here is at
https://sourceforge.net/p/pyx/mailman/message/35118842/, FWIW. I
simplified the report to use plain TeX instead of LaTeX.) --karl

Norbert Preining

2016-07-15 00:13:30 UTC

Permalink

HI Karl, hi all,

> So ... I cc Norbert. Norbert, does the attached tryplot.tex (which
> merely includes the attached plot.pdf) crash with Debian pdftex? The

Yes, it does.

The core dump backtrace is:
#0 0x00007f7b5a7611c8 in __GI_raise (sig=***@entry=6)
at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007f7b5a76264a in __GI_abort () at abort.c:89
#2 0x0000000000483a42 in Object::getDict (this=<optimized out>,
this=<optimized out>) at /usr/include/poppler/Object.h:217
#3 write_epdf () at ../../../texk/web2c/pdftexdir/pdftoepdf.cc:868
#4 0x0000000000471d13 in writeimage (img=<optimized out>)
at ../../../texk/web2c/pdftexdir/writeimg.c:382
#5 0x0000000000453d59 in zpdfwriteimage (n=<optimized out>) at pdftex0.c:22285
#6 0x00000000004673f5 in maincontrol () at pdftex0.c:38916
#7 0x00000000004183d9 in mainbody () at pdftexini.c:5656
#8 0x000000000040473e in main (ac=<optimized out>, av=<optimized out>)
at ../../../texk/web2c/lib/texmfmp.c:1013

SO I guess that should be forwarded to the poppler people.

> crucial thing is the non-integer value of StemV in plot.pdf. (The

Do you guess that this information enough for poppler devs to
provide a fix?

All the best

Norbert

--
PREINING Norbert + TeX Live & Debian Developer + http://www.preining.info
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13

Norbert Preining

2016-07-15 02:20:10 UTC

Permalink

Hi all,

a few things I have digged out and tried:
* compiling --without-system-xpdf (thus using the TL xpdf lib for pdftex
and xetex) makes the error go away
* I also confirm that changing the StemV manually to an integer
value fixes the problme (uncompress pdf, change, load) also
for poppler.

There is one thing that makes we wonder ...

The error message here is:
Call to Object where the object was type 2, not the expected type 1

> #2 0x0000000000483a42 in Object::getDict (this=<optimized out>,
> this=<optimized out>) at /usr/include/poppler/Object.h:217

That line is
Dict *getDict() { OBJECT_TYPE_CHECK(objDict); return dict; }
where the error comes from OBJECT_TYPE_CHECK:
#define OBJECT_TYPE_CHECK(wanted_type) \
if (unlikely(type != wanted_type)) { \
error(errInternal, 0, "Call to Object where the object was type {0:d}, " \
"not the expected type {1:d}", type, wanted_type); \
abort(); \
}

The types look related to what Karl said about vStem:
enum ObjType {
// simple objects
objBool, // boolean
objInt, // integer
objReal, // real
..

so that would be 2 (real) was passed in, but 1 (int) expected.

I just don't see how this happens to come from that line, as
objDict
is far below, 7 in the enum.

Ok, that for now

Norbert

--
PREINING Norbert + TeX Live & Debian Developer + http://www.preining.info
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13

KAKUTO, Akira

2016-07-15 05:04:03 UTC

Permalink

Hi Norbert,

> That line is
> Dict *getDict() { OBJECT_TYPE_CHECK(objDict); return dict; }
> where the error comes from OBJECT_TYPE_CHECK:

Many thanks.
If we remove
OBJECT_TYPE_CHECK(objInt);
in getInt(), as in xpdf, the crash disappears.
But this may not be a desirable fix.
See an attached Object.h.diff.

Note that Object.h.r40068 is
texlive/trunk/Build/source/libs/poppler/poppler-src/poppler/Object.h
and it is slightly modified by Peter compared with the
opriginal one in poppler 0.45.0.

Best,
Akira

KAKUTO, Akira

2016-07-15 05:51:20 UTC

Permalink

Hi Norbert,

> But this may not be a desirable fix.

If we use getNum() instead of getInt() in pdftoepdf.cc,
tryplot.pdf is created successfully even if we use poppler.
But I don't know the change is harmless or not.
See an attached pdftoepdf.cc.diff.

Best,
Akira

Ross Moore

2016-07-15 07:34:25 UTC

Permalink

Hi all,

On Jul 15, 2016, at 3:51 PM, KAKUTO, Akira <***@fuk.kindai.ac.jp<mailto:***@fuk.kindai.ac.jp>> wrote:

Hi Norbert,

But this may not be a desirable fix.

If we use getNum() instead of getInt() in pdftoepdf.cc<http://pdftoepdf.cc>,
tryplot.pdf is created successfully even if we use poppler.
But I don't know the change is harmless or not.
See an attached pdftoepdf.cc.diff.

Surely it is the right thing to be doing.
The PDF spec says StemV is a number so doesnât require an integer .

Leading

number

(Optional) The spacing between baselines of consecutive lines of text. Default value: 0.

CapHeight

number

(Required for fonts that have Latin characters, except for Type 3 fonts)The vertical coordinate of the top of flat capital letters, measured from the baseline.

XHeight

number

(Optional) The fontâs x height: the vertical coordinate of the top of flat nonascending lowercase letters (like the letter x), measured from the baseline, in fonts that have Latin characters. Default value: 0.

StemV

number

(Required, except for Type 3 fonts) The thickness, measured horizontally, of the dominant vertical stems of glyphs in the font.

StemH

number

(Optional) The thickness, measured vertically, of the dominant horizontal stems of glyphs in the font. Default value: 0.

bottom of page 282 of:

Document management â Portable document format â Part 1: PDF 1.7

Best,
Akira
<pdftoepdf.cc.diff>

Cheers

Ross

Dr Ross Moore

Mathematics Dept | Level 2, S2.638 AHH
Macquarie University, NSW 2109, Australia

T: +61 2 9850 8955 | F: +61 2 9850 8114<tel:%2B61%202%209850%209695>
M:+61 407 288 255<tel:%2B61%20409%20125%20670> | E: ***@mq.edu.au<mailto:***@mq.edu.au>

http://www.maths.mq.edu.au<http://mq.edu.au/>

[cid:***@01D030BE.D37A46F0]<http://mq.edu.au/>

CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.<http://mq.edu.au/>

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University.<http://mq.edu.au/>

Norbert Preining

2016-07-15 11:52:44 UTC

Permalink

On Fri, 15 Jul 2016, KAKUTO, Akira wrote:
> - fd = epdf_create_fontdescriptor(fontmap, stemV->getInt());
> + fd = epdf_create_fontdescriptor(fontmap, stemV->getNum());

Hmm, but
fd_entry *epdf_create_fontdescriptor(fm_entry * fm, int stemV)
expects and int, and is then transfered into
fd->font_dim[STEMV_CODE].val = stemV;
the font dim, all of which are integers.

So unfortunately, this is a no go, too.

According to the specification Ross send, all these numbers are allowed
to be floats, but are treated as ints here.

Norbert

--
PREINING Norbert + TeX Live & Debian Developer + http://www.preining.info
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13

KAKUTO, Akira

2016-07-15 21:53:26 UTC

Permalink

Hi Norbert,

> Hmm, but
> fd_entry *epdf_create_fontdescriptor(fm_entry * fm, int stemV)
> expects and int, and is then transfered into
> fd->font_dim[STEMV_CODE].val = stemV;
> the font dim, all of which are integers.

Change of struct fd_entry_ may be dangerous.
So I think a relatively safe fix for poppler is
fd = epdf_create_fontdescriptor(fontmap, (int)stemV->getNum());
if non-integer stemV is allowed.

Best,
Akira

Norbert Preining

2016-07-16 01:56:05 UTC

Permalink

Hi Akira,

> fd = epdf_create_fontdescriptor(fontmap, (int)stemV->getNum());
> if non-integer stemV is allowed.

What about (int)(stemV->getNum() + 0.5) (for positive values of stemV)
to get closest round integer?

Norbert

--
PREINING Norbert + TeX Live & Debian Developer + http://www.preining.info
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13

KAKUTO, Akira

2016-07-16 07:29:50 UTC

Permalink

Hi Norbert,

> What about (int)(stemV->getNum() + 0.5) (for positive values of stemV)
> to get closest round integer?

I think that is better than mine. Thank you.

Akira

The Thanh Han

2016-07-16 10:49:03 UTC

Permalink

Hi,

On 16 July 2016 at 09:29, KAKUTO, Akira <***@fuk.kindai.ac.jp> wrote:

> Hi Norbert,
>
> > What about (int)(stemV->getNum() + 0.5) (for positive values of stemV)
> > to get closest round integer?
>
> I think that is better than mine. Thank you.
>
> Akira
>

how about

fd = epdf_create_fontdescriptor(fontmap, round(stemV->getInt()));

round() is a macro defined by web2c which does what it says.

Regards,
Thanh

The Thanh Han

2016-07-16 10:52:26 UTC

Permalink

On 16 July 2016 at 12:49, The Thanh Han <***@gmail.com> wrote:

> Hi,
>
> On 16 July 2016 at 09:29, KAKUTO, Akira <***@fuk.kindai.ac.jp> wrote:
>
>> Hi Norbert,
>>
>> > What about (int)(stemV->getNum() + 0.5) (for positive values of stemV)
>> > to get closest round integer?
>>
>> I think that is better than mine. Thank you.
>>
>> Akira
>>
>
> how about
>
> fd = epdf_create_fontdescriptor(fontmap, round(stemV->getInt()));
>
> round() is a macro defined by web2c which does what it says.
>
>
>
âsorry, I meant:
â

fd = epdf_create_fontdescriptor(fontmap, round(stemV->getNum()));

âRegards,
Thanhâ

Norbert Preining

2016-07-16 10:57:23 UTC

Permalink

Hi all

> fd = epdf_create_fontdescriptor(fontmap, round(stemV->getNum()));

Sounds excellent.

What about the other values? As far I understood Ross's email, many of the other values can be real, too.

Norbert

--
PREINING Norbert + TeX Live & Debian Developer + http://www.preining.info
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13

Ross Moore

2016-07-16 22:46:35 UTC

Permalink

Hi all,

I was writing this before Karlâs message arrived.
But Iâll send it anyway.

On Jul 16, 2016, at 8:57 PM, Norbert Preining <***@logic.at<mailto:***@logic.at>> wrote:

Hi all

fd = epdf_create_fontdescriptor(fontmap, round(stemV->getNum()));

Sounds excellent.

What about the other values? As far I understood Ross's email, many of the other values can be real, too.

Norbert

With all due respect, why do anything that changes the number
that has been read from a font file?

Surely all (modulo a case discussed below) pdfTeX does with stemV is to build
the font-descriptor dictionary, which is written into the PDF file as a string.

So by all means read the stemV value from a font file as a number (integer or float)
but then immediately convert it into its string representation, which is how it should
be written out again, preserving the accuracy that was originally supplied.

The problem case is surely when there is no stemV supplied in the font,
as with TTF fonts, but it must be included in the font-descriptor to get valid PDF.
See the following conversation:

https://stackoverflow.com/questions/35485179/stemv-value-of-the-truetype-font

If no stemV value is supplied, then one must be created;
e.g. estimate it using the width of a non-serifed `Iâ in the font family.
If the font is italiced, multiply by cosine of the italic-angle.
Or produce some other approximate heuristic, as a truly exact value is rarely required.

Something of this kind is going on here, isnât it?

writefont.c- fd->font_dim[DESCENT_CODE].val = i < 0 ? i : 0;
writefont.c: fd->font_dim[STEMV_CODE].val =
writefont.c- dividescaled(getcharwidth(f, '.') / 3, pdffontsize[f], 3);
writefont.c- fd->font_dim[XHEIGHT_CODE].val =
writefont.c- dividescaled(getxheight(f), pdffontsize[f], 3);

but the result is an integer.

If such a calculation were to produce a float, then approximate/truncate it in some fashion
and convert to a string for inclusion in the font-descriptor.

It is clear from the following link that other software treats stemV as non-integer:

https://documentation.devexpress.com/#CoreLibraries/DevExpressPdfPdfFontDescriptor_StemVtopic

Indeed here it is declared as a double !!

Hope this help,

Ross

Dr Ross Moore

Mathematics Dept | Level 2, S2.638 AHH
Macquarie University, NSW 2109, Australia

T: +61 2 9850 8955 | F: +61 2 9850 8114<tel:%2B61%202%209850%209695>
M:+61 407 288 255<tel:%2B61%20409%20125%20670> | E: ***@mq.edu.au<mailto:***@mq.edu.au>

http://www.maths.mq.edu.au<http://mq.edu.au/>

[cid:***@01D030BE.D37A46F0]<http://mq.edu.au/>

CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.<http://mq.edu.au/>

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University.<http://mq.edu.au/>

Akira Kakuto

2016-07-16 13:50:02 UTC

Permalink

Dear Thanh,

> fd = epdf_create_fontdescriptor(fontmap, round(stemV->getNum()));

I have checked in changed pdftoepdf.cc (r41713) by using
zround(stemV->getNum()) since it seems simpler to use the
original function in web2c/lib than including cpascal.h.

Best,
Akira

Norbert Preining

2016-07-16 15:35:47 UTC

Permalink

> zround(stemV->getNum()) since it seems simpler to use the

Thanks.

Norbert

--
PREINING Norbert + TeX Live & Debian Developer + http://www.preining.info
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13

Ross Moore

2016-07-15 02:21:00 UTC

Permalink

Hi Karl, Norbert, all.

On Jul 15, 2016, at 10:13 AM, Norbert Preining <***@logic.at<mailto:***@logic.at>> wrote:

HI Karl, hi all,

So ... I cc Norbert. Norbert, does the attached tryplot.tex (which
merely includes the attached plot.pdf) crash with Debian pdftex? The

Yes, it does.

The core dump backtrace is:
#0 0x00007f7b5a7611c8 in __GI_raise (sig=***@entry=6)
at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007f7b5a76264a in __GI_abort () at abort.c:89
#2 0x0000000000483a42 in Object::getDict (this=<optimized out>,
this=<optimized out>) at /usr/include/poppler/Object.h:217
#3 write_epdf () at ../../../texk/web2c/pdftexdir/pdftoepdf.cc<http://pdftoepdf.cc>:868
#4 0x0000000000471d13 in writeimage (img=<optimized out>)
at ../../../texk/web2c/pdftexdir/writeimg.c:382
#5 0x0000000000453d59 in zpdfwriteimage (n=<optimized out>) at pdftex0.c:22285
#6 0x00000000004673f5 in maincontrol () at pdftex0.c:38916
#7 0x00000000004183d9 in mainbody () at pdftexini.c:5656
#8 0x000000000040473e in main (ac=<optimized out>, av=<optimized out>)
at ../../../texk/web2c/lib/texmfmp.c:1013

SO I guess that should be forwarded to the poppler people.

crucial thing is the non-integer value of StemV in plot.pdf. (The

Not according to Acrobat/Preflight.
There is (also ?) an /ItalicAngle key missing in the font. (see image)

[cid:7F81FABC-4246-4740-A257-***@mq.edu.au]

I donât know whether that is enough to trigger a core dump;
but it certainly is something that should be fixed.

Do you guess that this information enough for poppler devs to
provide a fix?

All the best

Norbert

--
PREINING Norbert + TeX Live & Debian Developer + http://www.preining.info
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13

Cheers

Ross

Dr Ross Moore

Mathematics Dept | Level 2, S2.638 AHH
Macquarie University, NSW 2109, Australia

T: +61 2 9850 8955 | F: +61 2 9850 8114<tel:%2B61%202%209850%209695>
M:+61 407 288 255<tel:%2B61%20409%20125%20670> | E: ***@mq.edu.au<mailto:***@mq.edu.au>

http://www.maths.mq.edu.au<http://mq.edu.au/>

[cid:***@01D030BE.D37A46F0]<http://mq.edu.au/>

CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.<http://mq.edu.au/>

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University.<http://mq.edu.au/>

Ross Moore

2016-07-15 03:06:01 UTC

Permalink

Hi again.

On Jul 15, 2016, at 12:21 PM, Ross Moore <***@mq.edu.au<mailto:***@mq.edu.au>> wrote:

Not according to Acrobat/Preflight.
There is (also ?) an /ItalicAngle key missing in the font. (see image)

<Screen Shot 2016-07-15 at 12.16.32 PM.png>

I donât know whether that is enough to trigger a core dump;
but it certainly is something that should be fixed.

More precise info:

Andre Wobst

2016-07-15 15:53:35 UTC

Permalink

Hi Ross,

On Fri, Jul 15, 2016 at 03:06:01AM +0000, Ross Moore wrote:
> Not according to Acrobat/Preflight.
> There is (also ?) an /ItalicAngle key missing in the font. (see image)

thanks for the report. Turns out that the ItalicAngle*s* typo has nothing to do
with pdftex, but originates from PyX, where the (unrelated) bug of non-integer
StemV values in pdftex discussed in this thread was observed.

Thanks for the details shown in your analysis. I wasn't aware of this nice
acrobat feature identifying such issues in PDF files. Good to know, really!
While I'm constantly using acrobat prefligh to analyse and sometimes fix
various PDF files I collect elsewhere, I just did know this report type.

I've just fixed it in PyX (https://sourceforge.net/p/pyx/code/3683/).

Best,

André

--
by _ _ _ Dr. André Wobst, Amselweg 22, 85716 Unterschleißheim
/ \ \ / ) ***@wobsta.de, http://www.wobsta.de/
/ _ \ \/\/ / PyX - High quality PostScript and PDF figures
(_/ \_)_/\_/ with Python & TeX: visit http://pyx.sourceforge.net/

Norbert Preining

2016-07-16 01:59:41 UTC

Permalink

Hi Andre,

> with pdftex, but originates from PyX, where the (unrelated) bug of non-integer
> StemV values in pdftex discussed in this thread was observed.

Was the plot.pdf also generated by PyX?

> I've just fixed it in PyX (https://sourceforge.net/p/pyx/code/3683/).

I checked the file and see
if self.dfWeight >= 600:
stemv = 120
else:
stemv = 70
file.write("/StemV %d\n" % stemv)
so PyX should create only integer valued StemV. How did the original
plot.pdf been created, then?

Norbert

--
PREINING Norbert + TeX Live & Debian Developer + http://www.preining.info
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13

Norbert Preining

2016-07-15 07:56:05 UTC

Permalink

Hi Akira,

On Fri, 15 Jul 2016, KAKUTO, Akira wrote:
> If we remove
> OBJECT_TYPE_CHECK(objInt);
> in getInt(), as in xpdf, the crash disappears.

Hmmm, not good ...

> - fd = epdf_create_fontdescriptor(fontmap, stemV->getInt());
> + fd = epdf_create_fontdescriptor(fontmap, stemV->getNum());

This is the correct fix! if stemV might be a non-integer, this
is the correct one!

Norbert

--
PREINING Norbert + TeX Live & Debian Developer + http://www.preining.info
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13

Karl Berry

2016-07-16 22:33:29 UTC

Permalink

- fd = epdf_create_fontdescriptor(fontmap, stemV->getInt());
+ fd = epdf_create_fontdescriptor(fontmap, zround(stemV->getNum()));

Ok. StemV is the only parameter that is treated specially in this way.
(Thanh: why? E.g., why nothing similar for StemH? Just trying to
understand ...)

But then there are the other PDF font descriptor parameters to consider.
As Ross said, FontWeight, ItalicAngle, Ascent, Descent, Leading,
CapHeight, XHeight, StemH, AvgWidth, MaxWidth, MissingWidth are also all
"numbers" (float).

In ptexlib.h, struct fd_entry_ represents them (the ones it cares
about) with intparm font_dim[FONT_KEYS_NUM]; if I'm reading the code
correctly. It is not clear to me if that merely means they get
truncated at some point, or something worse happens.

Norbert, could you easily remake your modified font with floating-point
values for all the above and see if pdftex (ideally with or without
poppler) crashes, and/or if the result is reasonable? And post the
modified font here.

If you're tied up for now, that's fine, of course. One of us should get
to it eventually, though.

In principle, we should clearly represent them in floating-point or
fixed-point, not integers, but I can't say I'm enthused about spending
time on that.

--thanks, karl.

Ross Moore

2016-07-16 23:20:02 UTC

Permalink

Hi Karl, and others,

On Jul 17, 2016, at 8:33 AM, Karl Berry <***@freefriends.org<mailto:***@freefriends.org>> wrote:

- fd = epdf_create_fontdescriptor(fontmap, stemV->getInt());
+ fd = epdf_create_fontdescriptor(fontmap, zround(stemV->getNum()));

Ok. StemV is the only parameter that is treated specially in this way.
(Thanh: why? E.g., why nothing similar for StemH? Just trying to
understand âŠ)

I just looked up stemV in PDFSpec V1.3, dated 1999.

But then there are the other PDF font descriptor parameters to consider.
As Ross said, FontWeight, ItalicAngle, Ascent, Descent, Leading,
CapHeight, XHeight, StemH, AvgWidth, MaxWidth, MissingWidth are also all
"numbers" (float).

All these parameters *were* integers back then.

Subsequently this has changed, but pdfTeX has not changed with it.
Iâll check further to determine in which version of PDF it actually changed.

In principle, we should clearly represent them in floating-point or
fixed-point, not integers, but I can't say I'm enthused about spending
time on that.

As Adobe and the ISO Technical Committee are finalising the specs for PDF 2.0,
and other PDF standards, it would be really great if it could be stated that pdfTeX
produced fully-conforming PDF files, for at least some of such specifications.

Finding and fixing all such variable declarations is surely absolutely necessary
to be able to achieve anything along these lines.

--thanks, karl.

Cheers,

Ross

Dr Ross Moore

Mathematics Dept | Level 2, S2.638 AHH
Macquarie University, NSW 2109, Australia

T: +61 2 9850 8955 | F: +61 2 9850 8114<tel:%2B61%202%209850%209695>
M:+61 407 288 255<tel:%2B61%20409%20125%20670> | E: ***@mq.edu.au<mailto:***@mq.edu.au>

http://www.maths.mq.edu.au<http://mq.edu.au/>

[cid:***@01D030BE.D37A46F0]<http://mq.edu.au/>

CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.<http://mq.edu.au/>

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University.<http://mq.edu.au/>

Norbert Preining

2016-07-16 23:55:26 UTC

Permalink

On Sat, 16 Jul 2016, Karl Berry wrote:
> As Ross said, FontWeight, ItalicAngle, Ascent, Descent, Leading,
> CapHeight, XHeight, StemH, AvgWidth, MaxWidth, MissingWidth are also all
> "numbers" (float).
>
> Norbert, could you easily remake your modified font with floating-point
> values for all the above and see if pdftex (ideally with or without

with poppler only StemV fails, all other values don't have any problem.

I simply uncompressed the plot.pdf and edited the 5 0 obj:
<</Type /FontDescriptor/FontName /TeX-mathx10/Flags 4/FontBBox [ 0 0 611 31 ]/ItalicAngle 86.246673/Ascent 31.5/Descent 0.78/CapHeight 0.44/XHeight 12.77/AvgWidth 24.55/MaxWidth 26.77/MissingWidth 21.44/StemH 45.44/StemV 322/FontFile 4 0 R>>

(random values) and compilation with pdftex/poppler (Debian variant)
worked as long as StemV was integer.

Norbert

--
PREINING Norbert + TeX Live & Debian Developer + http://www.preining.info
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13

Karl Berry

2016-07-16 23:25:35 UTC

Permalink

All these parameters *were* integers back then.

Fine, but irrelevant to my question to Thanh. There are special cases
throughout the pdftex source code for StemV, but no other font
parameter. That was my question; not about real vs. int.

produced fully-conforming PDF files

The PDF we write is conforming if it writes integers, as far as I can
see. The crucial thing from my point of view is to do something
reasonable when reading floating point, i.e., not crash.

As for properly preserving floating point values in all of these vluaes,
I'm sure we all agree that is desirable. Patches welcome. -k

The Thanh Han

2016-07-17 09:41:31 UTC

Permalink

On 17 July 2016 at 01:25, Karl Berry <***@freefriends.org> wrote:

> All these parameters *were* integers back then.
>
> Fine, but irrelevant to my question to Thanh. There are special cases
> throughout the pdftex source code for StemV, but no other font
> parameter. That was my question; not about real vs. int.
>
> produced fully-conforming PDF files
>
> The PDF we write is conforming if it writes integers, as far as I can
> see. The crucial thing from my point of view is to do something
> reasonable when reading floating point, i.e., not crash.
>
> As for properly preserving floating point values in all of these vluaes,
> I'm sure we all agree that is desirable. Patches welcome. -k
>

âthe real question one should ask is why StemV is required, since it's
already included in the font itself.â

âThanhâ

Ross Moore

2016-07-18 01:01:25 UTC

Permalink

Hi Thanh,

On Jul 17, 2016, at 7:41 PM, The Thanh Han <***@gmail.com<mailto:***@gmail.com>> wrote:

âthe real question one should ask is why StemV is required, since it's already included in the font itself.â

I think I can explain that.

For a Type 1 font, StemV is the Hinting parameter StdVW .
pdfTeX already knows this:
ptexlib.h:363: , {"StemV", "StdVWâ}

http://partners.adobe.com/public/developer/en/font/T1_SPEC.PDF

PDF uses the Font Descriptor dictionary in several ways.

#1. Firstly, if the font program itself is *not* present, then it can substitute
something that looks similar, having similar widths in vertical strokes.

#2. When the font *is* available, there are still issues; especially at
low resolution. Then the âhintingâ is used in choosing which pixels to
include when following the outline description.

#3. Even at high resolution, there is a possible use, related to #1.
The time required to produce the full hi-res image of a portion of
a page may be quite large (a few seconds, say).
Instead of presenting a blank page until itâs ready, a quick mock-up
is first shown, perhaps using a substitute font with cached letter forms.
This is later replaced with the full image, when ready.

I cannot find any documentation to justify that #3 is used in any specific renderer,
but I've certainly seen rough images appear prior to the full hi-res. version.
Note that the /Flags parameter conveys information about characteristics
of the script that the font implements, so is used in the selection of substitute fonts.

Hereâs an extract from the PDF Spec. which certainly refers to usage #1;

9.8 Font Descriptors

9.8.1General

A font descriptor specifies metrics and other attributes of a simple font or a CIDFont as a whole, as distinct from the metrics of individual glyphs. These font metrics provide information that enables a conforming reader to synthesize a substitute font or select a similar font when the font program is unavailable. The font descriptor may also be used to embed the font program in the PDF file.

âThanhâ

Hope this helps.

Ross

Dr Ross Moore

Mathematics Dept | Level 2, S2.638 AHH
Macquarie University, NSW 2109, Australia

T: +61 2 9850 8955 | F: +61 2 9850 8114<tel:%2B61%202%209850%209695>
M:+61 407 288 255<tel:%2B61%20409%20125%20670> | E: ***@mq.edu.au<mailto:***@mq.edu.au>

http://www.maths.mq.edu.au<http://mq.edu.au/>

[cid:***@01D030BE.D37A46F0]<http://mq.edu.au/>

CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.<http://mq.edu.au/>

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University.<http://mq.edu.au/>

The Thanh Han

2016-07-18 19:52:42 UTC

Permalink

Hi Ross,

On 18 July 2016 at 03:01, Ross Moore <***@mq.edu.au> wrote:

> Hi Thanh,
>
> On Jul 17, 2016, at 7:41 PM, The Thanh Han <***@gmail.com> wrote:
>
>
> âthe real question one should ask is why StemV is required, since it's
> already included in the font itself.â
>
>
> I think I can explain that.
>
> For a Type 1 font, StemV is the Hinting parameter StdVW .
> pdfTeX already knows this:
> ptexlib.h:363: , {"StemV", "StdVWâ}
>
> http://partners.adobe.com/public/developer/en/font/T1_SPEC.PDF
>
>
> PDF uses the Font Descriptor dictionary in several ways.
>
> #1. Firstly, if the font program itself is *not* present, then it can
> substitute
> something that looks similar, having similar widths in vertical
> strokes.
>
> #2. When the font *is* available, there are still issues; especially at
> low resolution. Then the âhintingâ is used in choosing which pixels
> to
> include when following the outline description.
>
> #3. Even at high resolution, there is a possible use, related to #1.
> The time required to produce the full hi-res image of a portion of
> a page may be quite large (a few seconds, say).
> Instead of presenting a blank page until itâs ready, a quick mock-up
> is first shown, perhaps using a substitute font with cached letter
> forms.
> This is later replaced with the full image, when ready.
>
> I cannot find any documentation to justify that #3 is used in any specific
> renderer,
> but I've certainly seen rough images appear prior to the full hi-res.
> version.
> Note that the /Flags parameter conveys information about characteristics
> of the script that the font implements, so is used in the selection of
> substitute fonts.
>
>
> Hereâs an extract from the PDF Spec. which certainly refers to usage #1;
>
>
> 9.8 Font Descriptors
>
> 9.8.1General
>
> A font descriptor specifies metrics and other attributes of a simple font
> or a CIDFont as a whole, as distinct from the metrics of individual glyphs.
> These font metrics provide information that enables a conforming reader to
> synthesize a substitute font or select a similar font when the font program
> is unavailable. The font descriptor may also be used to embed the font
> program in the PDF file.
>
>
>
>
âthanks for the excellent explanation.â

âI was aware about #1, i.e. pdf browsers might make use of info in font
descriptor to mimic the font. pdftex supports creating pdfs without
embedding the fonts, too. However I think those pdfs without embedded fonts
are rather useless.

Back to the original issue: I agree it makes more sense to copy StemV as
real number, not integer.

Regards,
Thanh

Werner LEMBERG

2016-07-18 20:25:37 UTC

Permalink

> However I think those pdfs without embedded fonts are rather
> useless.

Not at all. The lilypond documentation has a case where PDFs without
fonts would be extremely valuable (we haven't implemented this yet,
however).

. For the documentation (in texinfo format), lilypond creates
thousands of small PDFs that get included into the main PDF. All
those files use the same small set of music fonts again and again.

. If the small PDFs don't contain fonts it would be possible that
pdftex includes the necessary fonts only once while building the
master PDF.

. If the small PDFs contain subsetted fonts, pdftex must include
them as-is, enormously increasing the output file size.

. We currently suppress creation of subfonts in the small PDFs to
get a small output file size. However, this leads to an
incredible waste of disk space while building the documentation
(we speak of a few gigabytes).

Werner

The Thanh Han

2016-07-19 08:40:50 UTC

Permalink

On 18 July 2016 at 22:25, Werner LEMBERG <***@gnu.org> wrote:

>
> > However I think those pdfs without embedded fonts are rather
> > useless.
>
> Not at all. The lilypond documentation has a case where PDFs without
> fonts would be extremely valuable (we haven't implemented this yet,
> however).
>
> . For the documentation (in texinfo format), lilypond creates
> thousands of small PDFs that get included into the main PDF. All
> those files use the same small set of music fonts again and again.
>
> . If the small PDFs don't contain fonts it would be possible that
> pdftex includes the necessary fonts only once while building the
> master PDF.
>
> . If the small PDFs contain subsetted fonts, pdftex must include
> them as-is, enormously increasing the output file size.
>
> . We currently suppress creation of subfonts in the small PDFs to
> get a small output file size. However, this leads to an
> incredible waste of disk space while building the documentation
> (we speak of a few gigabytes).
>

âcan you provide a minimal example demonstrating the problem?

pdftex can merge Type1 subfonts in included pdfs.

Thanh
â

Werner LEMBERG

2016-07-19 16:48:54 UTC

Permalink

> can you provide a minimal example demonstrating the problem?

I can't. We don't have something working yet.

> pdftex can merge Type1 subfonts in included pdfs.

Well, yes, but lilypond's music fonts are OpenType fonts...

For building the documentation, we use ghostscript's `.loadfont'
operator in the EPS files generated by lilypond. Unfortunately,
ghostscript doesn't support OTCs (OpenType Collections) yet – support
is planned according to the developers but it is not an important
issue for them currently.[*]

http://bugs.ghostscript.com/show_bug.cgi?id=696808

Regardless of this problem, the subsetted fonts generated by gs while
converting the EPS files to PDF can't be merged by pdftex (or xetex,
BTW). For example, lilypond's central documentation file, the
notation reference, contains around 3500 such subsetted fonts (coming
from a few hundred included PDF snippets), based on 48 `normal' fonts.

For the new method, instead of using `.loadfont', we want to convert
the used OpenType fonts to bare CFF resources, extracting them from
OTCs where necessary. TTFs are converted to Type 42 resoures, and all
fonts are collected in a `resource directory'. The intermediate PDFs
would refer to the files in that directory, and the final output file
would eventually embed subsetted fonts derived from those resources.

Werner

[*] There is no official support of OTFs and OTCs in PostScript.

Reinhard Kotucha

2016-07-18 22:29:14 UTC

Permalink

On 2016-07-18 at 21:52:42 +0200, The Thanh Han wrote:

> I was aware about #1, i.e. pdf browsers might make use of info in
> font descriptor to mimic the font.

I've never seen a PDF file where substitution of non-existing fonts
worked properly. Not even with Adobe software.

Using multiple master fonts in order to mimic non-existing fonts was
certainly a good idea but unfortunately Adobe decided do drop
development of multiple master fonts many years ago.

> pdftex supports creating pdfs without embedding the fonts,
> too. However I think those pdfs without embedded fonts are rather
> useless.

This is a different issue. If we can assume that a particular font is
ubiquitous on any system, it doesn't have to be embedded. In general
it's a bad idea not to embed all fonts. I encountered two problems in
the past:

1. TeX Live ships manual pages in PDF in order to make them available
to Windows users. They use only Times and the fonts are not
embedded. Well, we can assume that Times is ubiquitous. But one
day I wondered why the fonts were rendered extremely ugly.

The reason was that I bought a printer a few days before and
installed the fonts from the accompanying CD. If the fonts are
not embedded, everything can happen.

2. Japanese often don't embed fonts because all Japanese fonts have
the same metrics anyway. But all Japanese fonts support ASCII and
and Japanese are happily providing documents in English using
Japanese fonts but don't embedded them. These (proprietary) fonts
are available on Windows but not on Unix, hence these documents
aren't as portable as the "Portable Document Format" implies.

I can only recommend to always embed all fonts in order to create
portable documents and, of course, all documents should be portable.

Werner, if file size matters it's certainly worthwhile to investigate
what ghostscript can do for you. This program is amazing. But whom
do I tell it?

Regards,
Reinhard

--
------------------------------------------------------------------
Reinhard Kotucha Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover mailto:***@web.de
------------------------------------------------------------------

Werner LEMBERG

2016-07-19 05:45:15 UTC

Permalink

> Werner, if file size matters

It's not only file size! Accessing the disc for writing unnecessarily
large files is also time consuming.

Just in case I was unclear: The PDFs without fonts we would like to
generate (hopefully soon) are an intermediate step only; such PDFs are
not intended for actually being viewed but to speed up the creation of
a master PDF (with embedded fonts) that includes the font-less PDFs.

> it's certainly worthwhile to investigate what ghostscript can do for
> you.

I have no idea what you mean here, please elaborate. Note that
ghostscript is *not* capable to merge various subsetted fonts back to
a single one: too much information is already lost during the
subsetting process.

> This program is amazing. But whom do I tell it?

:-)

Werner

Ross Moore

2016-07-19 12:23:18 UTC

Permalink

Hi all,

I'm fascinated by this application, and the various comments.

On 19/07/2016, at 21:38, "Werner LEMBERG" <***@gnu.org> wrote:

>
>> Werner, if file size matters
>
> It's not only file size! Accessing the disc for writing unnecessarily
> large files is also time consuming.

How often does this need to be done?
If only once to produce the final PDF, what is the problem?

>
> Just in case I was unclear: The PDFs without fonts we would like to
> generate (hopefully soon) are an intermediate step only;

Sure.
If only intermediate, why do they need to be valid PDFs?

> such PDFs are
> not intended for actually being viewed but to speed up the creation of
> a master PDF (with embedded fonts) that includes the font-less PDFs.

Same question.

>> it's certainly worthwhile to investigate what ghostscript can do for
>> you.
>
> I have no idea what you mean here, please elaborate.

Ghostscript can read individual files that add to the construction of a PostScript or PDF file.
Those files do not need to be complete documents. They just need to contain streams of graphic commands or other valid PostScript source, which add to what you want to show on a page.

Postscript is an incredibly powerful programming language, indeed it is the basis of PDF; but PDF excludes some of the programming aspects that make PostScript so powerful.
It is a perfect match for use with TeX, provided you avoid the potential for abuse.
Adobe's distiller deliberately excludes some of these features of PostScript, which are still usable with Ghostscript.

> Note that
> ghostscript is *not* capable to merge various subsetted fonts back to
> a single one: too much information is already lost during the
> subsetting process.

Do you believe this is a deliberate choice, or just accidental?
I've found the current maintainer to be very responsive to bug reports and feature requests.
So if this is a mistake, it's likely able to be fixed.

>
>> This program is amazing. But whom do I tell it?
>
> :-)

I agree that it is amazing.
Well, actually it is a very good implementation of an extremely good programming language.
What I find amazing is that, whereas other implementors have removed access to some of the potentially dangerous features of PostScript, these are retained in Ghostscript if you provide appropriate command-line options with your job.

>
>
> Werner

Hope this helps.

Ross

Werner LEMBERG

2016-07-19 17:13:21 UTC

Permalink

>> It's not only file size! Accessing the disc for writing
>> unnecessarily large files is also time consuming.
>
> How often does this need to be done? If only once to produce the
> final PDF, what is the problem?

The intermediate files become much larger if they contain fonts that
are not subsetted. A lilypond developer has disabled subsetting for
testing purposes; doing so increases the necessary disk space by a
factor of around three, IIRC, which means that you then need more than
7 GByte for building the whole documentation, which is excessively
huge.

> If only intermediate, why do they need to be valid PDFs?

Well, all PDF snippets created by lilypond (and gs) are read by pdftex
(or xetex) to create the final texinfo file, so you need valid PDFs, I
guess.

>>> it's certainly worthwhile to investigate what ghostscript can do
>>> for you.
>
> Ghostscript can read individual files that add to the construction
> of a PostScript or PDF file. Those files do not need to be complete
> documents. They just need to contain streams of graphic commands or
> other valid PostScript source, which add to what you want to show on
> a page.

OK, but I don't see how you want me to do that.

> Postscript is an incredibly powerful programming language, [...]

Indeed, but the standard is no longer evolving, and support for
current font formats (OTF, OTC) is missing.

>> Note that ghostscript is *not* capable to merge various subsetted
>> fonts back to a single one: too much information is already lost
>> during the subsetting process.
>
> Do you believe this is a deliberate choice, or just accidental?

It's not a mistake, as far as I know, but indeed not possible.

>>> This program is amazing. But whom do I tell it?
>
> Well, actually it is a very good implementation of an extremely good
> programming language.

I think you refer to guile, which lilypond uses as a programmable
interface for almost everything. While the programming language might
be `extremely good', the support isn't. Since a few years we can't
upgrade to the latest guile 2.x series because essential features have
changed in an incompatible way, and the new `replacements' don't work
as expected and/or as needed. This really threatens lilypond, because
virtually no application is still using guile 1.8, and GNU/Linux
distributions are going to throw out lilypond for this reason.

Werner

Reinhard Kotucha

2016-07-19 22:46:52 UTC

Permalink

On 2016-07-19 at 07:45:15 +0200, Werner LEMBERG wrote:

> > Werner, if file size matters
>
> It's not only file size! Accessing the disc for writing unnecessarily
> large files is also time consuming.
>
> Just in case I was unclear: The PDFs without fonts we would like to
> generate (hopefully soon) are an intermediate step only; such PDFs are
> not intended for actually being viewed but to speed up the creation of
> a master PDF (with embedded fonts) that includes the font-less PDFs.
>
> > it's certainly worthwhile to investigate what ghostscript can do for
> > you.
>
> I have no idea what you mean here, please elaborate. Note that
> ghostscript is *not* capable to merge various subsetted fonts back to
> a single one: too much information is already lost during the
> subsetting process.

I must admit that I never tried what you intend to do. Thus I
suggested to investigate.

Did you try to merge different font subsets into one (subsetted fonts
already exist in the files) or did you try to convince Ghostscript to
insert missing fonts (PDF files contain only references to external
fonts e.g., /FontName, but no physical font, subsetted or not)?

In the latter case Ghostscript must be able to find the fonts. So you
have to create your own Fontmap or to steal it (and the dedicated
fonts) from TeX Live and add the LilyPond fonts yourself.

http://tug.org/svn/texlive/trunk/Master/tlpkg/tlgs

Please note that TL ships the original URW fonts maintained by Walter
Schmidt. The fonts accompanied by Ghostscript are different and thus
not appropriate for subsitution.

The actual inclusion of fonts is done with

ps2pdf -dPDFSETTINGS=/prepress in.pdf out.pdf

Did you try this? If you say

> too much information is already lost during the subsetting process

I assume that your PDF files already contain font subsets. But what I
have in mind is that you create PDF files which don't contain any
fonts at all but only the information which font (/FontName) should be
used.

Then Ghostscript can insert the fonts. And it certainly creates
subsets by default. AFAIK it even converts Type 1 fonts to CFF in
order to save space. I don't know ATM what Ghostscript does if
multiple files are merged which all have references to one and the
same font but I'm confident that it does something very useful.

Isn't this exactly what you want to achieve? Create zillions of files
which don't contain any fonts at all, merge them, and finally insert
the fonts in order to make the document portable?

I must admit that I don't know anything about LilyPond except that
musicians like it. Presumably it creates PostScript code and converts
it to PDF. Right?

Regards,
Reinhard

--
------------------------------------------------------------------
Reinhard Kotucha Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover mailto:***@web.de
------------------------------------------------------------------

Werner LEMBERG

2016-07-20 06:05:14 UTC

Permalink

> Did you try to merge different font subsets into one (subsetted fonts
> already exist in the files)

Yes, by including PDF snippets into a larger texinfo document created
by pdftex (or xetex). I tried then to execute

ps2pdf pdftex-output.pdf out.pdf

which indeed reduces the file size and the number of subsetted fonts
by 30%, but there is a problem that makes all links disappear... We
have to further investigate.

> or did you try to convince Ghostscript to insert missing fonts (PDF
> files contain only references to external fonts e.g., /FontName, but
> no physical font, subsetted or not)?

I'm not there yet, but this is the ultimate goal.

> In the latter case Ghostscript must be able to find the fonts.

Yes, I really hope that!

> [...] If you say
>
> > too much information is already lost during the subsetting process
>
> I assume that your PDF files already contain font subsets.

Yes, this is current setup.

> But what I have in mind is that you create PDF files which don't
> contain any fonts at all but only the information which font
> (/FontName) should be used.

Exactly this is my plan also.

> Then Ghostscript can insert the fonts. And it certainly creates
> subsets by default. AFAIK it even converts Type 1 fonts to CFF in
> order to save space. I don't know ATM what Ghostscript does if
> multiple files are merged which all have references to one and the
> same font but I'm confident that it does something very useful.
>
> Isn't this exactly what you want to achieve? Create zillions of
> files which don't contain any fonts at all, merge them, and finally
> insert the fonts in order to make the document portable?

Yes.

> I must admit that I don't know anything about LilyPond except that
> musicians like it. Presumably it creates PostScript code and
> converts it to PDF. Right?

Yes. We have to modify lilypond to not embed the font resources into
the PS file but to collect them in a directory.

Werner

Reinhard Kotucha

2016-07-20 22:52:28 UTC

Permalink

On 2016-07-20 at 08:05:14 +0200, Werner LEMBERG wrote:

> ps2pdf pdftex-output.pdf out.pdf
>
> which indeed reduces the file size and the number of subsetted fonts
> by 30%, but there is a problem that makes all links disappear... We
> have to further investigate.

Yes, the problem is that in a PDF file the contents of pages and links
are stored at different places. It's not straightforward to keep the
links. What are you doing in order to retain the links? I suppose
that LuaTeX's epdf library

> > In the latter case Ghostscript must be able to find the fonts.
>
> Yes, I really hope that!

It actually works. Some companies in Japan offer PDF datasheets which
have only references to Windows fonts but the fonts are not included.
Thus, when I view them on Unix, I only see the graphics but no text.

A few years ago some guys from Japan asked me to add a cidfmap file to
our Ghostscript distribution for Windows.

http://tug.org/svn/texlive/trunk/Master/tlpkg/tlgs/lib/cidfmap.TeXLive

Now I can make these files portable

ps2pdf -dPDFSETTINGS=/prepress ....

If this works we can expect that Ghostscript embeds fonts mentioned in
Fontmap[.TeXLive] too.

> Yes. We have to modify lilypond to not embed the font resources
> into the PS file but to collect them in a directory.

Sure, the fonts have to be copied to a directory where Ghostscript can
find them.

But for testing I wouldn't change the LiliPond sources. IMO it's
easier to write a tiny script which removes fonts from the PostScript
files.

Regards,
Reinhard

--
------------------------------------------------------------------
Reinhard Kotucha Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover mailto:***@web.de
------------------------------------------------------------------

Norbert Preining

2016-07-21 01:15:22 UTC

Permalink

Hi all,

> http://tug.org/svn/texlive/trunk/Master/tlpkg/tlgs/lib/cidfmap.TeXLive

This file is BTW quite obsolete, please see
https://www.preining.info/blog/software-projects/cjk-fonts-ghostscript/
a script that is included in TeX Live (cjk-gs-integrate) for auto-generating
these files including the necessary links.

On the above page there is in the second half also lots of technical
details about how the fonts need to be set up, encodings etc.

The problem is that with newer MacOS eg, lotss of fonts have moved into
ttc (true type collections) and Ghostscript does not properly support them.
There are lots of problems with newer fonts, unfortunately.

All the best

Norbert

------------------------------------------------------------------------
PREINING, Norbert http://www.preining.info
JAIST, Japan TeX Live & Debian Developer
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13
------------------------------------------------------------------------

Reinhard Kotucha

2016-07-21 21:03:36 UTC

Permalink

On 2016-07-21 at 10:15:22 +0900, Norbert Preining wrote:

> Hi all,
>
> > http://tug.org/svn/texlive/trunk/Master/tlpkg/tlgs/lib/cidfmap.TeXLive
>
> This file is BTW quite obsolete, please see
> https://www.preining.info/blog/software-projects/cjk-fonts-ghostscript/
> a script that is included in TeX Live (cjk-gs-integrate) for auto-generating
> these files including the necessary links.
>
> On the above page there is in the second half also lots of technical
> details about how the fonts need to be set up, encodings etc.

Thanks for the link. I looked at it briefly and it indeed provides a
lot of information. I'll read it calmly next weekend.

> The problem is that with newer MacOS eg, lotss of fonts have moved
> into ttc (true type collections) and Ghostscript does not properly
> support them. There are lots of problems with newer fonts,
> unfortunately.

Are Ghostscript developers aware if these problems?

Regards,
Reinhard

--
------------------------------------------------------------------
Reinhard Kotucha Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover mailto:***@web.de
------------------------------------------------------------------

Norbert Preining

2016-07-21 22:15:21 UTC

Permalink

>Are Ghostscript developers aware if these problems?

I guess so, but consider it low to very low severity, same with vertical writing using otf fonts, as I mention on the web page.

I had some chat sessions with the devs, but it was not clear whether these things will ever be fixed.

Norbert

--
PREINING Norbert + TeX Live & Debian Developer + http://www.preining.info
GPG: 0x860CDC13 fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13

Werner LEMBERG

2016-07-21 05:08:27 UTC

Permalink

> > ps2pdf pdftex-output.pdf out.pdf
> >
> > which indeed reduces the file size and the number of subsetted fonts
> > by 30%, but there is a problem that makes all links disappear... We
> > have to further investigate.
>
> Yes, the problem is that in a PDF file the contents of pages and
> links are stored at different places.

Well, it's not clear to me why normal PDF viewers display the original
input file correctly without complaints, but ghostscript creates a
crappy output file.

> It's not straightforward to keep the links. What are you doing in
> order to retain the links? I suppose that LuaTeX's epdf library

Perhaps a misunderstanding. The links in the input PDF file *are*
valid, and after the ps2pdf run they are not. I don't do anything
special with the links.

> Some companies in Japan offer PDF datasheets which have only
> references to Windows fonts but the fonts are not included. Thus,
> when I view them on Unix, I only see the graphics but no text.

Indeed, I have also seen such PDF files – many years ago.

So there's much testing ahead...

Werner

Reinhard Kotucha

2016-07-21 21:47:44 UTC

Permalink

On 2016-07-21 at 07:08:27 +0200, Werner LEMBERG wrote:

>
> > > ps2pdf pdftex-output.pdf out.pdf
> > >
> > > which indeed reduces the file size and the number of subsetted
> > > fonts by 30%, but there is a problem that makes all links
> > > disappear... We have to further investigate.
> >
> > Yes, the problem is that in a PDF file the contents of pages and
> > links are stored at different places.
>
> Well, it's not clear to me why normal PDF viewers display the
> original input file correctly without complaints, but ghostscript
> creates a crappy output file.

Please send me an input file and tell me what you've done in order to
produce a (crappy) output file. I never encountered such problems.

> > It's not straightforward to keep the links. What are you doing in
> > order to retain the links? I suppose that LuaTeX's epdf library
>
> Perhaps a misunderstanding. The links in the input PDF file *are*
> valid, and after the ps2pdf run they are not. I don't do anything
> special with the links.

Certainly a misunderstanding. I expected this behavior and thought
that you have a solution. What we actually need is a way to extract
links from external files. Inserting them into the output file is
possible with pdftex, luatex, and the information can probably passed
to Ghostscript via the pdfmark operator.

There is a program accompanied by a Perl script which can extract
links but I don't remember its name. And for the sake of portability
I prefer a luatex solution anyway.

Regards,
Reinhard

--
------------------------------------------------------------------
Reinhard Kotucha Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover mailto:***@web.de
------------------------------------------------------------------

Werner LEMBERG

2016-07-22 04:34:47 UTC

Permalink

> > Well, it's not clear to me why normal PDF viewers display the
> > original input file correctly without complaints, but ghostscript
> > creates a crappy output file.
>
> Please send me an input file and tell me what you've done in order
> to produce a (crappy) output file. I never encountered such
> problems.

Meanwhile, we've identified the cause: it was a bug in XeTeX support
of texinfo.tex, producing incomplete `pdf:dest' specials; it is fixed
now. So the complaint of ghostscript was correct – however, I still
consider it rude to remove all links. :-)

Werner

Reinhard Kotucha

2016-07-22 22:03:45 UTC

Permalink

On 2016-07-22 at 06:34:47 +0200, Werner LEMBERG wrote:

> however, I still consider it rude to remove all links. :-)

Ghostscript doesn't remove links deliberately. If you combine
documents, preserving links is a non-trivial task.

Regards,
Reinhard

--
------------------------------------------------------------------
Reinhard Kotucha Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover mailto:***@web.de
------------------------------------------------------------------

Ross Moore

2016-07-22 02:56:15 UTC

Permalink

Hi Werner, Reinhard, Norbert

On Jul 20, 2016, at 4:05 PM, Werner LEMBERG <***@gnu.org<mailto:***@gnu.org>> wrote:

Did you try to merge different font subsets into one (subsetted fonts
already exist in the files)

Yes, by including PDF snippets into a larger texinfo document created
by pdftex (or xetex). I tried then to execute

ps2pdf pdftex-output.pdf out.pdf

which indeed reduces the file size and the number of subsetted fonts
by 30%, but there is a problem that makes all links disappear... We
have to further investigate.

Iâve been thinking along the same lines, first trying to find a way
to generate PDFs using pdfTeX, having no fonts included.
I wasnât able to achieve that.

However, I just ran a successful test with pdfTeX and GS as follows.

1. main document has some text in some font (I used CMR12).
It imports several copies of the same image which is a PDF
using the same font.

\includegraphics[âŠ optionsâŠ]{images/datadoc.pdf}

has a local file pdftex.map in the working directory.
This contains a line to include the whole font unsubsetted:

cmr12 CMR12 <<cmr12.pfb

2. image source named datadoc.tex â> datadoc.pdf
processed within a subdirectory images/
(so finds the standard pdftex.map )

3. after generating the image,
run:
ps2pdf -dEmbedAllFonts=false datadoc.pdf datadoc-nofonts.pdf

Note the significant reduction in file size:

-rw-r--r-- 1 ross staff 10692 Jul 22 11:33 datadoc.pdf
-rw-r--r-- 1 ross staff 3191 Jul 22 11:40 datadoc-nofonts.pdf

4. go back to the source document, change the names of images:

\includegraphics[âŠ optionsâŠ]{images/datadoc-nofonts.pdf}

Process it successfully with pdfTeX.
The Preview window now shows the images' text in a default system font.
(or a small filesize font supplied by GS.)

5. Edit in maindoc.pdf as follows:

locate the XObject where the image file is included; viz.

1 0 obj
<<
/Type /XObject
/Subtype /Form
/FormType 1
/PTEX.FileName (./images/datadoc-nofonts.pdf)
/PTEX.PageNumber 1
/PTEX.InfoDict 12 0 R
/BBox [0 0 595.28 841.89]
/Resources <<
/ProcSet [ /PDF /Text ]
/ExtGState <<
/R7 13 0 R
>>/Font << /R8 14 0 R>>
>>
/Length 146
/Filter /FlateDecode
>>
stream
...

note the line: /Font << /R8 14 0 R>>
We are going to change that object reference number.

Find the object corresponding to this font being used in the main document.
viz.

5 0 obj
<<
/Font << /F16 8 0 R /F15 9 0 R >>
/XObject << /Fm1 2 0 R /Fm2 3 0 R /Fm3 4 0 R >>
/ProcSet [ /PDF /Text ]
>>
endobj

and

8 0 obj
<<
/Type /Font
/Subtype /Type1
/BaseFont /CMR12
/FontDescriptor 24 0 R
/FirstChar 11
/LastChar 119
/Widths 22 0 R
>>
endobj

Go back and make the change: 2 characters

/Font << /R8 8 0 R>>
that is â14â â> â8 â preserving byte lengths.
Save the edited PDF.

6. close the Preview window of maindoc.pdf
then open it again.

Perfect!! Images are now showing using the correct font.

Since I used the same image 3 times, a single edit coped with
all the XObject instances from \includegraphics .

With different images, there would need to be a single edit for each.
If different fonts are used in the images, youâd need an edit for each font.

or did you try to convince Ghostscript to insert missing fonts (PDF
files contain only references to external fonts e.g., /FontName, but
no physical font, subsetted or not)?

I'm not there yet, but this is the ultimate goal.

Can you build a workflow using the trick described above?
In particular, automating the edits of the font object references.

Some further notes.

I kept the content streams uncompressed, to be able to
search in the PDF, when needed:
\pdfcompresslevel 0
\pdfobjcompresslevel 0
Not sure how necessary this will always be.
But compression can be applied later anyway.

The main document had a 2nd font for the page numbering.

The image had a 2nd /ExtGState resource dictionary: /R7 13 0 R

13 0 obj
<<
/Type /ExtGState
/BM /Normal
/OPM 1
/TK true
>>
endobj

This is to ensure the image completely overprints what is on the page
beneath it, I think. In my test there was nothing. But you might have
a background image or pattern, or somesuch.

I used a Type1 font here.
Not sure whether there will be any differences with other kinds of fonts.
Can pdfTeX directly use OTF fonts?
Doesnât it have to break them up into < 256-character subsets?

As for hyperlinks. I didnât have any in my image PDF.
In your case, is the URL associated to the image as a whole?
Or can you have multiple links in an image PDF?
â Presumably the latter, as the former is easily coped with
in the main document.

In the latter case Ghostscript must be able to find the fonts.

Yes, I really hope that!

This method doesnât require Ghostscript to find fonts at all.
All font-handling is done by pdfTeX and the manual edits.
But maybe you can develop a way to automate those edits?

[...] If you say

too much information is already lost during the subsetting process

I assume that your PDF files already contain font subsets.

Yes, this is current setup.

But what I have in mind is that you create PDF files which don't
contain any fonts at all but only the information which font
(/FontName) should be used.

Exactly this is my plan also.

The above technique does exactly that, Iâd say.

What would be nice is a way to create the images w/o any fonts,
directly with pdfTeX, so not requiring GS at all.

Isn't this exactly what you want to achieve? Create zillions of
files which don't contain any fonts at all, merge them, and finally
insert the fonts in order to make the document portable?

Yes.

I must admit that I don't know anything about LilyPond except that
musicians like it. Presumably it creates PostScript code and
converts it to PDF. Right?

Yes. We have to modify lilypond to not embed the font resources into
the PS file but to collect them in a directory.

Werner

Hope this helps,

Ross

Dr Ross Moore
Mathematics Dept | Level 2, S2.638 AHH
Macquarie University, NSW 2109, Australia

T: +61 2 9850 8955 | F: +61 2 9850 8114
M:+61 407 288 255 | E: ***@mq.edu.au<mailto:***@mq.edu.au>

http://www.maths.mq.edu.au

[cid:***@01D030BE.D37A46F0]

CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University.

Werner LEMBERG

2016-07-22 04:55:36 UTC

Permalink

> I’ve been thinking along the same lines, first trying to find a way
> to generate PDFs using pdfTeX, having no fonts included. I wasn’t
> able to achieve that.

IMHO, this would be a nice addition to luatex, pdftex, and xetex.

> However, I just ran a successful test with pdfTeX and GS as
> follows. [...]

Very interesting, thanks! Note, however, that we don't generate the
small snippets with pdftex/xetex/luatex but directly with ps2pdf from
a PS file generated by lilypond, so we fortunately don't have to jump
all the hoops you are demonstrating here. In particular, we access
all glyphs with the `glyphshow' operator, thus avoiding the need of
map files completely.

> ps2pdf -dEmbedAllFonts=false datadoc.pdf datadoc-nofonts.pdf

This we will certainly need.

> As for hyperlinks. I didn’t have any in my image PDF.

As mentioned in another mail, this was a problem with the XeTeX
support in texinfo.tex, now fixed.

> What would be nice is a way to create the images w/o any fonts,
> directly with pdfTeX, so not requiring GS at all.

Yes. Maybe someone has time to add this to the various TeX flavours!

Again, thanks a lot for your tests!

Werner