[pdftex] pdftex - Encoding for metafont PK fonts

Post by Pali RohÃ¡r
Hello,
is there any way to specify encoding for metafont bitmap PK fonts when
running pdftex in pdf output? To make pdf file selectable and
searchable.
For Type 1 pfb fonts it can be specified in pdftex.map file: "<8r.enc".
How to achieve same effect for type 3 PK fonts?

âI would convert the mf font to type1 using mftrace.

Thanh
â

Pali Rohár

2016-06-17 20:24:51 UTC

I would convert the mf font to type1 using mftrace.
Thanh

I do not want any mf --> type1 converting... Just use PK font as is.

(Btw that conversion cannot be done automatically and needs manual work)

--
Pali Rohár
***@gmail.com

Pali Rohár

2016-06-23 11:27:49 UTC

Hi! Anybody has idea how to do that?

--
Pali Rohár
***@gmail.com

Paul Vojta

2016-06-23 19:01:52 UTC

Hi! Anybody has idea how to do that?
--
Pali Rohár

Same way.

Paul Vojta, ***@math.berkeley.edu

Pali Rohár

2016-06-24 11:42:28 UTC

Post by Pali RohÃ¡r
Hello,
is there any way to specify encoding for metafont bitmap PK fonts
when running pdftex in pdf output? To make pdf file selectable
and searchable.
"<8r.enc". How to achieve same effect for type 3 PK fonts?

Hi! Anybody has idea how to do that?

Same way.

How same way? That is not possible, as specifying font in pdftex.map
means that font *has* Type 1 variant which pdftex try to use. So PK
fonts cannot be specified in pdftex.map and so enc file cannot be too...

Or is there any why how to do that? At least I do not know. And I would
be happy if somebody point me how.

--
Pali RohÃ¡r
***@gmail.com

Paul Vojta

2016-06-24 21:10:04 UTC

Hi! Anybody has idea how to do that?

Same way.

How same way? That is not possible, as specifying font in pdftex.map
means that font *has* Type 1 variant which pdftex try to use. So PK
fonts cannot be specified in pdftex.map and so enc file cannot be too...
Or is there any why how to do that? At least I do not know. And I would
be happy if somebody point me how.

I'm sorry, I didn't ready your email carefully. That is what I would do
for using PostScript Type 3 fonts.

For PK fonts as produced by MetaFont (which are not PS Type 3), probably
the way to go would be to create a virtual font to achieve the reencoding.
That is the standard way to go in the dvi driver world.

For information on virtual fonts: use Google.

Paul Vojta, ***@math.berkeley.edu

Reinhard Kotucha

2016-06-24 22:21:01 UTC

Post by Paul Vojta
For information on virtual fonts: use Google.

IMO a good starting point is

https://tug.org/TUGboat/tb11-1/tb27knut.pdf

Regards,
Reinhard

--
------------------------------------------------------------------
Reinhard Kotucha Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover mailto:***@web.de
------------------------------------------------------------------

Pali Rohár

2016-06-25 10:46:06 UTC

Post by Paul Vojta
For PK fonts as produced by MetaFont (which are not PS Type 3),
probably the way to go would be to create a virtual font to achieve
the reencoding. That is the standard way to go in the dvi driver
world.

Ok, to make it clear. I have PK font file (automatically generated from
MetaFont when running pdftex) and this PK font is in IL2 encoding.

pdftex see input file in IL2 encoding and so it correctly handle font
rendering and use metrics, kerns, ligatures, ...

When I print PDF document generated by pdftex, everything is OK, it
looks perfectly.

Problem is if I open that PDF document in PDF reader and I want to
select and copy text from document. Or when I want to search it.

All PDF readers which I tested think that text in that PDF document is
in Latin1 encoding, not Latin2 (IL2) in which is that PK font.

So e.g. when I select character 'Ä' PDF reader copy 'Ãš'. 'Ä' is at
position 0xE8 in Latin2 and at position 0xE8 in Latin1 is 'Ãš'.

Maybe PDF readers could think that font is not in Latin1, but in Unicode
as IIRC Unicode at positions 128-255 have same characters as Latin1
encoding.

Unicode character U+00E8 is for sure 'Ãš'. So I bet this is reason why
PDF reader thinks that I selected character 'Ãš' and not 'Ä'.

For Type 1 PFB fonts (even in IL2 encoding) this is not a problem,
because for each characters there is stored unified glyph name and there
is standard conversion table from glyph name to unicode character.

So probably in PK fonts is not any conversion table from 8bit character
to unicode character and so something (pdftex? PDF reader?) assume
either Latin1 or Unicode.

In detail my question is: How to tell pdftex encoding of PK font
(generated from MetaFont)?

Post by Paul Vojta
For information on virtual fonts: use Google.

With above detailed description, are you sure that virtual fonts could
do this unicode mapping?

Are not virtual fonts again only 8bit (as opposite of glyph names and
unicode)?

--
Pali RohÃ¡r
***@gmail.com

Paul Vojta

2016-06-25 20:52:23 UTC

[snip]

So e.g. when I select character 'č' PDF reader copy 'è'. 'č' is at
position 0xE8 in Latin2 and at position 0xE8 in Latin1 is 'è'.

I presume that your character 'č' is U+010D (LATIN SMALL LETTER C WITH
CARON), which is not in the range 0-255. So, given your comments below,
virtual fonts would not be able to support this character.

Maybe PDF readers could think that font is not in Latin1, but in Unicode
as IIRC Unicode at positions 128-255 have same characters as Latin1
encoding.
Unicode character U+00E8 is for sure 'è'. So I bet this is reason why
PDF reader thinks that I selected character 'è' and not 'č'.
For Type 1 PFB fonts (even in IL2 encoding) this is not a problem,
because for each characters there is stored unified glyph name and there
is standard conversion table from glyph name to unicode character.
So probably in PK fonts is not any conversion table from 8bit character
to unicode character and so something (pdftex? PDF reader?) assume
either Latin1 or Unicode.

Yes, I agree that this is likely the case. I do know that in PK fonts,
there is only a character (or no character) for each of the positions 0-255,
with no character names or additional coding information.

In detail my question is: How to tell pdftex encoding of PK font
(generated from MetaFont)?

Post by Paul Vojta
For information on virtual fonts: use Google.

With above detailed description, are you sure that virtual fonts could
do this unicode mapping?
Are not virtual fonts again only 8bit (as opposite of glyph names and
unicode)?

Yes, virtual fonts are only 8 bits. There are things called "omega
virtual fonts" which I think allow for larger-numbered characters,
but I don't know whether pdftex supports them. I think that luatex does.

Paul Vojta, ***@math.berkeley.edu

Pali Rohár

2016-06-25 21:21:16 UTC

Post by Paul Vojta
For PK fonts as produced by MetaFont (which are not PS Type 3),
probably the way to go would be to create a virtual font to
achieve the reencoding. That is the standard way to go in the
dvi driver world.

[snip]

Post by Pali RohÃ¡r
So e.g. when I select character 'Ä' PDF reader copy 'Ãš'. 'Ä' is at
position 0xE8 in Latin2 and at position 0xE8 in Latin1 is 'Ãš'.

I presume that your character 'Ä' is U+010D (LATIN SMALL LETTER C
WITH CARON), which is not in the range 0-255.

Yes.

Post by Paul Vojta
So, given your
comments below, virtual fonts would not be able to support this
character.

I thought.

Post by Pali RohÃ¡r
Maybe PDF readers could think that font is not in Latin1, but in
Unicode as IIRC Unicode at positions 128-255 have same characters
as Latin1 encoding.
Unicode character U+00E8 is for sure 'Ãš'. So I bet this is reason
why PDF reader thinks that I selected character 'Ãš' and not 'Ä'.
For Type 1 PFB fonts (even in IL2 encoding) this is not a problem,
because for each characters there is stored unified glyph name and
there is standard conversion table from glyph name to unicode
character.
So probably in PK fonts is not any conversion table from 8bit
character to unicode character and so something (pdftex? PDF
reader?) assume either Latin1 or Unicode.

Yes, I agree that this is likely the case. I do know that in PK
fonts, there is only a character (or no character) for each of the
positions 0-255, with no character names or additional coding
information.

When specifying Type 1 PFB font, it needed to add it into pdftex font
map file (primitive \pdfmapfile). And map line allows to specify
encoding vector file. That file contains for each character 0-255
position glyph name. And pdftex primitive \pdfglyphtounicode then maps
glyph name to unicode character. So For PFB fonts it is possible to do
that 0-255 position to unicode mapping.

But it is pity that it is not possible to specify that enc file also for
PK fonts generated by MetaFont. Or it is somehow possible?

Post by Pali RohÃ¡r
In detail my question is: How to tell pdftex encoding of PK font
(generated from MetaFont)?

Post by Paul Vojta
For information on virtual fonts: use Google.

With above detailed description, are you sure that virtual fonts
could do this unicode mapping?
Are not virtual fonts again only 8bit (as opposite of glyph names
and unicode)?

Luatex is unicoded and it is possible to create virtual font which remap
latin2 to unicode (yesterday I tried that). But my question is about
pdftex right now.

--
Pali RohÃ¡r
***@gmail.com

Ross Moore

2016-06-25 22:56:48 UTC

Hello Pali, and Paul

On Jun 26, 2016, at 7:21 AM, Pali RohÃ¡r <***@gmail.com<mailto:***@gmail.com>> wrote:

Maybe PDF readers could think that font is not in Latin1, but in
Unicode as IIRC Unicode at positions 128-255 have same characters
as Latin1 encoding.

Unicode character U+00E8 is for sure 'Ãš'. So I bet this is reason
why PDF reader thinks that I selected character 'Ãš' and not 'Ä'.

For Type 1 PFB fonts (even in IL2 encoding) this is not a problem,
because for each characters there is stored unified glyph name and
there is standard conversion table from glyph name to unicode
character.

So probably in PK fonts is not any conversion table from 8bit
character to unicode character and so something (pdftex? PDF
reader?) assume either Latin1 or Unicode.

What you need is a CMAP resource, which gets associated
with the Font-Descriptor dictionary, not with the font itself.

This is what is done with PFA and PFB fonts, and others.
So I donât see why you cannot also do this with PK fonts.

The question then becomes âwho or what creates the CMAP?â.

pdfTeX has a primitive \pdfgentounicode which if set to 1 (or higher)
causes an attempt to create the CMAP internally, based upon glyph names
and using a standard list of font character names.
Extra info can be provided by the \pdfglyphtounicode primitive,
as you have encountered in an earlier posting.

But Metafont-produced pk-fonts tend to use lazy generic names for
characters, such as /a1, /a2, /a3, etc.
This can imply non-uniqueness across several fonts, so is likely to be
unsuitable if you need to provide CMAP resources for several fonts
using the same character names.

The alternative is to construct the full CMAP resource externally,
as a text file. Then the contents of this file is loaded into the PDF
using pdfTeXâs \pdffontattr primitive, which reads in the file as a stream,
and creates the correct dictionary entry.

For details on how this can be done in TeX coding, consult the package
cmap.sty .
Look at files such as ot1.cmap, t1.cmap, t2.cmap etc. for the structure
of the kind of data file that is needed. These encode the unicode
mapping of the numbered character slots in a font.

Also it may be helpful to examine the coding that Iâve included below,
for attaching a CMAP resource to Xy-pic directional fonts,
e.g., for arrow-heads.
This assumes that LaTeX can find a private file called: xyd.cmap .
Another primitive \pdfnobuiltintounicode disables the attempt to create
the CMAP internally.

Yes, I agree that this is likely the case. I do know that in PK
fonts, there is only a character (or no character) for each of the
positions 0-255, with no character names or additional coding
information.

When specifying Type 1 PFB font, it needed to add it into pdftex font
map file (primitive \pdfmapfile). And map line allows to specify
encoding vector file. That file contains for each character 0-255
position glyph name. And pdftex primitive \pdfglyphtounicode then maps
glyph name to unicode character. So For PFB fonts it is possible to do
that 0-255 position to unicode mapping.

As outlined above.

But it is pity that it is not possible to specify that enc file also for
PK fonts generated by MetaFont. Or it is somehow possible?

I see no reason why not, but could easily be wrong.
But I must admit that Iâve not tried it with a PK font.

Font outlines have been the preferred technology for ~20 years
or more, so Iâve not had the need with bit-mapped fonts.

In detail my question is: How to tell pdftex encoding of PK font
(generated from MetaFont)?

For information on virtual fonts: use Google.

With above detailed description, are you sure that virtual fonts
could do this unicode mapping?

Are not virtual fonts again only 8bit (as opposite of glyph names
and unicode)?

Yes, virtual fonts are only 8 bits. There are things called "omega
virtual fonts" which I think allow for larger-numbered characters,
but I don't know whether pdftex supports them. I think that luatex
does.

Luatex is unicoded and it is possible to create virtual font which remap
latin2 to unicode (yesterday I tried that). But my question is about
pdftex right now.

Try what I suggest above.

% Supply CMAP files for Xy-pic's arrowhead fonts
% otherwise an Accessibility check fails for encoding of arrow tips.
%
\def\***@xyd@encoding{xyd}
\def\***@support@xyarrows{%
\IfFileExists{\***@xyd@encoding.cmap<mailto:***@encoding.cmap>}%
{\***@load@xyd}%
{\***@inhibitload@xyd}%
}

\def\***@load@xyd{%
\immediate\pdfobj stream file {\***@xyd@encoding.cmap<mailto:***@encoding.cmap>}\relax
\xdef\***@set@***@xyd##1{%
\noexpand\expandafter\pdffontattr\noexpand##1 {/ToUnicode \the\pdflastobj\space 0 R}}%
}
\def\***@inhibitload@xyd{\gdef\***@set@***@xyd##1{}}

% standard Xy tips
\def\***@xyd@***@xy{%
\pdfnobuiltintounicode\xyatipfont
\***@set@***@xyd{\xyatipfont}%
\pdfnobuiltintounicode\xybtipfont
\***@set@***@xyd{\xybtipfont}%
}
% CM-style Xy tips
\def\***@xyd@***@cm{%
\pdfnobuiltintounicode\xy@@atfont
\***@set@***@xyd{\xy@@atfont}%
\pdfnobuiltintounicode\xy@@btfont
\***@set@***@xyd{\xy@@btfont}%
}
% rebind the \UseTips macro
\def\***@UseTips{%
\***@UseTips
\***@xyd@***@cm
}

\AtBeginDocument{%
\@ifpackageloaded{xy}{% activate CMaps for Xy-pic arrows
\***@support@xyarrows
\***@xyd@***@xy
\let\***@UseTips\UseTips
\let\UseTips\***@UseTips
}{}%
}
--
Pali RohÃ¡r
***@gmail.com<mailto:***@gmail.com>

Hope this helps,

Ross

Dr Ross Moore

Mathematics Dept | Level 2, S2.638 AHH
Macquarie University, NSW 2109, Australia

T: +61 2 9850 8955 | F: +61 2 9850 8114<tel:%2B61%202%209850%209695>
M:+61 407 288 255<tel:%2B61%20409%20125%20670> | E: ***@mq.edu.au<mailto:***@mq.edu.au>

http://www.maths.mq.edu.au<http://mq.edu.au/>

[cid:***@01D030BE.D37A46F0]<http://mq.edu.au/>

CRICOS Provider Number 00002J. Think before you print.
Please consider the environment before printing this email.<http://mq.edu.au/>

This message is intended for the addressee named and may
contain confidential information. If you are not the intended
recipient, please delete it and notify the sender. Views expressed
in this message are those of the individual sender, and are not
necessarily the views of Macquarie University.<http://mq.edu.au/>

Pali Rohár

2016-06-25 23:39:47 UTC

Post by Pali RohÃ¡r
But it is pity that it is not possible to specify that enc file also
for PK fonts generated by MetaFont. Or it is somehow possible?
I see no reason why not, but could easily be wrong.
But I must admit that Iâve not tried it with a PK font.

Ok, take font csb12 (part of csfonts package) which has only metafont
source code (and generated metric tfm file). No Type 1 PFB file.
Encoding vector for csfons is stored in csr.enc file.

If I try this (\char232 is 'Ä')

\nopagenumbers
\pdfmapline{+csb12 <csr.enc}
\font\csb=csb12
\csb \char232
\bye

then pdftex show me warning:

pdfTeX warning: pdftex: invalid entry for `csb12': both ps_name and
font file missing

and font csb12.600pk is included. But still if I select in PDF viewer
that character I see 'Ãš'.

If I change mapline to "\pdfmapline{+csb12 csb12 <csr.enc}" then pdftex
show me warning:

pdfTeX warning: pdftex: No flags specified for non-embedded font
`csb12' (csb12) (I'm using 34): fix your map entry.

pdfTeX warning: pdftex: font `csb12' is not a standard font; I suppose
it is available to your PDF viewer then

And now *no* font is included in PDF file, which is wrong.

If I change mapline to "\pdfmapline{+csb12 csb12 <csb12 <csr.enc}" then
pdftex show me error:

!pdfTeX error: pdftex (file csb12): cannot open Type 1 font file for
reading
==> Fatal error occurred, no output PDF file produced!

And no PDF file is generated.

So I have really no idea how to specify csr.enc file for bitmap PK font.
It is somehow possible? If not, could it be possible to extend and patch
pdftex to support it?

====

About cmap, I already created and using cmap files for PK fonts. For
current selected font this simple tex code load "file.cmap":

\pdfnobuiltintounicode\font%
\immediate\pdfobj stream file {file.cmap}%
\pdffontattr\font{/ToUnicode \the\pdflastobj\space 0 R}%

(I have macro around it, for easy use)

But I thought that creating cmap file by hand and manually inserting
"/ToUnicode" into PDF file is a big hack and there could be some
"normal" solution to let pdftex generate cmap file (like for PFB fonts).

As all needed information for generating cmap file are stored in
"csr.enc" file (which is specified in mapline for PFB files) and then in
glyphtounicode.tex.

--
Pali RohÃ¡r
***@gmail.com

Akira Kakuto

2016-06-26 02:08:56 UTC

Post by Pali RohÃ¡r
and font csb12.600pk is included. But still if I select in PDF viewer
that character I see ...

What happens with the following:

\nopagenumbers
\pdfmapline{+csb12 <csr.enc}
\pdfglyphtounicode{xxxxx}{010D}
\font\csb=csb12
\csb \char232
\bye

Here xxxxx is the glyph name of \char232 in csr.enc.

Best,
Akira

Pali Rohár

2016-06-26 10:48:02 UTC

Post by Pali RohÃ¡r
and font csb12.600pk is included. But still if I select in PDF
viewer that character I see ...

\nopagenumbers
\pdfmapline{+csb12 <csr.enc}
\pdfglyphtounicode{xxxxx}{010D}
\font\csb=csb12
\csb \char232
\bye
Here xxxxx is the glyph name of \char232 in csr.enc.
Best,
Akira

See full output:

$ cat test.tex
\nopagenumbers
\pdfmapline{+csb12 <csr.enc}
\pdfglyphtounicode{ccaron}{010D}
\font\csb=csb12
\csb \char232
\bye

$ pdftex test.tex
This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012/Debian)
restricted \write18 enabled.
entering extended mode
(./test.tex{/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map}

pdfTeX warning: pdftex: invalid entry for `csb12': both ps_name and font file m
issing
[1] ) </home/pali/.texmf-var/fonts/pk/ljfour/public/cs/csb12.600pk>
Output written on test.pdf (1 page, 1487 bytes).
Transcript written on test.log.

$ cat test.log
This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012/Debian) (format=pdftex 2016.6.12) 26 JUN 2016 12:43
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
**test.tex
(./test.tex{/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map}

pdfTeX warning: pdftex: invalid entry for `csb12': both ps_name and font file m
issing
[1] ) </home/pali/.texmf-var/fonts/pk/ljfour/public/cs/csb12.600pk>
Output written on test.pdf (1 page, 1487 bytes).
PDF statistics:
13 PDF objects out of 1000 (max. 8388607)
8 compressed objects within 1 object stream
0 named destinations out of 1000 (max. 500000)
1 words of extra memory for PDF output out of 10000 (max. 10000000)

$ pdftotext test.pdf -
Ãš

So I get warnings about invalid entry in mapline and still 'Ãš' is selected.

--
Pali RohÃ¡r
***@gmail.com

Reinhard Kotucha

2016-06-26 05:54:19 UTC

Post by Pali RohÃ¡r
But it is pity that it is not possible to specify that enc file
also for PK fonts generated by MetaFont. Or it is somehow possible?
I see no reason why not, but could easily be wrong.
But I must admit that I’ve not tried it with a PK font.

Dear Ross,
I assume that the main problem is that Metafont is using numerical
glyph indices but in order to re-encode a font, it's required that
each glyph has a name.

In order to use PK fonts with dvips or pdftex, they have to be
converted to Type 3 at least. Converting them to Type 1 is preferred,
of course. In Metafont glyphs have numerical indices, try

tftopl $(kpsewhich csr10.tfm) | less

I suppose that by default the encoding of Knuth's Computer Modern
fonts is used. I don't know whether it's sufficient to define glyphs
in ISO-8859-2 order in Metafont.

In order to apply an encoding vector and use the facilities provided
by dvips and pdftex, it's necessary to replace numeric glyph indices
with names. If the initial encoding is arbitrary, manual work cannot
be avoided.

The best solution is to convert fonts created by Metafont to Type 1,
one way or another.

Post by Pali RohÃ¡r
Font outlines have been the preferred technology for ~20 years or
more, so I’ve not had the need with bit-mapped fonts.

Of course. But the main reason you don't need PK fonts anymore is
that at least the most important fonts are available as Type 1
nowadays.

For instance, 20 years ago the output of musictex was brilliant on
paper but ugly on screen. A lot happened since then.

There are several tools nowadays which are supposed to convert fonts
created with Metafont to Type 1. They either try to understand the
Metafont source code (MetaType1, by Boguslaw Jackowski) or they create
a high-resolution PK file and scan the outlines (textrace, by Peter
Szabo).

The former is preferred because programs which trace outlines of
bitmaps cannot reliably distinguish between an angle and a curve, even
at high resolutions.

On the other hand, when Tigran Aivazian was typesetting "The Hebrew
Old Testament" in TeX, he used textrace in order to convert the Hebrew
fonts created by Yannis Haralambous with Metafont to Type 1 and the
result is amazingly good.

Regards,
Reinhard

Werner LEMBERG

2016-06-26 06:24:09 UTC

Post by Reinhard Kotucha
There are several tools nowadays which are supposed to convert fonts
created with Metafont to Type 1. They either try to understand the
Metafont source code (MetaType1, by Boguslaw Jackowski) or they
create a high-resolution PK file and scan the outlines (textrace, by
Peter Szabo).

There's also Scott Pakin's `mf2pt1' tool (which we use for creating
the LilyPond OpenType fonts today) and Han-Wen Nienhuys's `mftrace'
(which was used for LilyPond fonts before we massaged the MetaFont
source files for good results with `mf2pt1'), which internally uses
either `autotrace' or `potrace'.

Werner

Pali Rohár

2016-06-26 11:35:22 UTC

Post by Reinhard Kotucha

Post by Pali RohÃ¡r
But it is pity that it is not possible to specify that enc file
also for PK fonts generated by MetaFont. Or it is somehow
possible?
I see no reason why not, but could easily be wrong.
But I must admit that Iâve not tried it with a PK font.

Dear Ross,
I assume that the main problem is that Metafont is using numerical
glyph indices but in order to re-encode a font, it's required that
each glyph has a name.

Yes! And that glyph name for each numberical value (index) is available
in encode file (csr.enc). This is reason why I'm tryint to tell pdftex:

"hey pdftex, please use csr.enc for my csb12 PK font, it contains
mapping index --> glyph name which you need to building cmap file"

Post by Reinhard Kotucha
In order to use PK fonts with dvips or pdftex, they have to be
converted to Type 3 at least.

IIRC pdftex internally convert all PK fonts to Type 3 when building PDF
document. As PDF format does not support PK fonts. And Type 3 fonts are
allowed to use full PostScript language (which support bitmaps). So PK
fonts are just "inserted" into PS as bitmaps and included as Type 3
fonts.

Post by Reinhard Kotucha
Converting them to Type 1 is
preferred, of course. In Metafont glyphs have numerical indices,
try
tftopl $(kpsewhich csr10.tfm) | less
I suppose that by default the encoding of Knuth's Computer Modern
fonts is used. I don't know whether it's sufficient to define glyphs
in ISO-8859-2 order in Metafont.
In order to apply an encoding vector and use the facilities provided
by dvips and pdftex, it's necessary to replace numeric glyph indices
with names. If the initial encoding is arbitrary, manual work cannot
be avoided.

File csr.enc contains vector of glyph names. So for each character
number from metric csr10.tfm there is glyph name. So no manual work
should be needed. All information is available.

Or I'm missing something?

Post by Reinhard Kotucha
The best solution is to convert fonts created by Metafont to Type 1,
one way or another.

This conversion cannot be done without manual work and sometimes can be
hard... Anyway, I'm not interested in conversion, I just want to use
existing PK (MetaFont) font :-)

--
Pali RohÃ¡r
***@gmail.com

Pali Rohár

2016-06-26 20:56:40 UTC

Post by Reinhard Kotucha

Post by Pali RohÃ¡r
But it is pity that it is not possible to specify that enc file
also for PK fonts generated by MetaFont. Or it is somehow
possible?
I see no reason why not, but could easily be wrong.
But I must admit that Iâve not tried it with a PK font.

Dear Ross,
I assume that the main problem is that Metafont is using numerical
glyph indices but in order to re-encode a font, it's required that
each glyph has a name.

Yes! And that glyph name for each numberical value (index) is
available in encode file (csr.enc). This is reason why I'm tryint to
"hey pdftex, please use csr.enc for my csb12 PK font, it contains
mapping index --> glyph name which you need to building cmap file"

Ok, it is really not possible with pdftex. I looked into pdftex source
code and code for writing PK & Type 3 fonts in PDFtex does not use
encoding vectors...

Also code for loading mapfile cannot be used for PK fonts (checker
refuse it).

I looked deeply how pdftex working with Type 3 and PK fonts and created
small patch which adds support for generating /ToUnicode object (from
\pdfglyphtounicode table) and which allows to load enc file also PK
fonts.

Patch was created against pdftex in TeXLive 2012 (which is on my
system), but is really small and should be very easy to rewrite/apply it
on new versions.

Patch pdftex-pkfonts-encfile-tounicode.patch is attached.

Let me know what do you think about it and if it can be added to pdftex
project.

--
Pali RohÃ¡r
***@gmail.com

Pali Rohár

2016-06-26 21:11:57 UTC

Post by Reinhard Kotucha

Post by Pali RohÃ¡r
But it is pity that it is not possible to specify that enc
file also for PK fonts generated by MetaFont. Or it is
somehow possible?
I see no reason why not, but could easily be wrong.
But I must admit that Iâve not tried it with a PK font.

Dear Ross,
I assume that the main problem is that Metafont is using
numerical glyph indices but in order to re-encode a font, it's
required that each glyph has a name.

Yes! And that glyph name for each numberical value (index) is
available in encode file (csr.enc). This is reason why I'm tryint
"hey pdftex, please use csr.enc for my csb12 PK font, it contains
mapping index --> glyph name which you need to building cmap file"

Ok, it is really not possible with pdftex. I looked into pdftex
source code and code for writing PK & Type 3 fonts in PDFtex does
not use encoding vectors...
Also code for loading mapfile cannot be used for PK fonts (checker
refuse it).
I looked deeply how pdftex working with Type 3 and PK fonts and
created small patch which adds support for generating /ToUnicode
object (from \pdfglyphtounicode table) and which allows to load enc
file also PK fonts.
Patch was created against pdftex in TeXLive 2012 (which is on my
system), but is really small and should be very easy to rewrite/apply
it on new versions.
Patch pdftex-pkfonts-encfile-tounicode.patch is attached.
Let me know what do you think about it and if it can be added to
pdftex project.

Ops, in patch from previous email is small problem with indexes and
notdef. Fixed in new attached pdftex-pkfonts-encfile-tounicode-v2.patch

--
Pali RohÃ¡r
***@gmail.com

Ross Moore

2016-06-26 23:25:30 UTC

Hello Pali,

On Jun 27, 2016, at 7:11 AM, Pali RohÃ¡r <***@gmail.com<mailto:***@gmail.com>> wrote:

Yes! And that glyph name for each numberical value (index) is
available in encode file (csr.enc). This is reason why I'm tryint
to tell pdftex:

"hey pdftex, please use csr.enc for my csb12 PK font, it contains
mapping index --> glyph name which you need to building cmap fileâ

OK. But why do you need csb12 ?
That is, why not csbx12 which *is* available in PFB format,
and is properly mapped.
viz.
SCI:vector ross$ grep csb `kpsewhich pdftex.map` | grep -v fcsb | grep -v TeXGyre | grep -v Roman
csb10 <csb10.pfb
csbx10 <csbx10.pfb
csbx12 <csbx12.pfb
csbx5 <csbx5.pfb
csbx6 <csbx6.pfb
csbx7 <csbx7.pfb
csbx8 <csbx8.pfb
csbx9 <csbx9.pfb
csbxsl10 <csbxsl10.pfb
csbxti10 <csbxti10.pfb

These fonts (and corresponding medium/regular weights)
are handled in LaTeX using csfonts.sty .

The encoding seems to be a subset of XL2. viz.

% @psencodingfile{
% author = "Petr Olsak, Zdenek Wagner",
% date = "19oct12",
% filename = "xl2.enc",
% license = "public domain",
% email = "tex-***@tug.org<mailto:tex-***@tug.org>",
% codetable = "ISO/ASCII",
% docstring = "
% some of our (CSTUG- czech TeX Users Group) users want to support
% 8bit font coding such that:
% -- lower 7bit is exactly OT1 (but with differences imposed
% by DEK -- e.g. layout of cmr is different from cmtt)
% -- upper part is taken from ISO-Latin 2 (iso 8859-2),
% but some of empty positions are filled with useful characters
% usually available in type-1 font (permill sign etc.)
% "
% }

Changing your test as follows works fine, at least for some characters, in Plain TeX.

\pdfcompresslevel 0
\pdfobjcompresslevel 0
\nopagenumbers
\pdfgentounicode 1
\input glyphtounicode.tex
%\pdfglyphtounicode{ccaron}{010D}
\font\csb=csbx12
\csb
\char232 \char233 %\char234 %\char235 %
\char236 \char237 %\char238
\char239
\bye

Ä.Ä.Ä

(I hope 5 characters are showing for you here.
If not, my mail client could be at fault.)
Just extract them yourself from the PDF attached here.

Pali Rohár

2016-06-26 23:37:09 UTC

Post by Ross Moore
OK. But why do you need csb12 ?
That is, why not csbx12 which *is* available in PFB format,
and is properly mapped.

Of course fonts which have Type 1 variant (PFB format) working fine.

In some cases I want csb12 which is "thinner" as csbx12. Reason is
exactly same why Knuth created more font faces of CM.

--
Pali RohÃ¡r
***@gmail.com

Ross Moore

2016-06-27 08:11:01 UTC

On Jun 27, 2016, at 9:25 AM, Ross Moore <***@mq.edu.au<mailto:***@mq.edu.au>> wrote:

Certainly one can attach a /ToUnicode map to the /Type3 font.

And if you build it correctly, it actually works!

However there is something else missing in the way the /Font dictionary
is built in that case, which prevents Acrobat from using that CMap.
The lack of encoding vector and /CharSet may well be the problem,
or it may be some other piece of font metadata that is needed.

Try this example, using the attached .cmap file (which is not fully
complete â I leave that to you to finish :-).
It produces the attached PDF, from which you can Copy/Paste correctly.

%%%%%% âââ start of example TeX source âââ %%%%%
\pdfcompresslevel 0
\pdfobjcompresslevel 0
\nopagenumbers
\pdfgentounicode 1
\input glyphtounicode.tex

\font\csb=csb12
\immediate\pdfobj stream file {cs1.cmap} % 1 obj
\pdffontattr \csb{/ToUnicode \the\pdflastobj\space 0 R}
%
\csb
\char232 \char233 %\char234 %\char235 %
\char236 \char237 %\char238
\char239 \char"F2 \char"F3 \char"F4 \char"F6 \char"F8 \char"F9 \char"FA \char"FC \char"FD \char"FE \char"FF

\font\csr=csr12
\csr
\char232 \char233 %\char234 %\char235 %
\char236 \char237 %\char238
\char239

\bye
%%%%%% âââ end of example TeX source âââ %%%%%

Pali Rohár

2016-06-27 09:27:50 UTC

Post by Ross Moore
On Jun 27, 2016, at 9:25 AM, Ross Moore
Certainly one can attach a /ToUnicode map to the /Type3 font.
And if you build it correctly, it actually works!

Yes, manually including cmap works.

Post by Ross Moore
However there is something else missing in the way the /Font
dictionary is built in that case, which prevents Acrobat from using
that CMap. The lack of encoding vector and /CharSet may well be the
problem, or it may be some other piece of font metadata that is
needed.
Try this example, using the attached .cmap file (which is not fully
complete â I leave that to you to finish :-).
It produces the attached PDF, from which you can Copy/Paste
correctly.

I have fully complete cmap files (which I generated from PFB fonts) and
that is working. I wrote it in email which I sent yesterday.

--
Pali RohÃ¡r
***@gmail.com

Reinhard Kotucha

2016-06-28 00:22:58 UTC

Post by Ross Moore
$ grep csb `kpsewhich pdftex.map` | grep -v fcsb | grep -v TeXGyre
| grep -v Roman

Dear Ross,
the many invocations of grep -v are not necessary anymore.

It always bothered me a lot in the past that it was nearly impossible
to find out where all these entries in psfonts.map an pdftex.map come
from.

When I adapted the updmap Perl script which Fabrice wrote for Windows
in order to make it work on all platforms, I couldn't resist to
provide a log file which hopefully makes life easier.

In order to locate the log file, just run updmap[-sys] and the
location of the log file is printed to screen.

It's usually

TEXMFSYSVAR/web2c/updmap.log (updmap-sys)

or

TEXMFVAR/web2c/updmap.log (updmap)

The log file contains lines like

/path/to/csfonts.map:

followed by TFM names, like

csb10
csbx10
csbx12
...

I deliberately removed everything from the map entries except the TFM
names on order to keep the file small. TFM names have to be unique
anyway.

In order to determine where the mapfile entry "csr10" comes from,
search for "csr10" in the log file and then search backwards for a
line ending with a colon.

And since files like

texmf-dist/fonts/map/dvips/cs/csfonts.map

are copied to the map files used by dvips and pdftex, it's much easier
to consult these files instead of scanning the derived files and make
heavy use of grep -v.

It's a pity that only very few people are aware of the updmap log file.

Regards,
Reinhard

Karl Berry

2016-06-27 21:47:09 UTC

pdftex-pkfonts-encfile-tounicode-v2.patch
...

Post by Pali RohÃ¡r
Let me know what do you think about it and if it can be added to
pdftex project.

At first blush, it seems plausible, but it needs study.

Simple question: are you ok with your code being released under
GPLv2-or-later?

More complex: can you please provide a small self-contained example that
shows the new feature? So I can see what happens in action.

Ideally: can you provide a patch (starting point) for the documentation?
Don't worry about English or style or whatever; the actual content of
what users should know is what's important.

Thanks,
Karl

Pali Rohár

2016-06-27 22:04:45 UTC

Post by Pali RohÃ¡r
pdftex-pkfonts-encfile-tounicode-v2.patch
...

Post by Pali RohÃ¡r
Let me know what do you think about it and if it can be added
to pdftex project.

At first blush, it seems plausible, but it needs study.
Simple question: are you ok with your code being released under
GPLv2-or-later?

No problem.

Post by Pali RohÃ¡r
More complex: can you please provide a small self-contained example
that shows the new feature? So I can see what happens in action.

\nopagenumbers
\pdfmapline{+csb12 <csr.enc}
\pdfglyphtounicode{ccaron}{010D}
\pdfgentounicode=1
\font\csb=csb12
\csb \char232
\bye

Char 232 in font bitmap PK csb12 is 'Ä' (LATIN SMALL LETTER C WITH
CARON). If you pdftex without my patch for generating pdf, then if you
select that character in pdf viewer you will see 'Ãš'. Because 232 is in
hex 0xE8 and unicode U+00E8 is 'Ãš'.

My patch adding support for specifying ENC file for PK fonts. Before my
patch \pdfmapline primitive was used only for Type 1, OpenType and
TrueType fonts. Now it is possible to use \pdfmapline to specify also PK
font and <file.enc syntax for reencoding.

Here file csr.enc contains mapping for font csb12 and there is specified
that on position 232 is glyph with name /ccaron.

Post by Pali RohÃ¡r
Ideally: can you provide a patch (starting point) for the
documentation?

What do you mean by this? I do not understand what is starting point to
the documentation...

Post by Pali RohÃ¡r
Don't worry about English or style or whatever; the
actual content of what users should know is what's important.

--
Pali RohÃ¡r
***@gmail.com

Karl Berry

2016-06-28 21:36:41 UTC

find out where all these entries in psfonts.map an pdftex.map come
from.

When Norbert again revised updmap, he made psfonts.map and pdftex.map
themselves include the map file names, as in:

..
% unc.map
uncb8r CenturySchL-Bold " TeXBase1Encoding ReEncodeFont " <8r.enc <uncb8a.pfb
uncbi8r CenturySchL-BoldItal " TeXBase1Encoding ReEncodeFont " <8r.enc <uncbi8a.pfb
..
% universalis.map
UniversalisADFStd-Bold-lf-ly1--base UniversalisADFStd-Bold " AutoEnc_xtabpflmit75u4zfhhmqrkrh7a ReEncodeFont " <[unvsl_xtabpf.enc <UniversalisADFStd-Bold.pfb
UniversalisADFStd-Bold-lf-ly1--lcdfj UniversalisADFStd-BoldLCDFJ " " <UniversalisADFStd-BoldLCDFJ.pfb
..
etc. etc. -k

Reinhard Kotucha

2016-06-28 22:52:10 UTC