blog about text processing; in drei Sprachen; на три езика
  • Home

Index Processing – the Hard Way

17. Juli 2009 – 12:17

We had a book to set from a translated original. The index entries were taken also from the first book, translated and put in a list. So, in the first preview of the new book we set the index list as enumerate and asked the lector to mark the text with the entry numbers for index generation:

We finally received a heavy marked text as commented PDF with the index number in each comment. Almost every page looked like this:

The index had to be generated very quickly and obviously there was no way to do that by marking each entry in the source TeX file. Fortunately it was confirmed, that the book preview is OK and no page should change.

We used the Acrobat feature of summarizing the comments in one PDF file to merge all the comments with the page numbers:

We converted the PDF file with the comments to plain text and used sed &co. to obtain a LaTeX-like markup in the end:

\sachindex{196}{451,478,-454,-386,284-,35,197-,-197}
  1. \sachindex{197}{14,345-,435,382-,386-,315-,-345,-284,379-,60-,-315,-382,500}
  2. \sachindex{198}{303,-60,-386,519-,439-,268-,404-,446,58,74,43,95-,409-,83,-439,-409,-404,449,358,-268,98,67}
  3. \sachindex{199}{147,115,16-,401-,74,187-,139-,429,119,435,123,-519,-187,-379,120-,458,269-,176,96,548,187,-139,-401,299-,385-,522,-269}
  4. \sachindex{200}{94,187-,195-,531,552-,549,429,197,527,222,454,356,215,-195,-187,-385,429,341,439,526,170}
  5. \sachindex{201}{524,244,-16,-120,-552,-299,94,115-,429,315,78,99,435,-115,97-,409,269,67,-95,-97,519,604,394,603,81,425}
  6. \sachindex{202}{279,557-,404-,553-,404,432,563,215,572-,569-,513,370,228-,508,-553,-228,-569,-572,244-,358,354}
  7. \sachindex{203}{104,43,14,155,541,-557,-404,378,581,40,203,-244,284}

The first argument of the \sachindex macro is the page number, in the second argument all the index numbers are placed as a comma separated list.

The list with the numbered index entries was also converted to a number dependent macro:

  1. \sachidx{594}{Work}
  2. \sachsubidx{595}{Work!ability to work}
  3. \sachsubidx{596}{Work!mental work}
  4. \sachsubidx{597}{Work!political work}
  5. \sachsubidx{598}{Work!psychotherapeutic work}
  6. \sachsubidx{599}{Work!social therapeutic work}
  7. \sachsubidx{600}{Work!therapeutic work}
  8. \sachsubidx{601}{Work!work collectives}
  9. \sachsubidx{602}{Work!work organization}
  10. \sachidx{603}{World government}
  11. \sachsubidx{604}{World government!Utopian world government}
  12. \sachidx{605}{World Trade Center}
  13. \sachidx{606}{Xenophobia}

The last step was to program both macros to write the apropriate \indexentry in an .idx file.

  1. \newcommand*{\sachidx}[2]{%
  2.   \@namedef{sr@#1}{#2}%
  3.   \@namedef{sr@#1-}{#2|(}%
  4.   \@namedef{sr@-#1}{#2|)}%
  5. }
  6. \let\sachsubidx\sachidx
  7. \newcommand*{\sachindex}[2]{\@for \eintrag:=#2\do{%
  8.     \immediate\protected@write\@indexfile{}%
  9.     {\string\indexentry{\@nameuse{sr@\eintrag}}{#1}}}}

Special attention was required to the beginings end endings of an index range. For this purpose we generate 3 macros from an index number: normal, begin range and end range.

Finally we obtained an idx file with 3087 index markings. After processing with MakeIndex they resulted in 749 index entries.

This way a job could be completed within a few hours when normally it would take days.

Tags: comments, index, makeindex, sed, shell

Post a Comment

  • Letzte Artikel

    • Typesetting Fight Club: 3B2 vs. TeX
    • BibTeX output to XML
    • No Comment. Ohne Worte
    • “Ша” (Ш) като “Шин” (ש)
    • Die neue (Typo-)Epidemie: Ligaturen an den Wortfugen
  • Kategorien

    • Allgemein (7)
    • BibTeX (1)
    • Code (6)
    • Database Publishing (4)
    • Deutsch (8)
    • English (17)
    • Examples (6)
    • Fonts (5)
    • Kvetch (7)
    • LaTeX (8)
    • Math Examples (1)
    • Math Fonts (4)
    • Math Typography (3)
    • Mikrotypografie (1)
    • OpenType (1)
    • TeX (7)
    • Text Processing (7)
    • Typografie (3)
    • Typography (3)
    • Umbruch (3)
    • Work (8)
    • На български (3)
  • Tags

    Ж Курсив Berthold Binnen-I Binnenmajuskel comparing ConTeXt cyrillic diff Ediff Emacs englaufend Font fontinst Formelsatz Geistes- und Sozialwissenschaftler Glyph hanging index italic Katalogproduktion Lindenberg Linotype maiola math Mathesatz Minion Multiple Master online OpenType Piska raggedright register Satz Schrift Slimbach Stempel Garamond TeX tfcpr Times Trennen Umbruch Wortzwischenraum WZR zhe
TANOVSKI & PARTNERS 2.0 is proudly powered by WordPress Entries (RSS) and Comments (RSS). Designed by Bob