Unicode issue found (Icelandic)

A bug has been discovered in the current build of PDFTextStream (v1.2) that can result in some Icelandic characters being outputted improperly.

It has come to our attention that a bug in v1.2 of PDFTextStream may result in some Icelandic characters being outputted improperly. This issue will manifest itself only if:

  • PDFTextStream is configured with strictEncoding set to true (via PDFTextStreamOptions.setUseStrictEncoding(boolean))
  • PDFTextStream is used to extract text and metadata from a PDF containing certain Icelandic characters, including Ð (Eth), ð (eth), Þ (Thorn), and þ (thorn)

We have found the root of the problem, and a fix is being developed. A bugfix release including this fix will be released by the end of this week.

Update: This issue has been resolved, but not by a bugfix release — the issue originally arose because of a malformed PDF document. See this post for the gory details…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s