Importing and Exporting Word Documents
From version 4.10 onwards the ‘Pro’ versions of TW and EW have extended their ability to import Word files to cover the ‘Word 8’ format. They can now import Word 6, 7, and 8 documents, including ‘quicksaved’ files. The Word 8 file format is saved by Office 97 (IBM) and Office 98 (Mac). Some versions of Office 97 will save an RTF file when asked to ‘save as Word 7’. However, this is OK so far as as EW/TW users are concerned as they both import RTF quite happily!
So far as the user is concerned, exporting and importing is normally the simple ‘drag and drop’ process, and all the conversions take place ‘behind the scenes’. However, it can be useful to understand what is going on, either because – like me – you are inquisitive, or to ensure that the process works as well as possible. The information on this page should help you understand just what is going on when you drag and drop Word files...
Note: to save space I’ll refer to ‘TW’ in what follows although I normally mean ‘EW and TW Pro’.
The Sections on this page are:
The ability to import embedded graphics has been extended. Some graphics types are now handled by TW itself, others require !ImageFS to be active and then use this. The format of Word 8 is considerably more complex than Word 6/7 and – naturally! – completely different! Graphics in Word 8 can now be encapsulated in a new ‘Office Art’ format. This is roughly equivalent to the way a Sprite can be placed inside a DrawFile. Earlier forms of Word also support an OLE embedding system to permit graphics files which Word itself does not understand. As a result of this, the topic of graphics handling on import is a nightmare! It is inevitable, given the mess of competing formats (some of which have no available public specification, or specs which are incorrect) that some graphics in some Word files cannot be imported correctly. That said, Icon Tech are continuously monitoring this area and add new formats as and when they can understand them.
With the exceptions of JPEGs, PNGs, and GIFs, all graphics import requires that !ImageFS must be running.
Importing Vector formats
The only vector format which can currently be imported is the WMF. This is translated into a DrawFile. However, the resulting DrawFile will also contains a new object-type which ‘hides’ the original WMF data. This is not shown by !Draw or other DrawFile editors. (Although some, like !Vector, may warn the user about an unknown type of object if you try loading one of these DrawFiles.) The hidden WMF data is stored inside the DrawFile so that it is available if the EW/TW document is re-exported as a new Word file. DrawFiles can be saved out of EW/TW and will keep this ‘hidden’ data for re-use if loaded again into another EW/TW document. The disadvantage of this approach is that the DrawFiles contain an ‘undocumented/hidden’ object and are larger than otherwise would be the case. The advantage is that the resulting DrawFile can be used to create an equivalent WMF in an exported Word file. If you wish, you can get rid of the hidden WMF data by saving the DrawFile, loading it into a suitable editor, selecting the normal contents and saving this selection as a new file which no longer contains the ‘hidden’ WMF.
You may find that some imported WMFs do not show the expected characters. For example, in electronics circuit diagrams (Ohms) symbols may have become capital ‘W’s. It is sometimes possible to correct this using the font map of !ImageFS to match RiscOS and WMF font names. This map is held inside the file !ImageFS.Fontmap and is in a fairly easy-to-edit text format.
In general, when ‘alien’ bitmap formats are detected in the imported Wordfile they are offered to !ImageFS for conversion into sprites. Hence the range of embedded graphic types which can be handled is essentially defined by what !ImageFS can accept. However, there are three important exceptions to the rule which are ‘special cases’ – JPEGs, PNGs, and GIFs. In practice, GIFs are rarely encountered in Word files, so their conversion process is described in detail on the page devoted to html (webpage) import and export.
If possible, TW will seek to find a way to display (render) the JPEG whilst preserving the internal data in JPEG format. This is useful if you want to save the JPEG picture as it minimises any loss of details caused by repeated conversions. It is also efficient as the TW document can store the compact JPEG format instead of the larger sprite. From Version 3.60 onwards of RiscOS the operating system provides support for displaying JPEGs just as if they were Sprites. Hence this is normally the preferred route for viewing JPEGs in imported Word files (and webpages).
TW therefore initially asks RiscOS if it is able to display the JPEG it has found embedded in an imported document. If it can, display is handled by the operating system. This is the ideal situation.
Unfortunately, there are two reasons why this does not always work.
- The JPEG is actually a ‘family’ of file formats. One of the newer members of this family is the ‘progressive’ JPEG. This has the advantage on webpages that – like the interlaced GIF – it appears as a rough picture quite quickly. Unfortunately, RiscOS does not currently understand progressive JPEGs (or some other newer forms of JPEG).
- Versions of RiscOS older than 3.60 can’t handle JPEGs as if they were sprites.
As a result EW/TW may discover that RiscOS refuses to display the JPEG.When this happens it tries to send the JPEG to !ChangeFSI for conversion.
- If the failure was due to the JPEG being ‘progressive’ and the OS is 3.60+ EW/TW asks !ChangeFSI to convert the data into an older form of JPEG that the OS can handle, saves the returned result and gets the OS to display it. This will only work if the available version of !ChangeFSI is new enough (Version 1.15RC or newer). The result is an ‘old format’ JPEG stored in the TW document and displayed by the RiscOS.
- If the OS can’t cope with JPEGs, TW asks !ChangeFSI to return a sprite which it then stores and displays in the usual way.
This underlying process can quite complex, but is hidden from the user, and it does mean that TW will show the JPEG picture if at all possible. It will also attempt to conserve memory usage by preserving JPEG compression if it can. In order to work, a suitable version of !ChangeFSI must have been ‘seen’ by the filer or Filer_Boot’ed at Boot. (This will usually be true as a result of the default Boot sequence.)
As TW initially tries to use the OS or !ChangeFSI when JPEGs are involved you can sometimes export/import Word files without any need for other ‘helper’ applications. However, in general, it is advisable to have !ImageFS running when attempting to import or export Word files, so that it can convert other types of file into RiscOS formats. However, a complication is that !ImageFS may be set to ‘auto’ intercept any attempts to load/save JPEGs. This may mean that when TW tries sending a JPEG to !ChangeFSI the file is ‘intercepted’ by !ImageFS. This can have two unwanted effects.
- Firstly, current versions of !ImageFS can’t handle ‘progressive’ JPEGs. Hence this may cause conversion to fail even when !ChangeFSI can translate the JPEG.
- Secondly, the JPEG will become a sprite rather than a conventional JPEG. Hence the filesize is increased unnecessarily
For these reasons it is advisable to switch off !ImageFS’s ‘auto’ handling of JPEGs when a new version of !ChangeFSI is available and you are using OS3.60+.
(A similar rule applies to GIF import/export where !InterGIF is to be preferred to !ImageFS, but this is mostly relevant to html import/export and is therefore discussed on the page on html import/export.)
EW/TW Pro now include code to display PNGs in the EW/TW document. Hence PNGs can be imported embedded in Word files. These will then be displayed and treated by EW/TW Pro just as if they were sprites. Hence they can also be saved as selections in these efficient, platform independent, formats. In effect, this means that EW/TW Pro now treat PNGs as if they are a RiscOS ‘native’ bitmap format!
This is a particularly useful new feature as PNGs – in addition to many of the best properties of JPEGs and GIFs – provide graded transparency . This feature has been implemented in EW/TW. As a consequence PNGs provide the ability to obtain better results in some circumstances – notably against coloured backgrounds in exported webpages – than using the more usual Sprites and JPEGs. In addition, TW/EW Pro can now be used in conjunction with the ‘screen grab’ facilities of applications like !Paint or !Snapper to convert the displayed PNG against its background into a Sprite!
Exporting Word files
When exporting a TW document as a Word file, all bitmaps are converted to BMPs. (This is to pander to the limitations of many older versions of Word!) TW equations are also converted into BMPs. This does mean that the resulting equations cannot be edited by Word, but they are anti-aliased – hence their appearance gives windows users a chance to see what properly displayed text should look like! The exported Word file is nominally in a word 7 format, but should not upset earlier wordprocessors that expect Word 6, although some rare minor newer features may ‘go missing’.
TW is able to generate the BMPs itself, so !ImageFS is not required for graphics export to a Word document. However, if !ImageFS is running, TW will send the graphics to it for conversion on the assumption that the user has chosen to prefer that !ImageFS handle the conversions.
DrawFiles are also converted to BMPs unless they have been previously imported from a WMF file embedded in a Word document and contain the ‘hidden’ object described earlier. If they do contain such a hidden object, this is used to place an equivalent WMF in the exported Word file. The advantage of this is that the vector format of the Word original is preserved. The disadvantage is that you can’t easily edit the drawfile and have the changes exported to Word again.
In general, features which Word and EW/TW have in common (lists, footnotes, etc) will be translated to their equivalent and other features which are not common will be ignored.
Back to the main TechWriter Tips page for more info...
Pages written using TW Pro and HTMLEdit
Content and pages maintained by: Jim Lesurf (firstname.lastname@example.org)