Computers Windows Internet

What are the text documents. Extension of a text file: types and basic aspects of determining belonging to programs


Why do you need text?

Today there are three most common text formats - TXT, RTF and DOC. What is their difference and what unites them? They have one thing in common: they all store text information. The difference lies in what formatting and word processing capabilities they provide, as well as how accessible the information they hold with regard to program compatibility is.

The simplest text format

The oldest and most modest format in terms of features. All you can do with text in this format is to do your own text input and save the paragraph breaks. This simplicity in certain situations takes on the importance of versatility and transparency: TXT is easily readable in different applications and on different platforms. In addition, many programs that do not even have their immediate task of working with text are able to save text in TXT format.

TXT processors

Since DOS times, many remember the Lexicon word processor, which was able to handle the TXT format quite high level... Today, the main tool for working with TXT is the standard Windows Notepad... Anyone for whom its functions are not enough, can always find an editor for his taste and needs in world wide web, including free. For example, using the freeware-program Vega by Konstantin Sheremetyev, you are unlikely to see a message that the opened text file is too large; according to the author's assurances, Vega version 2.04 opens files up to 2 Gb (!), and the program itself takes only 9.5 kb (compare, Notepad in Windows XP "weighs" about 65 kb); at the same time Vega is even more convenient than Notepad and does not require installation. And here is another example of the processing possibilities " plain text". The text you are reading was typed in an UltraEdit processor from IDM Computer Solutions. Its strong point is the special display and processing of the syntax of programming languages, but even with the most straightforward text, it can work wonders. Connoisseurs of convenient Russified programs, ergonomic and, most importantly, "who know a lot" about the specifics of Cyrillic encodings, should get acquainted with the Patriot program.

Formatting and versatility

Rich Text Format - this is how the abbreviation stands in the name of the format created by Microsoft. RTF is text marked up with special "control words", which allows you to perform and save quite complex formatting, insert footnotes, headers and footers, figures, tables and formulas, although in processing these additional objects RTF is inferior to the DOC format. It is inferior to DOC in the size of files: the use of "control words" for formatting text instead of a style sheet does not lead to compactness. However, RTF wins the security dispute with DOC because its internal organization does not provide for storing macrocode and, therefore, is immune to macro viruses.

RTF Processors

RTF is used as the primary or supported format in many, if not most word processing programs. A good tool can be, for example, Mikhail Morozov's Hieroglyph. This program implements not only the spelling checker of the Russian language, but also the function of automatically changing the language layout of the keyboard. Atlantis word processor from Rising Sun Solutions, existing in both commercial and free versions, will surely suit many users with the thoughtfulness of the interface, the presence of a large number shortcuts, replaceable toolbar and other functions. The already mentioned Patriot editor is also able to work with RTF.

The "largest" text format

The DOC format includes the most extensive text processing and formatting capabilities, including the creation of footnotes and comments, as well as the ability to create, place and edit tables, charts, images and other elements. True, in full and most correctly all these features are implemented only in MS Word, which is facilitated by the position of Microsoft, which does not disclose the current specifications of the popular format. Despite the fact that DOC is also "understood" by other programs, their manufacturers are not always able to ensure its correct recognition. Unlike TXT and RTF, DOC is a binary format, which makes it unreadable in simple text editors and, moreover, does not provide full compatibility of its own versions.

DOC processors

The main and, in view of the above reasons, "irreplaceable" word processor for working with DOC is MS Word, which most fully implements all the possibilities of this format. A lot of productivity and functionality is added to Word by third-party developments - all kinds of add-ons, macros and programs exist in large quantities in the vastness of the network. Competition is provided by WordPerfect from Corel, StarOffice from Sun Microsystems, and the free OpenOffice.org, for example. When working in Word and in other programs, you should be aware of the issue of format compatibility and save the document in DOC only if you are sure that incompatibility will not occur.

Applicability of formats

It is groundless to assert that one of the considered formats is worse than the others, without taking into account the peculiarities of the tasks for the solution of which they should be used. Since we will not set ourselves the task of typesetting in a word processor, the choice is almost unambiguous. To prepare volumes of text from medium to very large and to provide a "complete understanding" typed by any typesetting program, it seems most convenient to use the simplest, most compact and versatile means of typing and storing text - the TXT format. As for the use in layout of others text formats, then a lot depends on the implementation of their support in a specific prototyping program.
OpenOffice.org is an international open source project aimed at creating a universal office suite operating on different operating platforms, having open API and a file format based on XML. Basically, OpenOffice.org is a suite of programs developed within of this project... It includes: word processor, spreadsheets, graphics editor, presentation system and data access system. In terms of its capabilities, it is comparable to similar commercial programs and may well be considered as an alternative to them. OpenOffice.org is currently dual licensed under the GPL and SISSL. Despite the differences in these licenses, OpenOffice.org is free for the end user.

OpenOffice.org traces its origins to the StarOffice office suite, developed by the German firm StarDivision in the mid-1990s. In the fall of 1999, Sun acquired StarDivision. In June 2000, already under the Sun trademark, StarOffice 5.2 was released for MS Windows, Linux and Solaris. On October 13, 2000, StarOffice was open sourced (excluding some third-party modules) and is officially the birthday of OpenOffice.org. Today, both volunteers from around the world and Sun programmers work on the OpenOffice.org code.

Currently from one source code, developed by the OpenOffice.org community, two products are released: StarOffice, which adds components under a proprietary license, and the free OpenOffice.org. In OpenOffice.org, most of the proprietary components found in StarOffice have been replaced by their free counterparts.

(According to cnews.ru information.)

The set of rules by which data is stored in a file is called the file format. Various types files such as text files, bitmap graphics, etc., use a variety of formats. In general, several different formats can be defined for the same file type, although often the file type and format are understood to be the same thing. The file format is identified by the filename extension that is appended to the file name when it is saved in a specific format, such as DOC, GIF, and so on.

Typically, file formats are created for use in a well-defined application program. For example, graphic objects created in the well-known package vector graphics CorelDRAW are saved as CDR files, while images rendered by another graphics package, CorelXara, are written to disk as XAR files. Some formats are not associated with specific applications, that is, they are universal. One of the most famous universal formats is TXT format (format text files DOS).

Compression of computer files is often used to save storage space. There are many ways to compress files. These methods depend on original format files. Generally, the higher the compression ratio, the slower the read and write operations.

As for compression algorithms, there are both lossless compression algorithms and algorithms that can cause data loss.



Lossless compression ensures that all the data that was in the file before compression is present after the file is decompressed. Lossless compression mechanisms are used when storing text or numeric data such as spreadsheets or document files. Examples of lossless compression algorithms are the well-known ZIP, ARJ, and others.

Let's give short description the main formats used:

§ American Standard Code for Information Interchange ASCII (TXT). A text file format developed by the American National Standards Institute. Supported by all operating systems and all programs. It is a text file in DOS-encoding, there is no function to insert a picture, no formatting, it works in all machines, it is possible to create only small files.

§ ANSI (TXT). Format of text files in ANSI encoding(for code page Microsoft Windows)

§ MsWord for DOS, Windows (.DOC). The document format, developed by Microsoft Corporation, is supported by MS-DOS programs and most word processors. It preserves the original document formatting as well as character styles. except text information, files of this format can contain graphic images with different parameters. Supports 256 colors. Doesn't support compression. It is mainly used to exchange formatted text data between different platforms and applications.

Hypertext Markup Language HTML (HTM, HTML). Markup language for hypertext documents. All pages on the Internet are created using this special language. HTML documents are ASCII files that can be viewed and edited with any text editor. The difference from a regular text file is that HTML documents contain special tag commands that define the rule for formatting the document. If you have mastered the HTML language, then you can create pages for the Internet. By adding tags (labels) to plain text, you force the viewer to display that text in a specific way and place images on the page. If you've learned Java and JavaScript, you know how to extend the power of HTML by putting scripting commands inside tags.

§ Portable Document Format PDF (.PDF). This document storage format, developed by Adobe, claims to be an open typographic standard for the Web. It is seen as an alternative to HTML. The disadvantage of HTML is that documents translated to HTML usually do not retain their original format, and HTML offers a very limited number of typefaces when viewed. In contrast, users of Acrobat and PDF tools for creating, distributing and viewing documents in their original format know that readers will see the publication exactly as it was made. The PDF format is indispensable if you need to receive exact copy required document. As an example of the successful use of PDF for documents in Russian, let us cite the "Moscow News" server on the Internet. Presented on it in in electronic format materials completely repeat the paper original, printed by typographic method.

§ Standard Generalized Markup Language (SGML). The evolution of HTML is translated as standard generic markup language. It is a toolkit of mechanisms for creating structured documents, marked with descriptors (tags). Compared to HTML, it provides more flexible and versatile formatting options on the Web. However, SGML is also faster, so PDF is used as a simpler tool. The power of SGML lies in its cross-platform structured approach to describing the content of documents. SGML is actually a metalanguage, i.e. is intended to describe the markup languages ​​used when creating documents.

Every PC user is constantly faced with various formats of text files, but hardly thinks about how rich the history of these formats and programs is, which gave a person the ability to read books, work with text and create all the necessary documentation right on a computer.

The history of text files is not much younger than themselves personal computers- already their masterpieces were written in the first analogs of the modern "notebook". So what are the formats of text files and programs for working with them? First you need to understand what text files are for, what are the differences between them and what they have in common. It unites absolutely all text formats, their main task is to preserve text information. They differ in processing capabilities and also in access to information stored in files in terms of compatibility with other programs.

Traditionally, the simplest text format is the TXT format. It is also the most modest in features and the oldest text format. Due to its simplicity (TXT's capabilities are limited to typing and breaking it into paragraphs), this format is often used by a huge number of applications and programs on a variety of platforms.

With the proliferation of personal computers and the increase in their sales, Microsoft is creating another popular format called the Rich Text Format (or simply RTF). It is text that is marked up with certain "control words" that allow not only to produce, but also to save complex formatting elements and insert formulas, tables, figures, headers and footers and footnotes into the text.

However, RTF is quite inferior in capabilities to the DOC format, also created by Microsoft specifically for a software package called Microsoft Office... Created more than fifteen years ago, DOC includes a huge number of options for formatting and processing text, creating, editing and placing images, charts, tables and other elements. It should be noted that these functions will work most correctly only in MS Word. This is primarily due to the fact that Microsoft does not have the current specifications for the DOC format and does not allow its competitors and independent developers to use the capabilities of this format to the fullest. It is this fact that is one of the main reasons that, in addition to the DOC format, other text file formats are widely used in our time.

The main difference between the DOC format and text and TXT is its binary nature, due to which it becomes unreadable in such simple ones as Wordpad, Lexicon, Atlantis. Moreover, in some cases, you can observe the incompatibility of DOC files created in different versions of MS Word.

Text file formats can be opened and edited in a huge number of programs. Besides the previously mentioned MS Word, the most common are StarOffice from Sun Microsystems, WordPerfect from Corel and free package OpenOffice.org.

With the proliferation of electronic reading devices, other text file formats are gaining popularity, for example, FB2 and LRF.

In order to be able to use different text formats on different platforms, a large number of programs have been created, called converters. Text file converters allow you to save the original text from one format to another and use it later on different devices and platforms.

Converters are used not only to save text from one format to another, but also to create files that, unlike their sources, can be used on devices that are not able to "read" the original files. For example, some e-books that do not support popular text file formats can easily recognize the LRF or FB2 formats obtained from the original files using conversion programs.

We come across text files (documents) almost every day. However, in this matter, one should not confuse the extension of the text file and the text format of the data, they are different things. Let's try to determine what files of this type are, and what they are.

Text file extension: what is it?

Let's start with the fact that, as a rule, most files of this type have a three-letter extension after the separator (period). The simplest and most common type is files with the .txt extension, opened in the same Windows systems using a standard "Notepad".

However, despite the generally accepted rules, file extensions of text documents can contain not three letters, but more (their number can reach twelve, but not more), for example, files e-books.djvu. In addition, numbers may be present in the extension.

What gives us consideration of the classification of a text file (document) in terms of their identification? And the fact that one glance allows you to instantly solve an important question: what extension do text files have, such is the program associated with them for opening or editing. In many cases, you can almost always identify the original application in which such a file was created.


However, do not forget that today you can find a fairly large number of files with the same extensions, but created in other applications or associated with different programs. It would seem that a regular file with the extension .doc (.docx) initially corresponds to a text editor Microsoft Word... But you can open it or save it in this form in another, even on "apple" computers. This also includes, so to speak, a mixed type - .pdf files containing not only text, but also graphics. But after all Word documents may contain inserted images.

This is precisely what serves as an indicator that the extension of a text file testifies to the universality of the format itself, which is the most "readable" regardless of even the used operating system... The same goes for any type.

File extension types: text

In general, today such a huge number of text formats and their extensions are known that, probably, practically no specialist will undertake to count their full number.


Yes, of course, the universality of such documents is beyond doubt, especially if they use the simplest. But sometimes the problem may be that not every encoding is supported by a particular system or program. That is why a set of symbols appears on the screen instead of the usual letters.

As for the varieties of text files, you can't list them all. The most common are .txt, .doc, .tex, .text, .pdf, .log, .apt, .ttf, .err, .sub, .djvu, .odt, .rtf and many others. The list is endless.

Most interestingly, many of these types of files have different roles in the system. For example, in addition to the usual .sub subtitle file, a normal text document .txt can be responsible for viewing them when opening a video, and in this regard, many formats are interchangeable.


Notice even executable files can have text as content. The simplest example- a .bat file created in a regular "Notepad" and containing text in the form of a set of commands. At startup, the commands are executed, and the "Open with ..." menu is used for editing, unless this process is initially associated with another action.


A similar situation is observed with documents that use markup or programming languages, for example, .html, .htm, .xml files, etc. Even web pages can be opened natively in many editors as text files containing third-party elements.

Changing the extensions of text files

As for changing the extension, sometimes it can be changed, for example, .txt to .doc and vice versa. Word editor will open any type. The same applies to the .txt - .bat pair when opened in Notepad. But in other cases, it is better not to perform such manipulations, it simply will not lead to anything, and another application will not be able to open the renamed file. Changing the format will have to be done with the help of special converter programs.

Instead of an afterword

As it is already clear, the extension of a text file can have many variations, depending on the program in which the document was created. But, as well as in other cases, by the extension itself it is almost always possible to determine the originally associated application, in extreme cases, open it with any other program that supports work with this type of data, even if the original application is absent on the computer. And as it is probably already clear, it is text files that, in fact, are the most widespread and universal in the computer world, regardless of the software packages and operating systems used.

The most commonly used type of data in the computer world and on the Internet is text. Video and graphics are much more colorful and in general it is better to see once than hear a hundred times. Well, it's also good to hear - for this case, there are audio data formats. However, unpretentious and modest letters and numbers rule the computer ball. Without them, you can't even give a name to another file. Text data is important and varied - these are books, and documents, and program code... And there are different format options for each purpose. It is about them that will be discussed in this article. True, one should immediately make one reservation - this review will not touch upon the formats of e-books, they deserve a separate discussion. Here we will talk about the formats of the documents.

Text format - TXT (PlainText)

So - the simplest possible, the TXT format. This is the text in its essentially pure and uncomplicated form. Contains only the content of the text and the absolute minimum of service data - characters for the beginning and end of the text, carriage return, and the like.

Despite its almost Spartan simplicity, the format is not devoid of variations and differences. First, there are some differences between Windows, Unix and MacOS versions that use different line terminators. Also, differences may be due to the use of 8-bit (ASCII) or 16-bit (UNICODE) code pages.

However, despite this, the TXT format is extremely versatile, for which it is very popular among programmers and system administrators.

MS Office document formats and analogues - DOC, DOCX, RTF, ODT

For all its versatility and simplicity, TXT is absolutely unsuitable for creating the actual documents - texts intended for printing in compliance with certain rules and regulations. Since such documents, in addition to the text itself, must also contain a lot of information about the design and formatting of the text. And also about the format and size of the sheet of paper where they should be placed.

For these purposes, quite numerous formats of various office suites have been created. The most popular and in fact close to universal can be considered the MS Word formats - doc and docx. The first is a special closed format created by Microsoft for its text editor (more precisely, a whole line of formats - during its existence, it has been improved several times). Along with it, at the dawn of the company's development, the RTF (Rich Text Format) format was created in cooperation with the Adobe corporation. Unlike DOC, the structure of this format is available and it is successfully supported by almost all existing text editors... Although it is somewhat inferior to DOC in terms of the set of available functions.

The closed nature of Microsoft's developments led to the creation of an open office suite, Open Office. For which was developed proprietary format ODT ( OpenDocument Text Format). The format is not well supported by commercial editors, including MS Word and may open with errors.

Finally, in 2007, Microsoft decided to abandon the bet on the DOC format and developed the Office Open XML format family, which includes DOCX, which has become the main format for new versions of MS Word.

PDF format

Refusing to cooperate with Microsoft, Adobe moved its own way. She developed PDF format which was a format not so much for developing documents as for viewing and printing them. Unlike the previous group, which are formatted text, the appearance of which may nevertheless change depending on which particular machine it is displayed or printed on, PDF is a document format that is fundamentally unchanged and preserves appearance and layout in any conditions. It also supports a fairly wide range of both printing elements and additional services (for example password protection of a document from editing or printing, and so on). All this makes PDF more of a format for distributing complex and professionally executed documents and even books.