What is dlp. Data Leak Prevention - DLP. Consider an incident response algorithm

Even the most fashionable IT terms should be used appropriately and as correctly as possible. At least in order not to mislead consumers. It has definitely become fashionable to classify yourself as a DLP solution maker. For example, at the recent CeBIT-2008 exhibition, the inscription “DLP solution” could often be seen on the stands of manufacturers of not only little-known antiviruses and proxy servers in the world, but even firewalls. Sometimes there was a feeling that around the next corner you could see some kind of CD ejector (a program that controls the opening of a CD drive) with the proud slogan of a corporate DLP solution. And, oddly enough, each of these manufacturers, as a rule, had a more or less logical explanation for such positioning of their product (naturally, in addition to the desire to get a “gesheft” from a fashionable term).

Before considering the market of DLP system manufacturers and its main players, it is necessary to define what we mean by a DLP system. There have been many attempts to define this class of information systems: ILD&P - Information Leakage Detection & Prevention (“identifying and preventing information leaks”, the term was proposed by IDC in 2007), ILP - Information Leakage Protection (“protection against information leaks”, Forrester , 2006), ALS - Anti-Leakage Software (“anti-leakage software”, E&Y), Content Monitoring and Filtering (CMF, Gartner), Extrusion Prevention System (similar to Intrusion-prevention system).

But the name DLP - Data Loss Prevention (or Data Leak Prevention, protection against data leaks), proposed in 2005, was nevertheless established as a commonly used term. The phrase “systems for protecting confidential data from insider threats”. At the same time, internal threats are understood as abuses (intentional or accidental) by employees of the organization who have legal rights to access the relevant data, their powers.

The most coherent and consistent criteria for belonging to DLP systems were put forward by the research agency Forrester Research during their annual study of this market. They proposed four criteria according to which a system can be classified as a DLP. one.

Multichannel. The system must be able to monitor several possible channels of data leakage. In a network environment, this is at least e-mail, Web and IM (instant messengers), and not just scanning mail traffic or database activity. On the workstation - monitoring of file operations, work with the clipboard, as well as control of e-mail, Web and IM. 2.

unified management. The system must have unified means of managing the information security policy, analysis and reporting of events across all monitoring channels. 3.

Active protection. The system should not only detect violations of the security policy, but also, if necessary, enforce it. For example, block suspicious messages. 4.

Based on these criteria, Forrester selected 12 software vendors for review and evaluation in 2008 (they are listed below in alphabetical order, with the name of the company acquired by this vendor to enter the DLP market in parentheses) :

code green;
InfoWatch;
McAfee (Onigma);
Orchestria;
reconnex;
RSA/EMC (Tablus);
Symantec (Vontu);
Trend Micro (Provilla);
Verdasys;
Vericept;
Websense(PortAuthority);
workshare.

To date, of the above 12 vendors, only InfoWatch and Websense are represented on the Russian market to one degree or another. The rest either do not work in Russia at all, or have just announced their intentions to start selling DLP solutions (Trend Micro).

Considering the functionality of DLP systems, analysts (Forrester, Gartner, IDC) introduce a categorization of protection objects - types of information objects to be monitored. Such categorization allows, as a first approximation, to assess the scope of a particular system. There are three categories of monitoring objects.

1. Data-in-motion (data in motion) - e-mail messages, Internet pagers, peer-to-peer networks, file transfer, Web traffic, and other types of messages that can be transmitted over communication channels. 2. Data-at-rest (stored data) - information on workstations, laptops, file servers, specialized storages, USB devices and other types of data storage devices.

3. Data-in-use (data in use) - information currently being processed.

At the moment, about two dozen domestic and foreign products with some properties of DLP systems are represented on our market. Brief information about them in the spirit of the above classification is listed in Table. 1 and 2. Also in the table. 1 introduced such a parameter as “centralized data storage and audit”, which implies the ability of the system to store data in a single depository (for all monitoring channels) for their further analysis and audit. This functionality has recently become of particular importance, not only due to the requirements of various legislative acts, but also due to popularity among customers (according to the experience of implemented projects). All information contained in these tables is taken from public sources and marketing materials of the respective companies.

Based on the data in tables 1 and 2, we can conclude that today there are only three DLP systems in Russia (from InfoWatch, Perimetrix and WebSence). They also include the recently announced integrated product from Jet Infosystems (SKVT + SMAP), as it will cover several channels and have a unified management of security policies.

It is rather difficult to talk about the market shares of these products in Russia, since most of the mentioned manufacturers do not disclose sales volumes, the number of clients and protected workstations, limiting themselves only to marketing information. We can only say for sure that the main suppliers at the moment are:

Dozor systems, present on the market since 2001;
InfoWatch products sold since 2004;
WebSense CPS (began to be sold in Russia and around the world in 2007);
Perimetrix (a young company whose first version of its products was announced on its website at the end of 2008).

In conclusion, I would like to add that belonging or not to the class of DLP systems does not make products worse or better - it's just a matter of classification and nothing more.

Table 1. Products on the Russian market with certain properties of DLP systems

Company	Product	Product features
Company	Product	Protection of "data in motion" (data-in-motion)	Data-in-use protection	Data-at-rest protection	Centralized storage and auditing
InfoWatch	I.W. Traffic Monitor	Yes	Yes	Not	Yes
InfoWatch	I.W. CryptoStorage	Not	Not	Yes	Not
perimeter	SafeSpace	Yes	Yes	Yes	Yes
Jet Infosystems	Dozor Jet (SKVT)	Yes	Not	Not	Yes
Jet Infosystems	Dozor Jet (SMAP)	Yes	Not	Not	Yes
Smart Line Inc.	DeviceLock	Not	Yes	Not	Yes
SecurIT	Zlock	Not	Yes	Not	Not
	SecrecyKeeper	Not	Yes	Not	Not
SpectorSoft	Spector 360	Yes	Not	Not	Not
Lumension Security	Sanctuary Device Control	Not	Yes	Not	Not
websense	Websense Content Protection	Yes	Yes	Yes	Not
Informzaschita	security studio	Not	Yes	Yes	Not
Primetek	insider	Not	Yes	Not	Not
AtomPark Software	staffcop	Not	Yes	Not	Not
SoftInform	SearchInformServer	Yes	Yes	Not	Not

Table 2. Compliance of products presented on the Russian market with the criteria for belonging to the class of DLP systems

Company	Product	Criterion of belonging to DLP systems
Company	Product	Multichannel	Unified Management	Active protection	Accounting for both content and context
InfoWatch	I.W. Traffic Monitor	Yes	Yes	Yes	Yes
perimeter	SafeSpace	Yes	Yes	Yes	Yes
“Jet Infosystems”	“Dozor Jet” (SKVT)	Not	Not	Yes	Yes
“Jet Infosystems”	"Dozor Jet" (SMAP)	Not	Not	Yes	Yes
Smart Line Inc.	DeviceLock	Not	Not	Not	Not
SecurIT	Zlock	Not	Not	Not	Not
Smart Protection Labs Software	SecrecyKeeper	Yes	Yes	Yes	Not
SpectorSoft	Spector 360	Yes	Yes	Yes	Not
Lumension Security	Sanctuary Device Control	Not	Not	Not	Not
websense	Websense Content Protection	Yes	Yes	Yes	Yes
"Informzaschita"	security studio	Yes	Yes	Yes	Not
Primetek	insider	Yes	Yes	Yes	Not
"AtomPark Software"	staffcop	Yes	Yes	Yes	Not
"SoftInform"	SearchInformServer	Yes	Yes	Not	Not
"Infodefense"	"Infoperimeter"	Yes	Yes	Not	Not

We offer a range of markers to help you get the most out of any DLP system.

DLPsystems: what is it

Recall that DLP-systems (Data Loss / Leak Prevention) allow you to control all channels of the company's network communication (mail, Internet, instant messaging systems, flash drives, printers, etc.). Protection against information leakage is achieved due to the fact that agents are installed on all computers of employees that collect information and transfer it to the server. Sometimes information is collected through a gateway using SPAN technologies. The information is analyzed, after which the system or the security officer makes decisions on the incident.

So, your company has implemented a DLP system. What steps need to be taken to make the system work effectively?

1. Correctly configure security rules

Let's imagine that in a system serving 100 computers, a rule was created "Fix all correspondence with the word" contract "". Such a rule will provoke a huge number of incidents in which the real leak can be lost.

In addition, not every company can afford to maintain an entire staff of employees who track incidents.

To increase the utility rate of rules, you can use tools to create effective rules and track the results of their work. Every DLP system has a feature that allows you to do this.

In general, the methodology involves analyzing the accumulated database of incidents and creating various combinations of rules, which ideally lead to the appearance of 5-6 really urgent incidents per day.

2. Update safety rules at regular intervals

A sharp decrease or increase in the number of incidents is an indicator that the rules need to be adjusted. The reasons may be that the rule has lost its relevance (users have stopped accessing certain files) or employees have learned the rule and no longer perform actions prohibited by the system (DLP - training system). However, practice shows that if one rule is learned, then in a neighboring place the potential risks of leakage have increased.

You should also pay attention to seasonality in the work of the enterprise. During the year, key parameters related to the specifics of the company's work may change. For example, for a wholesale supplier of small equipment, bicycles will be relevant in spring, and snow scooters in autumn.

3. Consider an incident response algorithm

There are several approaches to incident response. When testing and running DLP systems, most often people are not notified about changes. The participants in the incidents are only observed. When a critical mass is accumulated, a representative of the security department or the personnel department communicates with them. In the future, work with users is often left at the mercy of representatives of the security department. There are mini-conflicts, negative accumulates in the team. It can spill out in the deliberate sabotage of employees in relation to the company. It is important to strike a balance between the requirement of discipline and maintaining a healthy atmosphere in the team.

4. Check the operation of the blocking mode

There are two modes of response to an incident in the system - fixing and blocking. If every fact of forwarding a letter or attaching an attached file to a flash drive is blocked, this creates problems for the user. Often, employees attack the system administrator with requests to unlock some of the functions, the management may also be dissatisfied with such settings. As a result, the DLP system and the company receive a negative, the system is discredited and unmasked.

5. Check if a trade secret regime has been introduced

Allows you to make certain information confidential, and also obliges any person who knows about it to bear full legal responsibility for its disclosure. In the event of a serious leakage of information under the trade secret regime operating at the enterprise, the violator can recover the amount of actual and moral damage through the court in accordance with 98-FZ "On Trade Secrets".

We hope that these tips will help reduce the number of unintentional leaks in companies, because DLP systems are designed to successfully deal with them. However, one should not forget about the complex information security system and the fact that intentional information leaks require special attention. There are modern solutions that allow you to complement the functionality of DLP systems and significantly reduce the risk of intentional leaks. For example, one of the developers offers an interesting technology - when confidential files are accessed suspiciously often, the webcam automatically turns on and starts recording. It was this system that made it possible to record how the unlucky kidnapper actively took screenshots using a mobile camera.

Oleg Necheukhin, Information Systems Security Expert, Kontur.Security

28.01.2014 Sergei Korablev

The choice of any enterprise-level product is not a trivial task for technical specialists and decision makers. Choosing a Data Leak Protection (DLP) data loss prevention system is even more difficult. The lack of a unified conceptual system, regular independent comparative studies and the complexity of the products themselves force consumers to order pilot projects from manufacturers and independently conduct numerous tests, determining the range of their own needs and correlating them with the capabilities of the systems being tested

Such an approach is certainly correct. A balanced, and in some cases even hard-won decision simplifies further implementation and avoids disappointment in the operation of a particular product. However, the decision-making process in this case can be delayed, if not for years, then for many months. In addition, the constant expansion of the market, the emergence of new solutions and manufacturers further complicate the task of not only choosing a product for implementation, but also creating a preliminary shortlist of suitable DLP systems. Under such conditions, up-to-date reviews of DLP systems are of undoubted practical value for technical specialists. Should a particular solution be included in the test list, or would it be too complex to implement in a small organization? Can the solution be scaled to a company of 10,000 employees? Can a DLP system control business-critical CAD files? An open comparison will not replace thorough testing, but will help answer basic questions that arise at the initial stage of the DLP selection process.

Members

The most popular (according to the Anti-Malware.ru analytical center as of mid-2013) DLP systems of InfoWatch, McAfee, Symantec, Websense, Zecurion and Jet Infosystem companies in the Russian information security market were selected as participants.

For the analysis, commercially available versions of DLP systems were used at the time of preparation of the review, as well as documentation and open reviews of products.

Criteria for comparing DLP systems were selected based on the needs of companies of various sizes and industries. The main task of DLP systems is to prevent leaks of confidential information through various channels.

Examples of products from these companies are shown in Figures 1-6.

Figure 3 Symantec product

Figure 4. InfoWatch product

Figure 5. Websense product

Figure 6. McAfee product

Operating modes

Two main operating modes of DLP systems are active and passive. Active - usually the main mode of operation, which blocks actions that violate security policies, such as sending sensitive information to an external mailbox. Passive mode is most often used at the stage of system configuration to check and adjust settings when the proportion of false positives is high. In this case, policy violations are recorded, but restrictions on the movement of information are not imposed (Table 1).

In this aspect, all the considered systems turned out to be equivalent. Each of the DLPs can work both in active and passive modes, which gives the customer a certain freedom. Not all companies are ready to start operating DLP immediately in blocking mode - this is fraught with disruption of business processes, dissatisfaction on the part of employees of controlled departments and claims (including justified ones) from management.

Technology

Detection technologies make it possible to classify information that is transmitted via electronic channels and identify confidential information. Today, there are several basic technologies and their varieties, similar in essence, but different in implementation. Each technology has both advantages and disadvantages. In addition, different types of technologies are suitable for analyzing information of different classes. Therefore, manufacturers of DLP solutions try to integrate the maximum number of technologies into their products (see Table 2).

In general, the products provide a large number of technologies that, if properly configured, provide a high percentage of recognition of confidential information. DLP McAfee, Symantec and Websense are rather poorly adapted for the Russian market and cannot offer users support for "language" technologies - morphology, transliteration analysis and masked text.

Controlled channels

Each data transmission channel is a potential channel for leaks. Even one open channel can negate all the efforts of the information security service that controls information flows. That is why it is so important to block the channels that are not used by employees for work, and control the rest with the help of leak prevention systems.

Despite the fact that the best modern DLP systems are capable of monitoring a large number of network channels (see Table 3), it is advisable to block unnecessary channels. For example, if an employee works on a computer only with an internal database, it makes sense to disable his access to the Internet altogether.

Similar conclusions are also valid for local leakage channels. True, in this case it can be more difficult to block individual channels, since ports are often used to connect peripherals, I / O devices, etc.

Encryption plays a special role in preventing leaks through local ports, mobile drives and devices. Encryption tools are quite easy to use, their use can be transparent to the user. But at the same time, encryption allows you to exclude a whole class of leaks associated with unauthorized access to information and the loss of mobile drives.

The situation with the control of local agents is generally worse than with network channels (see Table 4). Only USB devices and local printers are successfully controlled by all products. Also, despite the importance of encryption noted above, such a possibility is present only in certain products, and the forced encryption function based on content analysis is present only in Zecurion DLP.

To prevent leaks, it is important not only to recognize sensitive data during transmission, but also to limit the distribution of information in a corporate environment. To do this, manufacturers include tools in DLP systems that can identify and classify information stored on servers and workstations in the network (see Table 5). Data that violates information security policies must be deleted or moved to secure storage.

To detect confidential information on the corporate network nodes, the same technologies are used as to control leaks through electronic channels. The main difference is architectural. If network traffic or file operations are analyzed to prevent leakage, then stored information - the contents of workstations and network servers - is examined to detect unauthorized copies of confidential data.

Of the considered DLP systems, only InfoWatch and Dozor-Jet ignore the use of means for identifying information storage locations. This is not a critical feature for electronic leak prevention, but it greatly limits the ability of DLP systems to proactively prevent leaks. For example, when a confidential document is located within a corporate network, this is not an information leak. However, if the location of this document is not regulated, if the information owners and security officers do not know about the location of this document, this can lead to a leak. Unauthorized access to information is possible or the appropriate security rules will not be applied to the document.

Ease of management

Characteristics such as ease of use and control can be as important as the technical capabilities of solutions. After all, a really complex product will be difficult to implement, the project will take more time, effort and, accordingly, finances. An already implemented DLP system requires attention from technical specialists. Without proper maintenance, regular auditing and adjustment of settings, the quality of recognition of confidential information will drop dramatically over time.

The control interface in the native language of the security officer is the first step to simplify the work with the DLP system. It will not only make it easier to understand what this or that setting is responsible for, but will also significantly speed up the process of configuring a large number of parameters that need to be configured for the system to work correctly. English can be useful even for Russian-speaking administrators for an unambiguous interpretation of specific technical concepts (see Table 6).

Most solutions provide quite convenient management from a single (for all components) console with a web interface (see Table 7). The exceptions are the Russian InfoWatch (there is no single console) and Zecurion (there is no web interface). At the same time, both manufacturers have already announced the appearance of a web console in their future products. The lack of a single console in InfoWatch is due to the different technological basis of the products. The development of its own agency solution was discontinued for several years, and the current EndPoint Security is the successor to a third-party product, EgoSecure (formerly known as cynapspro), acquired by the company in 2012.

Another point that can be attributed to the disadvantages of the InfoWatch solution is that to configure and manage the flagship DLP product InfoWatch TrafficMonitor, you need to know a special scripting language LUA, which complicates the operation of the system. Nevertheless, for most technical specialists, the prospect of improving their own professional level and learning an additional, albeit not very common, language should be perceived positively.

The separation of system administrator roles is necessary to minimize the risks of preventing the appearance of a superuser with unlimited rights and other machinations using DLP.

Logging and reporting

The DLP archive is a database that accumulates and stores events and objects (files, letters, http requests, etc.) recorded by the system's sensors during its operation. The information collected in the database can be used for various purposes, including for analyzing user actions, for saving copies of critical documents, as a basis for investigating information security incidents. In addition, the base of all events is extremely useful at the stage of implementing a DLP system, since it helps to analyze the behavior of the DLP system components (for example, to find out why certain operations are blocked) and to adjust security settings (see Table 8).

In this case, we see a fundamental architectural difference between Russian and Western DLPs. The latter do not archive at all. In this case, DLP itself becomes easier to maintain (there is no need to maintain, store, backup and study a huge amount of data), but not to operate. After all, the archive of events helps to configure the system. The archive helps to understand why the transmission of information was blocked, to check whether the rule worked correctly, and to make the necessary corrections to the system settings. It should also be noted that DLP systems need not only initial configuration during implementation, but also regular “tuning” during operation. A system that is not properly maintained, not brought up by technical specialists, will lose a lot in the quality of information recognition. As a result, both the number of incidents and the number of false positives will increase.

Reporting is an important part of any activity. Information security is no exception. Reports in DLP systems perform several functions at once. First, concise and understandable reports allow heads of information security services to quickly monitor the state of information security without going into details. Second, detailed reports help security officers adjust security policies and system settings. Thirdly, visual reports can always be shown to top managers of the company to demonstrate the results of the DLP system and the information security specialists themselves (see Table 9).

Almost all competing solutions discussed in the review offer both graphical, convenient for top managers and heads of information security services, and tabular reports, more suitable for technical specialists. Graphical reports are missing only in DLP InfoWatch, for which they were lowered.

Certification

The question of the need for certification for information security tools and DLP in particular is open, and experts often argue on this topic within professional communities. Summarizing the opinions of the parties, it should be recognized that certification itself does not provide serious competitive advantages. At the same time, there are a number of customers, primarily government organizations, for which the presence of a particular certificate is mandatory.

In addition, the existing certification procedure does not correlate well with the software development cycle. As a result, consumers are faced with a choice: to buy an already outdated, but certified version of the product or an up-to-date, but not certified version. The standard way out in this situation is to purchase a certified product "on the shelf" and use the new product in a real environment (see Table 10).

Comparison results

Let's summarize the impressions of the considered DLP solutions. In general, all participants made a favorable impression and can be used to prevent information leaks. Differences in products allow you to specify the scope of their application.

The InfoWatch DLP system can be recommended to organizations for which it is fundamentally important to have a FSTEC certificate. However, the latest certified version of InfoWatch Traffic Monitor was tested at the end of 2010, and the certificate expires at the end of 2013. Agent-based solutions based on InfoWatch EndPoint Security (also known as EgoSecure) are more suitable for small businesses and can be used separately from Traffic Monitor. The combined use of Traffic Monitor and EndPoint Security can cause scaling issues in large companies.

Products of Western manufacturers (McAfee, Symantec, Websense), according to independent analytical agencies, are much less popular than Russian ones. The reason is the low level of localization. And it's not even the complexity of the interface or the lack of documentation in Russian. Features of confidential information recognition technologies, pre-configured templates and rules are "sharpened" for the use of DLP in Western countries and are aimed at fulfilling Western regulatory requirements. As a result, the quality of information recognition in Russia turns out to be noticeably worse, and compliance with the requirements of foreign standards is often irrelevant. At the same time, the products themselves are not bad at all, but the specifics of using DLP systems on the Russian market are unlikely to allow them to become more popular than domestic developments in the foreseeable future.

Zecurion DLP is notable for good scalability (the only Russian DLP system with confirmed implementation for more than 10,000 workplaces) and high technological maturity. What is surprising, however, is the lack of a web console that would help simplify the management of an enterprise solution aimed at various market segments. Zecurion DLP's strengths include high-quality confidential information recognition and a full line of leak prevention products, including protection at the gateway, workstations and servers, location detection and data encryption tools.

The Dozor-Jet DLP system, one of the pioneers of the domestic DLP market, is widely distributed among Russian companies and continues to grow its client base due to extensive connections of the Jet Infosystems system integrator, part-time and DLP developer. Although technologically DLP is somewhat behind its more powerful counterparts, its use can be justified in many companies. In addition, unlike foreign solutions, Dozor Jet allows you to archive all events and files.

Leakage channels leading to information being taken outside the company's information system can be network leaks (for example, e-mail or ICQ), local leaks (use of external USB drives), stored data (databases). Separately, you can highlight the loss of media (flash memory, laptop). A system can be attributed to the DLP class if it meets the following criteria: multi-channel (monitoring of several possible channels of data leakage); unified management (unified management tools for all monitoring channels); active protection (compliance with security policy); considering both content and context.

The competitive advantage of most systems is the analysis module. Manufacturers emphasize this module so much that they often name their products after it, for example, “Label-Based DLP Solution”. Therefore, the user often chooses solutions not based on performance, scalability, or other criteria traditional for the corporate information security market, but on the basis of the type of document analysis used.

Obviously, since each method has its advantages and disadvantages, the use of only one document analysis method makes the solution technologically dependent on it. Most manufacturers use several methods, although one of them is usually the "flagship" method. This article is an attempt to classify the methods used in the analysis of documents. Their strengths and weaknesses are assessed based on the experience of practical application of several types of products. The article fundamentally does not consider specific products, because. the main task of the user when choosing them is to eliminate marketing slogans like “we will protect everything from everything”, “unique patented technology” and realize what he will be left with when the sellers leave.

Container Analysis

This method analyzes the properties of a file or other container (archive, cryptodisk, etc.) that contains information. The colloquial name for such methods is “solutions on labels”, which quite fully reflects their essence. Each container contains a label that uniquely identifies the type of content contained within the container. The mentioned methods practically do not require computing resources to analyze the information being moved, since the label fully describes the user's rights to move content along any route. In a simplified form, such an algorithm sounds like this: “there is a label - we forbid it, there is no label - we skip it.”

The advantages of this approach are obvious: the speed of analysis and the complete absence of errors of the second kind (when the system mistakenly detects an open document as confidential). Such methods are called "deterministic" in some sources.

The disadvantages are also obvious - the system only cares about tagged information: if the tag is not set, the content is not protected. It is necessary to develop a procedure for labeling new and incoming documents, as well as a system to counter the transfer of information from a labeled container to an unlabeled one through buffer operations, file operations, copying information from temporary files, etc.

The weakness of such systems is also manifested in the organization of labeling. If they are placed by the author of the document, then by malicious intent he has the opportunity not to mark the information that he is going to steal. In the absence of malicious intent, sooner or later negligence or carelessness will appear. If you oblige to label a certain employee, for example, an information security officer or a system administrator, then he will not always be able to distinguish confidential content from open content, since he does not thoroughly know all the processes in the company. So, the "white" balance should be posted on the company's website, and the "gray" or "black" balance cannot be taken out of the information system. But only the chief accountant can distinguish one from the other, i.e. one of the authors.

Labels are usually divided into attribute, format and external. As the name suggests, the former are placed in file attributes, the latter are placed in the fields of the file itself, and the third are attached to (associated with) the file by external programs.

Container structures in IB

Sometimes the advantages of solutions based on tags are also low performance requirements for interceptors, because they only check tags, i.e. act like turnstiles in the subway: "if you have a ticket - go through." However, do not forget that miracles do not happen - in this case, the computational load is shifted to workstations.

The place of decisions on labels, whatever they may be, is the protection of document storages. When a company has a document storage, which, on the one hand, is replenished quite rarely, and on the other hand, the category and confidentiality level of each document are precisely known, then it is easiest to organize its protection using labels. You can organize the placement of labels on documents entering the repository using an organizational procedure. For example, before sending a document to the repository, the employee responsible for its functioning can contact the author and specialist with the question of what level of confidentiality to set for the document. This task is especially successfully solved with the help of format marks, i.e. each incoming document is stored in a secure format and then issued at the request of the employee, indicating it as allowed to read. Modern solutions allow you to assign access rights for a limited time, and after the expiration of the key, the document simply stops being read. It is according to this scheme that, for example, the issuance of documentation for public procurement tenders in the United States is organized: the procurement management system generates a document that can be read without the possibility of changing or copying the contents only of the tender participants listed in this document. The access key is valid only until the deadline for submitting documents to the competition, after which the document ceases to be read.

Also, with the help of solutions based on tags, companies organize document circulation in closed segments of the network in which intellectual property and state secrets are circulating. Probably, now, according to the requirements of the Federal Law “On Personal Data”, the document flow in the personnel departments of large companies will also be organized.

Content analysis

When implementing the technologies described in this section, unlike those described earlier, on the contrary, it is completely indifferent in which container the content is stored. The purpose of these technologies is to extract meaningful content from a container or intercept a transmission over a communication channel and analyze the information for prohibited content.

The main technologies in detecting prohibited content in containers are signature control, hash-based control, and linguistic methods.

Signatures

The simplest control method is to search the data stream for some sequence of characters. Sometimes a forbidden sequence of characters is called a "stop word", but in a more general case it can be represented not by a word, but by an arbitrary set of characters, for example, by the same label. In general, this method can not be attributed to content analysis in all its implementations. For example, in most devices of the UTM class, the search for forbidden signatures in the data stream occurs without extracting the text from the container, when analyzing the stream "as is". Or, if the system is configured for only one word, then the result of its work is the determination of a 100% match, i.e. method can be classified as deterministic.

However, more often the search for a specific sequence of characters is still used in text analysis. In the vast majority of cases, signature systems are configured to search for several words and the frequency of occurrence of terms, i.e. we will still refer this system to content analysis systems.

The advantages of this method include independence from the language and ease of replenishing the dictionary of prohibited terms: if you want to use this method to search for a word in Pashto in the data stream, you do not need to know this language, you just need to know how it is written. It is also easy to add, for example, transliterated Russian text or "Albanian" language, which is important, for example, when analyzing SMS texts, ICQ messages or blog posts.

The disadvantages become apparent when using a non-English language. Unfortunately, most manufacturers of text analysis systems work for the American market, and the English language is very "signature" - word forms are most often formed using prepositions without changing the word itself. In Russian, everything is much more complicated. Take, for example, the word "secret" dear to the heart of an information security officer. In English, it means both the noun "secret", the adjective "secret", and the verb "to keep secret". In Russian, several dozen different words can be formed from the root “secret”. Those. if in an English-speaking organization it is enough for an information security officer to enter one word, in a Russian-speaking organization they will have to enter a couple of dozen words and then change them in six different encodings.

In addition, such methods are unstable to primitive coding. Almost all of them give in to the favorite trick of novice spammers - replacing characters with similar ones. The author repeatedly demonstrated to security officers an elementary trick - the passage of confidential text through signature filters. A text containing, for example, the phrase "top secret" is taken, and a mail interceptor configured for this phrase. If the text is opened in MS Word, then a two-second operation: Ctrl + F, "find "o" (Russian layout)", "replace with "o" (English layout)", "replace all", "send document" - makes the document absolutely invisible to this filter. It is all the more disappointing that such a replacement is carried out by regular means of MS Word or any other text editor, i.e. they are available to the user, even if he does not have local administrator rights and the ability to run encryption programs.

Most often, signature-based flow control is included in the functionality of UTM devices, i.e. solutions that clean traffic from viruses, spam, intrusions and any other threats that are detected by signatures. Since this feature is "free", users often feel that this is enough. Such solutions really protect against accidental leaks, i.e. in cases where the outgoing text is not changed by the sender in order to bypass the filter, but they are powerless against malicious users.

masks

An extension of the functionality of the search for stopword signatures is the search for their masks. It is a search for such content that cannot be accurately specified in the base of "stop words", but its element or structure can be specified. Such information should include any codes characterizing a person or enterprise: TIN, account numbers, documents, etc. It is impossible to search for them using signatures.

It is unreasonable to set the number of a specific bank card as a search object, but you want to find any credit card number, no matter how it is written - with spaces or together. This is not just a desire, but a requirement of the PCI DSS standard: it is forbidden to send unencrypted plastic card numbers by e-mail, i.e. it is the user's responsibility to find such numbers in the e-mail and discard prohibited messages.

Here, for example, is a mask that specifies a stop word such as the name of a confidential or secret order, the number of which starts from zero. The mask takes into account not only an arbitrary number, but also any case and even the substitution of Russian letters for Latin ones. The mask is written in the standard "REGEXP" notation, although different DLP systems may have their own, more flexible notation. The situation is even worse with phone numbers. This information is classified as personal data, and you can write it in a dozen ways - using different combinations of spaces, different types of brackets, plus and minus, etc. Here, perhaps, a single mask is indispensable. For example, in anti-spam systems, where a similar task has to be solved, several dozen masks are used simultaneously to detect a telephone number.

Many different codes inscribed in the activities of companies and their employees are protected by many laws and represent commercial secrets, banking secrets, personal data and other legally protected information, so the problem of detecting them in traffic is a prerequisite for any solution.

Hash functions

Various types of hash functions for samples of confidential documents were at one time considered a new word in the leak protection market, although the technology itself has been around since the 1970s. In the West, this method is sometimes called "digital fingerprints", i.e. "digital fingerprints", or "shindles" in scientific slang.

The essence of all methods is the same, although specific algorithms for each manufacturer may differ significantly. Some algorithms are even patented, which confirms the uniqueness of the implementation. The general scenario of action is as follows: a database of samples of confidential documents is being collected. An “imprint” is taken from each of them, i.e. meaningful content is extracted from the document, which is reduced to some normal, for example (but not necessarily) text form, then the hashes of all content and its parts, such as paragraphs, sentences, fives of words, etc., are removed, the detail depends on the specific implementation. These fingerprints are stored in a special database.

The intercepted document is cleared of service information in the same way and brought to a normal form, then fingerprints-shindles are removed from it using the same algorithm. Received prints are searched in the database of prints of confidential documents, and if found, the document is considered confidential. Since this method is used to find direct quotes from a sample document, the technology is sometimes called "anti-plagiarism".

Most of the advantages of this method are also its disadvantages. First of all, this is the requirement to use sample documents. On the one hand, the user does not have to worry about stop words, significant terms and other information that is completely non-specific for security officers. On the other hand, "no pattern, no protection" poses the same problems with new and incoming documents as with label-based technologies. A very important advantage of this technology is its focus on working with arbitrary character sequences. From this follows, first of all, independence from the language of the text - even hieroglyphs, even Pashto. Further, one of the main consequences of this property is the possibility of taking fingerprints from non-textual information - databases, drawings, media files. It is these technologies that Hollywood studios and world recording studios use to protect media content in their digital storages.

Unfortunately, low-level hash functions are not robust to the primitive encoding discussed in the signature example. They easily cope with changing the order of words, rearranging paragraphs and other tricks of "plagiarists", but, for example, changing letters throughout the document destroys the hash pattern and such a document becomes invisible to the interceptor.

Using only this method complicates the work with forms. So, an empty loan application form is a freely distributed document, and a completed one is confidential, since it contains personal data. If you simply take a fingerprint from an empty form, then the intercepted completed document will contain all the information from the empty form, i.e. The prints will largely match. Thus, the system will either let confidential information through or prevent empty forms from being freely distributed.

Despite the shortcomings mentioned, this method is widely used, especially in a business that cannot afford qualified employees, but operates on the principle of "put all confidential information in this folder and sleep well." In this sense, requiring specific documents to protect them is somewhat similar to solutions based on labels, only stored separately from samples and preserved when changing the file format, copying part of the file, etc. However, a large business that has hundreds of thousands of documents in circulation is often simply not able to provide samples of confidential documents, because. the company's business processes do not require it. The only thing that is (or, more honestly, should be) in every enterprise is the "List of information constituting a trade secret." Making patterns out of it is not a trivial task.

The ease of adding samples to a controlled content database often plays tricks on users. This leads to a gradual increase in the fingerprint base, which significantly affects the performance of the system: the more samples, the more comparisons of each intercepted message. Since each print takes up from 5 to 20% of the original, the base of prints gradually grows. Users notice a sharp drop in performance when the database begins to exceed the filtering server's RAM. Usually the problem is solved by regularly auditing sample documents and removing outdated or duplicate samples, i.e. saving on implementation, users lose on operation.

Linguistic Methods

The most common method of analysis today is the linguistic analysis of the text. It is so popular that it is often referred to colloquially as "content filtering". carries the characteristics of the entire class of content analysis methods. From the point of view of classification, both hash analysis, and signature analysis, and mask analysis are “content filtering”, i.e. traffic filtering based on content analysis.

As the name implies, the method works only with texts. You will not use it to protect a database consisting only of numbers and dates, especially drawings, drawings and a collection of favorite songs. But with texts, this method works wonders.

Linguistics as a science consists of many disciplines - from morphology to semantics. Therefore, linguistic methods of analysis also differ from each other. There are methods that use only stop words, only entered at the root level, and the system itself already compiles a complete dictionary; there are terms based on the distribution of weights encountered in the text. There are linguistic methods and their imprints based on statistics; for example, a document is taken, the fifty most used words are counted, then the 10 most used words in each paragraph are selected. Such a "dictionary" is an almost unique characteristic of the text and allows you to find meaningful quotes in "clones".

Analysis of all the subtleties of linguistic analysis is not within the scope of this article, so we will focus on the advantages and disadvantages.

The advantage of the method is complete insensitivity to the number of documents, i.e. rare for corporate information security scalability. The content filtering base (a set of key vocabulary classes and rules) does not change in size depending on the appearance of new documents or processes in the company.

In addition, users note in this method the similarity with "stop words" in that if the document is delayed, then it is immediately clear why this happened. If a fingerprint-based system reports that a document is similar to another, then the security officer will have to compare the two documents himself, and in linguistic analysis he will receive already marked up content. Linguistic systems along with signature filtering are so common because they allow you to start working without changes in the company immediately after installation. There is no need to bother with tagging and fingerprinting, inventorying documents and doing other non-specific work for a security officer.

The disadvantages are just as obvious, and the first one is language dependence. In each country whose language is supported by the manufacturer, this is not a disadvantage, but from the point of view of global companies that have, in addition to a single language of corporate communication (for example, English), many more documents in local languages in each country, this is a clear disadvantage.

Another drawback is a high percentage of type II errors, which requires a qualification in the field of linguistics (to fine-tune the filtering base) to reduce it. Standard industry databases typically give 80-85% filtering accuracy. This means that every fifth or sixth letter is intercepted by mistake. Setting the base to an acceptable 95-97% accuracy is usually associated with the intervention of a specially trained linguist. And although to learn how to adjust the filtering base, it is enough to have two days of free time and speak the language at the level of a high school graduate, there is no one to do this work, except for a security officer, and he usually considers such work to be non-core. Attracting a person from outside is always risky - after all, he will have to work with confidential information. The way out of this situation is usually to buy an additional module - a self-learning "autolinguist", which is "fed" with false positives, and it automatically adapts the standard industry base.

Linguistic methods are chosen when they want to minimize interference in the business, when the information security service does not have the administrative resource to change the existing processes for creating and storing documents. They work always and everywhere, albeit with the disadvantages mentioned.

Popular channels of accidental leaks mobile storage media

InfoWatch analysts believe that mobile media (laptops, flash drives, mobile communicators, etc.) remain the most popular channel for accidental leaks, since users of such devices often neglect data encryption tools.

Another common cause of accidental leaks is paper media: it is more difficult to control than electronic media, since, for example, after a sheet has left the printer, it can only be monitored “manually”: control over paper media is weaker than control over computer information. Many leak protection tools (you can’t call them full-fledged DLP systems) do not control the output channel of information to the printer, so confidential data easily leaks out of the organization.

This problem can be solved by multifunctional DLP systems that block the sending of unauthorized information for printing and check the correspondence of the postal address and the addressee.

In addition, the growing popularity of mobile devices makes it much more difficult to protect against leaks, because there are no corresponding DLP clients yet. In addition, it is very difficult to detect a leak in the case of cryptography or steganography. An insider, in order to bypass some kind of filter, can always turn to the Internet for “best practices”. That is, DLP-means protect quite poorly from an organized intentional leak.

The effectiveness of DLP tools can be hampered by their obvious flaws: modern leak protection solutions do not allow you to control and block all available information channels. DLP systems will monitor corporate email, web usage, instant messaging, external media, document printing, and hard drive content. But Skype remains out of control for DLP systems. Only Trend Micro has managed to declare that it can control the operation of this communication program. The remaining developers promise that the corresponding functionality will be provided in the next version of their security software.

But if Skype promises to open its protocols to DLP developers, other solutions, such as Microsoft Collaboration Tools for organizing collaboration, remain closed to third-party programmers. How to control the transmission of information over this channel? Meanwhile, in the modern world, the practice is developing when specialists remotely unite into teams to work on a common project and disintegrate after its completion.

The main sources of leaks of confidential information in the first half of 2010 are still commercial (73.8%) and government (16%) organizations. About 8% of leaks come from educational institutions. The nature of the leaking confidential information is personal data (almost 90% of all information leaks).

The leaders in leaks in the world are traditionally the United States and Great Britain (also Canada, Russia and Germany with significantly lower figures are in the top five countries by the largest number of leaks), which is due to the peculiarity of the legislation of these countries, which requires reporting all incidents of leakage of confidential data. Analysts at Infowatch predict a decrease in the share of accidental leaks and an increase in the share of intentional leaks next year.

Implementation difficulties

In addition to the obvious difficulties, the implementation of DLP is also hampered by the difficulty of choosing the right solution, since various vendors of DLP systems profess their own approaches to the organization of protection. Some have patented algorithms for analyzing content by keywords, while others offer a method of digital fingerprints. How to choose the best product in these conditions? What is more efficient? It is very difficult to answer these questions, since there are very few implementations of DLP systems today, and there are even fewer real practices for their use (which one could rely on). But those projects that were nevertheless implemented showed that consulting accounts for more than half of the scope of work and budget, and this usually causes great skepticism among management. In addition, as a rule, existing business processes of the enterprise have to be restructured to meet the requirements of DLP, and companies are having difficulty doing this.

To what extent does the introduction of DLP help to comply with the current requirements of regulators? In the West, the introduction of DLP systems is motivated by laws, standards, industry requirements and other regulations. According to experts, clear legal requirements available abroad, guidelines for ensuring requirements are the real engine of the DLP market, since the introduction of special solutions eliminates claims from regulators. We have a completely different situation in this area, and the introduction of DLP systems does not help to comply with the law.

Some incentive for the introduction and use of DLP in a corporate environment may be the need to protect the trade secrets of companies and comply with the requirements of the federal law "On Trade Secrets".

Almost every enterprise has adopted such documents as the “Regulations on trade secrets” and “List of information constituting a trade secret”, and their requirements should be followed. There is an opinion that the law "On Trade Secrets" (98-FZ) does not work, however, company executives are well aware that it is important and necessary for them to protect their trade secrets. Moreover, this awareness is much higher than the understanding of the importance of the law “On Personal Data” (152-FZ), and it is much easier for any manager to explain the need to introduce confidential document management than to talk about the protection of personal data.

What prevents the use of DLP in the process of automating the protection of trade secrets? According to the Civil Code of the Russian Federation, in order to introduce a trade secret protection regime, it is only necessary that the information has some value and be included in the appropriate list. In this case, the owner of such information is required by law to take measures to protect confidential information.

At the same time, it is obvious that DLP will not be able to solve all issues. In particular, cover access to confidential information to third parties. But there are other technologies for this. Many modern DLP solutions can integrate with them. Then, when building this technological chain, a working system for protecting trade secrets can be obtained. Such a system will be more understandable for the business, and it is the business that will be able to act as the customer of the leak protection system.

Russia and the West

According to analysts, Russia has a different attitude towards security and a different level of maturity of companies supplying DLP solutions. The Russian market focuses on security specialists and highly specialized problems. Data breach prevention people don't always understand what data is valuable. In Russia, a "militarist" approach to the organization of security systems: a strong perimeter with firewalls and every effort is made to prevent penetration inside.

But what if a company employee has access to an amount of information that is not required to perform his duties? On the other hand, if we look at the approach that has been formed in the West in the last 10-15 years, we can say that more attention is paid to the value of information. Resources are directed to where valuable information is located, and not to all the information in a row. Perhaps this is the biggest cultural difference between the West and Russia. However, analysts say the situation is changing. Information is beginning to be perceived as a business asset, and it will take some time to evolve.

There is no comprehensive solution

100% leak protection has not yet been developed by any manufacturer. The problems with using DLP products are formulated by some experts as follows: effective use of the experience of dealing with leaks used in DLP systems requires the understanding that a lot of work on providing leak protection must be done on the customer's side, since no one knows better than him information flows.

Others believe that it is impossible to protect against leaks: it is impossible to prevent information leakage. Since the information is of value to someone, it will be received sooner or later. Software tools can make obtaining this information more costly and time consuming. This can significantly reduce the benefit of possessing information, its relevance. This means that the efficiency of DLP systems should be monitored.

Today, the DLP systems market is one of the fastest growing among all information security tools. However, the domestic information security sphere has not quite kept pace with global trends, and therefore the DLP systems market in our country has its own characteristics.

What is DLP and how do they work?

Before talking about the market of DLP-systems, it is necessary to decide what, in fact, is meant when it comes to such solutions. DLP systems are commonly understood as software products that protect organizations from leaks of confidential information. The abbreviation DLP itself stands for Data Leak Prevention, that is, the prevention of data leaks.

Such systems create a secure digital "perimeter" around the organization, analyzing all outgoing, and in some cases, incoming information. Controlled information should be not only Internet traffic, but also a number of other information flows: documents that are taken outside the protected security loop on external media, printed on a printer, sent to mobile media via Bluetooth, etc.

Since the DLP system must prevent leakage of confidential information, it necessarily has built-in mechanisms for determining the degree of confidentiality of a document detected in intercepted traffic. As a rule, two methods are most common: by parsing special document markers and by parsing the contents of the document. Currently, the second option is more common because it is resistant to modifications made to the document before it is sent, and also allows you to easily expand the number of confidential documents that the system can work with.

"Side" DLP tasks

In addition to their main task related to the prevention of information leaks, DLP systems are also well suited for solving a number of other tasks related to the control of personnel actions.

Most often, DLP systems are used to solve the following non-core tasks for themselves:

monitoring the use of working time and working resources by employees;
monitoring the communication of employees in order to identify "undercover" struggles that can harm the organization;
control over the legitimacy of employees' actions (prevention of printing counterfeit documents, etc.);
identification of employees who send out resumes for the rapid search for specialists for a vacant position.

Due to the fact that many organizations consider a number of these tasks (especially control of the use of working time) to be a higher priority than protection against information leaks, a number of programs have arisen that are designed specifically for this, but in some cases can also work as a means of protecting the organization from leaks. . What distinguishes such programs from full-fledged DLP systems is the lack of advanced tools for analyzing intercepted data, which must be done manually by an information security specialist, which is convenient only for very small organizations (up to ten controlled employees).