EPOQUE-BNS:
From on-line search to document delivery

During the last decade the world's major patent offices have been deeply involved in ambitious programmes to auto- mate their activities. They spend an average of ten to fifteen percent of their annual budget on this automation, a surprisingly low ratio given that the principal activity of a patent office is to process information.
A primary target for automation is the field of patent documentation and prior art searching. It is widely acknowledged that traditional search methods based on classified paper collections have reached the limits of their capacity. Approximately one million patent documents are added to the existing collection each year. This inevitably results in longer search times and higher costs, since examiners have more and more information and documents to contend with. Subdividing the patent collection into even smaller groups by means of a more refined classification system is only a temporary solution. The EPO's internal classification system already contains 160 000 sub-divisions.
Moreover, an increasing number of "cross-disciplinary" applications extend over several technical fields at once. Electronic games are an example of this trend. Inventions of this kind mean that more groups of documents have to be consulted, and make classification a much more haphazard business than in the past.
Finally, applicants now have sophisticated search tools at their disposal in a wide range of on-line systems which they use with great skill. They thus have the ability and the technical wherewithal to make an accurate assessment of the state of the art before filing their applications. The credibility of the patent offices would be directly called into question if their attachment to traditional methods rendered them unable to obtain results as good as, if not better than, those achieved by the applicants themselves.
Characteristics of patent literature. Patent documents include large numbers of figures, diagrams or drawings which are essential to the understanding of their content. The associated text is frequently written in a highly esoteric technical language which makes it difficult to read. Both these aspects have to be taken into account when designing an automated search system. A search system based solely on full-text searching will be of very limited use. Text has to be supplemented by graphics. The user must be able to switch between text and images easily and intuitively. Navigating within the document, moving from one figure to the text passage describing it, and vice versa, jumping to another part of the text referring to the same figure, displaying the next figure, etc. are mandatory features of any electronic search tool.
Electronic searching also has to be interactive, and, taking paper searches as a model, to follow an iterative rather than a sequential logic. Automated systems based on a sequential consultation of a large number of documents do not and cannot meet the needs of examiners.
Finally, it has to be remembered that search methods vary in subtle and significant ways from one technical field to another. This is already the case with traditional tools, and it is the more so with their electronic equivalents. The user- friendliness and performance of automated information systems will therefore depend to some extent on the technical field involved.
Creating the EPO's electronic documentation. With these considerations in mind, the first stage in the automation process consists of capturing the data and storing it in electronic form - creating very large databases. In co-operation with the member states and with its trilateral partners in the USA and Japan, the European Patent Office has invested heavily in this area. We now have the following resources at our disposal: The information stored in this way is available to all users via:
Non-patent literature is searched by accessing: o

During 1997 the EPO intends to approach other publishers with a view to enlarging its existing in-house collection to include these publishers' full-text and facsimile data. The EPO's aim is a simple one - to build up an electronic information system equal in size to, or more extensive than, traditional paper documentation.

Retrieving documents. The second stage in the automation process involves the development of systems capable of providing on-line access to all this stored data. Here, the first task is to sift through the vast stock of data and pinpoint the relatively small number of documents - a kind of "virtual" search group, totalling between 100 and 200 documents - which are likely to be relevant to the subject-matter being searched for a specific prior art. This material then has to be analysed quickly to eliminate the useless or redundant items. Finally, the examiner has to scrutinise the remaining 10 to 20 documents that may eventually be cited in the final search report.
The EPOQUE - or EPO QUEry - system covers the first two phases of the search process, ie the creation of virtual groups and analysis of retrieved documents. BNS - the BACON Numerical Service - moves forward into stage three, ie detailed study and identification of the most relevant documents.
EPOQUE was developed by the Questel-Bertelsmann consortium according to specifications laid down by the Office. Using a powerful standard query language, it allows the examiner to consult any record, document or part of a document stored in the Office's internal databases. Its advanced interface offers maximum flexibility in searching - and also enables the user to access the databases of the major commercial hosts using the same query language. This means that the world's entire stock of scientific and technical knowledge is accessible from a single workstation in each examiner's office. Using appropriate search strategies, involving bibliographic information, classification codes, keywords, free text words, indexing codes, etc., it is therefore possible to make an initial selection of documents for further study. EPOQUE RETRIEVAL provides this search tool.

Navigating and positioning within a document. EPOQUE goes much further than this however. Having marshalled a set of documents, the examiner has to be able to link them up on-screen and consult them in a fast, non-sequential manner. This involves navigating from one document or portion of a document to another, moving to and fro between text and drawings, or between particular passages or paragraphs, while any graphics are displayed simultaneously. The user can highlight strings and search passages for a specific word or expression, thus improving the positioning at the relevant parts of the document. The interaction between examiner and system is total, and the content is con- tinually enhanced as the user structures the documents and marks individual sections - drawings, tables, examples or whole paragraphs - for further reference. These aspects of the system's functionality, offering better visualisation, navigation and positioning in a document, make it quicker and easier to pick out and analyse a batch of relevant documents. EPOQUE VIEWER provides this display tool.

Accessing the full document. BNS was developed by the Storagetek-Infotel consortium and Bertelsmann, and as a huge document server it gives near-line access to the entire numerical collection of 25 million facsimile documents. This collection ideally complements the character-coded collection used by the EPOQUE VIEWER by covering more than the PCT minimum collection and because it stores documents published since 1920. The examiner can now navigate through "predefined" sub-parts, such as claims, description, figures, etc. Additionally, BNS is used to print documents for updating the classified paper collections (which are still kept) and for the search reports sent to the applicant and other patent offices.

Learning curve and system use. EPOQUE I came into service in 1990 and since then has been progressively improved to include, among other things, access to first-page images. EPOQUE VIEWER - as described above - came on-line in early 1995, and BNS followed in early 1996. This "suite" of electronic search tools constitutes one of the most powerful automated documentation systems in the world.
The massive response to the EPOQUE-BNS system is an indication of the success of the EPO's automation plan and its acceptance by users.

As it stands at present, EPOQUE-BNS is the only system in the world offering a paperless search that includes the PCT minimum documentation and hence complies with the quality standards of the EPO's search guidelines. Other search systems do exist - in other patent offices and in the commercial sector - but these are limited either in the number of users, or in the coverage their databases provide (full-text, facsimile images or specific countries).
Seen in terms of the number of users able to access the system at any one given time and the enormous amount of data available, the real benefit of the EPO's electronic search tool lies in its price. The cost of development, maintenance and upgrading of EPOQUE-BNS since 1988 totals DEM 40 million, most of which has already been depreciated. The combined running cost of EPOQUE and BNS is DEM 20 million a year. DEM 25 million a year are saved in manpower, reduced access to external commercial databases and increased search productivity.
Search productivity has increased because of a radical change in search methods. Examiners adjust their working methods to the new search tools as part of a "learning curve". What we must do is ensure that they exploit the system's potential to the full. The results obtained with EPOQUE and BNS are both satisfactory and encouraging.
The entire system is no more than the first generation of modern electronic search tools however. Permanent monitoring of developments in documentation technology and regular system upgrades are an on-going process. A study is already being conducted to look into the potential for integrating Internet technology into the EPOQUE system applying the Intranet concept. Equally vital is that we continue to monitor the progress made in the storage technology which has already enabled the Office to load an immense amount of data at a very reasonable cost.
The national patent offices of the EPO's member states have - and will continue to have - access to these tools, not only as instruments of patent information policy, but also as adjuncts to their own patent procedures. The databases created and administered by the EPO offer tremendous potential for improving the dissemination of patent information among all actors on the economic stage - notably the small and medium-sized sector.