-
EPOQUE-BNS:
From on-line search to document delivery
During the last decade the world's major patent offices have been
deeply involved in ambitious programmes to auto- mate their activities. They
spend an average of ten to fifteen percent of their annual budget on this
automation, a surprisingly low ratio given that the principal activity of a
patent office is to process information.
A primary target for automation is the field of patent documentation and
prior art searching. It is widely acknowledged that traditional search methods
based on classified paper collections have reached the limits of their capacity.
Approximately one million patent documents are added to the existing collection
each year. This inevitably results in longer search times and higher costs,
since examiners have more and more information and documents to contend with.
Subdividing the patent collection into even smaller groups by means of a more
refined classification system is only a temporary solution. The EPO's internal
classification system already contains 160 000 sub-divisions.
Moreover, an increasing number of "cross-disciplinary"
applications extend over several technical fields at once. Electronic games are
an example of this trend. Inventions of this kind mean that more groups of
documents have to be consulted, and make classification a much more haphazard
business than in the past.
Finally, applicants now have sophisticated search tools at their disposal
in a wide range of on-line systems which they use with great skill. They thus
have the ability and the technical wherewithal to make an accurate assessment of
the state of the art before filing their applications. The credibility of the
patent offices would be directly called into question if their attachment to
traditional methods rendered them unable to obtain results as good as, if not
better than, those achieved by the applicants themselves.
Characteristics of patent literature. Patent documents include
large numbers of figures, diagrams or drawings which are essential to the
understanding of their content. The associated text is frequently written in a
highly esoteric technical language which makes it difficult to read. Both these
aspects have to be taken into account when designing an automated search system.
A search system based solely on full-text searching will be of very limited use.
Text has to be supplemented by graphics. The user must be able to switch between
text and images easily and intuitively. Navigating within the document, moving
from one figure to the text passage describing it, and vice versa, jumping to
another part of the text referring to the same figure, displaying the next
figure, etc. are mandatory features of any electronic search tool.
Electronic searching also has to be interactive, and, taking paper searches
as a model, to follow an iterative rather than a sequential logic. Automated
systems based on a sequential consultation of a large number of documents do not
and cannot meet the needs of examiners.
Finally, it has to be remembered that search methods vary in subtle and
significant ways from one technical field to another. This is already the case
with traditional tools, and it is the more so with their electronic equivalents.
The user- friendliness and performance of automated information systems will
therefore depend to some extent on the technical field involved.
Creating the EPO's electronic documentation. With these
considerations in mind, the first stage in the automation process consists of
capturing the data and storing it in electronic form - creating very large
databases. In co-operation with the member states and with its trilateral
partners in the USA and Japan, the European Patent Office has invested heavily
in this area. We now have the following resources at our disposal:
- bibliographic data, including patent family references, in respect of
patent documents published since 1968 - the possibility of adding the data of
documents published since 1920 is being looked into
- classification codes, keywords and other indexed data in the classified
collections of the PCT minimum documents
- abstract data, for at least one member per family, published since 1970
- texts of one member per family of all PCT documents published since 1970,
captured in character-coded form, together with their corresponding drawings in
facsimile form - the possibility of adding documents published prior to 1970 in
certain technical fields is being looked into
- - facsimile images of all documents published since 1920 in the USA and
Japan, and in the member states of WIPO and the EPO.
The information
stored in this way is available to all users via:
- 51 bibliographic databases
- our in-house, licensed copy of the DERWENT database
- 13 full-text patent databases totalling some 60 million searchable records
(early 1997), and a further o 21 image databases for more than six million
documents (early 1997) - arranged by country of publication and different types
of images (first-page clipped images, embedded images and drawings)
- o the full BNS (BACON Numerical Service) collection of facsimile documents
with its 25 million documents (early 1997).
- Non-patent literature is searched by accessing: o
- in-house, licensed copies of abstract databases such as INSPEC, FSTA,
COMPENDEX
- four non-patent-literature, full-text databases (under development by the
publishers) - each containing an electronic copy of the full text of the
journals previously supplied in paper form by specific publishers (IEE, IEEE,
Elsevier Science Publishers, AIP) together with their facsimile images
- the full text and clipped drawings of the IBM Technical Disclosure Bulletin
- external commercial hosts offering hundreds of other databases, by far the
most used being the Chemical Abstracts database
- full facsimile copies of all articles classified by EPO examiners since
1989, with backfiles extending back to 1980 in selected technical areas
During 1997 the EPO intends to approach other publishers with a view to
enlarging its existing in-house collection to include these publishers'
full-text and facsimile data. The EPO's aim is a simple one - to build up an
electronic information system equal in size to, or more extensive than,
traditional paper documentation.
Retrieving documents. The second stage in the automation process
involves the development of systems capable of providing on-line access to all
this stored data. Here, the first task is to sift through the vast stock of data
and pinpoint the relatively small number of documents - a kind of "virtual"
search group, totalling between 100 and 200 documents - which are likely to be
relevant to the subject-matter being searched for a specific prior art. This
material then has to be analysed quickly to eliminate the useless or redundant
items. Finally, the examiner has to scrutinise the remaining 10 to 20 documents
that may eventually be cited in the final search report.
The EPOQUE - or EPO QUEry - system covers the first two phases of the
search process, ie the creation of virtual groups and analysis of retrieved
documents. BNS - the BACON Numerical Service - moves forward into stage three,
ie detailed study and identification of the most relevant documents.
EPOQUE was developed by the Questel-Bertelsmann consortium according to
specifications laid down by the Office. Using a powerful standard query
language, it allows the examiner to consult any record, document or part of a
document stored in the Office's internal databases. Its advanced interface
offers maximum flexibility in searching - and also enables the user to access
the databases of the major commercial hosts using the same query language. This
means that the world's entire stock of scientific and technical knowledge is
accessible from a single workstation in each examiner's office. Using
appropriate search strategies, involving bibliographic information,
classification codes, keywords, free text words, indexing codes, etc., it is
therefore possible to make an initial selection of documents for further study.
EPOQUE RETRIEVAL provides this search tool.
Navigating and positioning within a document. EPOQUE goes much
further than this however. Having marshalled a set of documents, the examiner
has to be able to link them up on-screen and consult them in a fast,
non-sequential manner. This involves navigating from one document or portion of
a document to another, moving to and fro between text and drawings, or between
particular passages or paragraphs, while any graphics are displayed
simultaneously. The user can highlight strings and search passages for a
specific word or expression, thus improving the positioning at the relevant
parts of the document. The interaction between examiner and system is total, and
the content is con- tinually enhanced as the user structures the documents and
marks individual sections - drawings, tables, examples or whole paragraphs - for
further reference. These aspects of the system's functionality, offering better
visualisation, navigation and positioning in a document, make it quicker and
easier to pick out and analyse a batch of relevant documents. EPOQUE VIEWER
provides this display tool.
Accessing the full document. BNS was developed by the
Storagetek-Infotel consortium and Bertelsmann, and as a huge document server it
gives near-line access to the entire numerical collection of 25 million
facsimile documents. This collection ideally complements the character-coded
collection used by the EPOQUE VIEWER by covering more than the PCT minimum
collection and because it stores documents published since 1920. The examiner
can now navigate through "predefined" sub-parts, such as claims,
description, figures, etc. Additionally, BNS is used to print documents for
updating the classified paper collections (which are still kept) and for the
search reports sent to the applicant and other patent offices.
Learning curve and system use. EPOQUE I came into service in
1990 and since then has been progressively improved to include, among other
things, access to first-page images. EPOQUE VIEWER - as described above - came
on-line in early 1995, and BNS followed in early 1996. This "suite" of
electronic search tools constitutes one of the most powerful automated
documentation systems in the world.
The massive response to the EPOQUE-BNS system is an indication of the
success of the EPO's automation plan and its acceptance by users.
More
than 2 000 different users access the system every month. The tables shown here
reveal a growth in the number of users as the system is increasingly accessed
by the member states.
- The number of connect hours to the EPOQUE RETRIEVAL tool continues to grow
and totalled 116000 active on-line search hours in 1996.
- More than 45 million documents or parts of documents were displayed by one
of the three parts of the EPOQUE-BNS system in 1996.
- 16 million documents were displayed using EPOQUE VIEWER alone, with
examiners using EPOQUE VIEWER to navigate through text and drawings of
approximately 80 documents per user per day
- 76 000 BNS documents are consulted for search purposes each month, the
equivalent of at least 4.5 documents per examiner per day.
As it stands at present, EPOQUE-BNS is the only system in the world
offering a paperless search that includes the PCT minimum documentation and
hence complies with the quality standards of the EPO's search guidelines. Other
search systems do exist - in other patent offices and in the commercial sector -
but these are limited either in the number of users, or in the coverage their
databases provide (full-text, facsimile images or specific countries).
Seen in terms of the number of users able to access the system at any one
given time and the enormous amount of data available, the real benefit of the
EPO's electronic search tool lies in its price. The cost of development,
maintenance and upgrading of EPOQUE-BNS since 1988 totals DEM 40 million, most
of which has already been depreciated. The combined running cost of EPOQUE and
BNS is DEM 20 million a year. DEM 25 million a year are saved in manpower,
reduced access to external commercial databases and increased search
productivity.
Search productivity has increased because of a radical change in search
methods. Examiners adjust their working methods to the new search tools as part
of a "learning curve". What we must do is ensure that they exploit the
system's potential to the full. The results obtained with EPOQUE and BNS are
both satisfactory and encouraging.
The entire system is no more than the first generation of modern electronic
search tools however. Permanent monitoring of developments in documentation
technology and regular system upgrades are an on-going process. A study is
already being conducted to look into the potential for integrating Internet
technology into the EPOQUE system applying the Intranet concept. Equally vital
is that we continue to monitor the progress made in the storage technology which
has already enabled the Office to load an immense amount of data at a very
reasonable cost.
The national patent offices of the EPO's member states have - and will
continue to have - access to these tools, not only as instruments of patent
information policy, but also as adjuncts to their own patent procedures. The
databases created and administered by the EPO offer tremendous potential for
improving the dissemination of patent information among all actors on the
economic stage - notably the small and medium-sized sector.