Digital libraries: advanced methods and technologies, digital collections
Enter into
review system
Nick/e-mail password

Relevant topics
Important dates

Russian Academy of Sciences

Russian Foundation for Basic Research

HP Labs

Overview RCDL 2007 Program Details

October, 15

9.00 (Hall C ) Arrival and registration

14.00 (Hall A ) Conference Opening

14.00 – 14.30. S. Abramov, I. Nekrestyanov, S. Znamenskij. Welcome (30 m.).

Welcome to the conferece!

14.30 – 15.00. P. Mehra, I. Belousov. HP Labs in Russia: Overview of Research and Collaboration Agenda (30 m.).

Information Management, Digital Archives & Digital Libraries activity of HP Labs

15.00 (Hall C ) Coffee-Break

15.15 (Hall A ) RCDL Tutorials.

15.15 – 17.15. Th. Risse. Approaches for large scale digital library infrastructures (2 h. // Vol. 1, p. ).

Abstract. Current plans for next generation DL architectures are aiming for a transition from the DL as an integrated, centrally controlled system to a large scale federation of DL services and information collections. The transition is driven by DL "market" needs and inspired by new technology trends that promise to solve at least part of these market needs. With the uptake of DLs in a wider community there is a need for better and adaptive tailoring of the content and service offer of a DL to the needs of the respective community as well as to the current service and content offer. Furthermore, there is a need for more systematic exploitation of existing resources like information collections, metadata collections, and services for making DLs more cost-effective as well as a need for opening up of DL technology to a wider. New technologies and paradigms like Peer-to-Peer networking and Service-oriented Architectures (SOA) suggest digital libraries that operate on more demand-oriented and flexible distributed or decentralized infrastructures. The tutorial aims to introduce to the audience various central aspects of bringing digital libraries to large scale infrastructures by discussing core ideas and related architectural options. Furthermore, it introduces the underlying technologies as a foundation for the understanding of the concrete solutions. The main part of the tutorial revolves around the following selected DL topics:

  • Content and Metadata Management
  • Navigation through the information space

For each of the topics the key challenges are discussed together with possible solutions for the challenges and the lessons learned in implementing these solutions. The solutions are illustrated with concrete examples and small system demos from the BRICKS project.

17.15 (Hall C ) Coffee-Break

17.30 (Hall A ) RCDL Tutorials.

17.30 – 19.30. G. Amato, P. Bolettieri, F. Debole, F. Falchi, C. Gennaro, F. Rabitti, P. Savino. A Tutorial on the MILOS Multimedia Content Management System (2 h. // Vol. 1, p. 16-34).

Abstract. In this paper we present the MILOS Multimedia Content Management System. MILOS supports the storage and content based retrieval of any multimedia documents whose descriptions are provided by using arbitrary metadata models represented in XML. It provides developers of digital library applications with functionalities for dealing with heterogeneous digital documents, heterogeneous metadata, and metadata schema mapping. This paper shows how to configure and use all MILOS components.

October, 16

9.30 (Hall A ) Scientific Digital Libraries.
Scientometrics and Evaluation of Innovation Potential

9.30 – 10.00. I. Zatsman, S. Shubnikov. Processing Principles of Information Resources for Evaluation of Innovative Potential of Science Fields (30 m. // Vol. 1, p. 35-44).

Abstract. The paper is devoted to a problem of an evaluation of innovative potential of science fields with use of information of patent electronic libraries, access to which is given by Rospatent. The structure and filling of patent documents is analyzed. It is shown, that modern researches of S&T interaction (which science fields are most cited in patents?) are based on a processing of patent documents, containing references to scientific papers. Researches of S&T interaction clarify a number of requirements to methodology of processing of patent documents, and also to a degree of elaboration of schemes for patent electronic libraries. One of these requirements consists in additional structurization of references to scientific papers in patent documents.

10.00 – 10.30. M. Kogalovsky, S. Parinov. Socionet Information Resources, Scientometrics and Metadata Quality Indicators (30 m. // Vol. 1, p. 45-54).

Abstract. The Socionet system is the first Russian professional social network for education and research areas. It is also a platform to build e-Social Sciences infrastructure at national level. Socionet information resources are integrated into national Common Scientific Information Space of Russian Academy of Sciences and have relations with international e-Science online infrastructure. The Socionet system used some ideas, methods and tools of RePEc (Research Papers in Economics) project and OAI (Open Archives Initiative). An added value of the Socionet is advanced architecture of relations among different types of information objects, which allows better navigation, research performance indexes, scientometric analysis and metadata quality indicators. The proposed article describes general features of the Socionet and its methods of information resources integration. The article also discusses possible implementations of scientometrics based on Socionet statistics and a problem of metadata quality measurements.

10.30 – 11.00. B. Cruz, P. Blesa, T. Krichel, J. Osca-Lluch, E. Velasco. Evaluation of INCISO: A system for automatic elaboration of a Citation Index in Social Science Spanish Journal (30 m. // Vol. 1, p. 55-58).

Abstract. We have developed a system that can elaborate a citation index in an automated way. It has been tested with Spanish journals. We need evaluate our system, mainly in effectiveness of the retrieval of citations. Criteria for evaluation of the system is presented and discussed, and the results of the application to our system are showed and analyzed.

11.00 (Hall C ) Posters with Coffee

S. Tarasov. The automated system of construction the thesaurus (poster // Vol. 2, p. 63-66).

Abstract. In article practical aspects of creating of the automated system of construction of the thesaurus are considered. Major problems of the organization of modern thesauruses and methods of their construction are defined. The problem of automatic generation of thesauruses on the basis of linguistic sources of different type is examined: results of the analysis of cases of texts, definitions of the explanatory dictionaries, the given associative dictionaries, etc. The offered architecture of the automated system of generation of thesauruses for any subject domain on the basis of linguistic sources and by means of participation independent assessors is described.

E. Rabchevsky. Automatic construction of ontologies (poster // Vol. 2, p. 37-40).

Abstract. This paper is devoted to automation of ontology's construction process that based on linguistic analysis of the Web resources mounted knowledge. Author offer a method of formal semantic models automatic construction. Also author design a plan for binding of formal models to particular domain.

Yu. Molorodov, A. Fedotov. Database and electronic libraries for the ecologies problems (poster // Vol. 2, p. 14-17).

Abstract. Technologies of the development of the electronic libraries allows to formulate the recommendations and approaches to optimum control of the ecological systems.

D. Kulikov, L. Sukina, S. Nikolaev. E-catalogue of the Pereslavl University Library (poster // Vol. 2, p. 20-22).

Abstract. The article provides information about the Pereslavl University Library, dwells on purposes and ideas of the indispensability of the e-catalogue of the library. It describes requirements regarding the structure, functionality and interface of the catalogue. The article draws attention to peculiarities, advantages and mission of the system within the university library.

11.25 (Hall A ) Scientific Digital Libraries.
Digital Libraires for Earth Sciences

11.25 (Top floor ) Information Retrieval.
News Summarization

11.25 – 11.45. V. Safroshkin, A. Ivanov. Problems of creation and functioning of a thematic information-communicative resource on geomagnetism (20 m. // Vol. 1, p. 59-61).

Abstract. Problems of creation of a thematic resource with an informatively-communicative orientation on the target audience are considered. Technical details, available services, users attendance and user requirements are described.

11.45 – 12.15. K. Firsov, A. Fazliev, S. Sakerin, T. Zuravleva, B. Fomin, V. Zakharov. Information-computational system ``Atmospheric radiation''. State of the art. (30 m. // Vol. 1, p. 62-66).

Abstract. The description of the information-computational system ``Atmospheric radiation'' is presented. An access to the data and programs is organized by web-interface (http://atrad.atmos.iao.ru). The information-computational system not only provides acsess to date but also enables to calculate raditive characteristic of atmosphere of Earth. The aim of our team is the creation of the Internet acsess distributed information-computational system.

12.15 – 12.35. A. Osin, E. Trushkina, V. Kuznetsov. Virtual Archive as a prototype distributed data system for scientific knowledge base (20 m. // Vol. 1, p. 67-71).

Abstract. This document outlines an attempt to develop guidelines for a low-barrier unified distributed data system (``Virtual Archive'') based on modern standards. A prototype data system uses approach close to IVOA [1] but aims at a more general application area. Existing trends and standards in building distributed data systems are briefly discussed and Virtual Archive approach to specific issues laid out.

11.25 – 11.55. N. Abramova, V. Abramov. Automatic compilation of news stories reviews (30 m. // Vol. 1, p. 131-141).

Abstract. This work deal with one of the topical problems of automatic summarization - multi-document summarization in respect to news stories. Abroad this line of researches is widely developed, however in Russia no is paid to this subject area. Authors propose the method of compilation of news stories reviews, on the basis of which is developed the summarization system. We present the sample summaries and describe experiments of summarization evaluation. The experiments proved that on average (with coating 80% as to three collections of documents provided for the research) survey summaries reflect the content of original texts.

11.55 – 12.25. P. Braslavski, V. Gustelev. News Summarization System Based On Machine Learning Approach (30 m. // Vol. 1, p. 142-147).

Abstract. The paper describes an experimental automatic summarization system for news stories based on machine learning approach. As a main dataset we use a corpus of 1183 news stories from Gazeta.ru, a popular Russian online news service. The news stories have highlighted sentences that are used as summary. For classifier building we use LibSVM - an implementation of support vector machine. We use a set of easily computable features for classification. Additionally, we performed evaluation on a smaller manually tagged Kommersant corpus. Evaluation shows acceptable quality of the results.

12.35 (Hall A ) Scientific Digital Libraries.
Technologies for Social-Economic Monitoring

12.35 (Top floor ) Information Retrieval.
Document Stream Analysis

12.35 – 12.55. A. Bogomolova, O. Karasev, R. Sennov, T. Yudina. University Information System RUSSIA: Database and Services to Monitor Economic and Social Development at Regional and Local Levels. Applications for Public Administration and University Education (20 m. // Vol. 1, p. 72-76).

Abstract. Article describes University Information System RUSSIA-based applications that may serve for public administration and decision support at regional and local levels. UIS RUSSIA is exploited for university courses. Work underway is on special language to classify economic data and knowledge products.

12.55 – 13.15. M. Ageev, B. Dobrov, A. Sidorov. Geographic Information System for Monitiring of Strongly Interrelated Data (20 m. // Vol. 1, p. 77-83).

Abstract. We describe an Automated System for Monitoring of Urban-planning Data. The System includes data warehouse, data visualization module, a system for computing derivative parameters using geospatial data, and a system for modelling legislative constraints. The System is flexible and extendable, so we can apply it for other domains.

12.35 – 12.55. D. Lande, A. Grigorjev, S. Brajchevskiy, A. Darmokhval, A. Snarskii. Object visualization of thematic informational arrays (20 m. // Vol. 1, p. 148-150).

Abstract. Approach to visualization of thematic informational arrays of electronic publications is described.

Use of so called Wordlet-diagrams which are formed by consideration of distribution of the volumes of publications, corresponding to the chosen informational objects, is proposed.

The use of Wordlet-diagrams is presented by the important visual complement of the systems of integration of the informational resources.

12.55 – 13.15. A. Snarskii, D. Lande, S. Brajchevskiy, A. Darmokhval. The properties of relevance distribution in documentary arrays (20 m. // Vol. 1, p. 151-155).

Abstract. Distributions of two kinds of the measure of relevance of the documents in the documentary streams are investigated.

Stable correlations in their reciprocal dependence are revealed. Hurst index of the corresponding rows is defined. It is shown that they possess fractal nature.

13.15 (Hall B ) LUNCH

14.15 (Hall A ) Scientific Digital Libraries.
Historical Digital Libraries

14.15 (Top floor ) Information Retrieval.
Clustering and Near-Duplicate Detection

14.15 – 14.45. V. Barakhnin, A. Fedotov. ethodological approach for developing informational-reference systems on history of science (30 m. // Vol. 1, p. 84-88).

Abstract. This article describes the methodic of developing informational-reference systems on history of science. The main principies of this metodic are the following:

Information is grouped around persons, at that detailed biografical data is classified by chronological view, geographical view, etc. Bibliographical list of person includes either publication by this scientist or publications about this scientist. connection between scienctific activity of researcher with formal description of object domain where this researcher worked is clearly shown.

This description includes the informational model of the reference book, subsystem's realization features of the informational system, and also main types of user informational requests which are required for full-featured work with the system.

14.45 – 15.15. A. Marchuk, P. Marchuk. Digital archives integration platform (30 m. // Vol. 1, p. 89-94).

Abstract. The article examine the problem of integration of factographic information systems. Integration means both union of information resources and migration of information systems to uniform solution, preserving functionality and interfaces of specific system. Proposed approach was implemented in the project ``Digital photo-archive of SB RAS''.

15.15 – 15.35. Yu. Leonova, A. Fedotov. Information model of the account of the temporary factor in information-reference systems (20 m. // Vol. 1, p. 95-102).

Abstract. The main subject of consideration given article is an information model of the account of the temporary factor in information-reference systems (IRS). IRS must provide execution an inquiry for some moment of time in past that is to say making the cut to true fact to free date. The account of the temporary factor is offered realize on base two dependencies:

  1. to versions document, connected with change attribute document on chosen time lag;
  2. relations parent-descendant between new and old objects.

14.15 – 14.45. A. Sytchev, M. Bazhenov. The Problem of The Seeds Selection for an Automatic Web-directory Resource Discovery Based on Strongly Connected Components Identification Followed by Content Filtering (30 m. // Vol. 1, p. 156-165).

Abstract. The paper presents results of experimental research of an approach proposed by authors for an automatic web-directory resource discovery. Using data sets gathered from Yandex web-directory (http://yaca.yandex.ru) 2 approaches for seeds selection from web-directory rubrics were examined. It was demonstrated that seeds selected from from web-directory have rather different importance for the automatic resource discovery. The number of inlinks of seeds may be considered approximately like indirect indicator of its importance.

14.45 – 15.15. Yu.G. Zelenkov, I.V. Segalovich. Comparative analysis of near-duplicate detection methods of Web documents (30 m. // Vol. 1, p. 166-174).

Abstract. The work covers comparative experimental investigation of most popular modern methods of near-duplicate detection for textual documents. The quantitative evaluation of indices of completeness, precision and F-measure is given.

The test data which is used in experiments, include the ROMIP.

A new algorithm, having higher quality factors than the existing approaches is proposed.

15.15 – 15.35. N. Vinogradova, O. Mitrofanova, P. Panicheva. Automatic Term Clustering in the Corpus of Russian Texts on Corpus Linguistics (20 m. // Vol. 2, p. 23-28).

Abstract. Automatic term clustering attempt decribed in the corpus of russian texts on corpus linguistics

15.35 (Hall C ) Posters with Coffee

N. Luneva. Multilingual linguistic knowledge base: architecture and metadata (poster // Vol. 2, p. 67-70).

Abstract. The paper describes some principal architectural decisions and types of metadata in the multilingual linguistic knowledge base founded on the new linguistic resource. The linguistic knowledge base is aimed at debugging semantic-syntactical representations in language processors of machine translation and text knowledge processing systems. The new knowledge base is being designed as a major test bed for the research community in the field of computational linguistics and intellectual technologies as well as for educational purposes, for comparative analysis of language structures and creating language training environments. The knowledge base features the component of the multilingual translation memory.

S. Volkov. From digital library to an information system "The complete .V. Lomonosov" (poster // Vol. 2, p. 18-19).

Abstract. The report is devoted to creation of the philological digital library ``M.V. Lomnonosov". The discussed library has scientific, cultural and educational goals. The necessity of creation of such a library is proved, requirements to preparation of a material and system of its organization are described. The paper provides main principles of formation of philological digital library "M.V. Lomnonosov" and gives brief account of a search mechanism. The model of information system ``The complete M.V. Lomonosov'' is offered.

T. Kachaeva, V.S. Yuzhikov. CV recognition and grading automated system (MS Word document) (poster // Vol. 2, p. 33-36).

Abstract. Article contains the description of CV recognition and grading automated system. Methods and algorithms of CV database forming based on automated analysis of incoming resumes were considered.

N. Markova, O. Obuhova, I. Soloviev, A. Chochia. Web-technology of dynamic classification in quasi-homogeneous digital collections (poster // Vol. 2, p. 29-32).

Abstract. Merging the benefits of attributive search and navigation would allow the user to navigate a digital collection by progressively selecting desired facet values of information objects.

The paper presents facet navigation based on dynamic classification - fast and easy drill down in collections by selecting attributes. The formal representations of ``facet formulas'', ``facet table'' and ``facet request'' are proposed. Several design decisions that can improve efficiency of facet navigation are discussed. A sample of visual interface snapshot illustrates the main ideas of the paper.

16.00 (Hall A ) Scientific Digital Libraries.
Virtual Observatories

16.00 (Top floor ) Information Retrieval.
Dictionaries and Thesaurus for Digital Libraries

16.00 – 16.30. A. Avramenko. Toward a Consensual Virtual and Real Time Clock in the Collection of Pulsar Timing Data Sets (30 m. // Vol. 1, p. 103-111).

Abstract. A problem of accordance of the observed values and their formal images, is considered. A structure and components of collection, which are appropriated to that accordance, are determined. By integration of observed and modeled data, the pulsar timing sets are transformed to parametric type, which is defined by observed parameters of rotation of pulsar. The methods of collation of virtual and real features of sets in time area of variables, are defined. The instances of problem application of collection, is considered.

16.30 – 17.00. N. Mamardashvili, A. Vovchenko, L. Kalinichenko, O. Malkov, M. Patrakova. Integration of Data Mining Tools in the Infrastructure of Virtual Observatory (30 m. // Vol. 1, p. 112-122).

Abstract. Data Mining methods are used in different fields of science, including astronomy, as the means to help obtain new knowledge and make scientific discoveries. The importance of incorporation of means for the solution of astronomical problems by Data Mining methods into the virtual observatories is discussed. Existing approaches are examined, preferences are given to the application of the algorithmic ensembles, the respective architecture (Ensembled Weka) of incorporation of the Weka Data Mining system into the infrastructure of virtual observatory is proposed.

17.00 – 17.30. O. Zhelenkova, A. Kopylov, V. Chernenkov. Application of IVOA software tools for radio sources investigation. (30 m. // Vol. 1, p. 123-130).

Abstract. The interrelationship between the objects of astronomical catalogs in the different ranges of electromagnetic spectrum and their association into the real astrophysical source has obvious scientific interest. Astronomical community actively uses the Internet for the access to the scientific information, but the heterogeneity of data and their constantly growing volumes are the certain difficulty. The gathering of information even about one celestial object is time-taking work because of a large quantity of resources, data access, formats of the obtained results and formats of input data of the program applications, used for further analysis. The community activity on the creation of the architecture of information interaction, standards, format specifications, data models and services, which increase the efficiency of work with the data, coordinates International Virtual Observatory Alliance (IVOA). Within the framework of this activity are created systems, which make it possible to realize the distributed computing and data access. We decided to analyze the usage of existing software tools for investigation the radio sources list. .

16.00 – 16.30. A. Vasiliev, D. Kozlov, S. Samusev, O. Shamina. Automatic document metadata extraction from Russian scientific articles (30 m. // Vol. 1, p. 175-184).

Abstract. Automatic document metadata extraction provides useful search mechanisms for digital libraries. In this paper three metadata extraction techniques are experimentally compared for metadata and bibliography extraction from Russian scientific articles.

16.30 – 16.50. N. Buzikashvili. Dmitry Samoilov's Iskalka (20 m. // Vol. 2, p. 41-48).

Abstract. The paper describes elegant, highly effective and easy-to-use text aligning and information retrieval solutions of the machine-aided translation completely developed by Dmitry Samoilov (1958-2005).

16.50 – 17.10. A. Andreev, D. Berezkin, A. Nechkin, K. Simakov, Yu. Sharov. The method for unsupervized detection and correction of misprints in geographical names for the system of semantic checking and validation of documents (20 m. // Vol. 2, p. 49-56).

Abstract. This article describes the method of detecting and correcting of misprints in such special data as geographical names. We give our classification of typical errors and misprints. We pay especial attention on discussing the method itself and proposed algorithm. Some experiments were carried out and results are presented.

17.10 – 17.30. O. Lavrenova. The multilingual Access to the Data on the Base of the Geographic Names Thesaurus (20 m. // Vol. 2, p. 57-62).

Abstract. The paper deals with the RSL project of the geographic names thesaurus in the form of a national authority file.

19.00 ("Navigator" ) Conference Dinner

October, 17

9.30 (Hall A ) Ontologies, Data Representation.
Access Techniques to Digital Collections

9.30 – 10.00. E. Myasnikov. Digital image collection navigation based onautomatic classification methods (30 m. // Vol. 1, p. 185-194).

Abstract. The central question of image navigation system construction is to build the projection of the collection into the two-dimensional space of navigation. The kernel of the method proposed is to build the projection in two steps. At the first step the hierarchical system of clusters is constructed. At the second step the initial space of image descriptions is projected into two-dimensional navigation space.

A survey of methods used for navigation system construction is given. The results of experimental analysis are present. The proposed method is compared to known methods. The results of this work allow to draw a conclusion of ability to successfully apply the method developed.

10.00 – 10.20. M. Prokhorov, O. Bartunov. Navigation into Full-text Data Bases and Portals (20 m. // Vol. 2, p. 71-80).

Abstract. Large amount of information in present day portals and information systems requires effective methods for navigation. Navigation tools presence even into "classical" paper publications (naturally, in "classical" form). However, modern electronic navigation mechanisms sufficiently more various and effective.

10.20 – 10.50. I. Markov, N. Vassilieva, A. Yaremchuk. Image retrieval. Optimal weights for color and texture fusion based on query object. (30 m. // Vol. 1, p. 195-200).

Abstract. It is a common way in CBIR to process different image features independently to estimate image similarity. Color and texture are common features which are used for searching in natural images. This paper proposes the hypothesis that it is possible to mark out optimal weights for fusing color and texture-based estimations in accordance with query image features. Linear combination of color and texture metrics is considered as a mixed-metrics. Clusters of images with common features and optimal weights for them are presented based on experimental results. The results of the paper can be used to determine the best weights for particular query and thus improve image retrieval.

10.50 (Hall C ) Coffee-Break

11.05 (Hall A ) Ontologies, Data Representation.
Ontologies in Digital Libraries

11.05 (Top floor ) Russian seminar "Internet-mathematics".

11.05 – 11.35. A. Privezetsev, A. Fazliev. Application task ontology for systematization of information resources in molecular spectroscopy (30 m. // Vol. 1, p. 201-210).

Abstract. The description of the application task ontology for systematization of informational resources in molecular spectroscopy is presented. The interfaces for end user demonstrate the usage of this approach in case of task of the determination of energy levels of the water.

11.35 – 12.05. E. Birialtcev, A. Gusenkov, A. Elizarov. About access to electronic collections presented as relational databases on the basis of ontologies (30 m. // Vol. 1, p. 211-216).

Abstract. This paper is devoted to application of ontological descriptions of different levels for providing access to electronic collections presented as relational databases. Offered approaches are approved on information resources of the Oil&Gas industry.

12.05 – 12.35. Yu. Zagorulko, O. Borovikova, G. Zagorulko. Ontology-Based Approach to Getting Content-Based Access to Humanitarian Information Resources (30 m. // Vol. 1, p. 217-224).

Abstract. The paper presents approach to getting the content-based access to humanitarian information resources using ontology. Ontology constitutes information basis of Internet knowledge portal that must provide both integration and systematization of humanitarian scientific knowledge and of information resources relevant to the subject domain of a portal and content-based access to them from any point of Internet space.

Ontology is used for automatic generation of scheme of internal data base of portal and forms for filling this data base, formulating user queries in terms of subject domain of portal and navigation through portal information space.

The structuring of portal ontology to domain-independent and subject domain ontologies, makes the knowledge portal easily adjustable to any area of knowledge.

12.35 (Hall B ) LUNCH

13.35 (Hall A ) Ontologies, Data Representation.
Concepts Descriptions and Refinements in Ontologies

13.35 (Top floor ) Russian seminar "Internet-mathematics".

13.35 – 14.05. N. Skvortsov. Application of concept refinement in salvation of ontology manipulation tasks (30 m. // Vol. 1, p. 225-229).

Abstract. The paper is continuing the investigation line of application of type refinement for heterogeneous ontological descriptions of a subject area. Most typical task of ontological specification manipulations are considered. They are: verification of ontological definition for internal consistency, mapping and integration of ontological contexts, ontology development, information contextualization and personalization, conceptual model development on the base of ontology, querying and messaging in terms of ontology. The claim of the paper is to show ability of application of refinement relation in tasks that are usual in ontology modeling.

14.05 – 14.25. N. Loukachevitch. Description of Role Concepts in Linguistic and Ontological Resources (20 m. // Vol. 2, p. 81-89).

Abstract. In the paper we consider ontological characteristics of such concepts as roles and show their distinctions from type concepts. We argue that the difference between types and roles is necessary to account in ontological and linguistic resources intended for automatic text processing if the automatic inference procedure is planned.

We show that information about concepts and entities obtained from text definitions often leads to incorrect descriptions of type-role relations, and it is necessary to make special efforts to describe this information appropriate for the logical inference.

We discuss possible means for description of the type-role relations used in the Thesaurus of Russian Language RuThes.

14.25 (Hall A ) Ontologies, Data Representation.
Manuscripts Digital Libraries

14.25 (Top floor ) Russian seminar "Internet-mathematics".

14.25 – 14.55. A. Varfolomeyev, I. Kravtsov., V. Filatov. SVG-visualization for digital libraries of hand-written documents (30 m. // Vol. 1, p. 230-235).

Abstract. Our article covers various terms and development technologies for Scalable Vector Graphic using in digital libraries of handwritten historical documents.

Full-text nature of the initial information and natural hierarchical structure of the documents define XML-technology as a choice and a basis for texts storing. But, if we use XML for allocation of logic elements in the texts of our library, it looks logically to apply the same technology to other purposes - for example, for building of queries to a collection of the documents or for visualization of the information at different stages of our work with the texts.

In this article, the basic attention is given to four variants of SVG using. We talk about making dependencies between XML-markup and images of initial documents. We describe special vector fonts definition for adequate representing of original texts. We demonstrate different forms of visualization as results of analytical queries to a collection of the documents. We also offer to use SVG-based editor for graphs models creating.

The considered approach is used in practice in development of special toolkit for information system "Istochnik" ("Source") intended for network community of the researchers.

14.55 – 15.15. V.S. Yuzhikov. Segmentation of the image of ancient manuscript page (20 m. // Vol. 1, p. 236-240).

Abstract. In article the algorithm for segmentation of the image of text page was described. The problem of segmentation consists in correlation of each element of page to one of two classes - the text or figure. Work of algorithm begins with splitting the image into small areas. For classification of each area following criteria are used:

  1. A share of black pixels in all area.
  2. Value of disorder of width elements into area.
  3. Presence of alternating lines and line spacing

15.15 (Hall C ) Coffee-Break

15.30 (Hall A ) Ontologies, Data Representation.
Tools for Digital Libraries

15.30 (Top floor ) Russian seminar "Internet-mathematics".

15.30 – 16.00. K. Kudim, G. Proskudina, V. Reznichenko. Comparison of repository systems EPrints 3.0 and DSpace 1.4.1 (30 m. // Vol. 1, p. 241-252).

Abstract. The basic facilities and features of most popular DSpace and EPrints open source systems for construction of scientific digital libraries are considered in the work. And also experience in creation of multilingual digital libraries on these basis is described. Comparative analysis of DSpace 1.4.1 and EPrints 3.1 is presented. Special attention is given to problems of localization, external formats compatibility and usability.

16.00 – 16.20. O. Bartunov, T. Sigaev. Specialized data types for digital libraries (20 m. // Vol. 2, p. 90-96).

Abstract. Complex modern informational systems require specialized data types, optimized for fast access and tasks of informational retrieval. Rapid changes in patterns of access to information require extensibility of database engine to allow experts in the data domain to develop custom data type, optimized for data domain. We describe several data types, developed for the open-source ORDBMS PostgreSQL, which facilitate operations with sets, hierarchical data, semistructured data and full-text search. Also, we describe PostgreSQL infrastructure for developing extensions.

16.20 – 16.40. C. Becker, S. Strodl, R. Neumayer, A. Rauber, E. Nicchiarelli, M. Kaiser. Long-Term Preservation of Electronic Theses and Dissertations: A Case Study in Preservation (20 m. // Vol. 2, p. 97-103).

Abstract. An increasing number of institutions throughout the world face legal obligations to collect and preserve digital objects over years. A range of tools exist today to support the variety of preservation strategies such as migration or emulation. Yet, different preservation requirements across institutions and settings make the decision on which solution to implement very difficult. The Austrian National Library will have to preserve electronic theses and dissertations provided in PDF. It is thus investigating potential preservation solutions. The preservation planning approach taken in the PLANETS project is used to evaluate various alternatives with respect to specific requirements. It provides an approach to make informed and accountable decisions on which solution to implement in order to preserve digital objects for a given purpose. We analyse the performance of various preservation strategies with respect to the specified requirements for the preservation of master's theses and dissertations and present the results.

16.40 – 17.00. M. Bogatyrev, V. Latov, I. Stolbovskaya. Application of Conceptual Graphs in Digital Libraries (20 m. // Vol. 2, p. 104-110).

Abstract. Some results of application conceptual graphs as a content of digital libraries are presented. The problem of clustering of conceptual graphs and its decision are considered.

17.00 (Hall C ) xcursion

17.00 (Hall B ) RCDL Steering Committee Meeting - invitation only

October, 18

9.30 (Hall A ) Integration problems.
Technologies for Information Resources Integration

9.30 (Top floor ) Russian Information Retrieval Evaluation Seminar.

9.30 – 10.00. D. Briukhov, L. Kalinichenko, D. Martynov. Source Registration and Query Rewriting Applying LAV/GLAV Techniques in a Typed Subject Mediator (30 m. // Vol. 1, p. 253-262).

Abstract. New methods and tools for application development in collaborative scientific enterprises (like Virtual Observatories (VO)) over multiple distributed sources of data and programs are required. In this paper we focus on results of research and experimental work oriented on problem-driven subject mediation emphasizing aspects of LAV/GLAV information sources integration in the mediator. The approach considered has the following distinguishing features: typed, object canonical model is used instead of usually applied relational one; a technique of refining mapping of source information models into extensible canonical one is provided; registration in a mediator of a relevant source is done so that a mediator type should be provably refined by a relevant source type or by a composition of such types (the conflict resolving functions are to be specified, if required); rewriting of non-recursive logical programs containing strongly typed rules is applied. These features provide methodological context for the current paper that is focused on description of the role the LAV/GLAV approach plays in the mediator. Using astronomical example taken from the Russian VO context, we show the technique of information source registration at the mediator and query rewriting technique in a typed specification environment applying LAV/GLAV approach.

10.00 – 10.30. A. Zhuchkov, A. Kravchenko, N. Tverdokhlebov. Service-oriented GRID-approach to maintain data spaces of virtual organizations (30 m. // Vol. 1, p. 263-272).

Abstract. This article covers the adaptation of some facilities of Grid-technology in order to form the Data Spaces support platform. It shows also an example of the constructing of a high-level service which operates in the subject-oriented Data Space of a medical Virtual Organization.

10.30 – 11.00. E. Kudashev, A. Filonov. Technologies and standards for services, catalogues and databases integration in Earth Observation programs (30 m. // Vol. 1, p. 273-279).

Abstract. The paper surveys emerging standards designed for building distributed service-oriented environments. The project of integration Russian satellite data archive in international system SSE supported by ESA is observed as an example.

11.00 (Hall C ) Coffee-Break

11.15 (Hall A ) Integration problems.
Geterogeneous Collections Integration

11.15 (Top floor ) Russian Information Retrieval Evaluation Seminar.

11.15 – 11.35. A. Fedotov, V. Barakhnin, A. Guskov, J. Leonova. Developing information system for scientific society based on the integration of heterogeneous heterogenius heterogenius collections of resources (20 m. // Vol. 2, p. 111-117).

Abstract. In this paper the technology for information system ``Organizations and employees of SB RAS directory'' creation is described. This technology based on the principle of decentralized information storage with an integrated resource catalogue. Such approach provides the possibility of automatic actualization of information and system's interoperability - the possibility of heterogeneous resource integration within the system and between external systems.

11.35 – 11.55. O. Klimenko, V. Philippov, M. Philippova. Digital Library of Mathematicle Resources MathTree (20 m. // Vol. 2, p. 118-121).

Abstract. MathTree library represents a set of integrated links to Internet resources. The links are stored with some metadata in a tree catalogue where different branches correspond to various aspects of mathematics. Moving along the branches allows one to get information related to a specific aspect of mathematics: research laboratories, scientific schools, departments, specialists in the given field, theses and other digital resources, links to magazines where articles on the subject are published, and conferences on related topics.

11.55 – 12.15. I. Smirnov, O. Pugachev, A. Lobanov, A. Alimov, E. Voronina. Digital marine animals collections of the Zoological Institute RAS and their metadata (20 m. // Vol. 2, p. 122-127).

Abstract. One of the ways to make easier and accelerate the search of necessary information on the Internet is the integration and standardization of databases on the biodiversity and their metadata. Presently in the information retrieval systems some standards of data input, description and presentation are being used, among them Darwin Core, RDF, Dublin Core Metadata Elements etc.

12.15 – 12.35. S. Chernov, E. Minack, P. Serdyukov. Converting Desktop into a Personal Activity Dataset (20 m. // Vol. 1, p. 280-283).

Abstract. The current experiments on personalization in information retrieval are limited to the available collections of the real world data. While a number of publications exploited user interaction with Desktop, often these experiments are neither repeatable nor comparable. In this paper we elaborate on the need for logging the Desktop activity data and creating a common collection for Desktop search evaluation. We describe the design of such a dataset and necessary logging tools. We also outline the current state of our Personal Activity Track initiative towards creation of the Desktop search dataset. While this effort is currently targeting English-speaking users, it is also applicable to Russian and other languages.

12.35 (Hall B ) LUNCH

13.35 (Hall A ) Integration problems.
Access Control and Exceptions

13.35 (Top floor ) Russian Information Retrieval Evaluation Seminar.

13.35 – 14.05. A. Berztiss, B. Thalheim. Exceptions in Information Systems (30 m. // Vol. 1, p. 284-295).

Abstract. The concept of exception has been defined in diverse ways. We relate exceptions to computational transactions and to control constructs. Our view of a transaction is very broad, and we consider transactional exceptions to be instances of undefined function values. By giving different interpretations to ``undefined'' we arrive at a classification of transactional exceptions. Our primary interest is in information systems, i.e., in database transactions, and in processes that consist of such transactions. In the database context we show that liberal treatment of exceptions is simpler than total quality management for consistency based on a set of constraints. We refer to control operations that link transactions into processes as actions. Actions tend to be time-related, and time Petri nets provide actions with semantics. The time Petri net representation indicates where exceptions can arise. We also consider high-level monitors for the detection of exceptions. Although our emphasis is on detection of exceptions, their handling is also discussed.

14.05 – 14.35. O. Zhizhimov, A. Fedotov. Models of management of access to the distributed information resources (30 m. // Vol. 1, p. 296-299).

Abstract. On the basis of the analysis of typical scenarios of work of information servers (WWW, FTP, Z39.50, etc.) problems which should dare at the organization of the monitoring system of access to the distributed information resources are formulated. Possibilities of technology LDAP as similar system most suitable to construction are considered. Within the limits of this technology by the access, differing degree of integration of functions of information servers three models of management are discussed with technology LDAP.

14.35 – 14.55. E. Ivashko. The defensive system against unauthorized documents-copying of the digital libraries development (20 m. // Vol. 1, p. 300-306).

Abstract. In this article we consider problems of applying statistical anomaly detection algorithm to preserve digital libraries from unauthorized full-scale documents copying. We propose modernized intrusion detection technique based on Markov chains for creating classifiers and patterns of normal behavior.

14.55 (Hall C ) Closing of the Conference RCDL2007

15.10 ("Pereslavl" hotel ) Buses to Moscow

Overview RCDL 2007 Program Details

Main page