Rev. Esp. Doc. Cient. 24(4), 2001
a
REFERENCE ANALYSIS BASE ON A VECTORIAL SPACES MODEL: CONTEMPORANY HISTORY IN JAEN RESEARCH FOR 1990-1995
José Luis Ortega Priego
Library and Information Science Degree
Avda. Andalucía, 88 7-D. 23006 Jaén (SPAIN)
jose_ortega@gmx.net
The spatial perfomance of the relationships there are among researchers in Contemporany History of Jaén for 1990-1995 through their behaviour in citing process is the objetive of this work. Through reference analysis bases on Vectorial Spaces Model (VSM) and displayed in a graphic thanks to Multidimensional Scaling (MDS) are obtained results about research fronts, who lead them, who made up them, and the "disciple/master" relationships there are among researchers.
Bibliometry; Citation analysis; Vectorials Spaces Model (VSM); Multidimensional Scaling (MDS); Mapping of Science; Contemporany History.
From relatively little time the citation analysis are receiving a great development within the parameters of the Bibliometry and the Scientometry. It is in the middle of the Sixties, when Solla Price (1965) begins to study the relations between the scientific works through its citations, and in the seventy Small (1973) it presents his first studies on the capacity of the Co-citation of works to study the affinity of content of the scientific works. But it is White and Griffith (1981) that in the eighty made specific the investigations on the Authors Co-citation Analysis (ACA). The most interesting aspect of the citations studies is its capacity of graphical representation, since the similarities between works or authors can be transformed into distances and these to show in different a dimensional space through models (Clustering, PCA, MDS, etc.), is what it has come to denominate maps of science. Mapping of Science that Noyons (1999) defines as landscapes of fields of scientific research created by the quantitative analysis of bibliographical data. In this scope it is possible to emphasize the works of Garfield (1994; 2001a; 2001b) in the ISI, where through his Scientography develops to graphical representations from an article or a review for the identification of the sharpshooting research front or the sprouting of new disciplines. Also it is possible to emphasize the projects that Noyons and Van Raan (1996; 1998a; 1998b) (Noyons, et. al. 1999) they are developing in the CTWS of the University of Leiden, on the construction of maps from the analysis of co-words and co-citation occurrences. Finally we cannot forget the works that Leydesdorff (1987; 1988; 1998; 2000), in Amsterdam, is developing on the Triple Helix models (university-industry-government) in the processes of scientific research. In Spain it is solely possible to emphasize the work of Moya and Jiménez (1998) on the identification of the research fronts in Library and Information Science through the authors co-citation analysis, although works in spanish language exist that describe to these techniques (Lopez-Martinez, 2000; Callon, 1995). This way the cite, defined as "intellectual transaction, an express recognition of one 'intellectual debt' towards a previous information source"(Carrascal, 1997), has become a mean one to know and to study the relations or links that can exist between authors, titles, reviews and research fields.
Nevertheless, the value of the cite as the best means for the study of the productivity of an author, the factor of impact of a publication or the map of a scientific community, are object of diverse controversies. Whereas for Garfield (1998) a cite acquires a validity degree to such an extent to turn the studies of citations an independent science Citationology, for others (Hauffe, 1994; Carrascal, 1997; Gilbert, 1977) the cite can be caused by personal and arbitrary interests that weaken the objective value of this.
The appearance of Internet network and its hypertext character has multiplied the cite value, since this is to us like powerful means, nonsingle to understand the organization and configuration of the network (Rousseau, 1997; Boudourides, 1999; Almind and Ingwersen, 1997), but like tool more in Information retrieval research (Figuerola, 1998) or electronic resources evaluation within Internet - Google, CiteSeer- (Lawrence, 1999; Brin, 1998).
Finally, the use of these techniques within the field of the humanities is nonexistent in Spain, although yes in anglosaxon world (Graham, 2000; Finkenstaedt, 1990). It is only possible to emphasize the quantitative bibliometrics analyses of Rodriguez Mayor (1993) on Prehistory research in Spain, Rubio (1994) on the period of the Franco Dictature and Ruíz Franco(1999) on the Spanish Civil War. With respect to the situation of the historical research in Jaen, we cant have any bibliography
The main intention of this work is the space representation of the relations that exist between the investigators in Contemporary History of Jaén during the period of 1990-1995 through their behavior in the accomplishment of the citations. One looks for to project the groups of authors in function to who and with what frequency citing to other scope authors of same research front, emphasizing therefore school or general tendency of certain groups at the time to make a cite. This way the cluster that identify themselves nonsingle will be grouped by thematic affinity, but also by affinity in their behavior at the time of citing one or another author.
The population study object is a total of 22 authors who have worked on Contemporary History in Jaén during the period of 1990-1995. The election of this field of study must to:
1º: The most subjective behavior in accomplishment of its citations, the ideological aspects of this (Frank Ruíz, 1999) and the value of cita ad hominem, that History has allows us to represent the schools and the relations "disciple/master" that exists in the community.
2º: The level more under specialization, which causes that the citations never are made by thematic affinity. If one had decided on a more general level (History of Jaén) the groups would be identified clearly by specialties, instead of by authors.
3º: The publications in Contemporary History in Jaén present a greater number with respect to other specialties, which causes that the specialty is not located in an only different scope which favors different approach of the historical reality, and thus different schools.
4º: The authors confine themselves to a certain research subject, which offers in addition a vision to us to the research fronts (Moya, 1998).
For the accomplishment of this work is had counted with a table developed in Microsoft Excel 2000(1999) where has quantified the references of the authors and the calculations of similarity. In order to develop the multidimensional scaling model it is had counted with the Vista: Statistical system Visual program, version 5,0,5 in Spanish, specialized in statistical calculations and space models developed by Forrest W. Young (1998). Finally Unlead PhotoImpact 3,02 Special Edition (1997) for the adjustment of the image generated by the model has been used.
The authors selection process has been made in two phases:
1º has been identified and selected previously all the historical research publications or that contains some article about historical research at least, although that is not the objective of the publication, that has been published in the province of Jaén during the period of 1990-1995.
2º Once extracted significant articles (Contemporary History in Jaén) the contained citations have been taken in order to extend of the possible most exhaustive form the number of works that were dispersed in foreign publications to the moved away province or of thematic. This type of publications is mainly the congress procceding.
Table I. Reviews List.
The reviews that agglutinate a greater number of works are the Bulletin of the Institute of Giennenses Studies and the Bulletin of the Chamber of Commerce that represent 34% and 28% of articles respectively.
Once selected the publications the articles drain that turn on Contemporary History in Jaén and province. And from these, about 46 altogether, are extracted a total of 218 citations of which is extracted the authors as well, a total of 22.
With the autohor list we began to indicate the references number that each author is doing to the authors already contained in the list. Thus it is extracted one by one the references that each author is doing. Once expressed the references in a small template they go to a table formed in each row through the author and each column by the rest of authors, being indicated in each field if it makes or does not make reference to that author and the number of times.
Table II. Author references number.
As it is possible to be appreciated the constructed table offers the manifold characteristic to us that each author, through his references, presents with respect to the other authors. Being constructed therefore a profile of each author modeled by its behavior at the time of citing the rest of researchers. This does possible that each row of each author can be turned a vector where A represents the author, subscript i the order of the author, superscript n the number of dimensions and R the references to each author, being thus its representation:
![]()
Before coming to the analysis all the references are equalled to a common measurement since an author who has published more, or than their works contains more references does not represent just like that author that has published a single work with very few references. Thus all the references are compared dividing each one between the total of all the references made by the author and thus obtain the value of each reference in the total number of each author. It formulates it would be:
![]()
And the table would be this way.
Table III. Author references frequency.
Once obtained the vectors of n-dimensions (in this case 22 dimensions) it is come to construct a similarity matrix. With her we will deal to measure the similarity degree that exists between each vector. For it one is going to take the Cosine formula (Salton, 1981), since this generates a smaller dispersion and a greater compaction of the results in a multidimensional scaling model (MDS) (Rorvig, 1999). Whereas the Dice (Griffiths, et al, 1986) and Jaccard formula(Van Rijsbergen, 1989), with very similar results, maintains many differences with the extreme similarities. Therefore, the Cosine formula would be:

From this form the similarity matrix is obtained. The similarity values
between the different authors come expressed in a scale (0,1) in which the
values nearest the 0 express a gradual disimilarity, difference between
authors, whereas nearest the 1 they demonstrate a greater similarity
degree, until the point to arrive at a total similarity as it happens such
to the similarities of same authors S(Ai, Ai)=1. On the other
hand, it is worth to say that when being a scalar product, that is, a
multiplication between two vectors, the order does not alter the product
reason why![]()
Once calculated the similarities we continued to represent the distances that exist between each vector in this n-dimensional space. For it we will transform the similarities into distances, reducing to them 1 to the similarities. Of this form we will have whatever defer authors from others.
Once calculated the similarities and their distances one is going away to the data graphical representation. As we had said our objective were to represent the relations that exist between diverse authors and the research fronts that arise in this publications. For it many multivariant techniques exist that it allow to us the clustering and the space display of different objects, as they are the Clustering, Principal Components Analysis (PCA) and Multidimensional Scaling (MDS) (Moya et al., 1998).
In this case it has been decided on the Classic or Metric Multidimensional Scaling (Conchillo and Ruiz, 1993), not only because it fits of very effective form with Vectorial Spaces Models (Rorvig, 1999), but that allows the maps construction of an abstract reality.
The MDS consists of un joint of data analysis techniques whose purpose is to show the data distance through a geometric representation (Young, 1985). Its origin is in the Torgerson studies in the metric MDS (1952) and Kruskal in the non-metric one (1964). By means of this algorithm a vectorial space from n-dimensions is reduced to another one of 2 or 3 dimensions, which allows the graphical representation of these vectors and to see its positioning in the space.

Graphical 1. Authors references map.
Table IV: Groups and identified research fronts.
The result of the multidimensional scaling model (MDS) presents a 0.409 stress degree more elevated than the reached by Moya (1998), although as Young (1998) tells "the Stress value is affected by the number of stimuli and dimensions, being practically impossible to say if a particular value is good or bad".
Since we have commented, the authors comparisons are made from their behavior at the time of citing other authors, of this form clusters that appear are authors groups who make his cites to certain authors and with a high frequency. As one is in table IV the groups have been defined by the author or authors name who focus in greater measurement the citations of these groups. Thus the cluster one characterizes itself so that all their authors citing Julio Artillo and with a high frequency. This is because Artillo is a pioneer in the researchs in XIX second half century, also main subject of the authors who citing to him.
On the other hand, the cluster 2 is formed by authors who citing mainly Eduardo Araque, motivated in this case by thematic affinity in the geographic and forest field research, which shows its noticeable peripheral character of the group.
The cluster 3, more diffuse, it forms by authors with a greater bond to Cobo Romero, with thematic relative to XX first half century (Spanish Civil War, Education). Within this group it deserves to honor a sub-group formed only by two authors, A. Tarifa and M. Ortiz that represents the most outstanding front in Demographic studies, initiates previously by Cobo Romero.
And finally, the center cluster 4 directs his citations of clearer form to Gay Armenteros, whose authors treat subjects (Spanish Civil War and XIX second half) researched by this same one. It does not stop being significant the fact that two of the group nominators (Artillo and Cobo Romero) make up this last group. This is because these authors lean as well in the Gay Armenteros works, Contemporary History research dean in Jaén. Acting these like bridge between its research fronts and Gay Armenteros.
To part of the research groups definition other results can be extracted, like it is the fact that as we moved away of the center the groups are more defined and clear, therefore the main groups we found them in the periphery. On the contrary, as we approached the center the groups are more diffuse, and they do not represent a tendency defined in its references, but that disperse in all the groups or authors whom they have around. Thus the center group can be said that it is halfway between group 1, 2 and 3.
Also they are appraised the existence of authors nondefined clearly in no group but that they act as a nexuses between these groups, like the fact of the author Lopez Cordero the most productive of all, and that is located between group 3, 4 and 1. Equal it happens to the Coronas and Higueras authors. Finally, the authors who name the groups (Artillo, Araque, Cobo Romero and Gay) are authors who have one expanded research activity in this field, that goes back to 15 or 20 years. This confirms the "disciple/master" relationship that exists in these groups, to the being these authors reference in the processes of historical research in Jaén. The fact that some of these authors are within their own groups must to a high self-citation index (Araque, 0,8333).
As it has been possible to see the references analysis contributes a new perspective to us whereupon to watch the study of the scientific communities. Since if from the citations analysis we can observe a community from "up", that is to say, to the recognized authors more, more veteran, more seated in the research field. The references analysis offers a contrapicado view to us, offering us the beginner authors image, less seated and less known, but than, doubt does not fit, are important part in the research processes and that will be the calls to the consolidation and improvement of research fields or to the new fronts creation.
The "disciple/master" relations located in this work do not make but confirm this dynamic fact of the science, in which the generations are renewed in the research processes, and with them the subjects and the methodologies. For that reason, the dynamic use of these techniques in models or maps could contribute the most valuable information to us on these generational relations.
It is clear that this work only can be contemplated as an example, a model still in embryo state but that its application in other facets not strictly bibliometrics can contribute significant data to us on other realities. Its use in the Information retrieval scope, as other bibliometrics techniques have been applied, would allow us to present qualitative information retrieval system and high informative value for the scientific information. On the other hand, its adaptation to the evaluation indicators of the scientific and technical activity would be an important fact for the decision making in scientific policies.
Finally, the indescribable aid that nowadays is supposing for the Bibliometry the multivariant models is unquestionable. In concrete the Vector Spaces Model (VSM), and their graphical representation through Multidimensional Scaling (MDS), has demonstrated to be very useful at the time of analyzing populations with a high amount of variables. The amount and the quality of information that these models provide to us must be an element to consider, not only for the Bibliometry but for any other aspect in the Information Science scope.
One thanks for to the Institute of Giennenses Studies his support in the location and identification of the sources of this work. Specially to Miguel Valero its support and Jose Juan Valenzuela its aid
1. ALMIND, T. C., INGWERSEN, P. (1997). Informetric analyses on the world wide web: Methodological Approaches to 'Webometrics'. Journal of Documentation, 53(4): 404-426.
2. BOUDOURIDES, M. A., SIGRIST, B., ALEVIZOS, P. D. (1999). Webometrics and the Self-Organization of the European Information Society. Draft Report, Task 2.1 of the SOEIS project, Rome meeting, June 17-19.
3. BRIN, S., PAGE, L. (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine. Proceedings of the 7th International World Wide Web Conference, April 1998. [on line]. [cited 01-07-07] http://google.standford.edu/long321.htm
4. CALLON, M., COURTAIL, J. P. y PENAN, H. (1995). Cienciometría: la medición de la actividad científica: de la bibliometría a la vigilancia tecnológica. Gijón: Trea
5. CARRASCAL, L. M. (1997). La referencia bibliográfica como medida de 'utilidad científica'. EtoloGuía, 15: 17-30.
6. CONCHILLO JIMENEZ, A., RUIZ GALLEGO-LARGO, T. (1993). Escalamiento Multidimensional: Una metodología de análisis en el campo de los factores humanos. Boletín Digital FH, 2 [on linea]. [cited 01-05-24] http://www.tid.es/presencia/boletin/boletin2/art003.htm
7. FIGUEROLA, C. G., ALONSO, J. L., ZAZO, A. F. (1998). Nuevos puntos de vista en la Recuperación de Información en el Web. VI Jornadas Españolas de Documentación FESABID 98 [on line]. [cited 01-05-24] http://www.florida-uni.es/~fesabid98/Comunicaciones/c_g_figuerola/c_g_figuerola.htm
8. FINKENSTAEDT, T. (1990) Measuring Research Perfomance in the Humanities. Scientometrics, 19 (5-6): 409-417.
9. GARFIELD, E. (1994). Research fronts. Current Contents, 41: 3-7.
10. GARFIELD, E. (1998). Random Thoughts on Citationology, Its Theory and Practice. Scientometrics, 43(1): 69-76.
11. GARFIELD, E. (2001a). Mapping the Precursors of Modern Structural Biology. Institute for Scientific Information [on line]. [cited 01-05-25] http://www.isinet.com/isi/hot/essays/13.html
12. GARFIELD, E. (2001b). Scientography: Mapping the Tracks of Science. Institute for Scientific Information [on line]. [cited 01-06-14] http://www.isinet.com/isi/hot/essays/citationanalysis/12.html
13. GILBERT, N. G. (1977). References as Persuasion. Social Studies of Sciences, 7: 113-122.
14. GRAHAM, S. R.(2000). Historians and Electronic Resources: A Citation Analysis. JAHC [on line] 3(3) [cited 01-07-15] http://www.mcel.pacificu.edu/JAHC/JAHCIII3/WORKS/Graham.html
15. GRIFFITHS, A., LUCKHURST, H., WILLETT, P. (1986). Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science, 37(1): 3-11.
16. HAUFFE, H. (1994) Is Citation Analysis a Tool for Evaluation of Scientific Contribution In: 13th Winterworkshop on Biochemical and Clinical Aspects of Pteridines. 1994, Feb. 25, St.Christoph/Arlberg.
17. KRUSKAL, J. B. (1964). Nonmetric multidimensional scaling. Psychometrika. 29, 1-27, 115-129.
18. LAWRENCE, S., GILES, C. L., BOLLACKER, K. (1999) Digital Libraries and Autonomous Citation Indexing. IEEE Computer, 32(6): 67-71.
19. LEYDESDORFF, L. (1987). Various Method for the Mapping the Science. Scientometrics, 11: 295-324.
20. LEYDESDORFF, L., CURRAN, M. (2000) Mapping University-Industry-Government Relations on the Internet: the Construction of Indicators for a Knowledge-based Economy. Cybermetrics [on line]. issue 1, pr. 4 [cited 01-04-27] http://www.cindoc.csic.es/cybermetrics/articles/v4i1p2.html
21. LEYDESDORFF, L., ETZKOWITZ, H. (1998). The Triple Helix as a model for innovation studies. Science and Public Policy, 25(3): 195-203.
22. LEYDESDORFF, L., ZAAL, R. (1988) Co-Words and Citations: Relations between Document Sets and Environments. In: EGGHE, L. and ROUSSEAU, R. (editores). Informetrics 87/88. Amsterdam: Elsevier, 105-119.
23. LOPEZ-MARTINEZ, R. E. (2000). Mapas tecnológicos como indicadores de la estructura cognoscitiva de la investigación. In: ALMADA DE ASCENCIO, M., et. al. Contribución al desarrollo de la sociedad del conocimiento. México: Universidad Nacional Autónoma de México, 150-160.
24. Microsoft Excel 2000 [cd-rom]. Ver. [United States]: Microsoft Corporation, c 1999. Software
25. MOYA, F. De, JIMENEZ, E., MONEDA, M. De la. (1998). Research Fronts in Library and Information Science in Spain (1985-1994). Scientometrics, 42(2): 229-246
26. NOYONS, E. C. M., VAN RAAN, A. F. J. (1996). Bibliometric Mapping of Agricultural Research. CTWS Working Papers [on line]. [cited 01-03-09] http://sahara.fsw.leidenuniv.nl/ed/nrlo/nrlo00.html
27. NOYONS, E. C. M., VAN RAAN, A. F. J. (1998a). Mapping Scientometrics, Informetrics, and Bibliometrics. CWTS Working Papers [on line]. [cited 01-03-09] http://sahara.fsw.leidenuniv.nl/ed/sib/home.html
28. NOYONS, E. C. M., VAN RAAN, A. F. J. (1998b). Monitoring Scientific Developments from a Dynamic Perspective: Self-Organized Structuring to Map Neural Network Research. Journal of the American Society for Information Science, 49: 69-81.
29. NOYONS, E. C. M., MOED, H. F. y LUWEL, M. (1999). Combining Mapping and Citation Análisis for Evaluative Bibliometric Purposes: A Bibliometric Study. Journal of American Society for Information Science, 50(2): 115-132.
30. NOYONS, E. C. M. (1999). Bibliometrics Mapping as a Science Policy and Research Management Tool. Leiden: DSWO, Leiden University
31. PRICE, J. D. De S. (1965). Networks of scientific papers, Science, 149: 510-515.
32. RODRIGUEZ ALCALDE, A. et. al. (1993). Análisis bibliométrico de Trabajos de Prehistoria: un chequeo a la prehistoria española de las tres últimas décadas. Trabajos de Prehistoria, 50: 10-37
33. RORVIG, M. (1999). Images of Similarity: A visual exploration of Optimal Similarity Metrics and Scaling Properties of TREC Topic-Document Sets. Journal of the American Society for Information Science, 50(8): 639-651.
34. ROUSSEAU, R. (1997). Sitations: an exploratory study. Cybermetrics [on line] v.1, issue 1, p. 1 [cited 01-04-18] http://www.cindoc.csic.es/cybermetrics/articles/v1i1p1.html
35. RUBIO LINIERS, M. C., RUIZ FRANCO, M. R. (1994). La investigación histórica sobre el franquismo: un análisis bibliométrico de las revistas españolas (1976-1992). Revista Española de Documentación Científica, 17(4): 413-426
36. RUIZ FRANCO, M. R., RIESCO, S. (1999) Veinte años de producción histórica sobre la Guerra Civil Española (1975-1995): una aproximación bibliométrica. Revista Española de Documentación Científica, 22(2): 174-197
37. SALTON, G., MCGILL, M. (1981). Introduction to Modern Information Retrieval. New York: McGraw-Hill.
38. SMALL, H.(1973) Co-citation in the scientific literature: a new measure of the relationship between two documents, Journal of the American Society for Information Science, 24(4): 265-269.
39. TORGERSON, W. S. (1952). Multidimensional scaling: Theory and method. Psychometrika. 17, 401-419
40. Unlead PhotoImpact: Special Edition [cd-rom]. Ver. 3.02. [United States]: Unlead Systems, Inc., c1992-1997. Software
41. VAN RIJSBERGEN, C. (1989). Towards an Information Logic. Glasgow: University of Glasgow, Dept. of Computing Science. (Research Report CSC/89/R8)
42. WHITE, H. D., GRIFFITH, B. C. (1981) Author cocitation: a literature measure of intellectual structure. Journal of the American Society for Information Science, 32(3): 163-171.
43. YOUNG, F. W. (1985). Multidimensional Scaling. In KOTZ, S., JOHNSON, N. L. and READS, C. B. (ed.) Encyclopedia of Statistical Sciences. New York: John Wiley & Sons, vol.V
44. YOUNG, F. W. (1998). ViSta: The Visual Statistic System. [on line] Ver. 5.0.5EW [North Carolina]: UNC, c1991-1998. 3,87 Mb. Software http://forrest.psych.unc.edu/research