AI News and Guides

Explore the best AI News and Guides — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • Texture artist

    Texture artist

    A texture artist is an individual who develops textures for digital media, usually for video games, movies, web sites and television shows or things like 3D posters. These textures can be in the form of 2D or (rarely) 3D art that may be overlaid onto a polygon mesh to create a realistic 3D model. Texture artists often take advantage of web sites for the purposes of marketing their art and self-promotion of their skills with the goal of gaining employment from a professional game studio or to join a team working on a "mod" (modification) of an existing game in hopes of establishing industry or trade credentials.

    Read more →
  • BRS/Search

    BRS/Search

    BRS/Search is a full-text database and information retrieval system. BRS/Search uses a fully inverted indexing system to store, locate, and retrieve unstructured data. It was the search engine that in 1977 powered Bibliographic Retrieval Services (BRS) commercial operations with 20 databases (including the first national commercial availability of MEDLINE); it has changed ownership several times during its development and is currently sold as Livelink ECM Discovery Server by Open Text Corporation. == Early development == Development on what was to become BRS began as Biomedical Communications Network (BCN) at the State University of New York at Albany (SUNY). BCN, which went online in 1968, provided on-line access to nine databases, including MEDLINE and BIOSIS Previews, to large universities and medical schools primarily in the Northeast of the USA. State funding for the project was withdrawn in 1975, and Bibliographic Retrieval Services (BRS) was formed as a non-profit concern the following year. It was incorporated in May 1976 as a for-profit corporation with Ron Quake as president, Jan Egeland as vice president in charge of marketing and training, and Lloyd Palmer as vice president of systems. == BRS commercial operations == In December 1976, the First BRS User Meeting was held in Syracuse, New York, and by January 1977 BRS started commercial operations with 20 databases (including the first national commercial availability of MEDLINE) and 9 million records, using modified IBM STAIRS (STorage And Information Retrieval System) software, Telenet for telecommunications, and timesharing mainframe computers of Carrier Corporation. In October 1980 BRS was sold by Egeland and Quake to Indian Head, Inc., a subsidiary of the Dutch company Thyssen-Bornemisza Group. == 1989–1993 == In 1989 Robert Maxwell acquired BRS and the BRS/Search software; he announced the planned incorporation of the ORBIT Search Service and BRS Information Technologies and renamed the whole group Maxwell Online, Inc. At that time BRS Information Technologies was serving the medical and academic library marketplace with over 150 databases. Maxwell later bought the publishing company Macmillan and put Maxwell Online under Macmillan. In the same year BRS/LINK (hypertext connection of databases; first application delivering full text) was announced. The initial BRS/LINK application "relates the citation in a bibliographic database to its full-text article in a second database," and "eliminates the need to re-execute a search strategy in the second database in order to find the corresponding full-text article." Initially BRS/LINK supported linking only selected bibliographic databases: MEDLINE, Health Planning and Administration, and MEDLINE References on AIDS to the full-text Comprehensive Core Medical Library. At the time of Robert Maxwell’s death in 1991, Macmillan brought in Andrew Gregory to represent the company during the 2 years that Maxwell’s affairs were being settled and to prepare Maxwell Online to be able to sell the components. Maxwell Online shortly thereafter underwent yet another name change, this time to InfoPro Technologies. == Dataware Technologies ownership of BRS/SEARCH == Early in 1994, InfoPro Technologies, a subsidiary of MHC Inc. (holding company for Macmillan Inc.), the former Maxwell Online service, sold off all its subsidiaries. ORBIT Search Services went to the French-owned Questel, the dial-up BRS Search Services to CD Plus Technologies (later to become OVID), and BRS Software Products (including BRS/SEARCH) to Dataware Technologies. Almost up to the end of InfoPro Technologies, BRS Software had been the fastest growing segment of the company. At the 14th BRS North American Users Group Conference in 1999, Dave Schubmehl of Dataware Technologies presented a paper in which he stated "The purpose of this presentation is to update BRS users on upcoming releases of BRS/Search, NetAnswer, and other Dataware products. BRS/Search 7.0 will include features specifically requested by customers, as well as other enhancements. Earlier this year, Dataware acquired Sovereign Hill Software, makers of InQuery. In light of that acquisition, and Dataware's other development projects, we'll look at Dataware's plans for all products, including BRS/Search and NetAnswer." == Open Text acquisition of BRS/Search == In 2001 BRS/Search was acquired by Open Text and became LiveLink ECM Discovery Server. It is now referred to as Open Text Discovery Server. Open Text still supports both BRS/Search and NetAnswer. The core BRS/Search technology in the Open Text portfolio was augmented with other capabilities through various acquisitions. For example, Dataware's acquisition of Sovereign-Hill brought InQuery, “a probabilistic information retrieval system using an inference network”, which was developed by the University of Massachusetts Amherst Center for Intelligent Information Retrieval] out of the UMass CIIR and into the marketplace. A product re-branding table shows the range of products, their old names and their new names. InQuery is a concept search engine that uses noun phrases, parts of speech and other co-occurrence relationships in overlapping passages of text rather than single term inverted indexes of single words in documents. Open Text's portfolio has grown to include Hummingbird Content Management, and has always included BASIS. == 2003 == BRS/Search North America User's Group (BRSNAUG) website with a June 8, 2003 date listed the following features for BRS/Search. The BRSNAUG also disincorporated in 2003. Cross-references to BRS/Search on the World Wide Web point to Open Text Livelink. Engine features include: Rapid query response time. Numerical data handling and elementary statistical processing (sum, avg, min, max) Search results weighting and relevancy ranking Left- and right-truncation and expansion of search terms Superior data compression – loaded databases typically use only about 1.5 times the input stream size in disk space Large capacity databases – up to 100 million documents, each with up to 65,000 paragraphs Fine control of indexing and searching – right down to the word, sentence, and paragraph level Fine control over data security. Document access can be controlled at the database, document, and paragraph level International language support for all 7/8 bit characters sets and customizable language tables Flexible and customizable stop word lists ANSI-compatible thesauri Hypertext links within and between documents and databases (R6.x) Support for natural language parsing of queries Automatic document summarization tools Client/Server development Programming interfaces for World-Wide Web (HTTP, HTML) access to databases

    Read more →
  • Semantic translation

    Semantic translation

    Semantic translation is the process of using semantic information to aid in the translation of data in one representation or data model to another representation or data model. Semantic translation takes advantage of semantics that associate meaning with individual data elements in one dictionary to create an equivalent meaning in a second system. An example of semantic translation is the conversion of XML data from one data model to a second data model using formal ontologies for each system such as the Web Ontology Language (OWL). This is frequently required by intelligent agents that wish to perform searches on remote computer systems that use different data models to store their data elements. The process of allowing a single user to search multiple systems with a single search request is also known as federated search. Semantic translation should be differentiated from data mapping tools that do simple one-to-one translation of data from one system to another without actually associating meaning with each data element. Semantic translation requires that data elements in the source and destination systems have "semantic mappings" to a central registry or registries of data elements. The simplest mapping is of course where there is equivalence. There are three types of Semantic equivalence: Class Equivalence - indicating that class or "concepts" are equivalent. For example: "Person" is the same as "Individual" Property Equivalence - indicating that two properties are equivalent. For example: "PersonGivenName" is the same as "FirstName" Instance Equivalence - indicating that two individual instances of objects are equivalent. For example: "Dan Smith" is the same person as "Daniel Smith" Semantic translation is very difficult if the terms in a particular data model do not have direct one-to-one mappings to data elements in a foreign data model. In that situation, an alternative approach must be used to find mappings from the original data to the foreign data elements. This problem can be alleviated by centralized metadata registries that use the ISO-11179 standards such as the National Information Exchange Model (NIEM).

    Read more →
  • International Philosophical Bibliography

    International Philosophical Bibliography

    The International Philosophical Bibliography (IPB), also known in French as Répertoire bibliographique de la philosophie (RBP), is a bibliographic database covering publications on the history of philosophy and continental philosophy. The database comprises records of publications in over 30 languages. Annually, about 12,000 records are added. The indexes include, among other elements, over 84,000 names of authors, editors, translators, reviewers, and collaborators, as well as more than 3,000 commentaries on philosophical works, making it the world's most complete index in Philosophy. Since 1934, the IPB has been developed by the Higher Institute of Philosophy at the University of Louvain (UCLouvain), first in Leuven and since 1978 in Louvain-la-Neuve. The online version was launched by Peeters Publishers in 1997 and continues to be updated quarterly.

    Read more →
  • Ampere Computing

    Ampere Computing

    Ampere Computing LLC is an American fabless semiconductor company that designs ARM-based central processing units (CPUs) with high core counts for use in cloud computing and data center environments. Founded in 2017 by former Intel president Renée James, the company is headquartered in Santa Clara, California, and operates as an independent subsidiary of SoftBank Group since November 2025. == History == Ampere Computing was founded in fall 2017 by Renée James, ex-President of Intel, with funding from The Carlyle Group. James acquired a team from MACOM Technology Solutions (formerly AppliedMicro) in addition to several industry hires to start the company. Ampere Computing is an ARM architecture licensee and develops its own server microprocessors. Ampere fabricates its products at TSMC. In April 2019, Ampere announced its second major investment round, including investment from Arm Holdings and Oracle Corporation. In June 2019, Nvidia announced a partnership with Ampere to bring support for Compute Unified Device Architecture (CUDA). In November 2019, Nvidia announced a reference design platform for graphics processing unit (GPU)-accelerated ARM-based servers including Ampere. In the first half of 2020, Ampere announced Ampere Altra, an 80-core processor, and Ampere Altra Max, a 128-core processor, without the use of simultaneous multithreading. In March 2020, the company announced a partnership with Oracle. In September 2020, Oracle said it would launch bare-metal and virtual machine instances in early 2021 based on Ampere Altra. In November 2020, Ampere was named one of the top 10 hottest semiconductor startups by CRN. In May 2021, the company announced a partnership with Microsoft. In April 2022, Ampere said that it had filed a confidential prospectus with the U.S. Securities and Exchange Commission, signaling its intent to go public. In June 2022, HPE announced their Gen11 ProLiant system would use Ampere Altra and Ampere Altra Max Cloud Native Processors. In July 2022, Google announced T2A instances using Ampere Altra in the Google cloud and in August 2022 Microsoft announced their instances of Ampere running in Azure. On March 19, 2025, investment holding company SoftBank Group announced it will acquire Ampere Computing for $6.5 billion. The deal finalized in November 2025, with Ampere remaining as an independent subsidiary with its headquarters in Santa Clara, California. == Products == Ampere develops ARM-based computer processors and CPU cores under their Altra brands. These are used in databases, media encoding, web services, network acceleration, mobile gaming, AI inference processing, and other applications and programs that need to scale. On February 5, 2018, Ampere announced the eMAG 8180 featuring 32x Skylark cores fabricated on TSMC's 16FF+ process. It supports a turbo of up to 3.3 GHz with a TDP of 125 W, 8ch 64-bit DDR4, up to 1 TB DDR4 per socket, and 42x PCIe 3.0 Lanes. The Skylark cores were based on AppliedMicro's X-Gene 3. Packet offers servers with the eMAG 8180 and 128 GB DRAM, 480 GB SSD, and 2x 10 Gbit/s networking. On September 19, 2018, Ampere announced the availability of a version featuring 16x Skylark cores. === 2020 === On March 3, 2020, Ampere announced the Ampere Altra featuring 80 cores fabricated on TSMC's N7 process for hyperscale computing. It was the first server-grade processor to include 80 cores and the Q80-30 conserves power by running at 161 W in use. The cores are semi-custom Arm Neoverse N1 cores with Ampere modifications. It supports a frequency of up to 3.3 GHz with TDP of 250 W, 8ch 72-bit DDR4, up to 4 TB DDR4-3200 per socket, 128x PCIe 4.0 Lanes, 1 MB L2 per core and 32 MB SLC. Ampere also announced their roadmap with Ampere Altra Max (2021) in development and AmpereOne (2022) defined. === 2021 === The 128-core Altra Max was released in 2021 and targeted hyperscale cloud providers. It uses the same server socket and platforms as Ampere Altra, and both products have one thread per core. The Altra Max CPUs provide 128 Arm v8.2+ cores per chip and run up to 3.0 GHz. They also support eight channels of DDR4-3200 memory and 128 lanes of PCIe Gen4. Also in 2021, Oracle launched its Oracle Cloud Infrastructure (OCI) using Ampere Altra processors. === 2022 === In February 2022, Ampere and Rigetti Computing announced a strategic partnership to create hybrid quantum-classical computers. The companies will combine Ampere's Altra Max CPUs with Rigetti's Quantum Processing Units (QPU) in cloud-based High-Performance Computing (HPC) environments. In April, Microsoft previewed its Azure Virtual Machines running on the Ampere Altra. The VMs run scale-out workloads, web servers, application servers, open source databases, cloud native .NET applications, Java applications, gaming servers, media servers, and other processes. In May, Ampere announced the sampling of AmpereOne CPUs, 5 nanometer chips based on its in-house Ampere-developed core. AmpereOne will add support for DDR5 main memory and PCIe Gen5 peripherals. On June 28, 2022, HPE became first tier-one server provider to offer compute with optimized cloud-native silicon for service providers and enterprises embracing cloud-native development with new line of HPE ProLiant RL Gen11 servers, using Ampere® Altra® and Ampere® Altra® Max processors, delivering high performance and power efficiency. === 2023 === During April 2023, Ampere released the Altra developer's kit, an IoT Prototype Kit based on Ampere Altra, aimed at cloud developers, available in 32-core, 64-core, and 80-core formats. === 2024 === In May 2024, Ampere updated its AmpereOne roadmap to 256 cores and announced a joint effort with Qualcomm on CPUs and accelerators. == Customers == Ampere's customers include Microsoft Azure, Tencent Cloud, Oracle, ByteDance, Hewlett Packard Enterprise (HPE), Cloudflare, Equinix, Kingsoft Cloud, Meituan, Scaleway, UCloud, Foxconn Industrial Internet, Gigabyte, Inspur, Cruise, Hetzner, Project Ronin, Wiwynn and Google Cloud Platform Cruise uses an Ampere Altra variant for its autonomous driving unit. The CPU was selected because of its throughput and low power consumption. In 2021, Oracle, Microsoft, Tencent, and ByteDance committed to using Ampere's customized chips, first announced in May. In April 2022, Microsoft previewed Ampere Altra processors in its new Azure D-and E- series virtual machines. The Dpsv5 series is built for Linux enterprise application types, and the Epsv5 series is for memory-intensive Linux workloads. They provide up to 64 vCPUs, include VM sizes with 2GiB, 4GiB, and 8GiB per vCPU memory configurations, up to 40 Gbit/s networking, and high-performance local SSD storage. In 2022, Microsoft's Ampere Altra-based Azure servers became the first cloud solution provider server to be Arm SystemReady SR certified. The Azure VMs, powered by Altra processors, were also the first to be SystemReady Virtual Environment standard certified. SystemReady defines a set of firmware and hardware standards as a baseline for system development for software developers, original equipment vendors, and chipmakers.

    Read more →
  • Vinberg's algorithm

    Vinberg's algorithm

    In mathematics, Vinberg's algorithm is an algorithm, introduced by Ernest Borisovich Vinberg, for finding a fundamental domain of a hyperbolic reflection group. Conway (1983) used Vinberg's algorithm to describe the automorphism group of the 26-dimensional even unimodular Lorentzian lattice II25,1 in terms of the Leech lattice. == Description of the algorithm == Let Γ < I s o m ( H n ) {\displaystyle \Gamma <\mathrm {Isom} (\mathbb {H} ^{n})} be a hyperbolic reflection group. Choose any point v 0 ∈ H n {\displaystyle v_{0}\in \mathbb {H} ^{n}} ; we shall call it the basic (or initial) point. The fundamental domain P 0 {\displaystyle P_{0}} of its stabilizer Γ v 0 {\displaystyle \Gamma _{v_{0}}} is a polyhedral cone in H n {\displaystyle \mathbb {H} ^{n}} . Let H 1 , . . . , H m {\displaystyle H_{1},...,H_{m}} be the faces of this cone, and let a 1 , . . . , a m {\displaystyle a_{1},...,a_{m}} be outer normal vectors to it. Consider the half-spaces H k − = { x ∈ R n , 1 | ( x , a k ) ≤ 0 } . {\displaystyle H_{k}^{-}=\{x\in \mathbb {R} ^{n,1}|(x,a_{k})\leq 0\}.} There exists a unique fundamental polyhedron P {\displaystyle P} of Γ {\displaystyle \Gamma } contained in P 0 {\displaystyle P_{0}} and containing the point v 0 {\displaystyle v_{0}} . Its faces containing v 0 {\displaystyle v_{0}} are formed by faces H 1 , . . . , H m {\displaystyle H_{1},...,H_{m}} of the cone P 0 {\displaystyle P_{0}} . The other faces H m + 1 , . . . {\displaystyle H_{m+1},...} and the corresponding outward normals a m + 1 , . . . {\displaystyle a_{m+1},...} are constructed by induction. Namely, for H j {\displaystyle H_{j}} we take a mirror such that the root a j {\displaystyle a_{j}} orthogonal to it satisfies the conditions (1) ( v 0 , a j ) < 0 {\displaystyle (v_{0},a_{j})<0} ; (2) ( a i , a j ) ≤ 0 {\displaystyle (a_{i},a_{j})\leq 0} for all i < j {\displaystyle i Read more →

  • Information scientist

    Information scientist

    The term information scientist developed in the latter part of the twentieth century by Wm. Hovey Smith to describe an individual, usually with a relevant subject degree (such as one in Information and Computer Science - CIS) or high level of subject knowledge, providing focused information to scientific and technical research staff in industry. It is a role quite distinct from and complementary to that of a librarian. Developments in end-user searching, together with some convergence between the roles of librarian and information scientist, have led to a diminution in its use in this context, and the term information officer or information professional (information specialist) are also now used. The term was, and is, also used for an individual carrying out research in information science. Brian C. Vickery mentions that the Institute of Information Scientists (IIS) was established in London during 1958 and lists the criteria put forward by this institute "Criteria for Information Science" (appendix 1) as well as his own "Areas of study in information science" (appendix 2). The IIS merged with the Library Association in 2002 to form the Chartered Institute of Library and Information Professionals (CILIP). == Notable Information Scientists == See also Award of Merit - Association for Information Science and Technology Marcia Bates David Blair (information technologist) Samuel C. Bradford Michael Buckland John M. Carroll Blaise Cronin Emilia Currás Brenda Dervin Eugene Garfield Paul B. Kantor Frederick Wilfrid Lancaster Calvin Mooers Tefko Saracevic Linda C. Smith Robert Saxton Taylor Brian Campbell Vickery Thomas D. Wilson == Additional reading == Ellis, David and Merete Haugan. (1997) "Modelling the information seeking patterns of engineers and research scientists in an industrial environment" (Journal of Documentation, Volume 53(4): pp. 384–403) Poole, Alex H. (2024). "'There's a big difference between going through life with the wind at your back, and going through life leaning into the wind': Feminism in Post-World War II Information Science". Proceedings of the Association for Information Science and Technology. 61: 300–313. doi:10.1002/pra2.1029. Vickery, Brian Campbell (1988) "Essays presented to B. C. Vickery" (Journal of Documentation, Volume 44, pp. 199–283). Vickery, B. & Vickery, A. (1987) Information Science in theory and practice (London: Bowker-Saur, pp. 361–369)

    Read more →
  • In-place algorithm

    In-place algorithm

    In computer science, an in-place algorithm is an algorithm that operates directly on the input data structure without requiring extra space proportional to the input size. In other words, it modifies the input in place, without creating a separate copy of the data structure. An algorithm which is not in-place is sometimes called not-in-place or out-of-place. In-place can have slightly different meanings. In its strictest form, the algorithm can only have a constant amount of extra space, counting everything including function calls and pointers. However, this form is very limited as simply having an index to a length n array requires O(log n) bits. More broadly, in-place means that the algorithm does not use extra space for manipulating the input but may require a small though non-constant extra space for its operation. Usually, this space is O(log n), though sometimes anything in o(n) is allowed. Note that space complexity also has varied choices in whether or not to count the index lengths as part of the space used. Often, the space complexity is given in terms of the number of indices or pointers needed, ignoring their length. In this article, we refer to total space complexity (DSPACE), counting pointer lengths. Therefore, the space requirements here have an extra log n factor compared to an analysis that ignores the lengths of indices and pointers. An algorithm may or may not count the output as part of its space usage. Since in-place algorithms usually overwrite their input with output, no additional space is needed. When writing the output to write-only memory or a stream, it may be more appropriate to only consider the working space of the algorithm. In theoretical applications such as log-space reductions, it is more typical to always ignore output space (in these cases it is more essential that the output is write-only). == Examples == Given an array a of n items, suppose we want an array that holds the same elements in reversed order and to dispose of the original. One seemingly simple way to do this is to create a new array of equal size, fill it with copies from a in the appropriate order and then delete a. function reverse(a[0..n - 1]) allocate b[0..n - 1] for i from 0 to n - 1 b[n − 1 − i] := a[i] return b Unfortunately, this requires O(n) extra space for having the arrays a and b available simultaneously. Also, allocation and deallocation are often slow operations. Since we no longer need a, we can instead overwrite it with its own reversal using this in-place algorithm which will only need constant number (2) of integers for the auxiliary variables i and tmp, no matter how large the array is. function reverse_in_place(a[0..n-1]) for i from 0 to floor((n-2)/2) tmp := a[i] a[i] := a[n − 1 − i] a[n − 1 − i] := tmp As another example, many sorting algorithms rearrange arrays into sorted order in-place, including: bubble sort, comb sort, selection sort, insertion sort, heapsort, and Shell sort. These algorithms require only a few pointers, so their space complexity is O(log n). Quicksort operates in-place on the data to be sorted. However, quicksort requires O(log n) stack space pointers to keep track of the subarrays in its divide and conquer strategy. Consequently, quicksort needs O(log2 n) additional space. Although this non-constant space technically takes quicksort out of the in-place category, quicksort and other algorithms needing only O(log n) additional pointers are usually considered in-place algorithms. Most selection algorithms are also in-place, although some considerably rearrange the input array in the process of finding the final, constant-sized result. Some text manipulation algorithms such as trim and reverse may be done in-place. == In computational complexity == In computational complexity theory, the strict definition of in-place algorithms includes all algorithms with O(1) space complexity, the class DSPACE(1). This class is very limited; it equals the regular languages. In fact, it does not even include any of the examples listed above. Algorithms are usually considered in L, the class of problems requiring O(log n) additional space, to be in-place. This class is more in line with the practical definition, as it allows numbers of size n as pointers or indices. This expanded definition still excludes quicksort, however, because of its recursive calls. Identifying the in-place algorithms with L has some interesting implications; for example, it means that there is a (rather complex) in-place algorithm to determine whether a path exists between two nodes in an undirected graph, a problem that requires O(n) extra space using typical algorithms such as depth-first search (a visited bit for each node). This in turn yields in-place algorithms for problems such as determining if a graph is bipartite or testing whether two graphs have the same number of connected components. == Role of randomness == In many cases, the space requirements of an algorithm can be drastically cut by using a randomized algorithm. For example, if one wishes to know if two vertices in a graph of n vertices are in the same connected component of the graph, there is no known simple, deterministic, in-place algorithm to determine this. However, if we simply start at one vertex and perform a random walk of about 20n3 steps, the chance that we will stumble across the other vertex provided that it is in the same component is very high. Similarly, there are simple randomized in-place algorithms for primality testing such as the Miller–Rabin primality test, and there are also simple in-place randomized factoring algorithms such as Pollard's rho algorithm. == In functional programming == Functional programming languages often discourage or do not support explicit in-place algorithms that overwrite data, since this is a type of side effect; instead, they only allow new data to be constructed. However, good functional language compilers will often recognize when an object very similar to an existing one is created and then the old one is thrown away, and will optimize this into a simple mutation "under the hood". Note that it is possible in principle to carefully construct in-place algorithms that do not modify data (unless the data is no longer being used), but this is rarely done in practice.

    Read more →
  • Automated essay scoring

    Automated essay scoring

    Automated essay scoring (AES) is the use of specialized computer programs to assign grades to essays written in an educational setting. It is a form of educational assessment and an application of natural language processing. Its objective is to classify a large set of textual entities into a small number of discrete categories, corresponding to the possible grades, for example, the numbers 1 to 6. Therefore, it can be considered a problem of statistical classification. Several factors have contributed to a growing interest in AES. Among them are cost, accountability, standards, and technology. Rising education costs have led to pressure to hold the educational system accountable for results by imposing standards. The advance of information technology promises to measure educational achievement at reduced cost. The use of AES for high-stakes testing in education has generated significant backlash, with opponents pointing to research that computers cannot yet grade writing accurately and arguing that their use for such purposes promotes teaching writing in reductive ways (i.e. teaching to the test). == History == Most historical summaries of AES trace the origins of the field to the work of Ellis Batten Page. In 1966, he argued for the possibility of scoring essays by computer, and in 1968 he published his successful work with a program called Project Essay Grade (PEG). Using the technology of that time, computerized essay scoring would not have been cost-effective, so Page abated his efforts for about two decades. Eventually, Page sold PEG to Measurement Incorporated. By 1990, desktop computers had become so powerful and so widespread that AES was a practical possibility. As early as 1982, a UNIX program called Writer's Workbench was able to offer punctuation, spelling and grammar advice. In collaboration with several companies (notably Educational Testing Service), Page updated PEG and ran some successful trials in the early 1990s. Peter Foltz and Thomas Landauer developed a system using a scoring engine called the Intelligent Essay Assessor (IEA). IEA was first used to score essays in 1997 for their undergraduate courses. It is now a product from Pearson Educational Technologies and used for scoring within a number of commercial products and state and national exams. IntelliMetric is Vantage Learning's AES engine. Its development began in 1996. It was first used commercially to score essays in 1998. Educational Testing Service offers "e-rater", an automated essay scoring program. It was first used commercially in February 1999. Jill Burstein was the team leader in its development. ETS's Criterion Online Writing Evaluation Service uses the e-rater engine to provide both scores and targeted feedback. Lawrence Rudner has done some work with Bayesian scoring, and developed a system called BETSY (Bayesian Essay Test Scoring sYstem). Some of his results have been published in print or online, but no commercial system incorporates BETSY as yet. Under the leadership of Howard Mitzel and Sue Lottridge, Pacific Metrics developed a constructed response automated scoring engine, CRASE. Currently utilized by several state departments of education and in a U.S. Department of Education-funded Enhanced Assessment Grant, Pacific Metrics’ technology has been used in large-scale formative and summative assessment environments since 2007. Measurement Inc. acquired the rights to PEG in 2002 and has continued to develop it. In 2012, the Hewlett Foundation sponsored a competition on Kaggle called the Automated Student Assessment Prize (ASAP). 201 challenge participants attempted to predict, using AES, the scores that human raters would give to thousands of essays written to eight different prompts. The intent was to demonstrate that AES can be as reliable as human raters, or more so. The competition also hosted a separate demonstration among nine AES vendors on a subset of the ASAP data. Although the investigators reported that the automated essay scoring was as reliable as human scoring, this claim was not substantiated by any statistical tests because some of the vendors required that no such tests be performed as a precondition for their participation. Moreover, the claim that the Hewlett Study demonstrated that AES can be as reliable as human raters has since been strongly contested, including by Randy E. Bennett, the Norman O. Frederiksen Chair in Assessment Innovation at the Educational Testing Service. Some of the major criticisms of the study have been that five of the eight datasets consisted of paragraphs rather than essays, four of the eight data sets were graded by human readers for content only rather than for writing ability, and that rather than measuring human readers and the AES machines against the "true score", the average of the two readers' scores, the study employed an artificial construct, the "resolved score", which in four datasets consisted of the higher of the two human scores if there was a disagreement. This last practice, in particular, gave the machines an unfair advantage by allowing them to round up for these datasets. In 1966, Page hypothesized that, in the future, the computer-based judge will be better correlated with each human judge than the other human judges are. Despite criticizing the applicability of this approach to essay marking in general, this hypothesis was supported for marking free text answers to short questions, such as those typical of the British GCSE system. Results of supervised learning demonstrate that the automatic systems perform well when marking by different human teachers is in good agreement. Unsupervised clustering of answers showed that excellent papers and weak papers formed well-defined clusters, and the automated marking rule for these clusters worked well, whereas marks given by human teachers for the third cluster ('mixed') can be controversial, and the reliability of any assessment of works from the 'mixed' cluster can often be questioned (both human and computer-based). == Different dimensions of essay quality == According to a recent survey, modern AES systems try to score different dimensions of an essay's quality in order to provide feedback to users. These dimensions include the following items: Grammaticality: following grammar rules Usage: using of prepositions, word usage Mechanics: following rules for spelling, punctuation, capitalization Style: word choice, sentence structure variety Relevance: how relevant of the content to the prompt Organization: how well the essay is structured Development: development of ideas with examples Cohesion: appropriate use of transition phrases Coherence: appropriate transitions between ideas Thesis Clarity: clarity of the thesis Persuasiveness: convincingness of the major argument == Procedure == From the beginning, the basic procedure for AES has been to start with a training set of essays that have been carefully hand-scored. The program evaluates surface features of the text of each essay, such as the total number of words, the number of subordinate clauses, or the ratio of uppercase to lowercase letters—quantities that can be measured without any human insight. It then constructs a mathematical model that relates these quantities to the scores that the essays received. The same model is then applied to calculate scores of new essays. Recently, one such mathematical model was created by Isaac Persing and Vincent Ng. which not only evaluates essays on the above features, but also on their argument strength. It evaluates various features of the essay, such as the agreement level of the author and reasons for the same, adherence to the prompt's topic, locations of argument components (major claim, claim, premise), errors in the arguments, cohesion in the arguments among various other features. In contrast to the other models mentioned above, this model is closer in duplicating human insight while grading essays. Due to the growing popularity of deep neural networks, deep learning approaches have been adopted for automated essay scoring, generally obtaining superior results, often surpassing inter-human agreement levels. The various AES programs differ in what specific surface features they measure, how many essays are required in the training set, and most significantly in the mathematical modeling technique. Early attempts used linear regression. Modern systems may use linear regression or other machine learning techniques often in combination with other statistical techniques such as latent semantic analysis and Bayesian inference. The automated essay scoring task has also been studied in the cross-domain setting using machine learning models, where the models are trained on essays written for one prompt (topic) and tested on essays written for another prompt. Successful approaches in the cross-domain scenario are based on deep neural networks or models that combine deep and shallow features. == Criteria for success == Any method of a

    Read more →
  • Microsoft SQL Server Master Data Services

    Microsoft SQL Server Master Data Services

    Microsoft SQL Server Master Data Services (MDS) is a Master Data Management (MDM) product from Microsoft that ships as a part of the Microsoft SQL Server relational database management system. Master data management (MDM) allows an organization to discover and define non-transactional lists of data, and compile maintainable, reliable master lists. Master Data Services first shipped with Microsoft SQL Server 2008 R2. Microsoft SQL Server 2016 introduced enhancements to Master Data Services, such as improved performance and security, and the ability to clear transaction logs, create custom indexes, share entity data between different models, and support for many-to-many relationships. == Overview == In Master Data Services, the model is the highest level container in the structure of your master data. You create a model to manage groups of similar data. A model contains one or more entities, and entities contain members that are the data records. An entity is similar to a table. Like other MDM products, Master Data Services aims to create a centralized data source and keep it synchronized, and thus reduce redundancies, across the applications which process the data. Sharing the architectural core with Stratature +EDM, Master Data Services uses a Microsoft SQL Server database as the physical data store. It is a part of the Master Data Hub, which uses the database to store and manage data entities. It is a database with the software to validate and manage the data, and keep it synchronized with the systems that use the data. The master data hub has to extract the data from the source system, validate, sanitize and shape the data, remove duplicates, and update the hub repositories, as well as synchronize the external sources. The entity schemas, attributes, data hierarchies, validation rules and access control information are specified as metadata to the Master Data Services runtime. Master Data Services does not impose any limitation on the data model. Master Data Services also allows custom Business rules, used for validating and sanitizing the data entering the data hub, to be defined, which is then run against the data matching the specified criteria. All changes made to the data are validated against the rules, and a log of the transaction is stored persistently. Violations are logged separately, and optionally the owner is notified, automatically. All the data entities can be versioned. Master Data Services allows the master data to be categorized by hierarchical relationships, such as employee data are a subtype of organization data. Hierarchies are generated by relating data attributes. Data can be automatically categorized using rules, and the categories are introspected programmatically. Master Data Services can also expose the data as Microsoft SQL Server views, which can be pulled by any SQL-compatible client. It uses a role-based access control system to restrict access to the data. The views are generated dynamically, so they contain the latest data entities in the master hub. It can also push out the data by writing to some external journals. Master Data Services also includes a web-based UI for viewing and managing the data. It uses ASP.NET in the back-end. The Silverlight front-end was replaced with HTML5 in SQL Server 2019. Master Data Services provides a Web service interface to expose the data, as well as an API, which internally uses the exposed web services, exposing the feature set, programmatically, to access and manipulate the data. It also integrates with Active Directory for authentication purposes. Unlike +EDM, Master Data Services supports Unicode characters, as well as support multilingual user interfaces. SQL Server 2016 introduced a significant performance increase in Master Data Services over previous versions. == Terminology == Model is the highest level of an MDS instance. It is the primary container for specific groupings of master data. In many ways it is very similar to the idea of a database. Entities are containers created within a model. Entities provide a home for members, and are in many ways analogous to database tables. (e.g. Customer) Members are analogous to the records in a database table (Entity) e.g. Will Smith. Members are contained within entities. Each member is made up of two or more attributes. Attributes are analogous to the columns within a database table (Entity) e.g. Surname. Attributes exist within entities and help describe members (the records within the table). Name and Code attributes are created by default for each entity and serve to describe and uniquely identify leaf members. Attributes can be related to other attributes from other entities which are called 'domain-based' attributes. This is similar to the concept of a foreign key. Other attributes however, will be of type 'free-form' (most common) or 'file'. Attribute Groups are explicitly defined collections of particular attributes. Say you have an entity "customer" that has 50 attributes — too much information for many of your users. Attribute groups enable the creation of custom sets of hand-picked attributes that are relevant for specific audiences. (e.g. "customer - delivery details" that would include just their name and last known delivery address). This is very similar to a database view. Hierarchies organize members into either Derived or Explicit hierarchical structures. Derived hierarchies, as the name suggests, are derived by the MDS engine based on the relationships that exist between attributes. Explicit hierarchies are created by hand using both leaf and consolidated members. Business Rules can be created and applied against model data to ensure that custom business logic is adhered to. In order to be committed into the system data must pass all business rule validations applied to them. e.g. Within the Customer Entity you may want to create a business rule that ensures all members of the 'Country' Attribute contain either the text "USA" or "Canada". The Business Rule once created and ran will then verify all the data is correct before it accepts it into the approved model. Versions provide system owners / administrators with the ability to Open, Lock or Commit a particular version of a model and the data contained within it at a particular point in time. As the content within a model varies, grows or shrinks over time versions provide a way of managing metadata so that subscribing systems can access to the correct content.

    Read more →
  • Birkhoff algorithm

    Birkhoff algorithm

    Birkhoff's algorithm (also called Birkhoff-von-Neumann algorithm) is an algorithm for decomposing a bistochastic matrix into a convex combination of permutation matrices. It was published by Garrett Birkhoff in 1946. It has many applications. One such application is for the problem of fair random assignment: given a randomized allocation of items, Birkhoff's algorithm can decompose it into a lottery on deterministic allocations. == Terminology == A bistochastic matrix (also called: doubly-stochastic) is a matrix in which all elements are greater than or equal to 0 and the sum of the elements in each row and column equals 1. An example is the following 3-by-3 matrix: ( 0.2 0.3 0.5 0.6 0.2 0.2 0.2 0.5 0.3 ) {\displaystyle {\begin{pmatrix}0.2&0.3&0.5\\0.6&0.2&0.2\\0.2&0.5&0.3\end{pmatrix}}} A permutation matrix is a special case of a bistochastic matrix, in which each element is either 0 or 1 (so there is exactly one "1" in each row and each column). An example is the following 3-by-3 matrix: ( 0 1 0 0 0 1 1 0 0 ) {\displaystyle {\begin{pmatrix}0&1&0\\0&0&1\\1&0&0\end{pmatrix}}} A Birkhoff decomposition (also called: Birkhoff-von-Neumann decomposition) of a bistochastic matrix is a presentation of it as a sum of permutation matrices with non-negative weights. For example, the above matrix can be presented as the following sum: 0.2 ( 0 1 0 0 0 1 1 0 0 ) + 0.2 ( 1 0 0 0 1 0 0 0 1 ) + 0.1 ( 0 1 0 1 0 0 0 0 1 ) + 0.5 ( 0 0 1 1 0 0 0 1 0 ) {\displaystyle 0.2{\begin{pmatrix}0&1&0\\0&0&1\\1&0&0\end{pmatrix}}+0.2{\begin{pmatrix}1&0&0\\0&1&0\\0&0&1\end{pmatrix}}+0.1{\begin{pmatrix}0&1&0\\1&0&0\\0&0&1\end{pmatrix}}+0.5{\begin{pmatrix}0&0&1\\1&0&0\\0&1&0\end{pmatrix}}} Birkhoff's algorithm receives as input a bistochastic matrix and returns as output a Birkhoff decomposition. == Tools == A permutation set of an n-by-n matrix X is a set of n entries of X containing exactly one entry from each row and from each column. A theorem by Dénes Kőnig says that: Every bistochastic matrix has a permutation-set in which all entries are positive.The positivity graph of an n-by-n matrix X is a bipartite graph with 2n vertices, in which the vertices on one side are n rows and the vertices on the other side are the n columns, and there is an edge between a row and a column if the entry at that row and column is positive. A permutation set with positive entries is equivalent to a perfect matching in the positivity graph. A perfect matching in a bipartite graph can be found in polynomial time, e.g. using any algorithm for maximum cardinality matching. Kőnig's theorem is equivalent to the following:The positivity graph of any bistochastic matrix admits a perfect matching.A matrix is called scaled-bistochastic if all elements are non-negative, and the sum of each row and column equals c, where c is some positive constant. In other words, it is c times a bistochastic matrix. Since the positivity graph is not affected by scaling:The positivity graph of any scaled-bistochastic matrix admits a perfect matching. == Algorithm == Birkhoff's algorithm is a greedy algorithm: it greedily finds perfect matchings and removes them from the fractional matching. It works as follows. Let i = 1. Construct the positivity graph GX of X. Find a perfect matching in GX, corresponding to a positive permutation set in X. Let z[i] > 0 be the smallest entry in the permutation set. Let P[i] be a permutation matrix with 1 in the positive permutation set. Let X := X − z[i] P[i]. If X contains nonzero elements, Let i = i + 1 and go back to step 2. Otherwise, return the sum: z[1] P[1] + ... + z[2] P[2] + ... + z[i] P[i]. The algorithm is correct because, after step 6, the sum in each row and each column drops by z[i]. Therefore, the matrix X remains scaled-bistochastic. Therefore, in step 3, a perfect matching always exists. == Run-time complexity == By the selection of z[i] in step 4, in each iteration at least one element of X becomes 0. Therefore, the algorithm must end after at most n2 steps. However, the last step must simultaneously make n elements 0, so the algorithm ends after at most n2 − n + 1 steps, which implies O ( n 2 ) {\displaystyle O(n^{2})} . In 1960, Joshnson, Dulmage and Mendelsohn showed that Birkhoff's algorithm actually ends after at most n2 − 2n + 2 steps, which is tight in general (that is, in some cases n2 − 2n + 2 permutation matrices may be required). == Application in fair division == In the fair random assignment problem, there are n objects and n people with different preferences over the objects. It is required to give an object to each person. To attain fairness, the allocation is randomized: for each (person, object) pair, a probability is calculated, such that the sum of probabilities for each person and for each object is 1. The probabilistic-serial procedure can compute the probabilities such that each agent, looking at the matrix of probabilities, prefers his row of probabilities over the rows of all other people (this property is called envy-freeness). This raises the question of how to implement this randomized allocation in practice? One cannot just randomize for each object separately, since this may result in allocations in which some people get many objects while other people get no objects. Here, Birkhoff's algorithm is useful. The matrix of probabilities, calculated by the probabilistic-serial algorithm, is bistochastic. Birkhoff's algorithm can decompose it into a convex combination of permutation matrices. Each permutation matrix represents a deterministic assignment, in which every agent receives exactly one object. The coefficient of each such matrix is interpreted as a probability; based on the calculated probabilities, it is possible to pick one assignment at random and implement it. == Extensions == The problem of computing the Birkhoff decomposition with the minimum number of terms has been shown to be NP-hard, but some heuristics for computing it are known. This theorem can be extended for the general stochastic matrix with deterministic transition matrices. Budish, Che, Kojima and Milgrom generalize Birkhoff's algorithm to non-square matrices, with some constraints on the feasible assignments. They also present a decomposition algorithm that minimizes the variance in the expected values. Vazirani generalizes Birkhoff's algorithm to non-bipartite graphs. Valls et al. showed that it is possible to obtain an ϵ {\displaystyle \epsilon } -approximate decomposition with O ( log ⁡ ( 1 / ϵ 2 ) ) {\displaystyle O(\log(1/\epsilon ^{2}))} permutations.

    Read more →
  • How to Solve it by Computer

    How to Solve it by Computer

    How to Solve it by Computer is a computer science book by R. G. Dromey, first published by Prentice-Hall in 1982. It is occasionally used as a textbook, especially in India. It is an introduction to the whys of algorithms and data structures. Features of the book: The design factors associated with problems, The creative process behind coming up with innovative solutions for algorithms and data structures, The line of reasoning behind the constraints, factors and the design choices made. The very fundamental algorithms portrayed by this book are mostly presented in pseudocode and/or Pascal notation.

    Read more →
  • Inferential theory of learning

    Inferential theory of learning

    Inferential Theory of Learning (ITL) is an area of machine learning which describes inferential processes performed by learning agents. ITL has been continuously developed by Ryszard S. Michalski, starting in the 1980s. The first known publication of ITL was in 1983. In the ITL learning process is viewed as a search (inference) through hypotheses space guided by a specific goal. The results of learning need to be stored. Stored information will later be used by the learner for future inferences. Inferences are split into multiple categories including conclusive, deduction, and induction. In order for an inference to be considered complete it was required that all categories must be taken into account. This is how the ITL varies from other machine learning theories like Computational Learning Theory and Statistical Learning Theory; which both use singular forms of inference. == Usage == The most relevant published usage of ITL was in scientific journal published in 2012 and used ITL as a way to describe how agent-based learning works. According to the journal "The Inferential Theory of Learning (ITL) provides an elegant way of describing learning processes by agents".

    Read more →
  • Information scientist

    Information scientist

    The term information scientist developed in the latter part of the twentieth century by Wm. Hovey Smith to describe an individual, usually with a relevant subject degree (such as one in Information and Computer Science - CIS) or high level of subject knowledge, providing focused information to scientific and technical research staff in industry. It is a role quite distinct from and complementary to that of a librarian. Developments in end-user searching, together with some convergence between the roles of librarian and information scientist, have led to a diminution in its use in this context, and the term information officer or information professional (information specialist) are also now used. The term was, and is, also used for an individual carrying out research in information science. Brian C. Vickery mentions that the Institute of Information Scientists (IIS) was established in London during 1958 and lists the criteria put forward by this institute "Criteria for Information Science" (appendix 1) as well as his own "Areas of study in information science" (appendix 2). The IIS merged with the Library Association in 2002 to form the Chartered Institute of Library and Information Professionals (CILIP). == Notable Information Scientists == See also Award of Merit - Association for Information Science and Technology Marcia Bates David Blair (information technologist) Samuel C. Bradford Michael Buckland John M. Carroll Blaise Cronin Emilia Currás Brenda Dervin Eugene Garfield Paul B. Kantor Frederick Wilfrid Lancaster Calvin Mooers Tefko Saracevic Linda C. Smith Robert Saxton Taylor Brian Campbell Vickery Thomas D. Wilson == Additional reading == Ellis, David and Merete Haugan. (1997) "Modelling the information seeking patterns of engineers and research scientists in an industrial environment" (Journal of Documentation, Volume 53(4): pp. 384–403) Poole, Alex H. (2024). "'There's a big difference between going through life with the wind at your back, and going through life leaning into the wind': Feminism in Post-World War II Information Science". Proceedings of the Association for Information Science and Technology. 61: 300–313. doi:10.1002/pra2.1029. Vickery, Brian Campbell (1988) "Essays presented to B. C. Vickery" (Journal of Documentation, Volume 44, pp. 199–283). Vickery, B. & Vickery, A. (1987) Information Science in theory and practice (London: Bowker-Saur, pp. 361–369)

    Read more →
  • Knowledge organization

    Knowledge organization

    Knowledge organization (KO), organization of knowledge, organization of information, or information organization is an intellectual discipline concerned with activities such as document description, indexing, and classification that serve to provide systems of representation and order for knowledge and information objects. According to The Organization of Information by Joudrey and Taylor, information organization: examines the activities carried out and tools used by people who work in places that accumulate information resources (e.g., books, maps, documents, datasets, images) for the use of humankind, both immediately and for posterity. It discusses the processes that are in place to make resources findable, whether someone is searching for a single known item or is browsing through hundreds of resources just hoping to discover something useful. Information organization supports a myriad of information-seeking scenarios. Issues related to knowledge sharing can be said to have been an important part of knowledge management for a long time. Knowledge sharing has received a lot of attention in research and business practice both within and outside organizations and its different levels. Sharing knowledge is not only about giving it to others, but it also includes searching, locating, and absorbing knowledge. Unawareness of the employees' work and duties tends to provoke the repetition of mistakes, the waste of resources, and duplication of the same projects. Motivating co-workers to share their knowledge is called knowledge enabling. It leads to trust among individuals and encourages a more open and proactive relationship that grants the exchange of information easily. Knowledge sharing is part of the three-phase knowledge management process which is a continuous process model. The three parts are knowledge creation, knowledge implementation, and knowledge sharing. The process is continuous, which is why the parts cannot be fully separated. Knowledge creation is the consequence of individuals' minds, interactions, and activities. Developing new ideas and arrangements alludes to the process of knowledge creation. Using the knowledge which is present at the company in the most effective manner stands for the implementation of knowledge. Knowledge sharing, the most essential part of the process for our topic, takes place when two or more people benefit by learning from each other. Traditional human-based approaches performed by librarians, archivists, and subject specialists are increasingly challenged by computational (big data) algorithmic techniques. KO as a field of study is concerned with the nature and quality of such knowledge-organizing processes (KOP) (such as taxonomy and ontology) as well as the resulting knowledge organizing systems (KOS). == Theoretical approaches == === Traditional approaches === Among the major figures in the history of KO are Melvil Dewey (1851–1931) and Henry Bliss (1870–1955). Dewey's goal was an efficient way to manage library collections; not an optimal system to support users of libraries. His system was meant to be used in many libraries as a standardized way to manage collections. The first version of this system was created in 1876. An important characteristic in Henry Bliss' (and many contemporary thinkers of KO) was that the sciences tend to reflect the order of Nature and that library classification should reflect the order of knowledge as uncovered by science: The implication is that librarians, in order to classify books, should know about scientific developments. This should also be reflected in their education: Again from the standpoint of the higher education of librarians, the teaching of systems of classification ... would be perhaps better conducted by including courses in the systematic encyclopedia and methodology of all the sciences, that is to say, outlines which try to summarize the most recent results in the relation to one another in which they are now studied together. ... (Ernest Cushing Richardson, quoted from Bliss, 1935, p. 2) Among the other principles, which may be attributed to the traditional approach to KO are: Principle of controlled vocabulary Cutter's rule about specificity Hulme's principle of literary warrant (1911) Principle of organizing from the general to the specific Today, after more than 100 years of research and development in LIS, the "traditional" approach still has a strong position in KO and in many ways its principles still dominate. === Facet analytic approaches === The date of the foundation of this approach may be chosen as the publication of S. R. Ranganathan's colon classification in 1933. The approach has been further developed by, in particular, the British Classification Research Group. The best way to explain this approach is probably to explain its analytico-synthetic methodology. The meaning of the term "analysis" is: breaking down each subject into its basic concepts. The meaning of the term synthesis is: combining the relevant units and concepts to describe the subject matter of the information package in hand. Given subjects (as they appear in, for example, book titles) are first analyzed into a few common categories, which are termed "facets". Ranganathan proposed his PMEST formula: Personality, Matter, Energy, Space and Time: Personality is the distinguishing characteristic of a subject. Matter is the physical material of which a subject may be composed. Energy is any action that occurs with respect to the subject. Space is the geographic component of the location of a subject. Time is the period associated with a subject. === The information retrieval tradition (IR) === Important in the IR-tradition have been, among others, the Cranfield experiments, which were founded in the 1950s, and the TREC experiments (Text Retrieval Conferences) starting in 1992. It was the Cranfield experiments, which introduced the measures "recall" and "precision" as evaluation criteria for systems efficiency. The Cranfield experiments found that classification systems like UDC and facet-analytic systems were less efficient compared to free-text searches or low level indexing systems ("UNITERM"). The Cranfield I test found, according to Ellis (1996, 3–6) the following results: Although these results have been criticized and questioned, the IR-tradition became much more influential while library classification research lost influence. The dominant trend has been to regard only statistical averages. What has largely been neglected is to ask: Are there certain kinds of questions in relation to which other kinds of representation, for example, controlled vocabularies, may improve recall and precision? === User-oriented and cognitive views === The best way to define this approach is probably by method: Systems based upon user-oriented approaches must specify how the design of a system is made on the basis of empirical studies of users. User studies demonstrated very early that users prefer verbal search systems as opposed to systems based on classification notations. This is one example of a principle derived from empirical studies of users. Adherents of classification notations may, of course, still have an argument: That notations are well-defined and that users may miss important information by not considering them. Folksonomies is a recent kind of KO based on users' rather than on librarians' or subject specialists' indexing. === Bibliometric approaches === These approaches are primarily based on using bibliographical references to organize networks of papers, mainly by bibliographic coupling (introduced by Kessler 1963) or co-citation analysis ( independently suggested by Marshakova 1973 and Small 1973). In recent years it has become a popular activity to construe bibliometric maps as structures of research fields. Two considerations are important in considering bibliometric approaches to KO: The level of indexing depth is partly determined by the number of terms assigned to each document. In citation indexing this corresponds to the number of references in a given paper. On the average, scientific papers contain 10–15 references, which provide quite a high level of depth. The references, which function as access points, are provided by the highest subject-expertise: The experts writing in the leading journals. This expertise is much higher than that which library catalogs or bibliographical databases typically are able to draw on. === The domain analytic approach === Domain analysis is a sociological-epistemological standpoint that advocates that the indexing of a given document should reflect the needs of a given group of users or a given ideal purpose. In other words, any description or representation of a given document is more or less suited to the fulfillment of certain tasks. A description is never objective or neutral, and the goal is not to standardize descriptions or make one description once and for all for different target groups. The develo

    Read more →