by Peter Millington

[13th May 1994 - First drafted]
[26th August 2000 - Converted to HTML]

Introduction

These notes relate to the computerised inventory of Alex Helm's collection of folk plays, folk dances and related customs. This was compiled by Ervin Beck in 1982.

To make the inventory easier to read, I have converted the text from all capital letters to mixed upper and lower case, using a computer program. I have also ironed out many of the minor formatting inconsistencies. Sample pages from the newly formatted inventory are attached. In printing it out, I have used layout and differing type styles to distinguish between the various fields rather than use the field labels.

I also used programs to automatically compile indexes for each of the fields. However, I came to the conclusion that the only indexes worth printing out were for Short Title Keywords, Authors, Journal Titles and Years. Sample pages are attached.

Indexes for the other fields either contained too many trivial terms (e.g. in the comments), or had too many entries under each term to be usable (e.g. in the References field). Nonetheless, indexes for these fields in purely electronic form have been useful in identifying inconsistencies and clerical errors.

I have made a small number of corrections to the data on items relating to Nottinghamshire, and likewise added a few comments. However, the project has expanded greatly since then. Automatic indexing revealed numerous minor inconsistencies and situations where standardisation is needed. I have also received a long list of amendments and suggestions from Cawte which should be considered side-by-side with these notes.

It makes sense to merge all these amendments and change them at the same time. I also feel it is important that interested parties should agree standards and a plan of action before continuing. I have therefore made no further changes.

The Main Inventory

  • My case conversion program was fairly rudimentary, just retaining the first character of a string of letters as a capital and converting the rest of the string to lower case. This of course led to some strange situations.

    Minor words such as "And", "Of" and "To" started with capital letters which looked unusual. I have converted the more common of these words to all lower case, but some still remain. Conversely, some initials and abbreviations, which one would expect to see in capitals came out in mixed case - e.g. "Cr", "Efdss", "Fls", "Vwl", etc. Again, I have corrected some of these, but they are more difficult to spot. The remaining instances will have to be corrected individually.

  • At the same time, I made sure that each record had a full complement of field labels, even if these fields were empty. This standardisation of the field labels makes it simpler to produce automatic indexes and nicely laid out listings. It will also make it easier to make corrections and add supplementary data.

  • I corrected any format errors that I spotted, but the automatically compiled indexes show that some still remain.

  • Each field in the original computerised list ended with a full stop. This caused problems when it came to indexing (where the full stop was significant for identifying abbreviations), and also when it came to concatenating fields in printed output.

    In nearly all cases, the full stop was unnecessary, so terminal full stops were removed from most fields. This in turn created in a few new problems. County abbreviations can occur with or without the full stop, and so may appear twice in the indexes. These will have to be made consistent. A similar problem occurs occasionally with the initials of authors' names (e.g. see entries under "Cawte...")

  • Since I printed off the main inventory, I have condensed the record numbering system. Producing automatic indexes highlighted the fact that some of the record identifiers were unduly long. My revised number format retains all the original information, but with fewer characters - thus "18(Pt.2):441-442[sic]" becomes "18/2:441-442*".

    The specific changes are as follows;

      Original Form Revised Form
    Sub-volumes (Pt.1)
    e.g.
    16(Pt.2):60
    /1
    e.g.
    16/2:60
    Insertions [between 115-116] {115-116}
    n.b. curly brackets
    "sic" [sic]
    e.g.
    18(Pt.2):441-442[sic]
    *
    e.g.
    18/2:441-442*

    Unfortunately, I received Cawte's comments on record numbering after I had made these changes and printed the indexes.

The Indexes

In all the indexes, the lists of reference numbers under the terms are simply sorted alphanumerically. As a result, for instance, page "200" might appear before page "60" in the list. However, except for the Short Title Keyword Index, I made sure that at least the volume numbers came out in the correct order.

  • Author Index

    Ervin Beck was fairly consistent in the way he formatted data for this field. Consequently the author index is quite good. The main problem is caused by names occurring in more than one form - e.g. initials and/or full forenames - compounded by occasional errors. Some of the errors may in fact have been Helm's, and Helm's handwriting will not have helped Beck.

    I would recommend the correction of errors and/or the addition of missing initials and the like. Whether we should standardise the form of each person's name - e.g. "N. Peacock" or "Norman Peacock" - is a matter for discussion.

    There are a few cases where inconsistent punctuation has resulted in multiple authors being joined in error as one entry in the index.

  • Journal Index

    Some journal titles begin with the word "The". In the index, these words have been rotated to the end of the field to provide a more meaningful list. Moreover, some titles appear in the inventory several times, sometimes with, and sometimes without the "The" - e.g. "(The) Essex Review". The result is two entries in the index.

    Either we should drop all these initial "The"s, or standardise the form of each journal title on a case-by-case basis. The latter approach may be more appropriate for titles such as "The Field".

    Use of "&" rather than "and", and certain standard abbreviations might make some of the entries less verbose.

  • Year Index

    Although superficially merely duplicating information in the Publication field, the Year field has proved ideal for producing a date index. It is much simpler to use this field than to try and find dates by parsing the Publication Details field.

    Currently, this field only includes publication or collection dates. There are other dates in titles and comments which it would be useful to include too.

  • Short Title Keyword Index

    A separate program was used to compile this index. This is the reason why the lists of reference numbers under the terms are more out of order than in the other indexes.

    The indexing program looked for terms rather than individual words. Rotated index entries were made for each word in multi-word terms.

    Words from the Short Title were kept together as one term provided they were not interrupted by certain punctuation marks (e.g. commas and double quotes) or by stopwords. In technical terms, these punctuation marks and stopwords are called separators.

    The stopword list consisted of "trivial" words such as "an", "from", "or" and "to", as well as some terms which would have had too many entries to be usable - e.g. "Helm", "Letter", "Notes", etc. I was tempted to include "Dance" and "Play" in the stoplist but did not. A full list of stopwords is given in the appendix.

    In addition to stopwords, terms beginning with numbers were excluded.

    Stopwords caused problems with song and dance titles, and with some locations - e.g. "Isle of Wight" is indexed unhelpfully under "Isle" and "Wight". In special cases such as these, we could tie the stopwords to preceding words to avoid their loss - e.g. "Isle_of Wight" or "Old Woman Tossed_up_in_a Blanket". In this case, the underscore character provides a temporary link which would be removed when it came to printing out.

    The index currently includes single-letter abbreviations because "E.", "W.", etc., are used in some place names (e.g. "E. Kent"). The result is that numerous personal names appear in the index under their initials as well as their surnames. Expanding these abbreviations to "East", "West", etc., would remove the need to index single letter abbreviations, thus eliminating most of the unnecessary entries. (Of course entries for full forenames would still remain - e.g. "Leslie Haworth".)

    County names and abbreviations are sometimes inconsistent. For instance, there are instances of "Hampshire", "Hamps." and "Hants." This is in addition to the previously mentioned problem of the inconsistent use of full stops in abbreviated forms.

Summary of Possible Improvements

  1. A matter of principle needs to be decided. Should the inventory faithfully record what is actually written in the Helm Notebooks, or should we correct, complete or enhance the information when necessary. For instance, where Helm has an incorrect or incomplete bibliographic reference, should we correct the error and/or add the missing information if it is known?

    My own view is that we should amend details when appropriate, but retain the original form as much as possible. With significant changes, an explanation should be added to the comments, together with the initials of the personal making the change.

  2. Cawte has suggested several improvements to the numbering system - e.g. to distinguish between seemingly identical items on the same page. It should be possible to accommodate most of these, provided that are concise.

    For retrieval purposes, there is probably no need to distinguish between similar items on the same page, but it could be important when it comes to citing items from the collection.

    Using lower case letters rather than capitals to indicate "recto" & "verso" page references would be easier on the eye, and occasionally save space in indexes. The same could apply to "A", "B", "C", etc.

  3. The remaining mixed upper and lower case errors should be identified and corrected.

  4. I agree with Cawte's suggestion that occurrences of "Helm Notes on..." should be reduced to read "Notes on..."

  5. In the Short Title field (or any other field in which stopwords are omitted from indexes) certain stopwords should be tied to preceding words using underscore characters to preserve meaning in index terms - e.g. "Isle_of Man".

  6. In the Short Title and Comments fields, the abbreviations "N.", "S.", etc., should be expanded to "North", "South", etc.,

  7. In the Short Title and Comments fields, county abbreviations and/or names should be standardised, as should the use of full stops in abbreviated forms.

  8. It is unfortunate that Ervin Beck chose not to include initials with the personal names that he quoted in Short Titles. Out of context in the indexes, this often makes it impossible to distinguish between personal names and place names - e.g. "Fenton". It would be useful if initials were added when available, as well as missing full stops.

  9. Cases where the full stops have been lost from initials in the Author field need to be rectified.

  10. Possibly, the form of authors' names should be standardised for each person.

  11. We need to introduce consistency in the use of initial "the"s in journal titles - either case-by-case for individual journals or for the whole list.

  12. Dates occurring in the Short Title, Title and Comments fields should be added to the Year field where they are not already present.

  13. The Comments field contains all sorts of information, which can be divided into two groups; (a) information which it is not worth indexing - e.g. cross references, paginations, notes on bibliographic accuracy, etc., and (b) items which it would be useful to index - e.g. lists of dance tunes, names of informants, etc.

    Some of the indexable items could be included in the Short Title or added to Publication Details. However, we should consider having a separate field for added indexing terms. This would be a more appropriate way handling, say, the longer lists of songs and dance tunes.

Appendix - Indexing Stopword List

&
+
-
--
---
----
A
ABOUT
ABOVE
ACTED
ADDED
AGO
ALEX
ALL
ALSO
AM
AN
AND
ANYONE
APP
APP.
APPEAR
APPENDIX
ARE
AS
AT
BACK
BE
BEFORE
BETWEEN
BY
C
CF
CF.
CIRCA
CITES
CO
CO.
COLLECTED
COME
COMMUNICATED
COMPARE
COMPARED
CONT
COPIED
COPY
CORRECTLY
CR
CR.
DATED
DATES
EARLIER
ED
ED.
END
ENDING
ETC
ETC.
EXTRACT
EXTRACTED
EXTRACTS
FOOTNOTES
FOR
FORMERLY
FORWARDED
FROM
FURTHER
GIVEN
HELM
HERE
HIM
I
IDENTICAL
ILLEGIBLE
IN
INCLUDED
INCLUDES
INDECIPHERABLE
INSERT
INTENDED
IS
ITEM
ITEMS
ITS
LABELLED
LAST
LETTER
LETTERS
LIKELY
MANY
MISC
MISC.
MOST
MOSTLY
MS
MS.
MSS
MSS.
NEAR
NO
NO.
NOT
NOTE
NOTES
NR
NR.
OF
ON
ONLY
OR
OTHER
PAGE
PAGES
PER
PERFORMED
PERHAPS
PLUS
PP
PP.
PT
PT.
QUOTED
QUOTES
QUOTING
RECEIVED
RE
REF
REF.
REFERENCE
REMAINS
REPEATED
REPLY
SAME
SEE
SEEN
SENT
SIC
SIMILAR
SOME
SOURCE
STIT
SUPPLIED
TAKEN
TH
THE
THEIR
THIS
THOUGHT
TO
TWO
TYPED
TYPESCRIPT
UNDECIPHERABLE
UNIDENTIFIED
UNREADABLE
UNSIGNED
UNSPECIFIED
VARIANT
VARIOUS
VERBATIM
VERSION
VERSIONS
VIA
VOL
VOL.
WAS
WHEN
WITH
WRITTEN