Write up on tech specifications of grand Database Comics

274738704_374401941331566_8704061855509051261_n

https://www.comics.org/

Case Study: Comic Book Inventory (CBI)
CBI came up with a solution that provides their mobile app users with GCD cover images without scraping the whole GCD site or relying on bulk image distribution. When a CBI user looks up a specific comic, the CBI app checks to see if the cover image is available locally, first on the user’s device, and then on the app’s back-end servers. If the image is not found there, the CBI app server will download the requested image from the GCD web site. CBI is simply archiving the downloads preventing the app from downloading the same covers repeatedly. For series, the name as appears in the indicia takes precedence and in general series are split when their title changes. This might mean that users might not find all the books they’re looking for in the same series or under the title they first try to search for

  • The GCD advanced search interface is somewhat inflexible and, for example, punctuation marks might cause a search to fail. We recommend starting from our “Everything” full-text search and drilling down to the desired data by applying relevant filters. We encourage you to create a better search system inside your app if appropriate.
  • Publishers are a complicated topic and our model, while comprehensive, might not match exactly what you have in mind – please see the User’s Guide to Publishers

Public and Private Tables

The GCD schema consists of public and private tables. The public tables contain the visible data records (except for cover information) and are available to anyone in the public dumps. A normal website account is required to access the data dumps. The GCD requires that any use of our data from the dumps be credited to us. On a web site, this should be done on each page where the data is used, and a link should be provided to our site. Commercial use is welcome as long as credit and (on the web) links are provided. For questions about using the dump please contact the Board.

The private tables consist of three types of tables:

  • account data (private for the obvious reasons)
  • data editing tables (private to avoid sending out unapproved data- we’d like to make the change history public at some point if folks want it)
  • the covers table (private for legal reasons involving the distributions of cover scans and copyright law).

Note: Please contact the Board if you have concerns about the policy regarding covers. Please do not contact the tech team as this is not a technical policy and complaining to them only wastes both your time and theirs.

A Bit of History

The original organization of the GCD data was into two files–or, more accurately, into a Series file (later two and then four Series files) and several dozen detail files. The two types of files were normalized by Lionel English in the mid-1990s in order to put them in to a real database, and then the two normalized files were further normalized in order to create the Publishers table from the Series table, and an Issues table from the Details table. During the server migration and site rewrite that took place during August-December 2009, Henry Andrews, Jochen Garcke and the rest of the tech team changed the database to a system supporting true foreign keys and transactions, and imposed a consistent naming scheme on the tables and columns. Several new public tables, as well as an entirely new set of data editing tables, were added at this time. The documentation for the pre-2009-migration schema is still available for those curious about it.

Future Plans

We’re progressing towards our real “1.0” release of this generation of the site, code-named “New Fun”. Much discussion on the lists references the “New Fun” schema which supports a much more complex and realistic view of the data. Please join the gcd-tech group if you would like to participate in that discussion. For the curious, the December 2009 release was 0.2 “Vieux Bois”.

Supporting Tables

There are several supporting tables necessary to use the data which lay out fixed sets of options for certain fields. They are:

  • gcd_country
    • Has columns idcode and name, and provides the set of countries used by the various publisher and series tables. The code column is the ISO 3166 code for the country, and should arguably be used as the foreign key, but due to the very rushed development timeline of the December 2009 release (as the prior server was no longer viable), we went with the standard numeric id column for reasons which seemed like a good idea at the time, but involved a great deal of sleep deprivation 🙂 We’ll probably revisit this eventually.
  • gcd_language
    • Has columns idcode and name, and provides the set of languages used by the series table. The code column is the ISO 639 code for the language. The two-letter ISO 639-1 code is used if available, three letter code from 639-2 or 639-3 otherwise. This might get standardized into 3-letter codes or 4-letter codes if we need to take ISO 639-6 into account once its code list is published. This table has the same foreign key oddness about code vs id as the gcd_country table, although the possibility of changing the codes makes it more sensible to stick with id for now.
  • gcd_story_type
    • Has columns id and name, and provides the set of types used by the story table. The type field is somewhat controversial in that its values actually cover various different attributes of story sequences. However, until late 2009 there was no way for the GCD to change the field (and it was in fact a plain text field within the story table) so discussions on changing it were set aside. This will be one of many things revisited along the path to the 1.0 New Fun release.

Publishers

The publisher table held two sorts of records: “master publishers” and “imprints”. Both of these concepts are less precise than one might expect.

“Master publishers” are the common names of publishers as typically grouped by comic book researchers. As this vague definition suggests, there is considerable room for disagreement over what constitutes a master publisher and how such entities should be named. This field is often referred to as the “argue ’till we’re blue in the face” field. Part of the current definition of a series in the GCD is that all issues within the series are published by the same master publisher. This has led to the grouping of some companies together under a single master publisher because one took over several series from the other. Debate on the proper arrangement and definition of master publishers is ongoing. A master publisher is a record that has the is_master field set to 1. The name of the publisher is chosen primarily based on what is most likely to be expected by a user running a search.

As of late 2009, “imprints” were considered a deprecated concept and as of July 2013 they were replaced by indicia publishers and brands (covered in their own tables below). The concept of an imprint may return in a more well-defined form in the future for formal imprints of publishing houses. The data records in this table that were considered “imprints” served as a catch-all for several different publishing concepts, often requiring long semi-colon separated lists as the field is set per-series but the data often changes per-issue. An imprint was a record that had a non-NULL parent_id field. There should be no more of these.

Note that there are a very small number of records that were both master publishers and imprints. Explaining those records is beyond the scope of this documentation.

gcd_publisher [apps.gcd.models.Publisher class in the code]
Column NameDescriptionUsage Notes
Core publisher data fields:
idDB-generated unique, primary key.
nameName of the master publishing “family”See the discussion above for how this name is chosen
year_beganYear (by cover date) of first publication.
year_endedYear (by cover date) of last publication, if any.
notesSpace for arbitrary notes.
urlThe URL for the publisher’s website.Includes the protocol and leading slashes (e.g. ‘http://’).
Relations to other tables and counts of related objects:
imprint_count (deprecated)Count of how many imprints a given master publisher has.Recalculated by the code on data updates. Should all be 0 as of July 2013.
brand_countCount of how many brands a given publisher has.Recalculated by the code on data updates.
indicia_publisher_countCount of how many indicia publishers a given publisher has.Recalculated by the code on data updates.
series_countCount of how many series this publisher publishes.Recalculated by the code on data updates.
issue_countTotal number of issues in all series published by the publisher.Recalculated by the code on data updates.
is_masterBoolean field (1 or 0 in MySQL).See discussion of master publishers above.
country_idThe country from which the publisher operates.Foreign key into the gcd_country table (id column).
parent_idSelf-join key for parent/child relationships between master publishers and imprints.NULL for non-imprints.
Change tracking fields:
reservedIf true (1 in MySQL, as opposed to 0 for false), there is an active change being made in the data editing tables.
created
modified

Brands

Brands represent emblems (graphic logos or text) that appear on the front or back covers or on the spine of an issue / book and identify the publisher in some way. While intended to be a fairly objective field, there is still considerable controversy over how to define brands and when to consider two similar emblems separate. The current working definition is that changes in the text of the emblem mean that the two emblems are different brands. Changing only graphic elements generally does not create a new brand. But there are exceptions to these rules depending on the nature of the text or whether there was a change in the common name of the brand / publisher / imprint accompanying a purely graphical change.

Brand effectively identifies publisher imprints as well as publisher branding other than imprints. However, it is not always a perfect match for imprints so a more formal imprint field / table may be introduced in the future.

Brands are only intended to be used for the publisher’s brand. Distributor marks are not tracked here (they will be given their own field/table in a future release of the site), nor are brand emblems from partners, sponsors, etc. [for instance, a TV network’s name or logo if the issue ties in to a TV show]).

The brand concept may change over the next few months, or may be given a new name in the UI. It is a new field, and therefore not filled out on most issues at this time.

gcd_brand [apps.gcd.models.Brand class in the code]
Column NameDescriptionUsage Notes
Core brand data fields:
idDB-generated unique, primary key.
nameName of the brand, generally the text of the emblem. If there is no text or it does not clearly identify the emblem, the common name (if any) may be used, or a brief description of the emblem may be used.
year_beganYear (by cover date) of first publication.May be NULL If unknown.
year_endedYear (by cover date) of last appearance, if any.May be NULL if unknown or still in use.
notesSpace for arbitrary notes. Should include any description of the emblem or emblems involved in the brand to aid identification.
urlThe URL for the brand’s website, if and only if it is distinct from the parent master publisher (this is common when the brand is a publisher’s imprint).Includes the protocol and leading slashes (e.g. ‘http://’).
Relations to other tables and counts of related objects:
issue_countTotal number of issues in all series published by the publisher.Recalculated by the code on data updates.
parent_idForeign key to the gcd_publisher table (id column)..An issue can only be linked to a brand that is linked to the same master publisher to which the issue’s series is linked.
Change tracking fields:
reservedIf true (1 in MySQL, as opposed to 0 for false), there is an active change being made in the data editing tables.
created
modified

Indicia Publishers

Indicia publishers are the companies, identified by their legal names, that are actually recorded within the comic (in the indicia if the comic has such a thing) on an issue-by-issue basis. This is pretty much the only publishing field that is objectively clear, as master publisher relies on grouping these companies by somewhat variable rules and sometimes tenuous speculation, while brand is controversial in a number of ways as described above. Unlike master publisher, it is linked at the issue level and therefore a series can (and in certain countries and time periods, often does) span many indicia publishers.

This is a new table so it is not filled out for many issues yet.

gcd_indicia_publisher [apps.gcd.models.IndiciaPublisher class in the code]
Column NameDescriptionUsage Notes
Core indicia publisher data fields:
idDB-generated unique, primary key.
nameName of the company exactly as it appears in the indicia, including abbreviations, punctuation, etc. (e.g. “Co.” and “Inc.”).On rare occasions minor variations in the name may be grouped as one indicia publisher. This should be explained in the notes field.
year_beganYear (by cover date) of first publication.May be NULL if not known.
year_endedYear (by cover date) of last publication, if any.May be NULL if not known or if the company is still active.
notesSpace for arbitrary notes.
urlThe URL for the company’s website, if and only if distinct from the master publisher website.Includes the protocol and leading slashes (e.g. ‘http://’).
Relations to other tables and counts of related objects:
issue_countTotal number of issues in all series published by the company.Recalculated by the code on data updates.
country_idThe country in which this company is incorporated.Foreign key into the gcd_country table (id column).
parent_idForeign key to the gcd_publisher table (id column).An issue can only be linked to an indicia publisher that is linked to the same master publisher to which the issue’s series is linked.
is_surrogateIf true (1 in MySQL, as opposed to 0 for false), this company is not properly part of its linked master publisher (exactly what “part of” means is somewhat variable). Instead, this company is an otherwise unrelated company that published material on behalf of the parent master publisher. This was common in the United States during World War II and explains a number of otherwise puzzling issues.
Change tracking fields:
reservedIf true (1 in MySQL, as opposed to 0 for false), there is an active change being made in the data editing tables.
created
modified

Series

A series in the GCD is officially defined by the indicia title and publisher – when either of these changes a new series is created. Minor changes in the indicia title (such as the presence or absence of an article, i.e. “The”) may be allowed within one series, especially if the article comes and goes. Some exceptions exist, such as Dell’s Four Color (2nd series) which is commonly grouped in guides but actually uses a different indicia title for nearly every issue.

A series may be split due to a particularly long lapse in its publication schedule (on the order of years) or due to certain kinds of renumbering, but the exact rules in this area are still being debated.

gcd_series [apps.gcd.models.Series class in the code]
Column NameDescriptionUsage Notes
Core series data fields:
idDB-generated unique, primary key.
nameSeries title, as it appears in the indicia (if any, as on the cover if there is no indicia), with any leading article moved to the end after a comma.The article-moving rule applies only if the title is in the language in which the series is written.
formatPhysical format of the series (size, color presence or absence, paper stock, etc.).Since format is really an attribute of the issue rather than the whole series, this field may contain a fairly long description of how the format changed over the life of the series. In the “New Fun” schema, this is being replace by a number of distinct fields, which individually describe the constituent attributes commonly listed here.
year_beganFirst year (by cover date) of publication.
year_endedLast year (by cover date) of publication, if any.
is_currentSet to true (1 in MySQL, as opposed to 0 for false) if new issues are still being produced for this series.This is a new field and may not be properly set for all series yet.
publication_datesFirst and last full cover publication dates of the series, separated by a hyphen (i.e. ‘-‘). No last date if still ongoing.This field is being replaced by the first_issue_id and last_issue_id fields. It is not currently possible to edit it from the OI, so errors are not being actively corrected. Once all data is migrated to the new field, this one will be dropped.
tracking_notesNotes about how a given series is continued in or continued from another series.See the [Formatting Documentation] for how these notes should be structured. This will eventually be replaced with foreign key fields.
publication_notesFormerly the place where arbitrary publication details could be recorded. Now considered deprecated.This predates the imprint field, which predates the issue-level indicia publisher and brand fields. Notes here that are not migrated to those fields will most likely be merged into the general notes field and this field will be eventually dropped.
notesSpace for arbitrary notes.
has_galleryIf true (1 in MySQL, as opposed to 0 for false), the series has at least one cover image.Updated by the code as covers are added/changed/removed.
Relations to other tables and counts of related objects:
issue_countNumber of issues associated with the series.Recalculated by the code when the data is updated.
first_issue_idForeign key to the first issue of this series in the gcd_issue table (id column).Updated by the code when issues are added, deleted or moved. This is a new field and is not always properly set, nor is it yet used in the UI.
last_issue_idForeign key to the last issue of this series in the gcd_issue table (id column).Updated by the code when issues are added, deleted or moved. This is a new field and is not always properly set, nor is it yet used in the UI.
publisher_idForeign key into the gcd_publisher table (id column).Should only ever reference a master publisher record (enforced at the code level).
imprint_idForeign key into the publishers table (id column) for the deprecated “imprint” concept.This is being set to NULL once its information has been migrated to the brand_id and indicia_publisher_id on the issues for the series.
country_idForeign key into the gcd_country table (id column).This is properly an attribute of the indicia publisher (linked to the issue table) rather than the series, and will eventually be dropped once that field is sufficiently populated.
language_idForeign key into the gcd_language table (id column).The language in which the comic is written. For comics that are purely visual, there is an ISO code for “no linguistic content”.
Change tracking fields:
open_reserveObsolete field that we forgot to drop during schema migration.This will be dropped next time we change the schema, and should be ignored.
reservedIf true (1 in MySQL,as opposed to 0 for false), there is an active change being made in the data editing tables.
created
modified

Issues

Issues are what you would expect. They include both periodical issue (pamphlets) and bound volumes / collections, currently with no distinctions between the two. The “New Fun” schema will add classifications so that we can treat these items differently. As it is, books are sometimes awkwardly wedged into fields designed for periodicals, but it mostly works.

One notable exception to the sensibility of this table is that issue variants do *not* each get separate records at this time. A future schema (possibly “New Fun” but possibly a release some time after that) will address this inconsistency. For now, variant covers are recorded by adding multiple cover sequences to the story table for the issue. There are some exceptions to this arrangement when more than just the cover varies.

gcd_issue [apps.gcd.models.Issue class in the code]
Column NameDescriptionUsage Notes
Core issue data fields:
idDB-generated unique, primary key.
numberIssue number or other identifier from the indicia (if there is one) or cover (if there is no indicia) or in some cases both (if there are both and they disagree).See Issue Numbers for numerous special cases of missing / dual / annual / per-volume or other sorts of “numbers”. Note that this is a text field rather than a numeric field because many issue “numbers” are partially or entirely non-numeric, i.e. “Summer Special”.
volumeVolume number from the indicia (if there is one) or cover (if there is no indicia).See Volume While this field has been present for a long time, it has only recently been displayed on the site. Therefore the data is not present as often as for other fields.
display_volume_with_numberIf true (1 in MySQL, as opposed to 0 for false), combine the volume number with the issue number in most places that the issue number would be displayed. This produces numbers like “v2#1” (although in many indexes that notation is still used directly in the issue number field – this will eventually be corrected). See Volume and Issue Numbers.
no_volumeIf true (1 in MySQL, as opposed to 0 for false), there is no volume number for the issue. This is used to confirm the meaning of an empty volume field since that field was ignored for many years and therefore is often empty when it should not be. See Volume and Issue Numbers.
sort_codeNumeric field that determines how issues sort within each series. There is a uniqueness constraint on the combination of this column and series_id.This replaces the old cover sort code and is now the sole source of ordering for issues within a series. key_date (which is not always known) is now only used for chronological search results.
key_dateSpecially formatted date string that (in theory) determines the ordering of issues in search results that are organized by date.In practice, many issues are missing this field, meaning that they will just sort to the beginning of any chronological list. This is a data problem, not a coding error. For reliably sorting issues within a series, see sort_code. Also see Keydate.
publication_datePublication date of the issue as given in the indicia, but fully spelled out.See Publication Date.
indicia_frequencyIf the issue has an indicia and it specified a publication frequency (monthly, quarterly, one-shot, etc.) that is recorded here.This is a text field rather than a foreign key into a set of options because of frequencies that are totally non-standard (common in alternative/underground publications) or irregular in such ways as “monthly except in September and December”. This is a new field so it is sparsely filled out at this time.
priceA decimal price followed by the ISO code, i.e. 0.10 USDSee Cover Price for details, including multiple prices and non-decimal currencies.
page_countOverall page count for the issue, counting covers but not inserts or dust jackets.A single sheet with one fold (the smallest possible “comic book”) would count as 4 pages. See Page Count.
page_count_uncertainIf true (1 in MySQL, as opposed to 0 for false), the page count is unknown (if page_count is NULL) or could not be reliably determined due to second-hand information or missing pages.This field allows the page_count field to be of numeric type. Previously, it was text so that a “?” could be appended for this purpose.
editingEditor for the whole issue, or for the overall anthology if there are different editors for each story in a collection. Also used for other issue-level credits such as assistant/associate editor, publisher, production, special thanks, etc.See Credits and Editing for how credits are structured.
no_editingIf there is no editor (or similar credit) at the issue level, this field should be set to true (1 in MySQL, as opposed to 0 for false). Distinguishes from an unknown editor or from the field simply not having been filled out.
notesSpace for arbitrary notes about the entire issue.Because the issue-level notes were originally (by convention) placed in the notes field of the first cover sequence (in the story table), the only reasonable option was to duplicate those notes into the issue table during migration. We are working through these duplicates, but for now it is very common to see identical notes in the issue record and in the sequence number 0 story record.
Fields relating to the other tables:
indicia_publisher_idForeign key into the gcd_indicia_publisher table (id column, can be NULL for no indicia publisher or unknown indicia publisher [data not entered yet]).Indicates the actual company that published the issue as shown in the indicia or other formal publication data. This is a new field and therefore is sparsely filled out at this time.
brand_idForeign key into the gcd_brand table (id column, can be NULL for no brand or unknown brand [data not entered yet]).Indicates the publisher’s logo / name / tagline on the exterior (front cover, back cover or spine) of the issue. There is still a fair amount of debate about what constitutes a separate brand (how much change in the logo appearance and/or text). This is a new field and therefore it is sparsely filled out at this time.
no_brandIf true (1 in MySQL as opposed to 0 for false), there are no brand markings on the issue.If this field is false and brand is NULL, the UI will display a question mark indicating that the branding is unknown.
series_idForeign key into the gcd_series table (id column).
story_type_countNumber of stories of type “story” linked to this issue.Used to determine whether an issue is “indexed” or not. Currently, an issue is indexed if there is at least one entry in the story table of type “story” (to prevent the existence of a story of type “cover” or other type from making the issue look complete). In the future we may also implement tracking of the expected number of stories to make this measurement more precise. Calculated whenever an issue’s stories are updated.
Change tracking fields:
index_statusObsolete field kept to help debug potential migration problems (so far, so good…).This field will be dropped the next time we update the schema, and should be ignored.
reserve_statusObsolete field kept to help debug potential migration problems (so far, so good…).This field will be dropped the next time we update the schema, and should be ignored.
reserve_checkObsolete field kept to help debug potential migration problems (so far, so good…).This field will be dropped the next time we update the schema, and should be ignored.
reservedIf true (1 in MySQL, as opposed to 0 for false), there is an active change being made in the data editing tables.
created
modified

Stories

The story table would more accurately be called the sequence table, and we plan to eventually rename it. It records the contents of an issue, whether they are actual sequential art stories, text stories, covers, advertisements or any number of other sequence types. Only sequential art stories, covers and text stories are technically required to “complete” an index, but cover-to-cover indexes are appreciated.

gcd_story [apps.gcd.models.Story class in the code]
Column NameDescriptionUsage Notes
Core story data fields:
idDB-generated unique, primary key.
sequence_numberOrder of the story within the issue.See Sequence Number. Sequence zero is no longer treated specially (although it still represents the front cover, or at least one variant of it).
titleThe title of the story, or a quote from the beginning of the script if there is no title.See Title.
title_inferredTrue (1 in MySQL, as opposed to 0 for false) if the indexer chose the title either directly or by using the first line of the script.This causes the UI to display the title with [square brackets] around it.
featureThe name of the feature, if any. Typically as from the splash or title page, but with some exceptions.Often but not always the name of the primary character. Some stories do not have a feature. See Feature.
page_countNumber of pages. One side counts as one page.See Page Count.
page_count_uncertainTrue (1 in MySQL, as opposed to 0 for false) if the page count is unknown or uncertain.This is a separate field so that page_count could be a numeric field instead of varchar.
notesArbitrary notes from the indexer.See Notes.
reprint_notesReprint information (both reprint from and reprinted in).See Reprints for the extremely detailed formatting requirements for these notes. This field will be replaced with database links in release 0.5 “Kin-der-kids”.
type_idForeign key to the data_story_type table (id column). Type of “story”, or more accurately sequence.Actual comic story, text story, advertisement, or letters page, etc.
job_numberJob Number
Fields for creator credits:
scriptWriter, scripter and/or (co-)plotter credits.See Credits and Script.
no_scriptTrue (1 in MySQL, as opposed to 0 for false) if there is no writing, such as for a single panel illustration with no text (pin-up) or a simple promo that shows the next issue’s cover with no significant text (lines like “Continued next issue” do not count as significant text).Note that the absence of dialog or narration does not mean there is no script. If there is a sequence of events depicted in the artwork, that constitutes a “script”.
pencilsPencils or other primary artwork (painting, photography, etc.) or layouts / breakdowns and finishing pencils.See Credits and Pencils.
no_pencilsTrue (1 in MySQL, as opposed to 0 for false) if there is no artwork, for instance a text story with no illustrations or a plain text recap page with limited visual elements (such as a solid color background, which is more of a production choice than the sort of artwork credited in these fields).
inksInks or other artwork that completes the primary artwork. Painting / photography should be credited both here and in pencils an possibly colors.See Credits and Inks.
no_inksTrue (1 in MySQL, as opposed to 0 for false) if there is no inking or similar finishing task that applies to the artwork, or there is no artwork.
colorsColor added to non-colored artwork, or similar image processing. Painting / color photography should be credited both here and in pencils, and possibly inks.See Credits and Colors.
no_colorsTrue (1 in MySQL, as opposed to 0 for false) if there is no colorist to credit, for instance because the sequence is printed in black and white, or if there is no artwork.
lettersLettering. For text not designed/produced by a letterer, use “typeset” (for instance typewritten text stories word processor output not done in word balloons or captions).See Credits and Letters.
no_lettersTrue (1 in MySQL, as opposed to 0 for false) if there is no letterer (or “typeset”) for the sequence. Covers with no significant text beyond the title and standard promotional taglines should set this to true.
editingDetails regarding editing for this individual story, as opposed to the editing notes for the whole issue.Rarely used except for sequence 0, as the editor for the book is typically assumed to have edited the whole book. However, this field can be used if individual stories have separate editors such as certain original anthologies or collections covering diverse material.
no_editingTrue (1 in MySQL, as opposed to 0 for false) if there is no editor to credit for this story (as opposed to the single editor that commonly edits the whole issue).
Fields related to content:
charactersCharacters appearing should be populated in this field. Sometimes only recurring characters, but sometimes including things like “unnamed boy” or “Nazis”.See Character Appearances. The formatting rules for this field are extensive, but have been changed several times and multiple sets of rules are currently in active use.
synopsisbrief synopsis of the content. The purpose of the field is to aid in identifying the story, not to provide a plot summary to replace reading the whole thing. Do not use text from other sites or sources as it may be assumed to be under copyright.See Synopsis.
genreOne or more genres, separated by semicolons.Free-form, although there is an Official Genres List. See also Genre.
Fields relating to the other tables:
issue_idForeign key into the issues table (id column).
type_idForeign key to the data_story_type table (id column). Type of “story”, or more accurately sequence.Actual comic story, text story, advertisement, or letters page, etc.
Change tracking fields:
reservedIf true (1 in MySQL, as opposed to 0 for false), there is an active change being made in the data editing tables.
created
modified

Category

Introduction to Indexing

What is “Indexing”? Indexing is the primary way of contributing data to the GCD. For each comic, an indexer records at minimum the issue’s number, date, price and page count as well as the types, titles and credits for each story and cover. A wealth of additional information, including character appearances, genres and plot synopses, may optionally be entered.

Indexing is done through our Online Indexing system (OI) of web forms by logging in through the front page or the login link on the search bar on each page. An alternative method, known as Indexing Offline, involves uploading text files (a.k.a. “Flat Files”) in place of using the web forms.

Important Note: Contributors cannot copy existing, published indexes without the permission of the copyright holder. Copyright laws and good manners prohibit such actions. If you have obtained permission, you must note the source of the material in your index. Cover scans from other sources (web sites or compilation CD’s) are not allowed unless you have obtained their permission. Cover scans with watermarks are never allowed. All efforts should be made to scan your own covers. See the Covers_FAQ or check with gcd-main if in doubt.

What information can I add or change?

Indexers are encouraged to contribute whatever information they can, as even incomplete indexes are valuable. Indexers may also fill in missing information on existing indexes or change information they know to be in error. The more information, the better! While we have a definition for “complete” indexes, that really only covers the basics. Indexes are rarely truly complete, as a fully-detailed cover-to-cover account of a comic is a lot of work! A “complete” index has, at minimum: Titles, page counts, credits (even if unknown) and features for each of the stories (either text or comic) and for the cover of the comic. It does not need to have letter pages, promos or ads, though these are definitely desirable. It also doesn’t have to have characters, synopses or notes about reprints, though these are definitely desirable too. Note: The exact definition of a minimally “complete” index is currently under discussion on gcd-policy.
Sometimes an object cannot be edited or deleted because other objects in the database depend on it existing, or because other changes are being made that might conflict with a new edit or invalidate the reason for deleting it. Please refer to Cannot Edit or Delete for more detail.

How do I learn how to index?

The OI Tutorial will guide you through the basics. For more details (and the official rules on what should and shouldn’t be entered in each field) the most up-to-date documentation is the Formatting Documentation on this wiki. There are some older documents outside this wiki that you might find if we haven’t hunted down all the old links yet. In the event of a disagreement, the wiki wins.  🙂

What happens after I submit my index?

After you submit the index, an editor will check the data for errors and proper formatting, and then approve the index (usually within a week). If there is a problem, the index may be put back in your queue with comments and you may be contacted by e-mail for comment. Your editing queue is accessible by clicking on the “Editing” link on the top and bottom search bars after you are logged in. Do not be discouraged if you get back a list of stuff to be corrected. The editors can be a picky bunch, and they have to be. We need the data in the database to be consistent for future uses and for searching. After the index has been approved, you may use the GCD search form at the top of every page to check your index to see if any errors got missed. If there are any errors, please re-reserve the comic and correct it yourself, or file an error report using the link that’s on every page.

How do I contact an Editor with questions about my index?

The easiest way to contact an Editor is to put a Comment in the issue you’re working on, if it applies to that specific issue. If you have a more general question, a good method is to post the question on the gcd-main list. Most of the Editors take part in the discussions there and someone is very likely to answer your question.

How do I get started as an indexer?

First you need an Online Indexing account. There is a link on the front page to [register] through the website. It is in the Login box on the left side of the page.

The information you will be asked to provide is:

  • Your email address. It will be used as your user name.
  • Your password
  • Your name that you would like to be credited by. Preferably your real name.
  • Your country
  • The languages you read (multiple languages can be selected)
  • Your comic book interests (not necessary, but we like to know a bit about you so we can pair you up with an editor with similar interests if possible)
  • Agree to our Privacy Policy
  • Whether or not you wish to see all content for an issue, with the full view accessible via a link. There are three options, all contents (including ads), editorial content (all sequences except promos and ads), and core content (covers and stories).
  • You can create a custom display by selecting all content, then selecting which sequences you would like to hide (No show sequences). These sequences still show in the Table of Contents and can be accessed by click on their entry there.
  • You can also hide the Cover Letterer credit display if no creator is credited.
  • Where you heard about us (not necessary, but information that may be helpful to our Public Relations Committee)
  • Whether or not you wish to receive e-mails for changes you submit that are approved without a comment. You will always receive an email if the approver comments.
  • Whether or not you wish the change comparison page for an issue to show a shortened version of the unchanged values, with the full view accessible via a link.
  • Whether or not you wish to see links to this wiki while you are indexing. The “Wiki Links in OI” box near the bottom of your Profile page (not shown in the image below) provides a link to the appropriate topic as a small [?] at the top of each data entry field. You can set this either on or off in your Profile.
  • Report – Comic Strip Committee
  • Jump to navigationJump to search
  • We would like to build a newspaper comic strip database.
  • Any and all strips and cartoons from any and all local or national newspapers, including college/university newspapers and alternative newspapers, from any country, are eligible for inclusion.
  • Data
  • We would like to be able to search and index from any of the following perspectives, and would like to be able to document the following facts at each level. Note that the organization below is not intended to place any restrictions on tech implementations–they may decide to organize things differently if it makes sense behind the scenes.
  • Strip meta data — general information about the strip. Name of the strip. Alternate names. Dates begun and concluded. Creator(s). Premise. Major characters. Genre. Links to preceding and/or succeeding strips, if any. Frequency. Format(s). General Notes. Cartoon question: How do you determine Strip Name?
  • Strip owners — syndicate, newspaper, or other entities who control distribution. Names. Address(es). Dates. Key personnel (and tenures and positions). Links to previous and/or next incarnations. General Notes.
  • Ownership — links between strips and strip owners indicating duration of relationship. Strip. Owner. Begin and end dates. Notes.
  • Newspapers — papers in which a strip appears. Publication name. Publisher. Location/distribution area (i.e. San Diego State University; Chicago, IL; U.K.) Links to previous and/or next incarnations. Notes.
  • Issues — issues of above. Date; volume; issue number.
  • Installments or episodes — individual strips. Date and/or serial number. Installment title. Credits. Characters. Synopsis or identifying caption or dialog. Keywords. Notes.
  • Publication — occurrences of installments in issues. Strip Installment. Issue of Newspaper. Page No. Notes. Strip name? (Individual papers have the discretion of running a strip under an alternate name–this is probably the best place to record that?)
  • Story/Continuity — aggregates of installments that tell extended stories. Strip. Installments. Story name(s). Synopsis. Keywords. Characters. Notes. (NOTE: Probably makes more sense for Installments to link to Stories than vice versa, but we’ll leave such things up to tech–point is that we know *which* strips belong to a given continuity. Will have to decide what to do about transitional/borderline/bridging strips.)
  • Characters — people who appear in a strip. Name(s). First appearance. Brief description. Strip(s) and story(ies) associated with. Notes.
  • Creators — people who produce strips. Name(s). Birth, death dates. Brief bio. Links to work. Notes.
  • Reprints — links between reprints of strips (presumably in comic book database) and individual installments and/or continuities.
  • Images — can’t get away with scans of each strip. But can presumably get away with scans of isolated panels to illustrate characters, and possibly isolated strips to illustrate strip meta data and story/continuity. This area needs further review by the board of directors due to possible legal issues.
  • Functionality
  • Must be able to enter/edit/search for all of the above.
  • Internationalized/localized UI as much as possible.
  • Must be able to work on data in bulk–create installments in bulk by specifying beginning and ending dates and frequency. Select installments by data range to add/edit credits in bulk. Ability to add in bulk a newspaper run (e.g. paper “foo” carries strip “bar” from date 1 till date 2, matching same date range in strip history (or matching alternate date range or serial number range in strip history)). Add stories by selecting date ranges of strips. Add reprints by date or serial number ranges.
  • We should also be able to set up a template for ongoing strips and/or allow a new strip installment to be copied from the previous days’. Should also be able to set up a template for a given newspaper’s strips and/or allow a new days worth of strips to be copied from the previous days strips (i.e. copy the same strips and page numbers to the next days issue, updating installment of each strip to next days installments).
  • An important issue is search display–many/most strips will have thousands of installments. So when returning search results, we want to see filtered/collated results that can be repeatedly drilled down to lowest details.
  • Example: Searching on creator or syndicate or newspaper might produce list of strips, possibly with associated date ranges. Clicking through a given strip might produce a list of years during which the strip ran (possibly limited by the original primary search–e.g. if we’re looking at a strip via a creator search, we’d only want to see the years associated with that creator’s tenure on the strip). Clicking through a given year might produce a list of months or weeks during that year, and clicking through a given month or week might produce a listing of the individual strips for that month or week, with full details. For a non-daily, there might be fewer click throughs to arrive at the details. If the strip makes use of story arcs, then clicking through the strip name might produce a list of story arcs instead of a list of years, and those could be drilled down to weeks or individual dates.
  • Main idea here is we don’t want to see a list of thousands of strip entries at a time, need to filter into manageable levels of detail.
  • (end of report)

Leave a Comment

Your email address will not be published. Required fields are marked *