MIARE-Tab Guidance Notes v0.8.0 – May 2011
PubChem BioAssay Description CSV Tags
1. General Description Items
Definitions of the individual items in the bioassay description.
PUBCHEM_EXT_DATASOURCE_REGID
Required. The depositor's own unique identifier for the
deposited bioassay. It must be unique across all data deposited by you. If you
provide an external ID that is not unique to your depositions, it will be
treated as an update request that will replace the existing bioassay record in
PubChem with the data you provide in the bioassay record.
PUBCHEM_ASSAY_NAME
Required. Name of the assay.
PUBCHEM_GRANT_NUMBER
PUBCHEM_PROJECT_CATEGORY
Enter RNAI_GLOBAL_INITIATIVE for projects submitted by RNAi Global Initiative members.
PUBCHEM_ACTIVITY_OUTCOME_METHOD
Accepted values for this tag: PRIMARY_SCREENING, CONFIRMATORY, SUMMARY, OTHER
PUBCHEM_SUBSTANCE_TYPE
Required. Enter 'NUCLEOTIDE' for an RNAi screen.
PUBCHEM_HOLD_UNTIL_DATE
Date the deposited bioassay is to be made available through
the public PubChem database. The maximum permitted on-hold time is 12 months.
This is an NIH/NLM policy. The expected format is: YYYY-MM-DD (e.g.
2009-07-16). If not provided, the deposition will be made available immediately
to the public.
PUBCHEM_ASSAY_DESCRIPTION
Enter PubChem bioassay description as tag/value pairs in the MIARE sheet.
PUBCHEM_ASSAY_PROTOCOL
Enter PubChem protocol description as tag/value pairs in the MIARE sheet.
PUBCHEM_ASSAY_COMMENTS
Enter PubChem comments description as tag/value pairs in the MIARE sheet.
2. Result Definitions
Definitions of the column headers present on the data CSV file.
RESULT_ID
Required. A sequentially increasing integer ID starting from one.
RESULT_NAME
Required. Must exactly match the name and order of the
column headers as they appear in the data CSV file after column 5, PUBCHEM_ASSAYDATA_COMMENT.
RESULT_TYPE
Required. The data type of each column header in the data
CSV file. Accepted values for this tag:
FLOAT, INTEGER, BOOLEAN, STRING, PUBCHEM_NCBI_PUBMED_ID, PUBCHEM_EXT_URL,
PUBCHEM_NCBI_NUCLEOTIDE_GI, PUBCHEM_NCBI_GENE_ID, PUBCHEM_NCBI_PROBE_ID,
PUBCHEM_SID, TARGET_NCBI_TAXONOMY_ID, TARGET_NCBI_GENE_ID
RESULT_DESCR
Description of the data column header e.g. ‘Confirmed by deconvoluted siRNA pools’ for ‘Hit Confirmation’.
RESULT_UNIT
Required. The data units for each column header in the data
CSV file. Accepted values for this tag:
PPT, PPM, PPB, MILLIMOLAR, MICROMOLAR,
NANOMOLAR, PICOMOLAR, FEMTOMOLAR, MILLIGR_PER_ML, MICROGR_PER_ML,
NANOGR_PER_ML, PICOGR_PER_ML, FEMTOGR_PER_ML, MOLAR, PERCENT, RATIO, SECONDS,
RECIPROCAL_SECONDS, MINUTES, RECIPROCAL_MINUTES, DAYS, RECIPROCAL_DAYS, OTHER,
NONE, UNSPECIFIED
3. XRefs
Cross references to relevant information in other databases. For example, PubChem AIDs of other related BioAssays.
XREF_TYPE
Required. Accepted XREF_TYPEs (relevant only):
PUBCHEM_NCBI_TAXONOMY_ID, PUBCHEM_NCBI_PUBMED_ID, PUBCHEM_NCBI_OMIM_ID, PUBCHEM_AID
XREF_VALUE
Required.
XREF_ANNOTATION
Specific description of the tag e.g. PUBCHEM_NCBI_TAXONOMY_ID / human sequences
4. MIARE (Categorised Comments)
MIARE-specific tag/value pairs (entered under CAT_COMMENT_TAG and CAT_COMMENT_VALUE columns respectively) that are stored in
the assay record as comments. All such comments are searchable in PubChem.
See MIARE checklist for the full-list of tag/value pairs.
PubChem Substance Description CSV Tags
5. Substance
Column 1: PUBCHEM_EXT_DATASOURCE_REGID
Required. The depositor's own unique identifier for
Substance descriptions. It must be unique across all data deposited by you. If
you provide an external ID that is not unique to your depositions, it will be
treated as an update request that will replace the existing substance record in
PubChem with the data you provide in the substance record. This is the only
required field in the substance CSV file.
PUBCHEM_NCBI_GENE_ID
NCBI Entrez Gene ID for a specific RNAi substance.
PUBCHEM_NCBI_PROBE_ID
NCBI Entrez Probe ID for a specific RNAi substance.
PUBCHEM_SUBSTANCE_COMMENT
Textual annotations, such as comments on the source of the
reagent sample or the name of the gene target of the siRNA, may optionally be
provided for substance data. This can be found through exact or keyword text
searches.
PUBCHEM_NCBI_TAXONOMY_ID
If the list of substances is not derived from a single
organism, an NCBI Taxonomy ID may optionally be provided for a specific
substance to indicate the source organism.
PUBCHEM_HOLD_UNTIL_DATE
Date deposited substance data is to be made available
through the public PubChem database. The maximum permitted on-hold time is 12
months. This is an NIH/NLM policy. The expected format is: YYYY-MM-DD (e.g.
2009-07-16). If not provided, the deposition will be made available immediately
to the public.
PubChem Assay Data Description CSV Tags
6. Data
The CSV column ordering for the first five columns is fixed
and must be exactly as documented below. Beyond that, there must be a column
for each result defined in the description.
Click on the "CSV Template" link (in the Add Data
View only) to download a CSV template file using the Assay Description that has
been entered. This is a guide so that you can cut and paste your data into this
CSV file while strictly maintaining the correct number of columns. For fields
without data there will be nothing but consecutive commas. There is also an
example CSV file with data. The CSV data file must either have no column
headers or these automatically generated headers; any deviations will cause
errors.
The following fixed columns are expected in your CSV file.
Optional fields for which there is no data available should be left empty.
Column headers and their order in the data file(s) should exactly match the
names and order of the result definitions.
Note: Substance descriptions must be deposited in PubChem
prior to depositing assay descriptions and data.
Column 1: PUBCHEM_SID
Required. The Substance identifier (SID) is a whole number
generated by PubChem after the substance list has been deposited. If substances
are identified by their PubChem SID, leave PUBCHEM_EXT_DATASOURCE_REGID blank.
Column 2: PUBCHEM_EXT_DATASOURCE_REGID
Required. The depositor's own unique identifier for
Substance descriptions previously loaded into either PubChem or the PubChem
deposition system. If you provide a value in this column, you must set the
value in Column 1 to '0' (zero) or leave it blank.
Column 3: PUBCHEM_ACTIVITY_OUTCOME
Required. The outcome for each Substance is represented by one of five values:
1 - Substance is considered inactive.
2 - Substance is considered active.
3 - Substance activity outcome is inconclusive.
4 - Substance activity outcome is unspecified.
5 - Substance identified as a probe (only allowed in summary assays).
Column 4: PUBCHEM_ACTIVITY_SCORE
The score for each Substance is a whole number where larger
values are more active. Scores are expected to be on a linear scale, so should
be transformed accordingly. Although not an absolute requirement, the range
should preferably be adjusted to 0-100, however larger and smaller values are
allowed. The score values are used to allow PubChem users to partition, sort,
and profile Assay Data results within and between biological assays.
Column 5: PUBCHEM_ASSAYDATA_COMMENT
Textual annotation and comments may optionally be provided
for Assay Data reported for this Substance in this column. This can be found
through exact or keyword text searches.
Column 6: Target Gene ID
Required. NCBI Entrez Gene ID
Column 7: Nucleotide GI
Recommended. Sequence identifier of type PUBCHEM_NCBI_NUCLEOTIDE_GI
Columns 8 and higher (one column per result definition):
All remaining columns are an order dependent one-to-one
correspondence between the result definitions defined in the associated Assay
Description. All defined columns must be present; however, values are optional
in individual fields. Consult the auto-generated CSV template file with your
description information to see the layout.
There are no comments on this page. [Add comment]