f Indexed reports - Wiki

Indexed reports

From Wiki

(Difference between revisions)
Jump to: navigation, search
Line 21: Line 21:
When interpreting index values, the following techniques should always be used:
When interpreting index values, the following techniques should always be used:
*'''Take no notice of Z in the Mosaic [[category]]''' because [[Z|Mosaic Group Z]] represents postcodes which were missing in [[STATS19]] returns.  No index is calculated for Z, and the label '''Unknown''' appears instead.  You could consider applying a [[Filter]] to remove it from the report altogether.
*'''Take no notice of Z in the Mosaic [[category]]''' because [[Z|Mosaic Group Z]] represents postcodes which were missing in [[STATS19]] returns.  No index is calculated for Z, and the label '''Unknown''' appears instead.  You could consider applying a [[Filter]] to remove it from the report altogether.
-
*'''Look carefully at the chart value axis''' because sample size makes a big difference to the significance of indices - an index of 115 on a sample of 1,000 would probably be significant, but an index of 180 on a sample of only 40 would probably be unreliable.
+
*'''Look carefully at the chart value axis''' because sample size makes a big difference to the significance of indices - an index of 115 on a sample of 1,000 would probably be a reliable indication of over-representation, but an index of 180 on a sample of only 40 may not.
*'''If you apply a Home filter, avoid large multiple selections and always clear the NK and Unknown boxes''' because basing the index calculation on eccentric collections of areas and missing postcodes will probably not generate meaningful results.
*'''If you apply a Home filter, avoid large multiple selections and always clear the NK and Unknown boxes''' because basing the index calculation on eccentric collections of areas and missing postcodes will probably not generate meaningful results.
*'''Don't use a Crash Location filter unless you have first applied a Home filter for a similar or a smaller area''' because the geographic extent of crashes should be consistent with or larger than the base - if this was not the case, the index calculation would probably be distorted.
*'''Don't use a Crash Location filter unless you have first applied a Home filter for a similar or a smaller area''' because the geographic extent of crashes should be consistent with or larger than the base - if this was not the case, the index calculation would probably be distorted.
Line 43: Line 43:
The population index is based on the total number of people resident at each postcode, according to recent estimates.  The base values used for this calculation are visible to [[power users]] in Population reports.
The population index is based on the total number of people resident at each postcode, according to recent estimates.  The base values used for this calculation are visible to [[power users]] in Population reports.
 +
 +
This index is usually most appropriate for use in casualty reports which include [[pedestrians]] or vehicle [[passengers]].
===Average annual mileage index base====
===Average annual mileage index base====
The average annual mileage index is based on the mean of driver responses when asked how many miles they drive in a year, according to representative surveys.  This information is provided at postcode level by [[Experian]].  The value used to calculate the base is made proportional to the size of the community at each postcode by multiplying it by the corresponding population value.
The average annual mileage index is based on the mean of driver responses when asked how many miles they drive in a year, according to representative surveys.  This information is provided at postcode level by [[Experian]].  The value used to calculate the base is made proportional to the size of the community at each postcode by multiplying it by the corresponding population value.
 +
 +
This index is usually most appropriate for use in [[driver]] reports, or causalty reports which exclude pedestrians and vehicle passengers.  Becasue the underlying data is by its nature more approximate, particular care should be taken to ensure that reprots are based on an adequate sample of drivers.
===Algorithm===
===Algorithm===

Revision as of 13:57, 13 May 2010

An indexed report is intended to put data into context, by indicating whether individual items of information are over or under represented compared to an expected norm.

MAST indexed reports achieve this by displaying an index bar as well as a measure for each data point in chart view.

Contents

Mosaic Profiles with indexing

At present, the only type of indexing implemented in MAST compares the backgrounds of people involved in crashes (represented by the Mosaic classification of their home postcodes) to a underlying norm referred to as the base value.

Reports only show indexing if all of the following circumstances apply:

Understanding Indices

Indices are expressed with a base value of 100: that is, an index value of 100 indicates that the corresponding data point is exactly representative of the underlying base, neither larger nor smaller than would be expected. Values over 100 indicate that a data point is over represented, while values under 100 indicate relative under-representation.

When interpreting index values, the following techniques should always be used:

  • Take no notice of Z in the Mosaic category because Mosaic Group Z represents postcodes which were missing in STATS19 returns. No index is calculated for Z, and the label Unknown appears instead. You could consider applying a Filter to remove it from the report altogether.
  • Look carefully at the chart value axis because sample size makes a big difference to the significance of indices - an index of 115 on a sample of 1,000 would probably be a reliable indication of over-representation, but an index of 180 on a sample of only 40 may not.
  • If you apply a Home filter, avoid large multiple selections and always clear the NK and Unknown boxes because basing the index calculation on eccentric collections of areas and missing postcodes will probably not generate meaningful results.
  • Don't use a Crash Location filter unless you have first applied a Home filter for a similar or a smaller area because the geographic extent of crashes should be consistent with or larger than the base - if this was not the case, the index calculation would probably be distorted.

Index calculation

Significance

MAST does not display indices which are based on such a small sample that the results are not reliable. Reports which meet the criteria listed above calculate an index for each data point, apart from those where:

  • the value is less than 30
  • the value is less than 1% of the total

If either or both cases apply, no index bar is displayed for the data point in question, and the label N/A appears instead.

Base

All indices require a base value to compare each data point against. MAST currently offers two different index bases, which are explained in greater detail below. Both bases use raw data which is derived at postcode level, so the granularity of the base is comparable to that of STATS19.

Population index base

The population index is based on the total number of people resident at each postcode, according to recent estimates. The base values used for this calculation are visible to power users in Population reports.

This index is usually most appropriate for use in casualty reports which include pedestrians or vehicle passengers.

Average annual mileage index base=

The average annual mileage index is based on the mean of driver responses when asked how many miles they drive in a year, according to representative surveys. This information is provided at postcode level by Experian. The value used to calculate the base is made proportional to the size of the community at each postcode by multiplying it by the corresponding population value.

This index is usually most appropriate for use in driver reports, or causalty reports which exclude pedestrians and vehicle passengers. Becasue the underlying data is by its nature more approximate, particular care should be taken to ensure that reprots are based on an adequate sample of drivers.

Algorithm

The method of calculation is as follows:

( {Value of data point} / {Value of all data points} ) / ( {Base corresponding to data point} / {Total Base} ) * 100

Note that {Value of all data points} excludes the value of any point representing unknown postcodes, in order to prevent them from distorting the index calculation. No index is calculated for such points, and the label Unknown appears instead.

Example

The Drivers Mosaic Profile public report has been filtered to a single area with the Driver Home dimension.

  • The report shows that out of a total of 1,328 drivers involved in crashes and known to reside in that area, 223 came from communities classified as Mosaic Group H (16.8%)
  • From a total area population of 76,589, communities classified as Mosaic Group H account for 11,278 (14.7%)

The index value for the H data point is 114 ( 16.8% / 14.7% * 100 )

Personal tools