Indexed reports
From Wiki
(35 intermediate revisions not shown) | |||
Line 1: | Line 1: | ||
- | __FORCETOC__An indexed report is intended to put data into context, by indicating whether individual items of information are over or under represented compared to | + | __FORCETOC__An indexed report is intended to put data into context, by indicating whether individual items of information are over or under represented compared to an expected norm. |
- | MAST indexed [[reports]] | + | MAST indexed [[reports]] achieve this by displaying an index bar as well as a [[measure]] for each data point in [[chart view]]. |
==Mosaic Profiles with indexing== | ==Mosaic Profiles with indexing== | ||
- | + | At present, the only type of indexing implemented in MAST compares the backgrounds of people involved in crashes (represented by the [[Mosaic]] classification of their home [[postcodes]]) to an underlying norm referred to as the ''base'' value. | |
- | Reports | + | Reports only show indexing if ''all'' of the following circumstances apply: |
*The report is based on either [[Vehicles]] or [[Casualties]] | *The report is based on either [[Vehicles]] or [[Casualties]] | ||
*The only [[dimension]] used as a [[category]] is either [[Driver Mosaic]] or [[Casualty Mosaic]] | *The only [[dimension]] used as a [[category]] is either [[Driver Mosaic]] or [[Casualty Mosaic]] | ||
- | *The report contains no [[series]] | + | *The report contains no [[series]] other than the [[measure]] |
*The report filters include either [[Driver Home]] or [[Casualty Home]] | *The report filters include either [[Driver Home]] or [[Casualty Home]] | ||
- | **''NOTE:'' | + | *The report filters does ''not'' include either [[Driver Home Small Area]] or [[Casualty Home Small Area]] |
+ | |||
+ | **''NOTE:'' The required 'Home' filter only has to be present - it does not necessarily have to be applied (so it could be set to '''All''') | ||
+ | |||
+ | |||
+ | MAST also includes historical access to the previous version of Experian's dataset, [[Mosaic 6 Public Sector Groups |Mosaic 6 Public Sector]]. Instructions on how to convert from [[Mosaic]] to the previous version can be found at [[Using Mosaic in MAST reports]] | ||
+ | |||
+ | ==How Indices are displayed== | ||
+ | |||
+ | Index values are shown as a superimposed secondary data series on column charts in [[Chart view]]. | ||
+ | |||
+ | Indices are not displayed as values in [[Grid view]], but they do appear as values in Excel spreadsheets downloaded using the [[Download Excel]] link. | ||
==Understanding Indices== | ==Understanding Indices== | ||
- | Indices are expressed with a base value of 100: that is, an index value of 100 indicates that the corresponding data point is exactly representative of the underlying | + | Indices are expressed with a base value of 100: that is, an index value of 100 indicates that the corresponding data point is exactly representative of the underlying base, neither larger nor smaller than would be expected. Values over 100 indicate that a data point is over represented, while values under 100 indicate relative under-representation. |
When interpreting index values, the following techniques should always be used: | When interpreting index values, the following techniques should always be used: | ||
- | *''' | + | *'''Take no notice of Z in the Mosaic [[category]]''' because [[Z|Mosaic Group Z]] represents postcodes which were missing in [[STATS19]] returns. No index is calculated for Z, and the label '''Unknown''' appears instead. You could consider applying a [[Filter]] to remove it from the report altogether. |
- | *'''Look carefully at the chart value axis''' because sample size makes a big difference to the significance of indices - an index of 115 on a sample of 1,000 would probably be | + | *'''Look carefully at the chart value axis''' because sample size makes a big difference to the significance of indices - an index of 115 on a sample of 1,000 would probably be a reliable indication of over-representation, but an index of 150 on a sample of only 40 may not. |
*'''If you apply a Home filter, avoid large multiple selections and always clear the NK and Unknown boxes''' because basing the index calculation on eccentric collections of areas and missing postcodes will probably not generate meaningful results. | *'''If you apply a Home filter, avoid large multiple selections and always clear the NK and Unknown boxes''' because basing the index calculation on eccentric collections of areas and missing postcodes will probably not generate meaningful results. | ||
- | *'''Don't use a Crash Location filter unless you have first applied a Home filter for a similar or a smaller area''' because the geographic extent of crashes should be consistent with or larger than the base | + | *'''Don't use a Crash Location filter unless you have first applied a Home filter for a similar or a smaller area''' because the geographic extent of crashes should be consistent with or larger than the base - if this was not the case, the index calculation would probably be distorted. |
+ | |||
+ | ==Index Bases== | ||
+ | |||
+ | All indices require a base value to compare each data point against. MAST currently offers two different index bases, which are explained in greater detail below. Both bases use raw data which is derived at postcode level, so the granularity of the base is comparable to that of [[STATS19]]. | ||
+ | |||
+ | ===Population index base=== | ||
+ | |||
+ | The population index is based on the total number of people resident at each postcode, according to recent estimates. The base values used for this calculation are visible to [[power users]] in Population reports. | ||
+ | |||
+ | This index is usually most appropriate for use in casualty reports which include [[pedestrians]] or vehicle [[passengers]]. | ||
+ | |||
+ | ===Average annual mileage index base=== | ||
+ | |||
+ | The average annual mileage index is based on the mean of driver responses when asked how many miles they drive in a year according to representative surveys, when weighted by regional trends. This information is provided at postcode level by [http://strategies.experian.co.uk/ Experian], the creators of Mosaic. The value used to calculate the base is made proportional to the size of the community at each postcode by factoring in the corresponding population value. | ||
+ | |||
+ | This index is usually most appropriate for use in [[driver]] reports, or casualty reports which exclude pedestrians and vehicle passengers. Because the underlying data is by its nature more approximate, particular care should be taken to ensure that reports are based on an adequate sample of drivers. | ||
==Index calculation== | ==Index calculation== | ||
+ | |||
+ | ===Significance=== | ||
+ | |||
+ | MAST does not display indices which are based on such a small sample that the results are not reliable. Reports which meet the criteria listed above calculate an index for each data point, apart from those where: | ||
+ | |||
+ | *the value is less than 30, or | ||
+ | *the value is less than 1% of the total | ||
+ | |||
+ | If either or both cases apply, no index bar is displayed for the data point in question, and the label '''N/A''' appears instead. | ||
===Algorithm=== | ===Algorithm=== | ||
Line 31: | Line 67: | ||
The method of calculation is as follows: | The method of calculation is as follows: | ||
- | ( {Value of data point} / {Value of all data points} ) / ( { | + | ( {Value of data point} / {Value of all data points} ) / ( {Base corresponding to data point} / {Total Base} ) * 100 |
- | + | Note that ''{Value of all data points}'' excludes the value of any point representing unknown postcodes, in order to prevent them from distorting the index calculation. | |
- | + | No index is calculated for a data point which contains only unknown values. The label '''Unknown''' appears instead. | |
- | + | ===Example=== | |
- | + | ||
- | + | ||
- | + | ''The Drivers Mosaic Profile public report has been filtered to a single area with the Driver Home dimension.'' | |
- | + | *The report shows that out of a total of 1,328 drivers involved in crashes and known to reside in that area, 223 came from communities classified as Mosaic Group H (16.8%) | |
+ | *From a total area population of 76,589, communities classified as Mosaic Group H account for 11,278 (14.7%) | ||
- | + | '''The index value for the H data point is 114 ( 16.8% / 14.7% * 100 )''' |
Current revision as of 17:07, 21 February 2022
An indexed report is intended to put data into context, by indicating whether individual items of information are over or under represented compared to an expected norm.
MAST indexed reports achieve this by displaying an index bar as well as a measure for each data point in chart view.
Contents |
Mosaic Profiles with indexing
At present, the only type of indexing implemented in MAST compares the backgrounds of people involved in crashes (represented by the Mosaic classification of their home postcodes) to an underlying norm referred to as the base value.
Reports only show indexing if all of the following circumstances apply:
- The report is based on either Vehicles or Casualties
- The only dimension used as a category is either Driver Mosaic or Casualty Mosaic
- The report contains no series other than the measure
- The report filters include either Driver Home or Casualty Home
- The report filters does not include either Driver Home Small Area or Casualty Home Small Area
- NOTE: The required 'Home' filter only has to be present - it does not necessarily have to be applied (so it could be set to All)
MAST also includes historical access to the previous version of Experian's dataset, Mosaic 6 Public Sector. Instructions on how to convert from Mosaic to the previous version can be found at Using Mosaic in MAST reports
How Indices are displayed
Index values are shown as a superimposed secondary data series on column charts in Chart view.
Indices are not displayed as values in Grid view, but they do appear as values in Excel spreadsheets downloaded using the Download Excel link.
Understanding Indices
Indices are expressed with a base value of 100: that is, an index value of 100 indicates that the corresponding data point is exactly representative of the underlying base, neither larger nor smaller than would be expected. Values over 100 indicate that a data point is over represented, while values under 100 indicate relative under-representation.
When interpreting index values, the following techniques should always be used:
- Take no notice of Z in the Mosaic category because Mosaic Group Z represents postcodes which were missing in STATS19 returns. No index is calculated for Z, and the label Unknown appears instead. You could consider applying a Filter to remove it from the report altogether.
- Look carefully at the chart value axis because sample size makes a big difference to the significance of indices - an index of 115 on a sample of 1,000 would probably be a reliable indication of over-representation, but an index of 150 on a sample of only 40 may not.
- If you apply a Home filter, avoid large multiple selections and always clear the NK and Unknown boxes because basing the index calculation on eccentric collections of areas and missing postcodes will probably not generate meaningful results.
- Don't use a Crash Location filter unless you have first applied a Home filter for a similar or a smaller area because the geographic extent of crashes should be consistent with or larger than the base - if this was not the case, the index calculation would probably be distorted.
Index Bases
All indices require a base value to compare each data point against. MAST currently offers two different index bases, which are explained in greater detail below. Both bases use raw data which is derived at postcode level, so the granularity of the base is comparable to that of STATS19.
Population index base
The population index is based on the total number of people resident at each postcode, according to recent estimates. The base values used for this calculation are visible to power users in Population reports.
This index is usually most appropriate for use in casualty reports which include pedestrians or vehicle passengers.
Average annual mileage index base
The average annual mileage index is based on the mean of driver responses when asked how many miles they drive in a year according to representative surveys, when weighted by regional trends. This information is provided at postcode level by Experian, the creators of Mosaic. The value used to calculate the base is made proportional to the size of the community at each postcode by factoring in the corresponding population value.
This index is usually most appropriate for use in driver reports, or casualty reports which exclude pedestrians and vehicle passengers. Because the underlying data is by its nature more approximate, particular care should be taken to ensure that reports are based on an adequate sample of drivers.
Index calculation
Significance
MAST does not display indices which are based on such a small sample that the results are not reliable. Reports which meet the criteria listed above calculate an index for each data point, apart from those where:
- the value is less than 30, or
- the value is less than 1% of the total
If either or both cases apply, no index bar is displayed for the data point in question, and the label N/A appears instead.
Algorithm
The method of calculation is as follows:
( {Value of data point} / {Value of all data points} ) / ( {Base corresponding to data point} / {Total Base} ) * 100
Note that {Value of all data points} excludes the value of any point representing unknown postcodes, in order to prevent them from distorting the index calculation.
No index is calculated for a data point which contains only unknown values. The label Unknown appears instead.
Example
The Drivers Mosaic Profile public report has been filtered to a single area with the Driver Home dimension.
- The report shows that out of a total of 1,328 drivers involved in crashes and known to reside in that area, 223 came from communities classified as Mosaic Group H (16.8%)
- From a total area population of 76,589, communities classified as Mosaic Group H account for 11,278 (14.7%)
The index value for the H data point is 114 ( 16.8% / 14.7% * 100 )