Call: +44(0)20 7428 1255

Matchcode FAQs

General Questions

Data Formats

Creating A New Job File

Running Batches

Typical Scenarios

Troubleshooting

Matchcode Fields

Matchcode Flags

Glossary


General Questions

What is the purpose of Matchcode software?

Matchcode is an address management package, developed to identify UK addresses using the Royal Mail PAF file as a reference source. It can also use other data sets such as Commercial PAF. Processing can be in either batch or interactive (Manual) mode.

Once Matchcode has successfully identified a UK address it can do a number of things including:

  • Return a clean version of the address as stored on PAF.
  • Supply a postcode for the address.
  • Supply a number of other address related codes and information associated with the address.

Can my address file be damaged or changed in any way, when using Matchcode?

No. Matchcode never writes to your original file at all. Each time you process a file you create a new output file only. This allows you the freedom to experiment with different settings until you are satisfied with the results.

Does Matchcode produce any kind of report to show how successful a batch has been?

Yes, to enable the log file option, open both the Batch and Job Control programs. Then from the Options menu of each, select Log File. This will open a dialog box in which you can opt to save processing results in a log file. Once this has been done once, Matchcode will always create a log file containing processing statistics in the future. The log file will be created in the same location as the Job File and will use the Job File name with a .LOG extension. It is a simple text file that can be viewed with Notepad or a similar application

The log file stores information about:

  • Filenames
  • PAF files used
  • Batch success criteria
  • Input postcode quality
  • Input address quality
  • Processing speed
  • Success level achieved

Data Formats

What types of file can Matchcode use?

Address information is stored on company systems in many different forms, from simple spreadsheets to sophisticated CRM systems. In order to process your address list with Matchcode you need to produce a text document or file. Most systems provide a facility to ‘Export’/’Save As’ a text file. Usually this is called a CSV file or Character Delimited Values. This type of file is the most suitable for Matchcode to use, as it requires the least setting up time. Alternatively, a fixed length field/record file can be used. In either case, it needs to be an ASCII text file.

My file comes from a Unix system and has only Linefeed record terminators. Can Matchcode handle these?

Yes. Matchcode can handle all common variants of line termination, including no terminator at all.

Why do I need to export my data into a text file to use Matchcode?

It would be possible in some cases to provide access to various file types directly or through OCDB interfaces. This would make life easier in the short term because it would eliminate the need to export files as text. However, consider the consequences of working on a live database file directly. Apart from the record locking issues etc. there would always be the possibility that you could make a mistake in the output format of your job. This would effectively destroy your contact information, leaving you to rely on backups to restore your system. Also you may wish to add new fields to your file, which were not previously part of the layout.

I have had problems processing my CSV file, are there any known pitfalls with this type of file?

There is one particularly common problem associated with CSV type files. This comes about because these files typically use the Comma character as a field delimiter. In other words, a Comma separates the individual fields within a CSV type record from each other. Other characters can be used, such as the Pipe (‘|’) or Hash (‘#’) but this is less common.

A problem sometimes occurs when fields within the record contain Commas of their own, as part of the text. Normally this would not be a problem because a well-constructed CSV record that uses Comma delimiters would have Quote characters around the fields. These are called text qualifiers.

Example:

“000001”,”MR JOE SMITH”,”1 THE CRESCENT, OFF THE HIGH STREET”,”THE VILLAGE”,”THE TOWN”,”AB1 1AB”

As you can see from the example above, a Comma within the third field separates ‘1 THE CRESCENT’ from ‘OFF THE HIGH STREET’. The problem arises when the fields do not have Quotes around them. When this happens, Commas such as in the above example are interpreted as additional field delimiters. Obviously this can be a problem because the file suddenly has an inconsistent format.

To overcome this we should ensure that exported CSV type Comma delimited files have text fields enclosed in double-quotes. Alternatively some other delimiter could be used, such as the Pipe character as mentioned above. If neither of these options is available try outputting the file as fixed length text.

My system can only produce fixed length records can I process this kind of file?

Yes you can. By default Matchcode expects Comma delimited variable length records. But its simple to just change this by specifying field lengths instead.


Creating A New Job File

What information do I need before I can clean my address file?

There are three basic groups of information that Matchcode will need.

  1. Details about the content and format of your existing address file.
  2. Details about the content and format of your proposed new address file.
  3. The matching parameters and levels to use during processing.

This information is supplied to Matchcode and stored within what we call a Job File. The Job File is created using the Job File Wizard from within the Job Control software. Once the necessary information has been entered, the Job File can be saved to disk. Because the Job File is one single file, it can easily be copied, edited to be reused for further jobs or passes. To start to create a Job File click New from the Job Control file menu.

I have several non-address fields in my file. How does Matchcode handle these?

When you are specifying your Input Fields, you will click the Add button. There are a number of fields shown, the first in the list is User Field. Any fields that are not in the list, in other words non-address related or persons name (if Electoral Roll data is not available) are specified as User Fields. If these fields are also put into the output format, then Matchcode will simply transfer this information across to the new file untouched.

If my input records are fixed length, can I output CSV?

Yes you can. Within Matchcode Job Control you can define the input and output file formats completely independently of each other. So it is very simple to output CSV from a fixed length record. In fact each field of both input and output can be defined independently.

I have a huge number of fields in my file, including some large notes fields. Can I process this file?

Matchcode can handle a large number of fields but notes fields, can be a problem. Many Windows applications that support notes fields allocate several thousand bytes per notes field. Matchcode’s cannot cope with fields this big. Ideally in a situation such as this, we would recommend exporting just the relevant fields plus some kind of unique identifier. By doing this it makes setting up a job quick and simpler. Once batch processing has finished you can import the file and update the relevant fields using the unique identifier as your link.

Are there any good general settings to use when outputting new addresses?

PLEASE NOTE: These settings are provided as a guide only and should not be used without careful testing.

Consumer Address File (Names Level – Assuming Electoral Roll is available)
Success Criteria Postcode to Delivery Point Suffix level
Voter Name
Advanced Options Accept postcode changes
Output the input address if matching is unsuccessful
Names Option = Address Improvement mode
Fuzzy Matching Advanced fuzzy
Max Towns=3
Consumer Address File (Address Level Only)
Success Criteria Postcode to Delivery Point Suffix level
Advanced Options Accept postcode changes
Output the input address if matching is unsuccessful
Fuzzy Matching Advanced fuzzy
Max Towns=3
Business Address File (Matching to Company Name level)
Success Criteria Postcode to Delivery Point Suffix level and Address Key
Advanced Options Accept postcode changes
Output the input address if matching is unsuccessful
Assign an address key only if top of address matches
Fuzzy Matching Orthographic
Max Towns=3
Business Address File (Where we are not outputting a new Company name)
Success Criteria Postcode to Delivery Point Suffix level
Advanced Options Accept postcode changes
Output the input address if matching is unsuccessful
Fuzzy Matching Orthographic
Max Towns=3
Any File (Postcode Level Only)
Success Criteria Postcode
Advanced Options Accept postcode changes
Fuzzy Matching Orthographic
Max Towns=3

For more information about Fuzzy Matching settings see General


Running Batches

Can anything be done to increase the speed of a batch?

Yes, several factors impact upon the performance of the Batch software.

  1. With regard to the machine itself, Memory and disk I/O speed directly affect the speed. More memory and faster disk I/O will improve performance significantly. This is because the process of searching PAF involves heavy disk I/O and so the quicker this information can be retrieved and the more that can be committed to cached memory the better. Also, and for the same reasons as above, it is helpful to sort the input file on the postcode and address fields to organise the records into address order as much as possible.
  2. Other factors include not setting the location of the RCDB files and other additional data sets, if these are not required. This is because Matchcode will retrieve information from these files whether they are used in the output file or not.
  3. Poor input address quality will cause Matchcode to spend longer searching for a match though of course this cannot be avoided.
  4. Make sure that both data files and the PAF database are held on the local machine rather than across a network if possible.

You may also consider:

  • Not checking for foreign addresses.
  • Not using input contact names (If Electoral Roll data is available) if this is not providing useful benefits.
  • If the file contains a lot of unused, non-address fields, try creating a cut-down file of address related fields only.

I have several batches to run. Do I need to wait until a batch has finished before starting another one?

No. Several batches can be run simultaneously, though this will of course affect the speed of each running batch. The number of batches that can be run at the same time is limited by the available memory.

My address file contains several million records. Can Matchcode cope with this?

Yes it can. Matchcode deals with each record sequentially so it doesn’t matter how many records you have in the file. However you may want to consider splitting the file up into smaller batches simply because other external factors may cause a problem with a batch and so it may be sensible to not put all of ones eggs in one basket. Having said this, a reasonably well-specified machine, dedicated to running the batch should be able to process more than a million records overnight. Also you may wish to bear in mind that the output file may be as large, if not larger than the input file. So make sure that you have enough available disk space to handle this output file.


Typical Scenarios

I want to get as many matches from my file as possible, I don’t mind spending a little extra time on it. What can I do?

A strategy of Multiple Passes is often the best way to go. Starting with the entire file and very strict match settings we can put successful matches aside, and continue with just the rejects. This process can be repeated, reducing the match level with each pass until no more matches can be found.

My file contains a mixture of consumer and business addresses. Can Matchcode process this file?

Yes. Matchcode can handle files of mixed address content. It may be best to run multiple passes, see Multiple Passes. In this way Matchcode can be tuned to get optimum results for each type. Each pass can be set to pick up certain address types. This method is particularly effective if Electoral Roll data is available. For example Pass1 could be set to use names and the criteria can be set to insist on a persons name match. In this way all the private addresses, which have matched names, can be removed from the file as successful matches first. Secondly a pass to insist on Company name matches can be used. Gradually we are left with the addresses that do not easily fit into these categories.

I often clean the same files regularly. Can I reuse an old Job File that was used for this file previously?

Yes you can. Simply make a copy of it using Windows Explorer or some similar program. Rename the copy to something meaningful and then open it. You will need to change the Input and Output filenames and also re-count the records. Once you have done this you can save and run the new batch.

I want to update my addresses. Should I replace them in my Output Format layout or append to my records?

You could simply create an output format that mirrors your input format with Formatted Address Lines substituting your Input Address Lines. And while this is a perfectly acceptable method, it imposes some limitations. For example it is useful to include a Process Code within the output records to indicate the records that have been matched. Also there are situations where you may want to carry out some post-processing on your file, which would be easier to achieve if new fields were appended. Another situation that requires access to a more complete set of output format arises when we have sole-trader type addresses within our file, see Some successfully coded records have only partial output addresses, or incorrect building numbers. Why?

Usually we would recommend appending information to your records and therefore giving yourself the opportunity to examine and update the resulting file using a database package if required.

I have a file of very poor quality addresses. Is there any way to flag the hopeless cases?

It is difficult to suggest a reliable method of doing this because simply setting a very low match level, say area code, does not guarantee that failed records are actually of no use. In fact the reverse can sometimes be the case. Take for example a situation where we can match some of these records satisfactorily to postcode level, and yet others cannot even be assigned area codes. It may seem that the records that matched to postcode level stand a far better chance of being good, and possibly suitable candidates for manual processing than those that failed to reach area code. However, you may discover that the postcode level records failed because having been matched to street level quote easily, they were rejected because the building number or name on the input simply does not exist, and could never be found. On the other hand the records that failed to reach area code, may just need slight correction to the town name for them to fall fully into place, their building names or numbers being accurate but let down by poor area (town/locality) information. So you see, it is not easy to make a reliable rule that can be applied to identify such hopeless records.

That said, it might be possible to run a pre-process to remove records that have very little information in the address fields. In most cases though, creating and running such a process would take longer to complete than it would take Matchcode to process and reject them.

My available address layout is only 4 fields of 30 bytes. Can Matchcode produce an address this small?

Yes but you need to use the Advanced Address Formatting utility to do it safely. This set of routines attempts to force an address into a smaller than recommended space by using various techniques including Word Abbreviation, Field Concatenation, Field Truncation and Field Elimination. How this is done can be carefully controlled using the interface provided.

Can anything be done with the residue records that the Batch system cannot find?

Yes, you could try cleaning the records manually using the Interactive program. If you open the Job File using the Interactive program after your batch and select to view uncoded records when prompted. You can then use the search mechanisms provided to search for and then save your successful results to the output file. There will typically be around 20%-30% of the residue records that could be found using manual techniques.

I regularly process the same file, some records manually; do I have to do these same records each time?

Probably not, you could consider including Address Keys & Organisation Keys in your output file format. This would make it possible to run an Address Generation batch on the file at some later stage. In this way you wouldn’t need to search for the addresses each time. Instead you would do a very thorough job the first time, and once the records were found, either in batch or interactive mode, they would have Address Keys & Organisation Keys assigned which could be used to regenerate the required address fields or postcodes from each new version of PAF.

I have a file of consumer addresses to clean. I don’t have time to run multiple passes. What should I do?

If you plan to create an output file that is exactly the same format as your input file you should:

  1. Define your input layout specifying all address lines etc. as usual.
  2. Create an output format in which your original address lines are substituted with a Formatted Address
  3. Set the Success Criteria code as Postcode, which should be to Delivery Point Suffix level.

    Success Criteria

  4. On the Advanced Options dialog select Accept postcode changes, and Output the input address if matching is unsuccessful.

    Advanced Cross-Matching Options dialog


  5. Highlight the output postcode field and click Advanced to set the source as Matched – Input – Floating

    Faqsmcd3.jpg (24925 bytes)

Make any final adjustments to the Job File and run the batch. Matched addresses will be replaced along with postcodes. Unmatched records will have the original address carried across to the new Formatted Address lines.


Troubleshooting

The Test Read button reported an error. What can I do?

If Test Read reported a problem, first check that the format that you have specified, agrees with the actual file layout. At a basic level you could start this by clicking on the View File button and simply checking the layout visually and counting fields. You could also try the Analyse File button to see if the format is in fact consistent but perhaps not as you have specified it. If this fails you may need to go to the data file itself and check the format using some other method. If your file is of the CSV type, perhaps a variable length Comma delimited structure, make sure that your fields have got double-quote field qualifiers. See I have had problems processing my CSV file, are there any known pitfalls with this type of file?

Some successfully coded records have only partial output addresses, or incorrect building numbers. Why?

If you are outputting new replacement addresses, you need to make sure that you have set the Batch Success level correctly. By default Matchcode sets the Batch Success to postcode level in a new Job File. As most postcodes, and in particular consumer address postcodes are shared by several addresses. Matching to this level does not guarantee a full address match. You need to set the Batch Success level to Postcode in the top half of the dialog box, and change it from Postcode in the bottom half of the dialog to Delivery Point Suffix. A Delivery Point Suffix or DPS relates to individual addresses or delivery points. And so to achieve a DPS level match, Matchcode must find the whole address. Once this has been changed, re-run the batch and you should see that the bad matches are now marked as rejects.

I seem to be getting a very low match rate on what I think is a reasonably good file. What can I do?

Once you have confirmed that your layout is correct and that Matchcode is using all of your address lines including any company names and postcodes. You should check what Advanced settings you are using. One of the ones that you need to check in particular is ‘Allow Postcode Changes’. By default this tick box is not checked. You should change it by putting a tick in the box. Without this box checked, Matchcode will reject all records where it cannot agree with any original postcodes that you may have. By making this change and re-running the batch you are allowing Matchcode to supply postcodes even if they are different from your original ones.

I have examined the output file from a business file and some of the company names are missing. Where are they?

Providing that you are sure that your match levels are set correctly then it is probably safe to say that the records in question are sole-trader type addresses. What happens is that Matchcode safely Matches the address you have provided, but on PAF the address is a private residential property. Perhaps operating as a business. Consequently there is no business name on PAF so Matchcode can only return the private address. The solution to this problem is to separate the output company name from the address so that this situation can easily be checked for, after the batch. To do this you need to create an output format that starts with the PAF Address element Organisation and is followed by Formatted Address lines, which exclude the Company name itself.

In this way, rather than producing for example a 5 line Formatted Address, you produce a 4 line Formatted Address plus a company name in a field of it’s own. Later you can check to see if the company name field is populated, and if not, populate it using the original company name from your input address. This can be achieved quote easily using a database package.

To exclude the company name from a Formatted Address, simply click on any one of the Formatted Address fields in the Output Format tab and then click on the Advanced button to the right. In this dialog you have various options, one of which is to remove the organisation name from the address.

My building number ranges used to have Hyphens between the numbers, now they don’t, what can I do?

Punctuation characters such as these are removed from PAF and so number ranges such as 5-7 THE HIGH STREET become 5 7 THE HIGH STREET. However this problem can be overcome by using Advanced Address Formatting

My batch has finished with a message indicating that it didn’t find as many records as I specified. What has happened?

Sometimes a batch may finish before it reaches the number of records specified in the Job File. Even though count records was used in the first place. This usually means that you are dealing with a CSV type file that has fewer fields than you specified in your input format. To check this try Analyse File and Test Read to see if they confirm your specified layout.

My output records are fixed length. I have specified them correctly but Matchcode insists that there is a problem. Why?

Some systems when producing fixed length records, do not pad out the last field to a consistent length. Effectively making the file variable length. Try specifying your last input field as being variable length rather than fixed. This may cure the problem.

The Town field in my formatted address is in upper case. Can I fix this?

Yes, simply go to the output format tab of your Job File. Select any one of the Formatted Address lines and click on the Advanced button. Click on Format the town in upper and lower case.

My file of business addresses is coding very poorly, what can I do?

Make sure that you have specified the company name as Address Line 1. Matchcode needs to see all available address information to get the best and safest results.

I have a file of mixed addresses, business and consumer, but only the business records are matching. Why?

Make sure that you haven’t set Organisation Key as one of your match criteria. Doing so would force Matchcode to only flag as successful, addresses where it could assign a positive Organisation Key. This would mean that any Large User or Residentail type addresses would be rejected as neither of these types ever have Organisation Keys.

Alternatively, if you have the commercial data set BUSINESS.PAF make sure that it isn’t the only PAF being used.

I want to examine my uncoded records using Interactive. But Interactive complains about the output file that batch created!

Batch processing can mess up an output file format if you started with an input file that had Double-Quote field qualifiers which you forgot to put back into the output file. In other words a perfectly well formatted input file can become a badly formatted output. This happens because Matchcode removes the double quotes as it handles each record. Consequently you need to make sure that all output fields are Double-Quoted. This can be done from the Output Format tab of Job Control.


Matchcode Fields

(Not all fields are covered in this section)

Input Fields

User Field

User Field is one of the available input fields. It should be used to refer to any input field or fields that Matchcode does not use. If the input field cannot be found amongst the other available fields, such as Input Address Lines then it should be referred to as a User Field.

So typically things such as Unique Reference Numbers, Telephone numbers and other contact specific details would be referred to in this way.

Input Addr Key and Input Orgn Key

These are used if the batch is of the Address Generation type. See. Address Keys & Organisation Keys

Address Line (1-7)

These fields are used to specify any of the Address or Company name fields. Postcodes can either be specified as address lines, or as Input Postcode if they are in a field f their own

Input Postcode

If the input file has a specific postcode field, then this field should be used to represent it.

Elector Fields

If Electoral Roll data is available and the input file has people’s names on consumer type addresses then these fields can be used to specify the name field(s). If the entire name is held within a single field then Elector Name should be used. The Elector Name field can also be used to specify surname.

Output Field Groups

Input Fields

These are the fields as specified in the input file.

PAF Codes

Postcode and Postcode Type

The PAF codes are, as the name suggests, taken directly from PAF. They include postcode. There is also a field called Postcode Type. This field indicates the address type (L=Large User: S=Small User: ‘ ‘=Unclassified).

Outcode / Incode / DPS Code

These are the 3 parts of the postcode split out.

AB10 1AJ 1A

AB10 = Outcode
1AJ = Incode
1A = DPS Code

Address Keys & Organisation Keys

These two 8 character codes can be assigned to addresses that have been matched to Delivery Point level. They are generally used together. All properties that receive mail should have their own Address Key. Unlike postcode, which can change from time to time, the Address Key should remain static. In some cases, several delivery points share a single address key. This can happen when several business are operating from a single building. To see an example of this, in Matchcode Data Capture select Search|Address Key Lookup..., enter the following Address Key, and press Find

Address Key = 29402275

It will return a small list of companies each operating from

S G HOUSE
41 TOWER HILL
LONDON
EC3N 4DU

Each of these organisations share the above Address Key, but the each have their own Organisation Key and their own DPS. If you then remove the postcode from the address elements and browse (F5) on the address itself, you will notice that another organisation is added to the list. This organisationm, as you will notice, has it’s own postcode and address key. If you look at the address key you will see that it starts with the number ‘6’. This indicates that this is a Large User.

Because of their constant nature, Address Key and Organisation Keys are useful as part of a dedupe.

Also if a certain large file of addresses need to be regularly updated. And an element of manually processing may be part of this process then it may be useful to attach these keys to the output file. In this way, next time the file is processed, an Address Generation job could be used to generate new postcodes etc for the file without requiring manual processing again.

By examining the Address Key and Organisation Key it is possible to get some idea of the address type.

IF THE ADDRESS KEY STARTS WITH THE NUMBER ‘6’ OR ‘7’
   THEN THE ADDRESS IS A LARGE USER
ELSE
   IF THE ORGANISATION KEY > ZERO
      THEN THE ADDRESS IS A SMALL USER ORGANISATION
   ELSE
      THE ADDRESS IS UNREGISTERED AND MAY BE A RESIDENTAIL ADDRESS
   ENDIF
ENDIF

PAF Address

These are the raw PAF elements and can be used to create a structured address where no formatting of field concatenation takes place.

Formatted Address

This group contains the formatted version of the PAF address where elements such as Building Number and Street are concatenated together into a single field. Up to 7 fields can be used and either basic or Advanced Address Formatting can take place to structure the address.

Special Fields

This group contains various fields such as date and record sequence number but it also contains the flags that Matchcode can output to indicate matching success level etc. See: Process Code and Non-PAF Status and AddrFrmt Status

RCDB Fields

This group contains various other address related codes such as O/S Grid Reference Codes.

Elector Fields

This group contains Electoral Roll names elements etc.


Matchcode Flags

Process Code

The Process Code can be found by going to the Output tab of an open Job File. Clicking on the Add button and the selecting it from the Special Fields group. It is a single character which will indicate one of four states for each record. These states are

Automatic The record has been automatically matched to the requested level.
Manual The record has been saved manually using the Interactive software.
Uncoded The record has not been automatically matched to the requested level.
Foreign The record is a suspected foreign address.

Although by default these codes are represented as A/M/U/F they can be changed by clicking on the Advanced button of the output tab when Process Code is selected.

Non-PAF Status

This flag should be used whenever Non-PAF elements are retained. It is a single character and contains a number. It can be interpreted as follows:

0 No Non-PAF elements were found
1 Non PAF element found and successfully retained
2 Non PAF element found but could not be successfully retained
3 A combination of 1 & 2

If we combine its use with the option not to flag PNR localities, we can then use this flag to help us to examine the output data in a more focused way. Concentrating on truly unrecognised elements that were kept. These will be found in records flagged with either 1 or 3.

AddrFrmt Status

This flag should be used when the Advanced Address Formatting is used. It will contain a number which is made of a combination of numbers that represent the methods used within the address. The basic numbers are as follows:

‘0’ No formatting required
‘1’ Abbreviation took place
‘2’ Concatenation took place
‘4’ Truncation took place
‘8’ Field elimination took place

Example (1)

Abbreviation + Truncation + Field Elimination = 1 + 4 + 8 = 13

Example (2)

Abbreviation + Concatenation = 1 + 2 = 3


Glossary

Advanced Address Formatting

Matchcode has advanced address-formatting features, which can be used to format address in a more controlled way.

Some of the things that can be done with the advanced formatting are as follows:

  • Hyphens or Slashes can be put into number ranges.
  • Words can be abbreviated to shorten field lengths.
  • The word case of each address element can be controlled.
  • The field position of each address element can be controlled.
  • Field concatenation can be controlled.
  • Fields can be eliminated in any order of priority to save space.

These and other features within the formatting can be used to write addresses into very limited address lines. Or when a more ordered Formatted Address is required.

To use Advanced Formatting

  1. Make a local copy of ADDRFRMT.INI. This can be found in the installation directory.
  2. Open the required Job File and go to the output format tab.
  3. Click on any one of the Formatted Address lines.
  4. Click on the advanced button.
  5. In the dialog box click on ‘Use a configuration file for formatting options’
  6. Click on the Configuration File tab.
  7. Run the ‘Address Formatting Configuration tool from either the Run menu or from the Matchcode group
  8. Browse for a select your ADDRFRMT.INI file in the browser provided.
  9. Select the number of Formatted Address lines to use
  10. Click ‘Change Settings’

Analyse File

The Analyse button on the input format tab of an open Job File requests that Matchcode read the file and produce a simple report showing such information as field and record delimiters, maximum record length etc. Also it will show if the file dies not have a consistent format.

Test Read

The Test Read button causes Matchcode to use the specified input format layout to read the file as though it were running the batch. This is a way of simulating the read part of the batch to provide an early warning of any potential format errors.

Success Criteria

Perhaps the most important subject to understand when processing address files is the Success Criteria. This is because failure to set this correctly and unreliable results may occur.

Address Generation

The term Address Generation refers to using the Address Key and Organisation key to retrieve a single address from PAF without need for Cross Matching of the address itself. This can be done manually in Interactive mode or as a batch by specifying the position of Input Addr Key and Input Orgn Key in the input file and setting type of batch to Address Generation on the Batch Processing tab of Job Control.

Electoral Roll Names

If the Electoral Roll (Names) files are available then this can assist the Cross Matching process. Depending on set up, Matchcode uses any available names in the input address to disambiguate otherwise ambiguous addresses.

An example of where this may be useful is where an input record uses a building name, which is unknown to the Royal Mail. Lets say the input address is something like this:

MR J SMITHERS
DUNROAMIN COTTAGE
LITTLE AVENUE
SMALLTOWN
SOMEPLACE

Now lets assume that LITTLE AVENUE has ten properties on it. These are numbered, predictably 1 to 10. There are no registered building names. Now without the benefit of Electoral Roll names it would be impossible to do more than assign a postcode. The address itself would remain unmatched. However, if Matchcode is using Names it will examine a list of all the people in LITTLE AVENUE and if it finds a good match to MR J SMITHERS, it will return the address where he lives. Lets say number ‘7’.

If we combined this method with Add Non-PAF elements to the output address, and outputting names, we could create an output record like this

MR JASON SMITHERS
DUNROAMIN COTTAGE
7 LITTLE AVENUE
SMALLTOWN
SOMEPLACE

Note that we were able to provide the full name. Retain the unmatched building name and add the missing building number.

Add Non-PAF elements to the output address

This option is enabled from the ‘Advanced Cross Matching Options’ of the Batch Processing tab of an open Job File.

This option instructs Matchcode to keep, where possible, unmatched input address elements. Any information that Matchcode did not match during the Cross Matching is put back into the output address if space is available. This may include building names etc. It is advisable to run tests before committing oneself to retaining these unmatched elements to avoid unwanted results.

FOR

  • Preferred address details such as house names and localities can be put back into a clean version of the address.
  • Extra information such as department names may be retained though not actually on PAF.

AGAINST

  • Unmatched parts of the input address, which may not be desirable to keep, may be retained, e.g. ** UNPAID ACCOUNT **.
  • Matchcode may infer what it considers to be missing address elements, but are in fact just badly spelled and as a result we could get elements repeated. SURBITON GROVE, ZERBITTEN GROWVE for example.

We have some control over how Matchcode deals with PNR (Postally Not Required) Localities. These are locality names that are commonly known and used but not recognised by the Royal Mail, and therefore PAF. These are known to Matchcode and may be used to assist matching but play no part in the output. However we can specify whether these elements are retained and also whether Matchcode flags them as Non-PAF elements. If we choose to use the Non-PAF option we should include a flag to show what, if anything, has taken place. This flag is known as Non-PAF Status.

Multiple Passes

A strategy of multiple passes is a good way to get the best results from Matchcode. Effectively we use Matchcode to filter out records with each pass through the file. We should start by setting the highest possible levels of match, and gradually remove successful records, creating ever-smaller residue files, which, are processed using reduced levels of matching. Each level should be indicated in the output file by a change of Process Code so that when all possible passes have been completed and the resulting files merged back into a single file we could easily identify the match level of each record.

What to do after the first batch has finished

Once a pass has been completed, the files can be split ready to go through the next pass. This is achieved by selecting Extract Records from the File menu of Job Control. This will bring up a dialog box. In here we can first select Automatically Coded records from the Output file, and write these to a new file. These successful records can be set aside until we are ready to merge the output data. Next, we repeat the process, selecting all other categories (Uncoded / Foreign / Manual) from the input file. This file will become the input file for a new, smaller batch. All we need to do then is to take a copy of the Job File using Windows Explorer or some other software. Then we edit the parameters of this new job file to:

  1. Change the input and output filename to refer to the new residue file.
  2. Re-count the records.
  3. Make changes to the match level and parameters to apply to the new batch.
  4. Change the Process Code to indicate the new match level.

For example if we have access to Electoral Roll Names data we may carry out the following levels of matching.

Pass-1 Name & Address matches only – Process Code of ‘N’ for successful records.
Pass-2 Address matches only – Process Code of ‘A’ for successful records.
Pass-3 Postcode matches only – Process Code of ‘P for successful records.

So our output file will contain the following Process Code flags – N / A / P / U

Obviously by the time we are looking for postcodes only we would have to retain our original addresses rather than replacing them.

Fuzzy Matching dialog box

When you open a saved Job File and go to the Batch Processing tab you will see the Fuzzy Matching button. By clicking on this button you will bring up a dialog box in which you can control some of the ways Matchcode searches for your addresses.

The dialog box is split into 3 sections. These are General, Max Towns and Number Matching.

General

Orthographic is the basic kind of fuzzy matching which Matchcode uses. It allows for such errors as character transposition and missing or duplicated characters. In most cases this is probably the safest option to use when processing business addresses.

Advanced Fuzzy Matching does everything that Orthographic does but in addition it allows for missing or extra words, incorrect word order etc. It is a more heavyweight form of fuzzy matching and as such it will probably increase the number of successful matches. It can also increase the number of mismatches as more differences between input and output are allowed.

Check For Foreign causes Matchcode to search an internal list of known foreign town if a record fails to match. If the address is believed to be foreign then it is flagged accordingly

Employ User Codes See Manual for details

Max Towns

Settings this option to 2 or more causes Matchcode to continue searching when an initial town match fails to yield a successful match. This is particularly useful in situation where an address could contain more than one viable town name. This can occur when Localities have been promoted into Towns, or when records contain such things as ‘Near Guildford’.

Number Matching

By deselecting these options you can limit the amount of tolerance Matchcode applies when matching building numbers.

© Capscan Limited 2011. All rights reserved