Special Functions



There are three special functions





Samples of special function data extractions

Image of spreadsheet showing extract and conversion examples

The left hand column contains the original text and the remaining columns contain data converted and/or extracted from the left hand cell. The way that individual numbers are displayed when extracted depends on the formatting applied to the cells. In the fourth example, the value 888 is shown as 888.00 because the cell has been formatted to show two decimal places. All the cells in the top three blocks have been formatted to show two decimal places and to use commas between thousands.

Click the item below to get more detail

blue line

Extracting Multiple Numbers from a Single Cell

Introduction

Extracting multiple numbers from one cell allows several numbers in one cell to be recognized, extracted and either used directly in the calculator or pasted into spreadsheet cells.
Imported data may contain multiple numbers, such as comma delimited data that has not been separated into columns, information containing more than one number, such as 180 @ $56.95, or numbers embedded in text. In–Cell can extract numbers from text or multiple numbers.

What it Does

To extract numbers from a cell, move In–Cell's Active Cell to the cell with the multiple numbers, or select the cell in Excel and use the Resample button.
The button labelled numbers in the special functions area will run the extract routine. Image of the Multiple numbers extract group of buttons The display text box will display the first number extracted from the cell. Click on the Display button to show the next number extracted. When you get to the last number extracted, clicking Display again, brings up the first number again.
The number shown in the text box can be moved to Input 1 or Input 2, using Inp1 or Inp2 as appropriate. Paste, pastes the value in the box to In–Cell's Active Cell.
After the extract, the Active Cell can be moved using the scroll buttons, Image of the Active Cell address box and associated scroll bars or select a new cell on the spreadsheet and click the Resample button. Image of the Resample button Note that clicking Resample may enter new data into Input 1 and Input 2. If a single, empty cell is selected, Resample will not replace the content of either input.

The Paste button can be double–clicked, resulting in all the numbers extracted from the cell being pasted into adjacent cells. After clicking the Numbers button, the Active Cell is moved one place in the default direction. The default direction, (down, right or no–move), is user–definable, (see the Customize and options page). The double–click causes the numbers to be pasted into sequential cells in the default direction. If any of the cells contains data, a warning dialog is shown, advising the user that this overwrite cannot be undone. (A single click paste, does not include an overwrite warning, as a single paste operation can be reversed with the Undo paste button). If the move direction is set to 'none', only the first number extracted is pasted, i.e., the same as a single–click.

The extraction of multiple numbers is also dependent on another User–definable settings, the Comma value. The comma options are: 'Thousands', 'Delimiter' and 'Auto'.

The comma as thousand separator (and the auto setting), checks the numbers for valid thousands groupings, so that even if the comma as thousands separator option is selected, a comma that cannot be a thousands separator is considered to be a separator between numbers. For example 123,456,78 is not a valid single number, and is extracted as two numbers: 123456 and 78. If the Auto setting is chosen, then four numbers are extracted 123 / 456 / 78 and 123456. In the example in the table at the top of this page, the value 888,999,12,456 has six numbers extracted, and this was done in 'Auto' mode. In 'Thousands' mode only two numbers are returned, 888999 and 12456. In 'Delimiter' mode four numbers are returned 888 / 999 / 12 and 456.


How it Works

The multiple number extract function is contained in a class module, which is called by the calculator. The calculator passes the cell information to the class module, which returns the number of valid numbers extracted. The calculator also passes the value of the comma user option to the module. The extraction is based on recognition of characters which Excel recognizes as numbers.
Individual extracted numbers are called–up from the conversion module.

If the user setting is auto, the conversion module creates a copy of the information sent to it. The first copy retains commas and the second copy has all commas removed. This allows two conversions to be undertaken.
The class module also provides the conversion function for a cell formula '=MultiNum('cell reference' instance number' 'comma option'), where the cell reference points to the cell containing the data to be extracted, instance number is the number to be shown, i.e, if the cell has three numbers, this formula can be repeated three times with instance numbers of 1, 2 and 3. The comma options in the formula are "Thou", "Delim" or "Auto". e.g., =(MultiNum($C$10, 3, "Delim"), where cell C10 contains the text with multiple numbers.

The pre–select and Resample functions in In–Cell both use the text to number function in the Class Module, to look for numbers as text to populate the two inputs. For Input 1, the instance number is set to one and for Input 2, the instance number is set to two. In both cases the Class Module makes available a value for the number of numbers found, and this value is tested to see whether a valid number is available to populate one or both inputs.

The text to number converter in the Class Module can also be called from user programs.

Back to top of page blue line

Convert Numbers written in Text to Numbers - Including several numbers in a Single Cell

Introduction

The text to numbers conversion converts English–language text to numbers. As numbers written in text can vary widely in the way they are written, not only from country to country but also due to individual variations, the accuracy of conversion does vary. Despite this a reasonable range of text variations can be recognized. In addition, the text to number conversion is able to recognize ordinary numbers, so that mixed numbers such as thirty four point 10 can be converted. The decimal point '.' is recognized as are the words 'point' and 'decimal'.

What it Does

Clicking on the 'Text as #' button results in the contents of the Active Cell being parsed for the presence of numbers in the form of text, as well as numbers present as standard digits, and mixed text/digit combinations. Multiple numbers can be converted and extracted. Image of the Text to numbers extract group of buttons The display text box will display the first number extracted from the cell. Click on the Display button to show the next number extracted. When you get to the last number extracted, clicking Display again, brings up the first number again.
The number shown in the text box can be moved to Input 1 or Input 2, using Inp1 or Inp2 as appropriate. Paste, pastes the value in the box to In–Cell's Active Cell.
After the extract, the Active Cell can be moved using the scroll buttons, Image of the Active Cell address box and associated scroll bars or select a new cell on the spreadsheet and click the Resample button. Image of the Resample button Note that clicking Resample may enter new data into Input 1 and Input 2. If a single, empty cell is selected, Resample will not replace the content of either input.

The Paste button can be double–clicked, resulting in all the numbers extracted from the cell being pasted into adjacent cells. After clicking the Numbers button, the Active Cell is moved one place in the default direction. The default direction, (down, right or no–move), is user–definable, (see the Customize and options page). The double–click causes the numbers to be pasted into sequential cells in the default direction. If any of the cells contains data, a warning dialog is shown, advising the user that this overwrite cannot be undone. (A single click paste, does not include an overwrite warning, as a single paste operation can be reversed with the Undo paste button). If the move direction is set to 'none', only the first number extracted is pasted, i.e., the same as a single–click.


How it Works

The text to number conversion function is contained in a class module, which is called by the calculator. The calculator passes the cell information to the class module, which returns the number of valid numbers extracted. The text is parsed twice, once to find the decimal part of any number and a second time to identify the integer part. The conversion routine was built specifically for the In–Cell calculator using an algorithm designed for use with text. Despite this, the conversion routine takes a noticeable time to parse a long paragraph of text contained in a cell. A delay after clicking 'Text as #' is to be expected, and is also dependant on the processor speed and how heavily loaded the processor is with other running programs and processes.
Individual extracted numbers are called–up from the conversion module by the main calculator program. There is one additional function in the class module that can be set, – a flag to indicate whether to recognize numeric digits as well as text. This option is set to recognize digits, and is not user–definable within In–Cell, although it could be changed if the class module was called from a user program.

The class module also provides the conversion function for a cell formula '=TextToNum('cell reference', 'instance'), where the cell reference points to the cell containing the data to be extracted, and instance number is the number to be shown. If the text in cell B10 contained ten numbers then the formula =TextToNum(B10, 10) would return the tenth number in the text.

The text to number conversion in the class module can be called from user programs.

Back to top of page blue line

Manipulating Dates - Including Converting Dates that Excel doesn't Recognize, and Dates before 01 January 1900

Introduction

The date conversion function will convert dates in a variety of formats into a standard text format, using your Windows short date format, as well as converting the date text to a serial number, (see the section on Excel Dates for more information on Excel's date serial number system, and how In–Cell handles dates before 1900, and between 01 January 1900 and 01 March 1900. The date conversion function will also extract day, month and year as numbers, ready for further calculation, and as with the other two special functions, extracted numbers (day, month, year, hour, minute, second and serial number), can be transferred to input 1 or input 2 or pasted to a cell in the spreadsheet. .

What it Does

Clicking on the 'Dates' button results in the contents of the Active Cell being parsed for the presence of dates. Image of the Dates extract group of buttons The display text box will display a text version of the first date found. The date will be displayed in the user's Windows short date format. In this website the display uses the short date format set on the PC I was using to get the screen capture images.
Clicking on the Display button displays various elements of the date information, in the adjacent text box. The information displayed is:

The caption on the Display button changes, to indicate the item being displayed, e.g., Day or Month. When you get to the last date extracted, clicking Display again, brings up the first date. Multiple dates can be extracted, and each is displayed in–turn.
The date element, e.g., serial number, shown in the text box can be moved to Input 1 or Input 2, using Inp1 or Inp2 as appropriate. Paste, pastes the value in the box to In–Cell's Active Cell. The text display of the date cannot be moved to either of the inputs, as these are for numeric values only.

If you want to convert a date from a format not recognized by Windows, such as 4th July 2000, (July 04, 2000 and several other formats are recognized by Windows), after conversion, paste the serial number into a cell, not the text display. The serial number can then be formatted as a date, using the Cell Format menu item or toolbar icon, and selecting Numbers, Date and a specific date format.

If the cell selected for the date extract contains a single number, the date extract function treats the number as a date serial number, and displays all the same elements, such as Year, Month and serial number. If the number has a decimal component, this is treated as a time, and is converted using Excel's time numbering system, for example 0.5 is noon, i.e., half–way through the day. Unlike Excel, In–Cell recognizes serial numbers which are negative. Excel returns "#######" for negative numbers formatted as dates. In–Cell converts negative numbers to dates, back to the year 1582. Prior to that date, the Gregorian date system had not been announced or introduced into any country, and using serial numbers before that date has no value. For more information go to the Gregorian calendar information. After the extract, the Active Cell can be moved using the scroll buttons, Image of the Active Cell address box and associated scroll bars or select a new cell on the spreadsheet and click the Resample button. Image of the Resample button Note that clicking Resample may enter new data into Input 1 and Input 2. If a single, empty cell is selected, Resample will not replace the content of either input.



How it Works

The date conversion and extract function is contained in a Class Module and the routine is called by In–Cell, which passes the cell contents to the Class Module. The Class module makes available the number of valid dates that it has extracted. In–Cell then sets a value on the Class Module, for the date to be returned and then reads eight returned values for that date. Seven of the values have been described above and are the values shown in the display textbox. The eighth is a code showing how the Class Module coped with the conversion. If the conversion was from text and was definite, e.g. 01 February 1903, the code returned is 'D1M1Y1'. On the other hand 01/02/03 returns UU3, meaning that all of the three elements were Undefined, 01 could be a two digit year, a month or a day, as could each of the other two numbers. 01 02 1903 would return Y1UU2. In–Cell tests this value. If the result is a 'definate' conversion, the result is displayed immediately, if not, the result is passed to the date option dialog box for user input. Image of the Date option dialog box The date option dialog box allows the user to change the order of the elements, as well as force a different century for two digit years, but it won't overide a 4 digit year.
In the image of the date option dialog, the initial order of conversion is shown, as well as a text box showing what the user has selected. The top three sections with radar buttons, allows the user to specify how each of the components found is to be used. Clicking a radar button will move an element from say, day to month, and the month will be flipped into the day position. The initial order display remains unchanged. The text box displays the revised order. For clarity, the month is displayed a text. The century of the selected year can be changed using the spin buttons. in the Century box. This can be used to override a 4 digit year in the original text!
There is a final check that the date elements selected are acceptable, e.g., month must be from 1 to 12. The revised date is passed back to the Class Module for a final check and errors such as 29 February in a non–leap year are detected, returning an error message.

When a number is passed to the Class Module, this is treated as a date serial number. The return codes indicate how the serial number was handled. S1 indicates a standard, Windows–recognized date serial number. P1 is a special case where the integer part of the number is zero. In this case, the serial number is treated as a time and no date is displayed. The zero serial number represents 30 December 1899. (If a text format date for 30 December 1899 is passed to the Class Module it is converted as normal, returning the correct date, and no time component). Most times in cells in a spreadsheet have zero as the date component, hence displaying 30 December 1899 would be inappropriate. The code returned for a time only conversion is T1.
The code P1 is returned when the date serial number was negative. Negative numbers are not recognized by Excel as valid dates, but In–Cell does recognize them. (P1 = Pre 1900). There is a user selectable option to stop conversion of dates prior to 1900, (see the Customize and options page).
Short error text may also be returned if an invalid date has been detected, such as 29 February in a non–leap year.

The Options box in the date dialog allow the user to decide whether the selected order of date elements is to be the default for future conversions, e.g., is the date always month, day, year as opposed to day, month, year. The selection can be used throughout this session, until In–Cell is closed, or it can be saved for future sessions, or the current conversion can be a one–off.

The Class Module also has three other inputs, a Century value, a Century range and a preferred date conversion order. These are used internally by In–Cell and are also used by a date extract formula, =ExtractDates('cell reference', 'instance number', 'item number', 'preferred order'), for example =ExtractDates($I$48,1,3,"DMY"). There are two optional parameters to this formula, 'century range' and 'Force Century'. The century range is similar to the Windows action for handling two digit years, so that some 'years' are future and some 'years' are past. Windows for example, treats 99 as 1999 and 11 as 2011. The Century Range value can be set to match the range of dates being converted. The Force century will use a specific century as the base for two digit year conversions, in conjunction with the range specified. Windows base is the current century, i.e., 2000, but the base in the Class Module can be changed. For example with a Force Century value of 17 and a Century range of 20, the two digit year 99 becomes 1699 and 11 becomes 1711. All years up to 20 are treated as current century, e.g., 19 would be 1719, but 21 would be 1621. Changing the Century range alters the range, and can be of value when a specific group of dates are being handled. If a four digit year is found, this takes precedence of the Force century and Century range values.
When the formula is used for conversion, there is no date option dialog and the conversion relies entirely on the parameters passed to it to interpret equivocal dates. The conversion code is available by using instance number 8.

Back to top of page blue line

Additional Information on Dates in Excel and How In-Cell Handles Dates

Dates in Excel

Excel 2003 states that Dates are stored in Microsoft® Excel as serial numbers starting with 1, which represents January 1, 1900; See office.microsoft.com/training

If you use the Day, Month, Year and Weekday functions for the date serial number 1, this is what you get:

screen cap of XL with serial #1 conversions

At first glance this looks OK, but 01, January 1900 was actually a Monday.

Check out the day of the week calculation at: www.faqs.org The following is one of the suggested calculations in the sci-math-faq website:

Here is a standard method suitable for mental computation:


Remember also that 1900 was NOT a leap year.
See About Calculating Leap Years, by Dan Mabbutt.

Dan Mabutt’s site has a short program in Visual Basic, for calculating leap years.

This can be re-written as a formula: =IF(MOD(C3,4)=0,IF(MOD(C3,100)=0,IF(MOD(C3,400)=0,TRUE,FALSE),TRUE),FALSE) where cell C3 contains the year, (not the date serial number).

I have put together a formula following the sci-math-faq website process described above and Dan Mabbutt’s leap year calculation.

Here it is broken down into sections:
=CHOOSE
[chooses the day of the week based on the result of the rest of the calculation]
(MOD(((
[A modulus division by 7 - see last but one line]
(INT(RIGHT(TEXT(B3,"0000"),2)/4))
[Divide the last two digits of the year by 4]
+D3
[Add the day]
+CHOOSE(LEFT(TEXT(B3,"0000"),2)-16,4,2,0,6))
[Add a value based on the century - 19xx adds 0, 20xx adds 6]
+(CHOOSE(MONTH(A3),1,4,4,0,2,5,0,3,6,1,4,6)
[Adds a number based on the month]
+IF(AND
[If both of the next two lines of the formula are true subtract one, else ‘add’ zero]
(IF(MOD(B3,4)=0,IF(MOD(B3,100)=0,IF(MOD(B3,400)=0,TRUE,FALSE),TRUE),FALSE),
[tests if the year was a leap year]
MONTH(A3)<3),-1,0))
[test if the month was January or February i.e., 1 or 2]
+RIGHT(TEXT(B3,"0000"),2))
[Add the last two digits of the year]
,7)
[the modulus divisor of 7]
+1,"Sat","Sun","Mon","Tue","Wed","Thu","Fri")
[Choose the day of the week - the result yields values from 0 to 6, so add 1 to the result so that the choose function works]

The above version of the formula has been broken up to make it easier to read. To use this formula, copy the formulas below into cells B2 to E2 and enter a date serial number or a date in an Excel recognized format, into cell A2. Note that Excel does not handle dates before 01 January 1900 and also has an error in the 1900 leap year (it wasn't a leap year - there is more about this a few rows down this page). As a result this formula will not work before 01 January 1900 and is not correct before 1 March 1900.

B2 =TEXT(YEAR(A2),"0000")
C2 =CHOOSE(MONTH(A2),"Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
D2 =TEXT(DAY(A2),"00")
E2 =CHOOSE(MOD((((INT(RIGHT(TEXT(B2,"0000"),2)/4))+D2+CHOOSE(LEFT(TEXT(B2,"0000"),2)-16,4,2,0,6))+(CHOOSE(MONTH(A2),1,4,4,0,2,5,0,3,6,1,4,6)+IF(AND(IF(MOD(B2,4)=0,IF(MOD(B2,100)=0,IF(MOD(B2,400)=0,TRUE,FALSE),TRUE),FALSE), MONTH(A2)<3),-1,0))+RIGHT(TEXT(B2,"0000"),2)),7)+1,"Sat","Sun","Mon","Tue","Wed","Thu","Fri")

The formula in E2 uses the intermediate results in cells B2, & D2.
Although this formula can be changed to remove the dependency on cell D2, the Year is required in B2, as the maximum number of calculation levels (nested functions) is exceeded if the year extraction from the date serial number is included in the formula in E2. (Excel 2007 allows far more levels of nesting and the formula can be re–written for use in Excel 2007 without the intermediate results in D2).

On the free code page there is a version of this calculation as a user defined formula. It works back to 1 March 1704. (I haven't worked out why it goes wrong before that date). Click here for the day of the week user defined function.

Here are a few dates and weekdays using both Excel’s weekday function and the above weekday formula. screen cap of XL with serial # conversions around 28Feb 1900
This table shows where the problem with weekdays arises - February 29th of 1900. As 1900 was NOT a leap year there was no 29th of February. Unfortunately, the fact that 1900 was not a leap was missed when this serial numbering system was devised. One of the results is that some calculations in Microsoft Windows® are not correct in the period between 01 January 1900 and the end of February 1900.
The calculations for 2000 (Y2K) were correct, including the fact that 2000 was a leap year.

Visual Basic for Applications (VBA)

Date calculations in VBA give different results for the period 01 January 1900 to 01 March 1900 when compared to using Excel formulas. This was a surprise finding and caused a lot of headaches when writing routines using VBA.

I am not an expert in the inner workings of VBA, and if the following is not correct, I am sure someone will let me know:
From my observations, it appears that:



The table below was completed by running this VBA program:
(The code was entered into a standard module in the Personal.xls workbook. This is a hidden workbook when using Excel, but it is visible when switching to Visual Basic (Tools – Macro – Visual Basic Editor), or alt key plus f11 function key. If no modules are shown under Personal.xls in the Project explorer pane (in the VBA window use Tools – View – Project Explorer), Right click Personal.xls and use Insert – Module).



Option Explicit
Private Sub DateCalcTest()
Dim n As Single

ActiveSheet.Range("B2").Activate

'Table headers
ActiveCell(1, 1) = "Serial Number"
ActiveCell(1, 2) = "Day"
ActiveCell(1, 3) = "Month"
ActiveCell(1, 4) = "Year"

'Format the headers
ActiveSheet.Rows(2).RowHeight = 25
For n = 1 To 4
 With ActiveCell(1, n)
  .Font.Bold = True
  .HorizontalAlignment = xlCenter
  .VerticalAlignment = xlVAlignCenter
  .WrapText = True
 End With
Next

'This is the heart of the program
'and displays dates for date serial numbers -3 to +63
For n = -3 To 63
 ActiveCell(n + 5, 1) = n
 ActiveCell(n + 5, 2) = Day(n)
 ActiveCell(n + 5, 3) = MonthName(Month(n), True)
 ActiveCell(n + 5, 4) = Year(n)
Next

End Sub


Make sure the active spreadsheet is blank and then in the module you have created, hit function key f5, and the program will run.


screen cap of VBA with serial # conversions around 28Feb 1900

(some rows from the output have been hidden);

Note that in VBA:

What In-Cell Calculator does with Dates

In-Cell uses the following assumptions when handling dates:

The Gregorian date system was described in a Papal Bull (described in The Catholic Encyclopedia and reproduced in New Advent as follows: For practical purposes a bull may be conveniently defined to be "an Apostolic letter with a leaden seal..."). Many web sites include information on the Papal Bull which introduced the Gregorian Calendar, including a translation into English by Bill Spencer. The first year in which any country used the Gregorian date system was 1582, the year in which October 15, followed the fourth of October. Some countries did not convert to the Gregorian system of dates until well into the twentieth century. See this Wikipedia entry for information on the introduction of the Gregorian calendar. Most countries had converted by 1753, but the Wikipedia article referenced above, states that Greece did not convert until 1923 and Turkey did not convert until 1926.

In general any date calculations before 1753 should take into account which country the date was derived from, and the fact that some countries used dual dating for a period of time. As the Gregorian calendar did not exist before 1582, In-Cell does not carry–out any date conversions before the 10 day 'jump' in October 1582.

Back to top of page blue line

Using Spreadsheet Formulas for the Special Functions

Introduction

The three special functions used to extract multiple numbers from a single cell, to convert text to numbers, including cells with multiple numbers and mixed numbers and text, and multiple date conversions, available in the In–Cell calculator, are all available as cell formulas.

These formulas are available as soon as the In–Cell Add–In is loaded. The following two images show the formulas in the first picture and the output obtained by the formulas in the second picture: screen cap of special function formulas screen cap of special function formula results



You can see that in each formula the first parameter is the address of the cell containing the information to be converted or extracted. The second parameter is the number or date to be shown, when there is more than one number or date in the cell.
For the date extract there are four further parameters. The first is a number from 1 to 8 specifying which part of the date information to display. The values are as follows:


The second is a three character code identifying the date order, e.g., "DMY" for day, month, year order. "MDY" is another common arrangement but any order of dates can be specified.
The third is the century range and the fourth is the Century. These last two work together when there is a two-digit year. If the year is in 4 digit format, it will be converted and displayed as that year, irrespective of the two 'century' values.
If the year is two digits only, then the century range specifies how much of the period from 00 to 99 is in the Century specified and how much is in the previous Century. For example using '50, 19' will result in years from 00 to 50 being converted to 1900 to 1950 and years from 51 to 99 being converted to 1851 to 1899. If '99, 19' is used then all two digit years will be in the range 1900 to 1999. You select these parameters depending on the source data you are converting. The conversion is much more flexible than Excel's handling of two digit years, as well as the fact that the conversion can handle a wider range of date formats. The date number is the number used by In–Cell and Excell to store date information. The numbering is the same in In–Cell and Excell for dates from 01 March 1900 onwards. For dates before 01 March 1900, see what In–Cell does with dates.
The Conversion code is a text string that In–Cell creates during conversion. It gives an insight into how the date information was handled. A value "D1M1Y1" means that the conversion was absolute, i.e., In–Cell saw no ambiguity in the date information. At the other end of the spectrum a value of "UU3" means that all three date components were unplaceable, such as '09/10/11', where each number could be a day, month or year. A date of '31/10/30' results in "DY2M1", as 31 and 30 could be day or year values, and 10, although it could be day, month or year, has to be month, as the other two values have to fill the day and year slots, although which is which is not clear. The actual conversion in this case was dependant on the "DMY" parameter in the formula.

The MultiNum function handles multiple numbers in a single cell. The way it handles commas depends on your user setting in the Options and Customization dialog box.
The TextToNum function converts numbers written as text, as well as finding numeric digits, so that all numbers in mixed text, such as 'two at $15.50' are extracted, in this case 2 and 15.5.

Back to top of page

Questions or comments can be sent to Feedback

Copyright © 2007 Humar Consulting Inc.