Samples of special function data extractions

The left hand column contains the original text and the remaining columns contain data converted and/or
extracted from the left hand cell. The way that individual numbers are displayed when extracted depends on the
formatting applied to the cells. In the fourth example, the value 888 is shown as 888.00 because the cell has been
formatted to show two decimal places. All the cells in the top three blocks have been formatted to show two decimal
places and to use commas between thousands.
Click the item below to get more detail
Extracting multiple numbers from one cell allows several numbers in one cell to be recognized, extracted and either used directly in the calculator or
pasted into spreadsheet cells.
Imported data may contain multiple numbers, such as comma delimited data that has not been separated into columns, information containing more than one
number, such as 180 @ $56.95, or numbers embedded in text. In–Cell can extract numbers from text or multiple numbers.
To extract numbers from a cell, move In–Cell's Active Cell to the cell with the multiple numbers, or select the cell in Excel and use the Resample button.
The button labelled numbers in the special functions area will run the extract routine.
The display text box will display the first number extracted from the cell. Click on the Display button to show the next number extracted.
When you get to the last number extracted, clicking Display again, brings up the first number again.
The number shown in the text box can be moved to Input 1 or Input 2, using Inp1 or Inp2 as appropriate. Paste, pastes the value in the box
to In–Cell's Active Cell.
After the extract, the Active Cell can be moved using the scroll buttons,
or select a new cell on the spreadsheet and click the Resample button.
Note that clicking Resample may enter new data into Input 1 and Input 2. If a single, empty cell is selected, Resample will not replace the content
of either input.
The Paste button can be double–clicked, resulting in all the numbers extracted from the cell being pasted into adjacent cells. After clicking the Numbers
button, the Active Cell is moved one place in the default direction. The default direction, (down, right or no–move), is
user–definable, (see the Customize and options page). The double–click causes the numbers
to be pasted into sequential cells in the default direction. If any of the cells contains data, a warning dialog is shown, advising the user that this
overwrite cannot be undone. (A single click paste, does not include an overwrite warning, as a single paste operation can be reversed with
the Undo paste button). If the move direction is set to 'none', only the first number extracted is pasted, i.e., the same as a single–click.
The extraction of multiple numbers is also dependent on another User–definable settings, the Comma value. The comma options are: 'Thousands',
'Delimiter' and 'Auto'.
The comma as thousand separator (and the auto setting), checks the numbers for valid thousands groupings, so that even if the comma as thousands separator option is selected, a comma that cannot be a thousands separator is considered to be a separator between numbers. For example 123,456,78 is not a valid single number, and is extracted as two numbers: 123456 and 78. If the Auto setting is chosen, then four numbers are extracted 123 / 456 / 78 and 123456. In the example in the table at the top of this page, the value 888,999,12,456 has six numbers extracted, and this was done in 'Auto' mode. In 'Thousands' mode only two numbers are returned, 888999 and 12456. In 'Delimiter' mode four numbers are returned 888 / 999 / 12 and 456.
The multiple number extract function is contained in a class module, which is called by the calculator. The calculator passes the
cell information to the class module, which returns the number of valid numbers extracted. The calculator also passes the value of the
comma user option to the module. The extraction is based on recognition of characters which Excel recognizes as numbers.
Individual extracted numbers are called–up from the conversion module.
If the user setting is auto, the conversion module creates a copy of the information sent to it. The first copy retains commas and the second copy has all
commas removed. This allows two conversions to be undertaken.
The class module also provides the conversion function for a cell formula '=MultiNum('cell reference' instance number' 'comma option'), where
the cell reference points to the cell containing the data to be extracted, instance number is the number to be shown, i.e, if the cell has three
numbers, this formula can be repeated three times with instance numbers of 1, 2 and 3. The comma options in the formula are "Thou", "Delim" or "Auto".
e.g., =(MultiNum($C$10, 3, "Delim"), where cell C10 contains the text with multiple numbers.
The pre–select and Resample functions in In–Cell both use the text to number function in the Class Module, to look for numbers as
text to populate the two inputs. For Input 1, the instance number is set to one and for Input 2, the instance number is set to two. In both cases
the Class Module makes available a value for the number of numbers found, and this value is tested to see whether a valid number is available
to populate one or both inputs.
The text to number converter in the Class Module can also be called from user programs.
The text to numbers conversion converts English–language text to numbers. As numbers written in text can vary widely in the way they are written,
not only from country to country but also due to individual variations, the accuracy of conversion does vary. Despite this a reasonable
range of text variations can be recognized. In addition, the text to number conversion is able to recognize ordinary numbers, so that mixed numbers
such as thirty four point 10 can be converted. The decimal point '.' is recognized as are the words 'point' and 'decimal'.
Clicking on the 'Text as #' button results in the contents of the Active Cell being parsed for the presence of numbers in the form of text, as well as
numbers present as standard digits, and mixed text/digit combinations. Multiple numbers can be converted and extracted.
The display text box will display the first number extracted from the cell. Click on the Display button to show the next number extracted.
When you get to the last number extracted, clicking Display again, brings up the first number again.
The number shown in the text box can be moved to Input 1 or Input 2, using Inp1 or Inp2 as appropriate. Paste, pastes the value in the box
to In–Cell's Active Cell.
After the extract, the Active Cell can be moved using the scroll buttons,
or select a new cell on the spreadsheet and click the Resample button.
Note that clicking Resample may enter new data into Input 1 and Input 2. If a single, empty cell is selected, Resample will not replace the content
of either input.
The Paste button can be double–clicked, resulting in all the numbers extracted from the cell being pasted into adjacent cells. After clicking the Numbers
button, the Active Cell is moved one place in the default direction. The default direction, (down, right or no–move), is
user–definable, (see the Customize and options page). The double–click causes the numbers
to be pasted into sequential cells in the default direction. If any of the cells contains data, a warning dialog is shown, advising the user that this
overwrite cannot be undone. (A single click paste, does not include an overwrite warning, as a single paste operation can be reversed with
the Undo paste button). If the move direction is set to 'none', only the first number extracted is pasted, i.e., the same as a single–click.
The text to number conversion function is contained in a class module, which is called by the calculator. The calculator passes the
cell information to the class module, which returns the number of valid numbers extracted. The text is parsed twice, once to find the decimal part
of any number and a second time to identify the integer part. The conversion routine was built specifically for the In–Cell calculator
using an algorithm designed for use with text. Despite this, the conversion routine takes a noticeable time to parse a long paragraph of text
contained in a cell. A delay after clicking 'Text as #' is to be expected, and is also dependant on the processor speed and how heavily loaded
the processor is with other running programs and processes.
Individual extracted numbers are called–up from the conversion module by the main calculator program. There is one additional function
in the class module that can be set, – a flag to indicate whether to recognize numeric digits as well as text. This option is set to
recognize digits, and is not user–definable within In–Cell, although it could be changed if the class module was called from a user program.
The class module also provides the conversion function for a cell formula '=TextToNum('cell reference', 'instance'), where the cell reference points
to the cell containing the data to be extracted, and instance number is the number to be shown. If the text in cell B10 contained ten numbers
then the formula =TextToNum(B10, 10) would return the tenth number in the text.
The text to number conversion in the class module can be called from user programs.
The date conversion function will convert dates in a variety of formats into a standard text format, using your Windows short date format,
as well as converting the date text to a serial number, (see the section on Excel Dates for more information on Excel's date serial number
system, and how In–Cell handles dates before 1900, and between 01 January 1900 and 01 March 1900. The date conversion function will also extract day,
month and year as numbers, ready for further calculation, and as with the other two special functions, extracted numbers (day, month, year,
hour, minute, second and serial number), can be transferred to input 1 or input 2 or pasted to a cell in the spreadsheet. .
Clicking on the 'Dates' button results in the contents of the Active Cell being parsed for the presence of dates.
The display text box will display a text version of the first date found. The date will be displayed in the user's Windows short date format. In this website
the display uses the short date format set on the PC I was using to get the screen capture images.
Clicking on the Display button displays various elements of the date information, in the adjacent text box. The information displayed is:
The caption on the Display button changes, to indicate the item being displayed, e.g., Day or Month. When you get to the last date extracted,
clicking Display again, brings up the first date. Multiple dates can be extracted, and each is displayed in–turn.
The date element, e.g., serial number, shown in the text box can be moved to Input 1 or Input 2, using Inp1 or Inp2 as appropriate.
Paste, pastes the value in the box to In–Cell's Active Cell. The text display of the date cannot be moved to either of the inputs, as
these are for numeric values only.
If you want to convert a date from a format not recognized by Windows, such as 4th July 2000, (July 04, 2000 and several other formats are recognized by Windows),
after conversion, paste the serial number into a cell, not the text display. The serial number can then be formatted as a date, using the Cell Format
menu item or toolbar icon, and selecting Numbers, Date and a specific date format.
If the cell selected for the date extract contains a single number, the date extract function treats the number as a date serial number, and
displays all the same elements, such as Year, Month and serial number. If the number has a decimal component, this is treated as a time, and is converted
using Excel's time numbering system, for example 0.5 is noon, i.e., half–way through the day. Unlike Excel, In–Cell recognizes serial numbers which
are negative. Excel returns "#######" for negative numbers formatted as dates. In–Cell converts negative numbers to dates, back to the year
1582. Prior to that date, the Gregorian date system had not been announced or introduced into any country, and using serial numbers before that date
has no value. For more information go to the Gregorian calendar information.
After the extract, the Active Cell can be moved using the scroll buttons,
or select a new cell on the spreadsheet and click the Resample button.
Note that clicking Resample may enter new data into Input 1 and Input 2. If a single, empty cell is selected, Resample will not replace the content
of either input.
The date conversion and extract function is contained in a Class Module and the routine is called by In–Cell, which passes the cell contents
to the Class Module. The Class module makes available the number of valid dates that it has extracted. In–Cell then sets a value on the
Class Module, for the date to be returned and then reads eight returned values for that date. Seven of the values have been described above and are the values
shown in the display textbox. The eighth is a code showing how the Class Module coped with the conversion. If the conversion was from text and was
definite, e.g. 01 February 1903, the code returned is 'D1M1Y1'. On the other hand 01/02/03 returns UU3, meaning that all of the three elements
were Undefined, 01 could be a two digit year, a month or a day, as could each of the other two numbers. 01 02 1903 would return Y1UU2. In–Cell
tests this value. If the result is a 'definate' conversion, the result is displayed immediately, if not, the result is passed to the date option dialog
box for user input.
The date option dialog box allows the
user to change the order of the elements, as well as force a different century for two digit years, but it won't overide a 4 digit year.
In the image of the date option dialog, the initial order of conversion is shown, as well as a text box showing what the user has selected. The
top three sections with radar buttons, allows the user to specify how each of the components found is to be used. Clicking a radar button will move
an element from say, day to month, and the month will be flipped into the day position. The initial order display remains unchanged. The
text box displays the revised order. For clarity, the month is displayed a text. The century of the selected year can be changed using the spin buttons.
in the Century box. This can be used to override a 4 digit year in the original text!
There is a final check that the date elements selected are acceptable, e.g., month must be from 1 to 12. The revised date is passed back to the
Class Module for a final check and errors such as 29 February in a non–leap year are detected, returning an error message.
When a number is passed to the Class Module, this is treated as a date serial number. The return codes indicate how the serial number was handled.
S1 indicates a standard, Windows–recognized date serial number. P1 is a special case where the integer part of the number is zero. In this
case, the serial number is treated as a time and no date is displayed. The zero serial number represents 30 December 1899. (If a text format date for
30 December 1899 is passed to the Class Module it is converted as normal, returning the correct date, and no time component). Most times in cells in
a spreadsheet have zero as the date component, hence displaying 30 December 1899 would be inappropriate. The code returned for a time only conversion is T1.
The code P1 is returned when the date serial number was negative. Negative numbers are not recognized by Excel as valid dates, but In–Cell
does recognize them. (P1 = Pre 1900). There is a user selectable option to stop conversion of dates prior to 1900,
(see the Customize and options page).
Short error text may also be returned if an invalid date has been detected, such as 29 February in a non–leap year.
The Options box in the date dialog allow the user to decide whether the selected order of date elements is to be the default for future
conversions, e.g., is the date always month, day, year as opposed to day, month, year. The selection can be used throughout this session, until In–Cell
is closed, or it can be saved for future sessions, or the current conversion can be a one–off.
The Class Module also has three other inputs, a Century value, a Century range and a preferred date conversion order. These are used internally by In–Cell
and are also used by a date extract formula, =ExtractDates('cell reference', 'instance number', 'item number', 'preferred order'), for
example =ExtractDates($I$48,1,3,"DMY"). There are two optional parameters to this formula, 'century range' and 'Force Century'. The
century range is similar to the Windows action for handling two digit years, so that some 'years' are future and some 'years' are past. Windows
for example, treats 99 as 1999 and 11 as 2011. The Century Range value can be set to match the range of dates being converted. The Force century
will use a specific century as the base for two digit year conversions, in conjunction with the range specified. Windows base is the current century, i.e.,
2000, but the base in the Class Module can be changed. For example with a Force Century value of 17 and a Century range of 20, the two digit
year 99 becomes 1699 and 11 becomes 1711. All years up to 20 are treated as current century, e.g., 19 would be 1719, but 21 would be 1621. Changing
the Century range alters the range, and can be of value when a specific group of dates are being handled. If a four digit year is found,
this takes precedence of the Force century and Century range values.
When the formula is used for conversion, there is no date option dialog and the conversion relies entirely on the parameters passed to it
to interpret equivocal dates. The conversion code is available by using instance number 8.
Excel 2003 states that Dates are stored in Microsoft® Excel as serial numbers starting with 1, which represents January 1, 1900;
See office.microsoft.com/training
If you use the Day, Month, Year and Weekday functions for the date serial number 1, this is what you get:
At first glance this looks OK, but 01, January 1900 was actually a Monday.
Check out the day of the week calculation at: www.faqs.org The following is one of the suggested calculations in the sci-math-faq website:
Here is a standard method suitable for mental computation:
Remember also that 1900 was NOT a leap year.
See About Calculating Leap Years, by Dan Mabbutt.
Dan Mabutt’s site has a short program in Visual Basic, for calculating leap years.
This can be re-written as a formula: =IF(MOD(C3,4)=0,IF(MOD(C3,100)=0,IF(MOD(C3,400)=0,TRUE,FALSE),TRUE),FALSE)
where cell C3 contains the year, (not the date serial number).
I have put together a formula following the sci-math-faq website process described above and Dan Mabbutt’s leap year calculation.
Here it is broken down into sections:
=CHOOSE
[chooses the day of the week based on the result of the rest of the calculation]
(MOD(((
[A modulus division by 7 - see last but one line]
(INT(RIGHT(TEXT(B3,"0000"),2)/4))
[Divide the last two digits of the year by 4]
+D3
[Add the day]
+CHOOSE(LEFT(TEXT(B3,"0000"),2)-16,4,2,0,6))
[Add a value based on the century - 19xx adds 0, 20xx adds 6]
+(CHOOSE(MONTH(A3),1,4,4,0,2,5,0,3,6,1,4,6)
[Adds a number based on the month]
+IF(AND
[If both of the next two lines of the formula are true subtract one, else ‘add’ zero]
(IF(MOD(B3,4)=0,IF(MOD(B3,100)=0,IF(MOD(B3,400)=0,TRUE,FALSE),TRUE),FALSE),
[tests if the year was a leap year]
MONTH(A3)<3),-1,0))
[test if the month was January or February i.e., 1 or 2]
+RIGHT(TEXT(B3,"0000"),2))
[Add the last two digits of the year]
,7)
[the modulus divisor of 7]
+1,"Sat","Sun","Mon","Tue","Wed","Thu","Fri")
[Choose the day of the week - the result yields values from 0 to 6, so add 1 to the result so that the choose function works]
The above version of the formula has been broken up to make it easier to read. To use this formula, copy the formulas below into cells B2 to E2
and enter a date serial number or a date in an Excel recognized format, into cell A2. Note that Excel does not handle dates before 01 January 1900 and also has an error in the 1900 leap year (it wasn't a leap year - there is more about this a few rows down this page). As a result this formula will not work before 01 January 1900 and is not correct before 1 March 1900.
B2 =TEXT(YEAR(A2),"0000")
C2 =CHOOSE(MONTH(A2),"Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
D2 =TEXT(DAY(A2),"00")
E2 =CHOOSE(MOD((((INT(RIGHT(TEXT(B2,"0000"),2)/4))+D2+CHOOSE(LEFT(TEXT(B2,"0000"),2)-16,4,2,0,6))+(CHOOSE(MONTH(A2),1,4,4,0,2,5,0,3,6,1,4,6)+IF(AND(IF(MOD(B2,4)=0,IF(MOD(B2,100)=0,IF(MOD(B2,400)=0,TRUE,FALSE),TRUE),FALSE), MONTH(A2)<3),-1,0))+RIGHT(TEXT(B2,"0000"),2)),7)+1,"Sat","Sun","Mon","Tue","Wed","Thu","Fri")
The formula in E2 uses the intermediate results in cells B2, & D2.
Although this formula can be changed to remove the dependency on cell D2, the Year is required in B2, as the maximum number of calculation levels (nested functions) is exceeded if the year extraction from the date serial number is included in the formula in E2. (Excel 2007 allows far more levels of nesting and the formula can be re–written for use in Excel 2007 without the intermediate results in D2).
On the free code page there is a version of this calculation as a user defined formula. It works back to 1 March 1704. (I haven't worked out why it goes wrong before that date). Click here for the day of the week user defined function.
Here are a few dates and weekdays using both Excel’s weekday function and the above weekday formula.
This table shows where the problem with weekdays arises - February 29th of 1900. As 1900 was NOT a leap year there was no 29th of February.
Unfortunately, the fact that 1900 was not a leap was missed when this serial numbering system was devised. One of the results is that some
calculations in Microsoft Windows® are not correct in the period between 01 January 1900 and the end of February 1900.
The calculations for 2000 (Y2K) were correct, including the fact that 2000 was a leap year.
Date calculations in VBA give different results for the period 01 January 1900 to 01 March 1900 when compared to using Excel formulas. This was a surprise finding
and caused a lot of headaches when writing routines using VBA.
I am not an expert in the inner workings of VBA, and if the following is not correct, I am sure someone will let me know:
From my observations, it appears that:
The table below was completed by running this VBA program:
(The code was entered into a standard module in the Personal.xls workbook. This is a hidden workbook when using Excel, but it is visible when switching
to Visual Basic (Tools – Macro – Visual Basic Editor), or alt key plus f11 function key. If no modules are shown under Personal.xls in the Project explorer pane
(in the VBA window use Tools – View – Project Explorer), Right click Personal.xls and use Insert – Module).
Option Explicit
Private Sub DateCalcTest()
Dim n As Single
ActiveSheet.Range("B2").Activate
'Table headers
ActiveCell(1, 1) = "Serial Number"
ActiveCell(1, 2) = "Day"
ActiveCell(1, 3) = "Month"
ActiveCell(1, 4) = "Year"
'Format the headers
ActiveSheet.Rows(2).RowHeight = 25
For n = 1 To 4
With ActiveCell(1, n)
.Font.Bold = True
.HorizontalAlignment = xlCenter
.VerticalAlignment = xlVAlignCenter
.WrapText = True
End With
Next
'This is the heart of the program
'and displays dates for date serial numbers -3 to +63
For n = -3 To 63
ActiveCell(n + 5, 1) = n
ActiveCell(n + 5, 2) = Day(n)
ActiveCell(n + 5, 3) = MonthName(Month(n), True)
ActiveCell(n + 5, 4) = Year(n)
Next
End Sub
Make sure the active spreadsheet is blank and then in the module you have created, hit function key f5, and the program will run.
(some rows from the output have been hidden);
Note that in VBA:
In-Cell uses the following assumptions when handling dates:
The Gregorian date system was described in a Papal Bull (described in The Catholic Encyclopedia and reproduced in New Advent
as follows: For practical purposes a bull may be conveniently defined to be "an Apostolic letter with a leaden seal..."). Many web sites include information on the
Papal Bull which introduced the Gregorian Calendar, including a translation into English by Bill Spencer.
The first year in which any country used the Gregorian date system was 1582, the year in which October 15, followed the fourth of October. Some countries did not convert to the Gregorian system of dates
until well into the twentieth century.
See this Wikipedia entry for information on the introduction of the Gregorian calendar.
Most countries had converted by 1753, but the Wikipedia article referenced above, states that Greece did not convert until 1923 and Turkey did not convert until 1926.
In general any date calculations before 1753 should take into account which country the date was derived from, and the fact that some countries used dual dating for a period of time.
As the Gregorian calendar did not exist before 1582, In-Cell does not carry–out any date conversions before the 10 day 'jump' in October 1582.
The three special functions used to extract multiple numbers from a single cell, to convert text to numbers, including cells with multiple numbers
and mixed numbers and text, and multiple date conversions, available in the In–Cell calculator, are all available as cell formulas.
These formulas are available as soon as the In–Cell Add–In is loaded. The following two images show the formulas in the first picture
and the output obtained by the formulas in the second picture:
You can see that in each formula the first parameter is the address of the cell containing the information to be converted or extracted.
The second parameter is the number or date to be shown, when there is more than one number or date in the cell.
For the date extract there are four further parameters. The first is a number from 1 to 8 specifying which part of the date information to display.
The values are as follows:
The second is a three character code identifying the date order, e.g., "DMY" for day, month, year order. "MDY" is another common arrangement
but any order of dates can be specified.
The third is the century range and the fourth is the Century. These last two work together when there is a two-digit year. If the year is in 4
digit format, it will be converted and displayed as that year, irrespective of the two 'century' values.
If the year is two digits only, then the century range specifies how much of the period from 00 to 99 is in the Century specified and how
much is in the previous Century. For example using '50, 19' will result in years from 00 to 50 being converted to 1900 to 1950 and years
from 51 to 99 being converted to 1851 to 1899. If '99, 19' is used then all two digit years will be in the range 1900 to 1999. You select these
parameters depending on the source data you are converting. The conversion is much more flexible than Excel's handling of two digit
years, as well as the fact that the conversion can handle a wider range of date formats.
The date number is the number used by In–Cell and Excell to store date information. The numbering is the same in In–Cell and Excell
for dates from 01 March 1900 onwards. For dates before 01 March 1900, see what In–Cell does with
dates.
The Conversion code is a text string that In–Cell creates during conversion. It gives an insight into how the date information was handled.
A value "D1M1Y1" means that the conversion was absolute, i.e., In–Cell saw no ambiguity in the date information. At the other end of the
spectrum a value of "UU3" means that all three date components were unplaceable, such as '09/10/11', where each number could be a day, month or
year. A date of '31/10/30' results in "DY2M1", as 31 and 30 could be day or year values, and 10, although it could be day, month or year, has to be month,
as the other two values have to fill the day and year slots, although which is which is not clear. The actual conversion in this case was
dependant on the "DMY" parameter in the formula.
The MultiNum function handles multiple numbers in a single cell. The way it handles commas depends on your user setting in the
Options and Customization dialog box.
The TextToNum function converts numbers written as text, as well as finding numeric digits, so that all numbers in mixed text, such as 'two at
$15.50' are extracted, in this case 2 and 15.5.
Questions or comments can be sent to Feedback