A PHP Error was encountered

Severity: 8192

Message: Methods with the same name as their class will not be constructors in a future version of PHP; Cacheall_info has a deprecated constructor

Filename: libraries/cacheall_info.php

Line Number: 136

Get a string between two tags


PHP Get a string between two substrings

1 The regex method

It will find the first possible match (non greedy + case insensitive)

<?php

function regex_get_string_between($string$start$end)
{
    
$pattern "/" preg_quote($start"/") . "(.*?)" preg_quote($end"/") . "/i";

    
preg_match($pattern$string$matches);
    if ( ! empty(
$matches[1]))
        return 
$matches[1];

    return 
false;
}

?>


Slow


2 The array method

It will find the first possible match (non greedy + case sensitive)

<?php

function array_get_string_between($string,$start,$end)
{
    
$r explode($start$string2);
    if (isset(
$r[1]))
    {
        
$a explode($end$r[1], 2);
        if (
$a[0] !== $r[1]) // if $end is not in $string, $a[0] will be equal to $r[1]
            
return $a[0];
    }

    return 
false;
}

?>


Faster than the regex method


3 The string method from internet

It will find the first possible match (non greedy + case sensitive)

On internet you will find the following function:

<?php
// DO NOT USE THIS FUNCTION
function get_string_between($string$start$end)
{
1    $string " " $string;
2    $ini strpos($string$start);
3    if ($ini == 0)
4         return false;
5
6    $ini 
+= strlen($start);
7    $len strpos($string$end$ini) - $ini;
8    return substr($string$ini$len);
}

?>


The fastest method up till now but can it be improved?
The concatenation in line 1 is not necessary
The assignment in line 7 is also not necessary

<?php
// DO NOT USE THIS FUNCTION
function better_get_string_between($string$start$end)
{
    
$pos strpos($string$start0);
    if (
$pos === false)  // avoid concatenation
        
return false;

    
$pos += strlen($start);
    return 
substr($string$posstrpos($string$end$pos) - $pos);  // no assignment
}

?>


This is 5% faster than the string method from internet

But the get_string_between function has a nasty bug (and so has the improved version)

<?php

$string 
"I do have start but not the other one";
$start  "start";
$end    "end";

?>


The function will find $start at position 11
so it will set $ini to 11
it will add the length of "start" which is 5 to $ini
$ini = 16

$len = strpos($string,$end,$ini) - $ini;
But strpos will return false for finding $end, so it becomes:
$len = false - 16;
$len will be -16
Thus the final statement will read:
substr($string,16,-16);
get_string_between returns "but "

Not the intended behaviour

But dependant on where $start is found and the length of the string this goes unnoticed:
For example if we change the start to $start = "other"
$ini = 29 adding the length of gives $ini = 34
Thus the final statement will read:
substr($string,34,-34);
Now substring will fail and will return false!.
Actually the array method had the same sort of bug, when it cannot find $end it will return everything after $start, so I fixed it in the version above

Let's fix this

4 The final string method

It will find the first possible match (non greedy + case sensitive)

<?php

function final_get_string_between($string$start$end)
{
    if ( 
false === $pos strpos($string$start0) )
        return 
false;                                       // start is not found

    
$pos += strlen($start);
    if ( 
false == $endpos strpos($string$end$pos) )
        return 
false;                                       // $end is not found or $endpos is equal to $pos

    
return substr($string$pos$endpos $pos);
}

?>


This works, it will now return false if $start or $end is not found (or if $endpos === $pos).
The extra check slows it down a little but it is still faster than the buggy internet method.


5 The final case insensitive string method

It will find the first possible match (non greedy + case insensitive)

<?php

function insensitive_get_string_between($string$start$end)
{
    if ( 
false === $pos stripos($string$start0) )
        return 
false;

    
$pos += strlen($start);
    if ( 
false === $endpos stripos($string$end$pos) )
        return 
false;

    return 
substr($string$pos$endpos $pos);
}

?>


The slowest method


6 The no checks method

It will find the first possible match (non greedy + case sensitive)

If you are sure both start and end are in the string you could use:

<?php
// DO NOT USE THIS FUNCTION
function nocheck_get_string_between($string$start$end)
{
    
$pos strpos($string$start0) + strlen($start);
    return 
substr($string$posstrpos($string$end$pos) - $pos);
}

?>


Which is the fastest method up till now, but can you ever be sure ?


7 The 1 character delimiters method method

It will find the first possible match (non greedy)

This function will work if $start and $end have only one character for example: $start="<";   $end=">";

<?php

function one_character_get_string_between($string,$start,$end)
{
    return  
trim(strstr(strstr($string$start), $endtrue), $start $end);
}

?>


Faster than the no check method.


8 The single delimiter method

It will find the first possible match (non greedy)

This function will work if $start = $end

<?php

function array_delimiter_get_string_between($string$delim){
    
$string explode($delim$string3);
    return isset(
$string[1]) ? $string[1] : false;
}

?>


This is by far the fastest method


9 Benchmarks

10000 executions

results in seconds (smaller is better)

seconds           name                    paragraph
0.470390081405640 string case insensitive (5)
0.053134918212891 regex (1)
0.047200202941895 array  (2)
0.030622005462646 internet string method buggy (3)
0.029839992523193 final string with checks on start and end (4)
0.028485774993896 no checks string method buggy (6)
0.027677059173584 string with two 1 character delimiters (7)
0.023669004440308 array single delimiter $end=$start (8)


-Function callsLanguage constr.arithmetic op.assignmentslogical checksconcatenations
array310220
internet402211
final402220
no checks402100
1 character300001
1 delimiter110110


The table also explains a little about the benchmarks
functions are the most expensive
strpos is faster than strstr
explode and preg_match are slow
Language constructs like isset or empty are faster than functions
arithmetic operations are fast
concatenations are slow
logical checks are slow

10 Conclusion

If $end=$start use array_delimiter_get_string_between.
If $end and $start have only 1 character use one_character_delimiter_get_string_between.
If you need case insensitivity use the array method or use the regex and remove the i from the end of the pattern.
In all other cases use final_get_string_between.
When memory is an issue (dealing with large strings) tend to go for string based functions.
And if you want final_get_string_between to be greedy see below

<?php

function greedy_get_string_between($string$start$end)
{
    
$pos strpos($string$start0);
    if( 
$pos === false || strpos($string$end$pos) === false)
        return 
false;

    
$pos += strlen($start);
    return 
substr($string$posstrrpos($string,$end,$pos) - $pos);
}

/* this combines the functions selecting the best method depending on the start and end */
function get_between($string,$start,$end,$insensitive=false)
{
    if (empty(
$string) || empty($start) ||empty($end) )
        return 
false;

    if (
$start === $end)
    {
        
$string explode($start$string3);
        return isset(
$string[1]) ? $string[1] : false;
    }
    else if (
strlen($start)===&& strlen($end)===1)
    {
        return  
trim(strstr(strstr($string$start), $endtrue), $start $end);
    }
    else if ( ! 
$insensitive)
    {
        if ( 
false === $pos strpos($string$start0) )
            return 
false;                                       // start is not found

        
$pos += strlen($start);
        if ( 
false == $endpos strpos($string$end$pos) )
            return 
false;                                       // $end is not found or $endpos is equal to $pos

        
return substr($string$pos$endpos $pos);
    }
    else
    {
        if ( 
false === $pos stripos($string$start0) )
            return 
false;

        
$pos += strlen($start);
        if ( 
false === $endpos stripos($string$end$pos) )
            return 
false;

        return 
substr($string$pos$endpos $pos);
    }
}
?>