php截斷帶html字符串文章內容的方法

2024-05-04 21:57:05

字體：大中小

來源：轉載

供稿：網友

文章截斷使用主要是在列表頁面時我沒有寫描述這樣只能在文章中截取字符串了，但使用php 自帶函數會導致div未結束，從而頁面混亂了，那么要如何解決此問題呢？

博主寫好一篇文章，博客后臺一般會在搜索頁面或者列表頁面給出文章標題和截斷了的的文章部分作為進一步閱讀的入口。

Function: mb_substr( $str, $start, $length, $encoding )

$str，需要截斷的字符串

$start，截斷開始處

$length，長度（注意，這個跟mb_strimwidth不同，1就代表一個中文字符）

$encoding，編碼，我設為 utf-8

例,截斷文章標題,控制在15個文字,代碼如下:

<?php echo mb_substr('m.survivalescaperooms.com原創', 0, 15,"utf-8"); ?>

這樣對于純文本沒問題,但是我的是中間有html標簽的于是問題來了,怎樣截斷一篇文章,注意,這篇文章不僅僅是普通的字符串文本,而是包含了各種格式化標簽和樣式內容的文本,如果處理不當,這些閉合標簽無法正常關閉,從而破壞整個文檔流。

如果單純是純文本,下面這個函數差不多是夠用的,代碼如下:

<?php

    /**

     * 字符串截取，支持中文和其他編碼

     *

     * @param string $str 需要轉換的字符串

     * @param string $start 開始位置

     * @param string $length 截取長度

     * @param string $charset 編碼格式

     * @param string $suffix 截斷字符串后綴

     * @return string

     */

    function substr_ext($str, $start=0, $length, $charset="utf-8", $suffix="")

    {

        if(function_exists("mb_substr")){

             return mb_substr($str, $start, $length, $charset).$suffix;

    }

        elseif(function_exists('iconv_substr')){

             return iconv_substr($str,$start,$length,$charset).$suffix;

        }

        $re['utf-8']  = "/[x01-x7f]|[xc2-xdf][x80-xbf]|[xe0-xef][x80-xbf]{2}|[xf0-xff][x80-xbf]{3}/";

        $re['gb2312'] = "/[x01-x7f]|[xb0-xf7][xa0-xfe]/";

        $re['gbk']    = "/[x01-x7f]|[x81-xfe][x40-xfe]/";

        $re['big5']   = "/[x01-x7f]|[x81-xfe]([x40-x7e]|xa1-xfe])/";

        preg_match_all($re[$charset], $str, $match);

        $slice = join("",array_slice($match[0], $start, $length));

        return $slice.$suffix;

    }

但是,如果需要截斷是網頁中的某部分格式化文本,上面的函數就不夠用了,它不具備處理格式化標簽的能力。

這時,需要一個新函數,它應該是以上函數的升級加強版,它必須有能力正確的處理標簽,下面找到一個

strip_tags() 函數剝去 HTML、XML 以及 PHP 的標簽。

例子1,代碼如下:

<?php

echo strip_tags("Hello <b>world!</b>");

?>

輸出:Hello world!

這樣就好做了我們只要在上面基礎上如下操作,代碼如下:

<?php

$a = strip_tags("Hello <b>world!</b>");

substr_ext( $a,10) ;

//但是發現html不見了這個也不是什么好的解決辦法了。

?>

接著google 發現cns寫了一個支持html截取字符串的函數,代碼如下:

/**

* 獲取字符在字符串中第N次出現的位置

* @param string $text 字符串

* @param string $key 字符

* @param int $int N

* @return int

*/

function strpos_int($text, $key, $int)

{

    $keylen = strlen($key);

    global $textlen;

    if (!$textlen)

        $textlen = strlen($text);

    static $textpos = 0;

    $pos = strpos($text, $key);

    $int--;

    if ($pos)

    {

        if ($int == 0)

            $textpos+=$pos;

        else

            $textpos+=$pos + $keylen;

    }

    else

    {

        $int = 0;

        $textpos = $textlen;

    }

    if ($int > 0)

    {

        strpos_int(substr($text, $pos + $keylen), $key, $int);

    }

    return $textpos;

}

/**

* 截取HTML

* @param string $string  HTML 字符串

* @param int $length 截取的長度

* @param string $dot

* @param string $append

* @return string

*/

function cuthtml($string, $length, $dot = ' ...', $append = "")

{

    $str = strip_tags($string);//先過濾標簽

    $new_str = iconv_substr($str, 0, $length, 'utf-8');

    $last = iconv_substr($new_str, -1, 1, 'utf-8');

    $sc = substr_count($new_str, $last);

    $position = strpos_int($string, $last, $sc); //獲取截取真實的長度

    if (function_exists('tidy_parse_string'))//服務器開啟tidy的話直接用函數不全html代碼即可

    {

        $options = array("show-body-only" => true);

        return tidy_parse_string(mb_substr($string, 0, $position) . $dot . $append, $options, 'UTF8');

    } else //沒有開啟tidy

    {

        if (strlen($string) <= $position)

        {

            return $string;

        }

        $pre = chr(1);

        $end = chr(1);

        $string = str_replace(array('&', '"', '<', '>'), array($pre . '&' . $end, $pre . '"' . $end, $pre . '<' . $end, $pre . '>' . $end), $string);

        $strcut = '';

        $n = $tn = $noc = 0;

        while ($n < strlen($string))

        {

            $t = ord($string[$n]);

            if ($t == 9 || $t == 10 || (32 <= $t && $t <= 126))

            {

                $tn = 1;

                $n++;

                $noc++;

            } elseif (194 <= $t && $t <= 223)

            {

                $tn = 2;

                $n += 2;

                $noc += 2;

            } elseif (224 <= $t && $t <= 239)

            {

                $tn = 3;

                $n += 3;

                $noc += 2;

            } elseif (240 <= $t && $t <= 247)

            {

                $tn = 4;

                $n += 4;

                $noc += 2;

            } elseif (248 <= $t && $t <= 251)

            {

                $tn = 5;

                $n += 5;

                $noc += 2;

            } elseif ($t == 252 || $t == 253)

            {

                $tn = 6;

                $n += 6;

                $noc += 2;

            } else

            {

                $n++;

            }

            if ($noc >= $position)

            {

                break;

            }

        }

        if ($noc > $position)

        {

            $n -= $tn;

        }

        $strcut = substr($string, 0, $n);

        $strcut = str_replace(array($pre . '&' . $end, $pre . '"' . $end, $pre . '<' . $end, $pre . '>' . $end), array('&', '"', '<', '>'), $strcut);

        $pos = strrpos($strcut, chr(1));

        if ($pos !== false)

        {

            $strcut = substr($strcut, 0, $pos);

        }

        return $strcut . $dot . $append;

    }

}