Hello!
I created a simple program which is supposed to search 'query' in the first column of all .xls or .xlsx files in a given 'directory' and then return each row (6 cells) where 1st cell is matching 'query', grouping them as a table.
This program actually works as it was supposed, but it's VERY slow to be usable. The script is working about 400 seconds and more, also it takes a lot of memory - up to several hundreds of Mb for some large files. In the 'directory' I have 38 .xls and .xlsx files each up to 5 Mb of size (~ up to 120 000 rows).
So my question: is it possible to speed up somehow the program (maybe the algorythm in general or the way I use PHPExcel) to consume some reasonable amount of time (several seconds)? Would be ideal to speed it up to 1 second or less. Thank you in advance!
My code:
My software:
PHPExcel 1.8.0
PHP 5.6.0
Apache 2.4.10
Windows 7
I created a simple program which is supposed to search 'query' in the first column of all .xls or .xlsx files in a given 'directory' and then return each row (6 cells) where 1st cell is matching 'query', grouping them as a table.
This program actually works as it was supposed, but it's VERY slow to be usable. The script is working about 400 seconds and more, also it takes a lot of memory - up to several hundreds of Mb for some large files. In the 'directory' I have 38 .xls and .xlsx files each up to 5 Mb of size (~ up to 120 000 rows).
So my question: is it possible to speed up somehow the program (maybe the algorythm in general or the way I use PHPExcel) to consume some reasonable amount of time (several seconds)? Would be ideal to speed it up to 1 second or less. Thank you in advance!
My code:
<?php /** Error reporting */ error_reporting(E_ALL); ini_set('display_errors', TRUE); ini_set('display_startup_errors', TRUE); define('EOL',(PHP_SAPI == 'cli') ? PHP_EOL : '<br />'); /** Include PHPExcel */require_once dirname(__FILE__) . '/../../PHPExcel/Classes/PHPExcel.php'; date_default_timezone_set('UTC'); if(isset($_POST['query']) && trim($_POST['query']) != '') { $dir = dirname(__FILE__); $files = scandir($dir); $found = False; print"<html><head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"><title>Test search</title>"; print"<table border=1>\n"; print"<tr>\n"; print"<td>Col1</td><td>Col2</td><td>Col3</td><td>Col4</td><td>Col5</td><td>Col6</td>"; print"</tr>\n"; for($i=0;$i<count($files);$i++){ $file=$files[$i]; $is_xls = stripos($file, ".xls"); $is_xlsx = stripos($file, ".xlsx"); if($is_xls === False && $is_xlsx === False) continue; $pathToFile = $dir."/".$file; $inputFileType = PHPExcel_IOFactory::identify($pathToFile); $objReader = PHPExcel_IOFactory::createReader($inputFileType); $objReader->setReadDataOnly(true); $objPHPExcel = $objReader->load($pathToFile); $objWorksheet = $objPHPExcel->getActiveSheet(); // Get the highest row and column numbers referenced in the worksheet $highestRow = $objWorksheet->getHighestRow(); $highestColumn = $objWorksheet->getHighestColumn(); $highestColumnIndex = PHPExcel_Cell::columnIndexFromString($highestColumn); $query = str_replace('-', '', preg_replace('/\s+/', '', strtolower(trim($_POST['query'])))); for($row=1;$row<$highestRow;$row++){ $firstColumn = $objWorksheet->getCellByColumnAndRow(0, $row)->getValue(); $name = str_replace('-', '', preg_replace('/\s+/', '', strtolower(trim($firstColumn)))); $pos = strpos($name, $query); if( $pos == True || $pos === 0){ $found=True; print"<tr>\n"; print"<td>".$objWorksheet->getCellByColumnAndRow(0, $row)->getValue()."</td> <td>".$objWorksheet->getCellByColumnAndRow(1, $row)->getValue()."</td> <td>".$objWorksheet->getCellByColumnAndRow(2, $row)->getValue()."</td> <td>".$objWorksheet->getCellByColumnAndRow(3, $row)->getValue()."</td> <td>".$objWorksheet->getCellByColumnAndRow(4, $row)->getValue()."</td> <td>".$objWorksheet->getCellByColumnAndRow(5, $row)->getValue()."</td>\n"; print"</tr>\n"; } } $objPHPExcel->disconnectWorksheets(); unset($objReader); unset($objPHPExcel); unset($objWorksheet); } if(!$found){ print"<tr>\n"; print"<td>Nothing found.</td>\n"; print"</tr>\n"; } print"</table>\n"; } else# If "search" parameter doesn't exist, show an error and exit { header("HTTP/1.1 500 Internal Server Error"); print"<h1>500 Internal Server Error</h1>". "Request is not set"; exit; }
PHPExcel 1.8.0
PHP 5.6.0
Apache 2.4.10
Windows 7