it parses a pdf file into an array of document objects which is further processed to get what we need. there is an interesting library called smalot/ pdfparser. it works perfectly for a majority of these, but seems to just timeout and stop working for certain