Why is the analysis of the text by regular expression interrupted at the end of the line?



  • I have this code:

    import re
    import urllib.request
    

    def main():
    pattern = re.compile(r"(?P<Block><div id="entryContent">.*)")
    data = urllib.request.urlopen("http://www.oxfordlearnersdictionaries.com/"
    "definition/english/hello").read().decode()
    result = pattern.search(data)
    print(result.group("Block"))

    main()
    

    Result:

    <div id="entryContent"><div xmlns="http://www.w3.org/1999/xhtml" class="entry" sk="hello11" id="hello_1"><ol class="h-g" id="hello_1__1"><div class="top-container"><div class="top-g" id="hello_1__2">

    In my understanding, I was supposed to get the text from the RP to the end. Specially put ".*." But somehow, the analysis of the text is interrupted almost immediately and I only get a small part. Why would it be there?


  • QA Engineer

    Because the metashimwall. . default doesn't match \n
    Use regular flag expression DOTALL

    https://docs.python.org/2/library/re.html#module-contents


    For PCRE &apos; s regular expressions, this flag is recorded as follows:

    /your_regex/s
    



Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2