×

A semantic-based approach for handling incomplete and inaccurate provenance in reservoir engineering. (English) Zbl 1250.68255

Summary: Provenance is becoming an important issue as a reliable estimator of data quality. However, provenance collection mechanisms in the reservoir engineering domain often result in incomplete provenance information. In this paper, we address the problem of predicting missing provenance information in reservoir engineering. Based on the observation that data items with specific semantic “connections” may share the same provenance, our approach annotates data items with domain entities defined in a domain ontology, and represent these “connections” as sequences of relationships (also known as semantic associations) in the ontology graph. By analyzing annotated historical datasets with complete provenance information, we capture semantic associations that may imply identical provenance. A statistical analysis is applied to assign probability values to the discovered associations, which indicate the confidence of each association when it is used for future provenance prediction.
We develop a voting algorithm which utilizes the semantic associations and their confidence measures to predict the missing provenance information. Because the existing provenance information can be incorrect due to errors during the manual provenance annotation procedure, as an extension of the voting algorithm, we further design an algorithm for prediction which takes into account both the confidence measures of semantic associations and the accuracy of the existing provenance. A probability value is calculated as the trust of each prediction result. We develop the ProPSA (Provenance Prediction based on Semantic Associations) system which uses our proposed approaches to handle incomplete and inaccurate provenance information in reservoir engineering. Our evaluation shows that the average precision of our approach is above 85% when one-third of the provenance information is missing.

MSC:

68T30 Knowledge representation
68P20 Information storage and retrieval of data
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] DOI: 10.1109/MIC.2005.63 · Zbl 05096310 · doi:10.1109/MIC.2005.63
[2] Craft B. C., Applied Petroleum Reservoir Engineering (1990)
[3] da Silva P. P., IEEE Data Engineering Bulletin 26
[4] DOI: 10.1016/j.petrol.2008.12.008 · doi:10.1016/j.petrol.2008.12.008
[5] Getoor L., Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) (2007) · Zbl 1141.68054
[6] DOI: 10.2118/559-PA · doi:10.2118/559-PA
[7] Jagadish H. V., SIGMOD Record 33
[8] DOI: 10.1016/j.future.2010.07.005 · doi:10.1016/j.future.2010.07.005
[9] Sahoo S., Internet Computing, IEEE 12
[10] Sayarpour M., Journal of Petroleum Science and Engineering 69
[11] DOI: 10.1145/1084805.1084812 · Zbl 05444789 · doi:10.1145/1084805.1084812
[12] DOI: 10.1145/322261.322273 · Zbl 0462.68042 · doi:10.1145/322261.322273
[13] DOI: 10.2118/12894-PA · doi:10.2118/12894-PA
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.